Last modified: 2014-06-29 20:26:28 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T31793, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 29793 - Check uploaded images with Google image search to find copyright violations


Summary:	Check uploaded images with Google image search to find copyright violations

Status:	NEW

Product:	MediaWiki extensions
Classification:	Unclassified
Component:	UploadWizard (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Lowest enhancement (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2011-07-09 21:10 UTC by Raimond Spekking
Modified:	2014-06-29 20:26 UTC (History)
CC List:	8 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Raimond Spekking 2011-07-09 21:10:02 UTC

Idea: Check images uploaded with the wizard with the Google image search to identify potentially copyright violations. Example: http://goo.gl/XbNPB (hopefully the link inside this shortened link is stable).

If Google finds an identical image add the newly uploaded image to a hidden cat to be processed by Commons admins.

Comment 1 Neil Kandalgaonkar 2011-07-09 21:16:55 UTC

That's a great idea, but should it be in the UploadWizard or should a bot be doing that to everything uploaded? Things can be uploaded via API too.

Comment 2 Raimond Spekking 2011-07-09 21:28:41 UTC

In principle for every upload. But with an integration into the UploadWizard the user can be warned prior he finishes the upload.

Comment 3 User:Docu 2011-07-10 12:32:39 UTC

What I like about this search is that it finds similar images, but doesn't it always find some? -- Sometimes similar in ways one hadn't thought of, but unlikely copies of the one I started out with.

Comment 4 Neil Kandalgaonkar 2011-07-11 23:48:31 UTC

Yeah, I'm not sure what it would mean if we found similar images. Does that mean it's bad?

In any case, I don't see any supported API for Google's image similarity search, and certainly not one that returns some sort of similarity rating.

Also, who is the target audience here? Someone who is determined to copyvio will do it anyway. Perhaps there are some users who might abandon their upload once they realized we didn't want copyvio images (having missed every other warning)....

It's a neat idea but I'm not seeing an easy way to make it work. Deferring for now

Comment 5 Carl Austin Bennett 2012-02-25 19:26:19 UTC

Is this search finding images with the same content or merely images with the same title?

If (as I suspect) it's the latter, it may be better to look into something like TinEye.com - again not ideal, as it merely detects the same image to be on a hundred other sites without indicating the original license for any of them. It'd keep a few very tired memes and visual Internet clichés off the site, but that's about it.

Then again, under the current system I could grab a camera, take a photo of some non-notable elementary school that someone requested on some WikiProject, upload it with no tags and a textual description of "I took this photo twenty minutes ago; do what you want with it, I don't care." and rest assured that some obnoxious robot would delete the image as a copyvio before the week is done.

That's what happens when this sort of thing is entrusted to entirely-automated processes.

Comment 6 Mark Holmquist 2012-06-01 20:10:05 UTC

I have to question this.

Now, it could be interesting to fill a source URL by this, but I'm not sure it's worth a call to an API that may or may not exist....anyway, I digress from the original point.

If we tried to use this API (which may not exist) to detect whether the image exists, a large portion of the traffic would be turned away, as I understand it. Many of the images uploaded through UW are uploaded from another source, which is perfectly legal if the license is right. Since I don't see any way we could detect the license of the image from a Google Images search, with or without an API, we are in the soup.

I could maybe see this working with an image host's API, because I think they might store licensing information in a simple format, and that would allow us to pre-fill a lot of information (original source, author name, EXIF data possibly, licensing information), but it's a pretty small chance that the image exists on the image host. Maybe. Of course, this is all contingent on the image hosts' ability to search by image contents, which could be tough.

Maybe this is the sort of thing that we could consider implementing as a super-extra feature for communities that extensively use a specific image host and disable it for Commons, etc., where the images come from all over.

Just a thought!

Comment 7 Thehelpfulone 2012-06-22 19:41:02 UTC

Reassigning to wikibugs-l per bug 37789

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links