Last modified: 2014-06-29 20:26:28 UTC
Idea: Check images uploaded with the wizard with the Google image search to identify potentially copyright violations. Example: http://goo.gl/XbNPB (hopefully the link inside this shortened link is stable). If Google finds an identical image add the newly uploaded image to a hidden cat to be processed by Commons admins.
That's a great idea, but should it be in the UploadWizard or should a bot be doing that to everything uploaded? Things can be uploaded via API too.
In principle for every upload. But with an integration into the UploadWizard the user can be warned prior he finishes the upload.
What I like about this search is that it finds similar images, but doesn't it always find some? -- Sometimes similar in ways one hadn't thought of, but unlikely copies of the one I started out with.
Yeah, I'm not sure what it would mean if we found similar images. Does that mean it's bad? In any case, I don't see any supported API for Google's image similarity search, and certainly not one that returns some sort of similarity rating. Also, who is the target audience here? Someone who is determined to copyvio will do it anyway. Perhaps there are some users who might abandon their upload once they realized we didn't want copyvio images (having missed every other warning).... It's a neat idea but I'm not seeing an easy way to make it work. Deferring for now
Is this search finding images with the same content or merely images with the same title? If (as I suspect) it's the latter, it may be better to look into something like TinEye.com - again not ideal, as it merely detects the same image to be on a hundred other sites without indicating the original license for any of them. It'd keep a few very tired memes and visual Internet clichés off the site, but that's about it. Then again, under the current system I could grab a camera, take a photo of some non-notable elementary school that someone requested on some WikiProject, upload it with no tags and a textual description of "I took this photo twenty minutes ago; do what you want with it, I don't care." and rest assured that some obnoxious robot would delete the image as a copyvio before the week is done. That's what happens when this sort of thing is entrusted to entirely-automated processes.
I have to question this. Now, it could be interesting to fill a source URL by this, but I'm not sure it's worth a call to an API that may or may not exist....anyway, I digress from the original point. If we tried to use this API (which may not exist) to detect whether the image exists, a large portion of the traffic would be turned away, as I understand it. Many of the images uploaded through UW are uploaded from another source, which is perfectly legal if the license is right. Since I don't see any way we could detect the license of the image from a Google Images search, with or without an API, we are in the soup. I could maybe see this working with an image host's API, because I think they might store licensing information in a simple format, and that would allow us to pre-fill a lot of information (original source, author name, EXIF data possibly, licensing information), but it's a pretty small chance that the image exists on the image host. Maybe. Of course, this is all contingent on the image hosts' ability to search by image contents, which could be tough. Maybe this is the sort of thing that we could consider implementing as a super-extra feature for communities that extensively use a specific image host and disable it for Commons, etc., where the images come from all over. Just a thought!
Reassigning to wikibugs-l per bug 37789