Last modified: 2014-09-21 08:20:30 UTC
I've been thinking. Even now that we don't have reliable information just yet, we could probably make some sort of 'reliability' assessment of the data that we scraped. The goal is to improve the quality of the data presented by MediaViewer, it would spur the community to action on these issues (which are now only visible as 'failures' in media viewer, and not as 'editor feedback' in file description pages. On the file description, we could show a bar to registered users with more than 100 edits saying: "This image has 5% machine readable data. If you would care to help improve this, please join the cleanup campaign." The user could click it and you would see a list of "things that are missing or unknown, or badly formatted". The campaign page would list tools like the "Add {{Information}}" gadget that is available to users and similar data. We could also give 'bonus' points for usage of {{data}}, {{author}} or similar templates that are able to provide more specific and more semantic data to us when moving stuff to a wikidata layer. Perhaps Magnus Manske and Riilke would be able to assist in setting something like that up with the community. Later we can retool all that into a gadget that works for the migration to wikidata layer and people would already be used to this workflow (and we can perhaps provide aggregate metrics by moving everything to a better layer at that point in time).
Currently the only thing we could easily check is whether there is an Information template present and whether there is a license template present. (And coordinates, but we can't tell whether a file missing a coordinate template should have one.)
https://commons.wikimedia.org/wiki/User:TheDJ/datacheck.js JS evaluator for this under development (will take me a few weeks to finish probably, with my time constraints). Needs a project page on Commons etc..