Last modified: 2011-08-29 15:59:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T32582, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 30582 - importImages.php with option --source-wiki-url uses latest comment as metadata, violating CC licenses
importImages.php with option --source-wiki-url uses latest comment as metadat...
Status: NEW
Product: MediaWiki
Classification: Unclassified
Maintenance scripts (Other open bugs)
1.18.x
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-26 11:34 UTC by Gregor Hagedorn
Modified: 2011-08-29 15:59 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Gregor Hagedorn 2011-08-26 11:34:37 UTC
importImages.php contains code to obtain the metadata from another wiki (e.g. commons) when importing images from that wiki. However, "metadata" are erroneously defined as the comment of the most recent update. An example shows why this is misleading. When importing:

http://commons.wikimedia.org/wiki/File:Esox_lucius1.jpg

the actual metadata (which include the author "Timothy Knepp") are ignored, and instead the upload comment of the second (latest) revision, i.e. then text "bigger version" is considered a the metadata.

----

To reproduce: Download http://upload.wikimedia.org/wikipedia/commons/c/c5/Esox_lucius1.jpg, place it SOME_PATH (your choice)

go into root of your wiki and run (replacing YOURNAME, SOME_PATH):

php ./maintenance/importImages.php  --conf ./LocalSettings.php --user YOURNAME --source-wiki-url 'http://commons.wikimedia.org/w/' SOME_PATH

----

Presently, the function getFileCommentFromSourceWiki( $wiki_host, $file ) is used which runs in this case:

http://commons.wikimedia.org/w/api.php?action=query&format=xml&titles=File:Esox_lucius1.jpg&prop=imageinfo&iiprop=comment

Instead a function new function getMetadataFromSourceWiki calling something like:

http://commons.wikimedia.org/w/api.php?action=query&format=xml&prop=revisions&rvprop=content&titles=File:Esox_lucius1.jpg
is needed.

We run it with the following code in importImages.inc (using index.php however):

function getMetadataFromSourceWiki( $wiki_host, $file ) {
	$url = $wiki_host . '/index.php?action=raw&title=File:' . rawurlencode( $file ) ;
	### example: http://commons.wikimedia.org/w/index.php?action=raw&title=File:Esox_lucius1.jpg
	$body = Http::get( $url );
	return html_entity_decode( $body);
}

which in importImages.php is then called as:

$real_comment = getMetadataFromSourceWiki_GH( $options['source-wiki-url'], $base );

replace present getFileCommentFromSourceWiki call.

----

Please update trunk, so we no longer have to patch.
Comment 1 Mark A. Hershberger 2011-08-29 15:59:02 UTC
Could you supply the patch you're using as a file in diff format attached to this bug?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links