Last modified: 2014-08-26 17:11:13 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T59259, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 57259 - No proper handling of multivalued fileds
No proper handling of multivalued fileds
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
CommonsMetadata (Other open bugs)
unspecified
All All
: Normal normal with 2 votes (vote)
: ---
Assigned To: Tisza Gergő
:
: 64803 64888 (view as bug list)
Depends on:
Blocks: 62254
  Show dependency treegraph
 
Reported: 2013-11-19 16:53 UTC by Tisza Gergő
Modified: 2014-08-26 17:11 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Tisza Gergő 2013-11-19 16:53:09 UTC
Fields can have multiple values, in several different ways:
* some file metadata (EXIF etc) fields can have multiple values
* we parse some data from HTML code of license templates; some images have multiple license templates
* sometimes the same property can have a value from both the file and the description
* categories, and any properties based on categories, are in many-to-many relation with images
* (there are also multi-languaged values which can be multivalued when all languages are requested, but we already deal with that)

Right now we handle this in a very hacky way for some fields (e.g. concatenate categories with "|") and don't handle it at all for most (one of the values is selected by some random aspect of the code). This will be especially problematic if we want to use CommonsMetadata as a helper tool for the Wikidata migration.

A proper multivalue handling should probably be able to:
* indicate whether or not the given field is multivalued
* indicate the source (e.g. if one of the values comes from the file, the other from the description, we should be able to somehow tell that)
* synchronize properties somehow (e.g. a multilicensed image will have multiple license names and multiple license URLs; the user of the API has to be able to match the right name to the right link)
Comment 1 Tisza Gergő 2013-11-20 14:57:07 UTC
We already use arrays with _type key for multilanguaged arrays (even though it is an ugly hack), so it seems logical to use the same format (_type=ul, see [1]) for multivalued properties. 

We currently return an array with 'value' and 'source' fields for a single property; for multivalued properties we could maybe return such an array for each value, that would make it easy to indicate different sources (although ugly and not very compact). 

For marking which values of multivalued properties belong tohether, we could maybe use an additional 'group' field (e.g. License, LicenseShortName etc. read from the first license template would have group=1).

[1] https://www.mediawiki.org/wiki/Manual:File_metadata_handling#Format_of_this_merged_metadata
Comment 2 Tisza Gergő 2014-05-04 03:58:48 UTC
*** Bug 64803 has been marked as a duplicate of this bug. ***
Comment 3 Tisza Gergő 2014-05-05 16:24:50 UTC
*** Bug 64888 has been marked as a duplicate of this bug. ***
Comment 4 Lokal_Profil 2014-05-06 07:38:43 UTC
Copying across a point raised in bug: 64888 which makes this issue more severe.

This bug results in dual licensed material pointing to the wrong license. I.e. the text will say e.g. "CC BY-SA 3.0" but will link to http://www.gnu.org/copyleft/fdl.html. Apart from being highly confusing it most likely violates one of the two licenses.

Example: https://sv.wikipedia.org/wiki/Sveriges_l%C3%A4n#mediaviewer/Fil:Greater_coat_of_arms_of_Sweden.svg
Comment 5 Gerrit Notification Bot 2014-05-24 09:55:41 UTC
Change 135194 had a related patch set uploaded by Gergő Tisza:
[WIP] Handle multiple templates in TemplateParser

https://gerrit.wikimedia.org/r/135194
Comment 6 Gerrit Notification Bot 2014-05-30 06:09:41 UTC
Change 135194 merged by jenkins-bot:
Handle multiple templates in TemplateParser

https://gerrit.wikimedia.org/r/135194
Comment 7 Jean-Fred 2014-06-10 12:34:49 UTC
What’s the current status of this patch?

The issue described at bug:64888 is very problematic. I had not noticed that bug before, but frankly I would have been inclined to consider it a blocker for wide-deployment on Wikimedia sites.

(Here is another example if needed: <https://commons.wikimedia.org/wiki/File:Silver_crystal.jpg#mediaviewer/File:Silver_crystal.jpg>))
Comment 8 Tisza Gergő 2014-06-10 16:34:02 UTC
That specific issue should be fixed; CommonsMetadata handles multivalued fields correctly internally, but only returns one value due to limitations of the API format. This is done more consistently now.

The caching for CommonsMetadata is pretty complicated (there is a memcached layer on both the frontend and backend wiki, plus whatever the API framework uses, plus Varnish), so I am waiting to see if the issue is properly fixed (all the caches involved should wear out in 30 days) or some sort of manual purging will be necessary.
Comment 9 Raimond Spekking 2014-06-10 16:38:05 UTC
(In reply to Tisza Gergő from comment #8)

> The caching for CommonsMetadata is pretty complicated (there is a memcached
> layer on both the frontend and backend wiki, plus whatever the API framework
> uses, plus Varnish), so I am waiting to see if the issue is properly fixed
> (all the caches involved should wear out in 30 days) or some sort of manual
> purging will be necessary.

Please give more steam on this issue. The acceptance of the MultimediaViewer at least in the German Wikipedia community is lowering from day to day due to such critical bugs :-(
Comment 10 Tisza Gergő 2014-06-16 00:55:33 UTC
All examples from this and duplicate tickets provide correct data now. Is anyone aware of images which are still showing inconsistent licence information?
Comment 11 Andre Klapper 2014-07-06 22:11:00 UTC
Asking again before closing this ticket: Is anyone aware of images which are still showing inconsistent licence information?
Comment 12 Tisza Gergő 2014-07-11 19:02:00 UTC
Setting back the new, since the original issue described in comment 0 still stands. I'll assume the problem with mixing up different licenses is fixed.
Comment 13 Tisza Gergő 2014-07-11 19:02:52 UTC
...setting back the state to new...

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links