Last modified: 2012-05-15 14:51:59 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T38799, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 36799 - API mangles certain UTF characters when querying 16 or more pages
API mangles certain UTF characters when querying 16 or more pages
Status: RESOLVED DUPLICATE of bug 36839
Product: MediaWiki
Classification: Unclassified
API (Other open bugs)
1.20.x
All All
: Low minor (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-12 21:19 UTC by magog.the.ogre
Modified: 2012-05-15 14:51 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description magog.the.ogre 2012-05-12 21:19:17 UTC
This is a fairly bizarre error... it only appears when querying 16 or more pages on the API. If the page name has a certain UTF character in it ("-", a.k.a. E28093 in UTF-8 hex), the API mangles the title and then states that said mangled title doesn't exist.

Here is the reproduction:
*Mangled result, 16 files being queried: http://commons.wikimedia.org/w/api.php?action=query&titles=File:Krizius%204.jpg|File:Abraszewski%20Bayview.jpg|File:Abraszewski%20flowers%20small.jpg|File:Abraszewski%20gray%20mansion%20small.jpg|File:Saasveld04.jpg|File:Oyama-jinja%20004.jpg|File:Sila%20o%20Tonga%20-%20Coat%20of%20arms%20of%20the%20Kingdom%20of%20Tonga.svg|File:Ft-Banks-1946-1953-C.pdf|File:Ounguicularis.jpg|File:Royal%20Dublin%20Fusileers.jpg|File:Edsim%20Vascular.jpg|File:Flag%20Dubrovnik%E2%80%93Neretva%20County.gif|File:1824%20laver%20coral.jpg|File:1928%20new%20chambers.jpg|File:1933%20Thicknesse%20w480.jpg|File:2-10%20Armoured%20Regt%20(AWM%20043801).jpg&prop=imageinfo|revisions|templates&iiprop=sha1
**Notice at the top the API returns the result: <page ns="6" title="File:Flag Dubrovnik–Neretva County.gif" missing="" imagerepository="" />
**Thus the dash character has been mangled into mojibake
*Now, to create a non-mangled result, remove any one of the other files being queried in the above result (only 15 files being queried)
**Removing the last of the files from the list(File:...(AWM%20043801).jpg): http://commons.wikimedia.org/w/api.php?action=query&titles=File:Krizius%204.jpg|File:Abraszewski%20Bayview.jpg|File:Abraszewski%20flowers%20small.jpg|File:Abraszewski%20gray%20mansion%20small.jpg|File:Saasveld04.jpg|File:Oyama-jinja%20004.jpg|File:Sila%20o%20Tonga%20-%20Coat%20of%20arms%20of%20the%20Kingdom%20of%20Tonga.svg|File:Ft-Banks-1946-1953-C.pdf|File:Ounguicularis.jpg|File:Royal%20Dublin%20Fusileers.jpg|File:Edsim%20Vascular.jpg|File:Flag%20Dubrovnik%E2%80%93Neretva%20County.gif|File:1824%20laver%20coral.jpg|File:1928%20new%20chambers.jpg|File:1933%20Thicknesse%20w480.jpg&prop=imageinfo|revisions|templates&iiprop=sha1
**Removing the first of the files from the list (File:Krizius%204.jpg): http://commons.wikimedia.org/w/api.php?action=query&titles=File:Abraszewski%20Bayview.jpg|File:Abraszewski%20flowers%20small.jpg|File:Abraszewski%20gray%20mansion%20small.jpg|File:Saasveld04.jpg|File:Oyama-jinja%20004.jpg|File:Sila%20o%20Tonga%20-%20Coat%20of%20arms%20of%20the%20Kingdom%20of%20Tonga.svg|File:Ft-Banks-1946-1953-C.pdf|File:Ounguicularis.jpg|File:Royal%20Dublin%20Fusileers.jpg|File:Edsim%20Vascular.jpg|File:Flag%20Dubrovnik%E2%80%93Neretva%20County.gif|File:1824%20laver%20coral.jpg|File:1928%20new%20chambers.jpg|File:1933%20Thicknesse%20w480.jpg|File:2-10%20Armoured%20Regt%20(AWM%20043801).jpg&prop=imageinfo|revisions|templates&iiprop=sha1
**On both above results, the API correctly returns the result: <page pageid="25721149" ns="6" title="File:Flag Dubrovnik–Neretva County.gif" imagerepository="local">...</page>

I have literally never encountered this error for any other file, and my bot has queried a LOT of files, so I don't know how many different utf-8 characters the API will mangle.
Comment 1 magog.the.ogre 2012-05-12 21:39:42 UTC
... and the same issue is occurring with File:José de Ribera-St Sebastian.jpg (http://en.wikipedia.org/wiki/File:Jos%C3%A9_de_Ribera-St_Sebastian.jpg). Is there maybe a new feature that is bugging up English Wikipedia?
Comment 2 Umherirrender 2012-05-14 18:40:15 UTC
Maybe it is depending on the length of the request? See bug 36839
Comment 3 Alex Monk 2012-05-15 14:51:59 UTC
I know the dupe should really be the other way around, but it looks like everyone's attention is on the other bug, so I'll mark this one as duped instead.

*** This bug has been marked as a duplicate of bug 36839 ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links