Last modified: 2012-08-31 04:51:14 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T39587, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 37587 - character filter for uselang
character filter for uselang
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
General/Unknown (Other open bugs)
1.19.1
All All
: High normal (vote)
: 1.19.x release
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-14 11:10 UTC by Fomafix
Modified: 2012-08-31 04:51 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Fomafix 2012-06-14 11:10:37 UTC
Bug 36938 is fixed and adds escaping of uselang for HTML.

For the JavaScript variable mw.config.get( 'wgUserLanguage' ) still a lots of characters are allowed but some are filtered:

https://www.mediawiki.org/w/index.php?uselang= >>> "en"
https://www.mediawiki.org/w/index.php?uselang=%20 >>> " "
https://www.mediawiki.org/w/index.php?uselang=%21 >>> "!"
https://www.mediawiki.org/w/index.php?uselang=%22 >>> """
https://www.mediawiki.org/w/index.php?uselang=%23 >>> "en"
https://www.mediawiki.org/w/index.php?uselang=%24 >>> "$"
https://www.mediawiki.org/w/index.php?uselang=%25 >>> "%"
https://www.mediawiki.org/w/index.php?uselang=%2525 >>> "en"
https://www.mediawiki.org/w/index.php?uselang=%26 >>> "&"
https://www.mediawiki.org/w/index.php?uselang=%26amp >>> "&amp"
https://www.mediawiki.org/w/index.php?uselang=%26amp; >>> "en"
https://www.mediawiki.org/w/index.php?uselang=; >>> ";"
https://www.mediawiki.org/w/index.php?uselang=: >>> "en"
https://www.mediawiki.org/w/index.php?uselang=%3d >>> "="
https://www.mediawiki.org/w/index.php?uselang== >>> "="
https://www.mediawiki.org/w/index.php?uselang=/ >>> "en"
https://www.mediawiki.org/w/index.php?uselang=" >>> """
https://www.mediawiki.org/w/index.php?uselang=' >>> "'"

Many scripts use wgUserLanguage unescaped. Examples:
https://commons.wikimedia.org/wiki/MediaWiki:Common.js
https://commons.wikimedia.org/wiki/MediaWiki:Gadget-HotCat.js

When you open the following link on dewiki with activated gadget HotCat
  https://de.wikipedia.org/w/index.php?uselang=en%26curid=19891835
the page https://commons.wikimedia.org/wiki/User:Fomafix/xss.js is loaded.

Of course this is a bug in the gadget, but there are lots of gadgets which maybe contain the same error.

Expected result:
wgUserLanguage should only be set when uselang contains only necessary allowed characters.
Comment 2 Fomafix 2012-06-14 13:55:34 UTC
BCP 47 writes in: https://tools.ietf.org/html/bcp47#section-7

 language tags use only the characters A-Z, a-z, 0-9, and HYPHEN-MINUS

This should be the only allowed characters.
Comment 3 Chris Steipp 2012-06-25 18:16:57 UTC
Fomafix,

Thanks for reporting this too! I've been out on leave for a the last two weeks, so apologies for the slow response.

I see exactly what you mean and yes, that is bad. We need to figure out the best place to put in the fix for this, but we will get it addressed asap.
Comment 4 Tim Starling 2012-06-25 22:01:16 UTC
The uselang attribute commonly contains punctuation characters that aren't allowed by BCP-47, due to the {{int:}} hack commonly used on multilanguage wikis. Only the minimum set of characters required for security should be rejected, plus the ones rejected by Language::isValidCode().
Comment 5 Krinkle 2012-06-30 06:15:26 UTC
(In reply to comment #4)
> The uselang attribute commonly contains punctuation characters that aren't
> allowed by BCP-47

Such as? I thought it was only used for things like en-upload-ownwork. But always within BCP-47, in general even structer (never numbers or uppercase even).
Comment 6 Chris Steipp 2012-07-06 16:51:41 UTC
Working with Tim on this yesterday, he pulled a list of all of the uselang values that hit WMF sites from the cache (http://paste.tstarling.com/p/qzhZBz.html). There were several obvious attack strings, and some that looked like they probably were errors. Almost all the rest were a-zA-Z0-9.-+ characters, with a few ?, =, and ncr-encoded characters where it was hard to figure out if they were errors or intentional.

From a security perspective, I think we should at least implement Nikerabbits patch now and if anyone was intentionally using ', ", or &, we can work with the site admins to get those cleaned up. Then we can later look at whitelisting [a-zA-Z+.-] only.
Comment 7 Chris Steipp 2012-08-01 18:54:59 UTC
With the rollout of wmf8 today on de.wikipedia.org, the particular issues reported by fomafix appears to be resolved. Thanks everyone!

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links