Last modified: 2013-04-05 08:49:14 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T41623, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 39623 - Invalid language codes via uselang are used for lang HTML tag et al.
Invalid language codes via uselang are used for lang HTML tag et al.
Status: RESOLVED INVALID
Product: MediaWiki
Classification: Unclassified
Internationalization (Other open bugs)
1.20.x
All All
: Low normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-24 16:38 UTC by jeblad
Modified: 2013-04-05 08:49 UTC (History)
10 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description jeblad 2012-08-24 16:38:04 UTC
ULS should reject undefined language codes and don't pass them on.
Comment 1 Niklas Laxström 2012-08-25 10:57:25 UTC
Can you provide more details?
Comment 2 jeblad 2012-08-25 15:27:26 UTC
There are more in bug 37459, also with a proposed solution, but in general the user follows a link like http://en.wikipedia.org/wiki/Berlin?uselang=xyzzy or http://wikidata-test-repo.wikimedia.de/wiki/Data:Q2?uselang=xyzzy and even if the language is way off it is silently ignored for most part but retained in wgContentLanguage. If the language fails completely it should also be removed from wgContentLanguage.

It seems to me that there are at least three levels; language codes that are formatted right, language codes that exist, and language codes for languages we support with message files. The existing code only checks for correct formatting.

In the Wikidata-project it could be necessary to limit the language codes to at least the ones  used for existing languages.
Comment 3 Niklas Laxström 2012-08-27 06:10:19 UTC
The param uselang is from core code, even though ULS extensions uses setlang for the language change callback.

I'm recategorizing this bug.
Comment 4 jeblad 2012-08-27 07:15:46 UTC
Its not the uselang parameter in itself that is the problem, it is the validation of its arguments it uses. Those are sanitized at line 93 in UniversalLanguageSelector.php, and sanitation through RequestContext::sanitizeLangCode is to simplistic.

It seems like setlang works as expected, but I've only checked the resulting wgUserLang value.
Comment 5 Niklas Laxström 2012-08-27 07:21:38 UTC
So you are talking about setlang and not uselang after all?
Comment 6 jeblad 2012-08-27 07:23:58 UTC
This is about both of them.
Comment 7 Niklas Laxström 2012-08-27 07:51:27 UTC
Sorry if I start sounding grumpy, but let's process one issue at a time.

There are too many moving parts right now: uselang vs. setlang, trying to determine whether they actually have the same issue and even wikidata is thrown into the mix.

You also confused me when you stated that uselang=xyzzy would affect wgContentLanguage - this is not true, it only affects the lang tag on the html element (and some other places).

Let's concentrate for the uselang=xyzzy first. We allow codes like this because commons supposedly uses hacks like Special:Upload?uselang=ownwork.

In this case setlang differs, because it should only accept supported language codes (those which have any l10n at all) as opposed to known language codes (basically that we know the name of the language in some languages).

The fix for setlang is already in Gerrit, waiting for review.

Which brings to me to the important question: what is the issue you need to have fixed?
Comment 8 jeblad 2012-08-27 08:43:21 UTC
Sorry, wgContentLang was a typo on my part, it should have been wgUserLang. I'm an expert in confusing people! =)

We first spotted this in Wikidata, but I think it would be wrong for us to make a workaround in Wikidata, this is something that is "ULS" but probably need a fix upstream in core.

In Wikidata we should be able to limit the codes in uselang/setlang to valid languages, and possibly to a more limited set of supported languages. The supported languages in our context could be the languages that have a working Wikipedia-project, but shifting from use of Language::isValid to Language::isValidBuiltInCode is probably sufficient.

As a rough idea; having a config var that changes RequestContext::sanitizeLangCode from using isValid to the more stringent isValidBuiltInCode solves our problem.

Then there are also the problem of what to do if the code fails, but then I think it is acceptable to simply reset it to "en".
Comment 9 Niklas Laxström 2012-08-27 10:06:49 UTC
IsValidBuiltInCode is not strict enough for you.

List of validation functions:
;isValidCode: checks that the code doesn't contain some problematic characters not valid in titles or not safe in html
;isValidBuiltInCode: checks that the code only contains [a-z0-9-]

To be added:
;isKnownLanguageCode: checks that code is among the known defined language codes
;isSupportedLanguageCode: checks that the code is one that has any l10n in MediaWiki
Comment 10 jeblad 2012-08-27 11:26:02 UTC
I read the code a little to fast and missed a test, yes your right! =D
Comment 11 Nemo 2013-04-05 08:49:14 UTC
(In reply to comment #7)
> You also confused me when you stated that uselang=xyzzy would affect
> wgContentLanguage - this is not true, it only affects the lang tag on the
> html
> element (and some other places).

So this seems to be the problem; this tag seems to trickle down to wbDataLangName and to be abused by Wikidata label adder JS.

> Let's concentrate for the uselang=xyzzy first. We allow codes like this
> because
> commons supposedly uses hacks like Special:Upload?uselang=ownwork.

Indeed. So bug 37459 is the problem.
The language validation functions have been added in the meanwhile, so Wikidata can fix itself.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links