Last modified: 2014-10-31 06:18:42 UTC
json files may not contain duplicate keys, yet Jenkns does not detect it. https://gerrit.wikimedia.org/r/#/c/160541/9/languages/i18n/qqq.json
Hmm. According to the config file, jsonlint is run as part of the jslint job, along with jshint, for mwcore… http://git.wikimedia.org/blob/integration%2Fjenkins-job-builder-config.git/HEAD/mediawiki.yaml#L21
I don't see output from any of those on https://gerrit.wikimedia.org/r/#/c/160541/9
(In reply to Greg Grossmeier from comment #2) > I don't see output from any of those on > https://gerrit.wikimedia.org/r/#/c/160541/9 Yeah, good point. Also, per https://github.com/zaach/jsonlint/issues/13 duplicate keys are valid (if useless) JSON and so aren't going to trigger jsonlint to error.
(In reply to James Forrester from comment #3) > (In reply to Greg Grossmeier from comment #2) > > I don't see output from any of those on > > https://gerrit.wikimedia.org/r/#/c/160541/9 > > Yeah, good point. Found the cause; jslint and jsonlint are now done for mw-core in npm, so the failure of npm to due to a submodule checkout error. Re-writing this bug to be a feature request.
The script is in integration/jenkins.git bin/json-lint.php It uses PHP json_decode() to parse the json. And with PHP 5.4.24: php > var_dump( json_decode( '{ "key": 1, "key": 2}' ) ); class stdClass#1 (1) { public $key => int(2) } php > The first key is overridden. One can look at the JSON RFC http://tools.ietf.org/html/rfc7159#section-4 which states: > [..] When the names within an object are not > unique, the behavior of software that receives such an object is > unpredictable. Many implementations report the last name/value pair > only. Other implementations report an error or fail to parse the > object, and some implementations report all of the name/value pairs, > including duplicates. Ideally we would propose to upstream PHP an option to json_decode() which would warn/fatal/except whenever a dupe is encountered. Meanwhile one will have to figure out how to parse json "manually" and detect such error. Which is hmm cumbersome. Patch welcome.
python 2.7.8 has the same behavior: > Repeated names within an object are accepted, and only the value of the > last name-value pair is used. https://docs.python.org/2/library/json.html#standard-compliance
Tell you the truth, I would much rather have jenkins throw an error and reject it, and let user override it manually. Can't imagine any reason to allow dups IMHO, even if some langs are ok with it.
What I really meant is that neither PHP json_decode() or python json decoder let us detects duplicate key. So one needs to find an implementation that warns/dies or write one to do so.
(In reply to Antoine "hashar" Musso from comment #8) > What I really meant is that neither PHP json_decode() or python json decoder > let us detects duplicate key. So one needs to find an implementation that > warns/dies or write one to do so. According to <https://docs.python.org/2/library/json.html#repeated-names-within-an-object> we can override that behavior. <https://mail.python.org/pipermail/python-list/2013-May/647954.html> has an example snippet on how to do so.
Setting lowest priority until someone is willing to take this and figure out a solution.
(In reply to Antoine "hashar" Musso (WMF) from comment #10) > Setting lowest priority until someone is willing to take this and figure out > a solution. Ok, I wrote a python script that recursively searches for JSON files, and checks if they have any duplicate keys: <https://github.com/legoktm/dupe-keys-json>. I don't know how to turn it into something run by jenkins though.
I like the checker since it has tox and tests :-D I am not too sure how to have it deployed on all the CI slaves though. Ideally we would add a setup.py and make it a Debian package, not sure whether it is worth the effort though. Kunal, can you sync with Yurik to figure out whether that match is use case? If anyone has an idea how to deploy the script on all the CI slaves I am taking inputs :-)
Seems like it does. Didn't think it required such an elaborate python code though! And thanks!!! :)