Last modified: 2014-10-31 06:18:42 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T73284, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 71284 - Augment jsonlint test with a test for duplicate keys
Augment jsonlint test with a test for duplicate keys
Status: NEW
Product: Wikimedia
Classification: Unclassified
Continuous integration (Other open bugs)
unspecified
All All
: Lowest enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-09-25 05:31 UTC by Yuri Astrakhan
Modified: 2014-10-31 06:18 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Yuri Astrakhan 2014-09-25 05:31:04 UTC
json files may not contain duplicate keys, yet Jenkns does not detect it.

https://gerrit.wikimedia.org/r/#/c/160541/9/languages/i18n/qqq.json
Comment 1 James Forrester 2014-09-25 15:42:47 UTC
Hmm. According to the config file, jsonlint is run as part of the jslint job, along with jshint, for mwcore…

http://git.wikimedia.org/blob/integration%2Fjenkins-job-builder-config.git/HEAD/mediawiki.yaml#L21
Comment 2 Greg Grossmeier 2014-09-25 15:44:57 UTC
I don't see output from any of those on https://gerrit.wikimedia.org/r/#/c/160541/9
Comment 3 James Forrester 2014-09-25 15:52:26 UTC
(In reply to Greg Grossmeier from comment #2)
> I don't see output from any of those on
> https://gerrit.wikimedia.org/r/#/c/160541/9

Yeah, good point.

Also, per https://github.com/zaach/jsonlint/issues/13 duplicate keys are valid (if useless) JSON and so aren't going to trigger jsonlint to error.
Comment 4 James Forrester 2014-09-25 15:55:27 UTC
(In reply to James Forrester from comment #3)
> (In reply to Greg Grossmeier from comment #2)
> > I don't see output from any of those on
> > https://gerrit.wikimedia.org/r/#/c/160541/9
> 
> Yeah, good point.

Found the cause; jslint and jsonlint are now done for mw-core in npm, so the failure of npm to due to a submodule checkout error.


Re-writing this bug to be a feature request.
Comment 5 Antoine "hashar" Musso (WMF) 2014-09-25 16:43:31 UTC
The script is in integration/jenkins.git bin/json-lint.php  It uses PHP json_decode() to parse the json.

And with PHP 5.4.24:

 php > var_dump( json_decode( '{ "key": 1, "key": 2}' ) );
 class stdClass#1 (1) {
   public $key =>
   int(2)
 }
 php > 

The first key is overridden.

One can look at the JSON RFC http://tools.ietf.org/html/rfc7159#section-4 which states:

>   [..] When the names within an object are not
>   unique, the behavior of software that receives such an object is
>   unpredictable.  Many implementations report the last name/value pair
>   only.  Other implementations report an error or fail to parse the
>   object, and some implementations report all of the name/value pairs,
>   including duplicates.


Ideally we would propose to upstream PHP an option to json_decode() which would warn/fatal/except whenever a dupe is encountered.

Meanwhile one will have to figure out how to parse json "manually" and detect such error. Which is hmm cumbersome.

Patch welcome.
Comment 6 Antoine "hashar" Musso (WMF) 2014-09-25 16:46:28 UTC
python 2.7.8 has the same behavior:

> Repeated names within an object are accepted, and only the value of the
> last name-value pair is used.

https://docs.python.org/2/library/json.html#standard-compliance
Comment 7 Yuri Astrakhan 2014-09-25 16:53:16 UTC
Tell you the truth, I would much rather have jenkins throw an error and reject it, and let user override it manually. Can't imagine any reason to allow dups IMHO, even if some langs are ok with it.
Comment 8 Antoine "hashar" Musso (WMF) 2014-09-25 21:11:06 UTC
What I really meant is that neither PHP json_decode() or python json decoder let us detects duplicate key. So one needs to find an implementation that warns/dies or write one to do so.
Comment 9 Kunal Mehta (Legoktm) 2014-09-26 22:13:56 UTC
(In reply to Antoine "hashar" Musso from comment #8)
> What I really meant is that neither PHP json_decode() or python json decoder
> let us detects duplicate key. So one needs to find an implementation that
> warns/dies or write one to do so.

According to <https://docs.python.org/2/library/json.html#repeated-names-within-an-object> we can override that behavior. <https://mail.python.org/pipermail/python-list/2013-May/647954.html> has an example snippet on how to do so.
Comment 10 Antoine "hashar" Musso (WMF) 2014-10-24 13:53:27 UTC
Setting lowest priority until someone is willing to take this and figure out a solution.
Comment 11 Kunal Mehta (Legoktm) 2014-10-25 05:26:04 UTC
(In reply to Antoine "hashar" Musso (WMF) from comment #10)
> Setting lowest priority until someone is willing to take this and figure out
> a solution.

Ok, I wrote a python script that recursively searches for JSON files, and checks if they have any duplicate keys: <https://github.com/legoktm/dupe-keys-json>.

I don't know how to turn it into something run by jenkins though.
Comment 12 Antoine "hashar" Musso (WMF) 2014-10-30 22:43:10 UTC
I like the checker since it has tox and tests :-D  I am not too sure how to have it deployed on all the CI slaves though.  Ideally we would add a setup.py and make it a Debian package, not sure whether it is worth the effort though.

Kunal, can you sync with Yurik to figure out whether that match is use case?

If anyone has an idea how to deploy the script on all the CI slaves I am taking inputs :-)
Comment 13 Yuri Astrakhan 2014-10-31 06:18:42 UTC
Seems like it does. Didn't think it required such an elaborate python code though! And thanks!!! :)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links