Last modified: 2013-10-10 21:34:48 UTC
We currently use plain objects in many places where users can pass in keys. Even with Object.create(null), users can pass in the key '__proto__' which will happily set the prototype, for example to an object if that is what we set as a value. ES6 / harmony Map and Set (see http://wiki.ecmascript.org/doku.php?id=harmony:simple_maps_and_sets) fix this problem (and allow objects as keys), so we should use them. Native Map / Set is not yet complete in node 0.10, so we should probably use one of the existing shims to get it: https://github.com/paulmillr/es6-shim https://github.com/Benvie/harmony-collections The es6 shim also gives us more goodies which might be useful to have for cleaner code. We should not go overboard though, and only use features where a native implementation is likely to happen soon.
Another contender: https://github.com/WebReflection/es6-collections
(In reply to comment #1) > Another contender: https://github.com/WebReflection/es6-collections This one uses linear search on a key array to support object keys. Search for 'function betterIndexOf' in https://github.com/WebReflection/es6-collections/blob/master/src/es6-collections.js.
https://github.com/paulmillr/es6-shim/blob/master/es6-shim.js uses iteration on a linked list in its Map implementation. Search for 'function Map'.
https://github.com/Benvie/harmony-collections/blob/master/harmony-collections.js seems to be the only one of the contenders so far that uses native objects for primitive keys (and linear search for object keys), so only incurs constant overhead for each string key access. __proto__ is mapped to a randomized string. Memory overhead is only incurred per Map, but not per item. Seems to be the best option for large maps. For small maps linear search is likely faster.
The standard thunk is something like: function Map() { this.store = Object.create(null); } Map.prototype.get = function( k ) { return this.store['$'+k]; } Map.prototype.set = function( k, v ) { this.store['$'+k] = v; } ...but we should be a little careful not to go overboard here. In many cases we know that '__proto__' cannot be a possible key. In other case we are doing queries only (for example, to determine whether a tag name is a member of a specific set), and Object.create(null).__proto__===null, so that's perfectly safe as well. So, sure we should use safer maps where needed, but it probably doesn't have to be a hugely-invasive patch set.
(In reply to comment #5) > The standard thunk is something like: > > function Map() { this.store = Object.create(null); } > Map.prototype.get = function( k ) { return this.store['$'+k]; } > Map.prototype.set = function( k, v ) { this.store['$'+k] = v; } This is basically what harmony-collections does for primitive keys when native maps are not available. It also implements the full spec, which is good for consistency even if we don't need objects as keys right now. > So, sure we should use safer maps where needed, but it probably doesn't have > to > be a hugely-invasive patch set. Yup, only where users pass in arbitrary keys for new items, as I noted in comment #1.
Change 85432 had a related patch set uploaded by Arlolra: WIP: ES6 shim cleanup. https://gerrit.wikimedia.org/r/85432
Change 85432 merged by jenkins-bot: ES6 shim cleanup https://gerrit.wikimedia.org/r/85432
harmony-collections are quite slow in production: ---------------------------------------------------------- Latest master: [subbu@earth tests] time node parserTests --wt2html --wt2wt --selser --html2wt --html2html >&! /dev/null 94.013u 0.728s 1:34.32 100.4% 0+0k 0+0io 0pf+0w ---------------------------------------------------------- [subbu@earth lib] git checkout 064cee74 [subbu@earth tests] time node parserTests --wt2html --wt2wt --selser --html2wt --html2html > & ! /dev/null 59.475u 0.456s 0:59.65 100.4% 0+0k 0+0io 0pf+0w ---------------------------------------------------------- Cscott has a fork that performs better at https://github.com/cscott/harmony-collections.
I've fixed the performance regression by patching harmony-collections. I might also patch es6-shim to be O(1) so that would be another option.
(In reply to comment #10) > I've fixed the performance regression by patching harmony-collections. > > I might also patch es6-shim to be O(1) so that would be another option. Can you post the current timings?
$ node --version v0.10.20 $ git checkout master $ time tests/parserTests.js --quiet real 1m8.476s user 1m3.036s sys 0m0.628s $ git checkout no-es6 $ time tests/parserTests.js --quiet real 0m37.091s user 0m37.036s sys 0m0.444s With the gross hack in https://gerrit.wikimedia.org/r/88148 : $ time tests/parserTests.js --quiet real 0m39.008s user 0m38.876s sys 0m0.540s With https://gerrit.wikimedia.org/r/88194 and https://gerrit.wikimedia.org/r/88193 : $ time tests/parserTests.js --quiet real 0m42.924s user 0m42.668s sys 0m0.456s
With es6-shim (https://gerrit.wikimedia.org/r/88241 and https://gerrit.wikimedia.org/r/88242 ): $ time tests/parserTests.js --quiet real 0m46.265s user 0m46.228s sys 0m0.488s Unlike the harmony-collections patch, this is a fork that can be upstreamed (https://github.com/paulmillr/es6-shim/issues/148). Don't know why exactly this is slower than the harmony-collections version; I suspect it's due to the other stuff hacked by es6-shim. The actual Map/Set implementation should be faster.
(In reply to comment #13) > Don't know why exactly this is slower than the harmony-collections version; I > suspect it's due to the other stuff hacked by es6-shim. The actual Map/Set > implementation should be faster. I was wrong. Simply adding es6-shim does not make the parserTests benchmark, in my tests. The es6-shim implementation seems to be slower solely because its constructor is slower. We make 88,143 instances of Set via JSUtils.arrayToSet during a run of parserTests, and those constructor calls seem to account for the extra 7s of runtime.
OK. I suggest moving to es6-shim. The codebase is much simpler. Things left to do: a) profile our code and figure out why we are creating so many different Set objects during a parserTests run. Possible we can hoist many of them out (into mediawiki.wikitext.constants.js) which would greatly reduce the speed penalty from using es6-shim. b) (possibly) hack es6-shim to lazy-initialize the Map objects. Only create the backing linked-list if non-string keys are ever used. [gwicke suggests that numeric keys could also be sped up, by prefixing string keys with '$' instead of testing against '__proto__'] c) (possibly) add a FastStringSet to JSUtils (along the lines of https://gerrit.wikimedia.org/r/88148 ). c2) figure out what the hotspots are for Set.has() -- possibly only a few sites actually need to use FastStringSet. d) subbu has also requested re-doing some of the performance comparisons above using a single (large) article as a benchmark, instead of ParserTests. This would help determine whether the performance loss was significant in real use (as opposed to just in parserTests). Arlo: I'm moving on to other tasks right now. Go ahead and tackle any of the above that you like.
Regarding (d), here is one possible page: fr:Coupe_du_pays_de_Galles_de_football See https://gist.github.com/subbuss/6876447 for some tests I did y'day morning.
wrt (b) https://github.com/cscott/es6-shim/tree/faster-map recovers the lost speed: $ time tests/parserTests.js --quiet real 0m38.924s user 0m38.896s sys 0m0.464s
For a), this seems to be the largest culprit: https://gerrit.wikimedia.org/r/88927
Subbu, here's a comparison as requested in d), https://gist.github.com/arlolra/6913667 So it's clear, in this case: master is before that merge that just went in, so ec485737eaa10f5336578e5a9f54fe944de9fc23 no-es6 is master with, git revert 560c79e9a1ee8d01bc9cc4b862f280fa6752a610 and es6regress is what just went in dea9cbfa8ff4a6aea12930ca13c2d0f3491d8477 Still seeing a regression so taking one of Cscott's solutions is probably best.
Change 88242 had a related patch set uploaded by Cscott: Replace harmony-collections with es6-shim. https://gerrit.wikimedia.org/r/88242
Change 88242 merged by jenkins-bot: Replace harmony-collections with es6-shim https://gerrit.wikimedia.org/r/88242
Seems good. starting parsing of fr:Coupe_du_pays_de_Galles_de_football completed parsing of fr:Coupe_du_pays_de_Galles_de_football in 6671 ms starting parsing of fr:Coupe_du_pays_de_Galles_de_football completed parsing of fr:Coupe_du_pays_de_Galles_de_football in 4247 ms starting parsing of fr:Coupe_du_pays_de_Galles_de_football completed parsing of fr:Coupe_du_pays_de_Galles_de_football in 4395 ms