Last modified: 2014-05-02 00:20:01 UTC
0.8 promises better performance, besides a bunch of other improvements: https://github.com/gwicke/pegjs/blob/master/CHANGELOG.md At a minimum, we'll need to change all references to pos and pos0 to peg$pos etc, and also rework the cache key patch regexp in mediawiki.tokenizer.peg.js. Also relevant: "Removed the toSource method of generated parsers and introduced a new output option of the PEG.buildParser method. It allows callers to specify whether they want to get back the parser object or its source code."
Rather than using peg$pos we should probably use the public offset() methods now available in 0.8.
The current tokenizer (0.6) is responsible for 24% of our cpu time when parsing [[en:Barack Obama]], so there is a good amount of potential here.
Change 130561 had a related patch set uploaded by Arlolra: WIP: Upgrade to pegjs v0.8 https://gerrit.wikimedia.org/r/130561
Change 130561 merged by GWicke: Upgrade to pegjs v0.8 https://gerrit.wikimedia.org/r/130561