Last modified: 2014-07-15 07:49:40 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T59670, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 57670 - Stack overflow/long parse times parsing {{中央線経路図}}
Stack overflow/long parse times parsing {{中央線経路図}}
Status: RESOLVED FIXED
Product: Parsoid
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Low normal
: ---
Assigned To: Gabriel Wicke
: performance
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-27 17:20 UTC by ssastry
Modified: 2014-07-15 07:49 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description ssastry 2013-11-27 17:20:46 UTC
Nov 20 18:43:57 <arlolra> the max recurse from jawiki/中央線_(韓国)
Nov 20 18:44:32 <arlolra> {{中央線経路図}} seems to be from this template
Nov 20 18:45:28 <arlolra> just trying to parse that alone causes it
Nov 20 18:45:53 <arlolra> but parsing the content of the template works just fine
...
Nov 20 18:49:03 <subbu> confirmed by this:
Nov 20 18:49:04 <subbu> [subbu@earth lib] echo '{{中央線経路図}}' | node parse --prefix jawiki
Nov 20 18:49:04 <subbu> ERROR in Main_Page:
Nov 20 18:49:04 <subbu> Maximum call stack size exceeded
Nov 20 18:49:04 <subbu> Stack trace: undefined
Comment 1 Arlo Breault 2013-11-28 05:52:51 UTC
The source of that template is *really* big and the error seems to be happening in tokenizer. It's not an infinite recursion cause this works,

echo "{{中央線経路図}}" | node --stack-size=2048 parse --prefix jawiki
Comment 2 ssastry 2013-12-02 17:36:33 UTC
Interesting ... it would be worthwhile to figure out if we can get this to parse without getting into deep call stacks. But, definitely lower priority for now.
Comment 3 ssastry 2014-05-06 15:32:46 UTC
Couple more pages from production logs:

* itwiki:IV_Copa_Brasil oldid:64293916
* dewiki:Präsidentschaftswahl_in_Frankreich_2012 oldid:126550548
Comment 4 Gabriel Wicke 2014-05-29 02:45:22 UTC
The bulk of the recursion is via table_data_tag calling nested_block_in_table, which in turn matches other table syntax. The remainder of the table is matched using this recursion, which causes the overflow as this table is large.
Comment 5 Gerrit Notification Bot 2014-05-29 03:18:27 UTC
Change 135992 had a related patch set uploaded by GWicke:
Bug 57670: Avoid recursion via nested_block_in_table

https://gerrit.wikimedia.org/r/135992
Comment 6 Gabriel Wicke 2014-05-29 03:19:45 UTC
Debugging note: Node 0.11 finally has stack traces on stack overflow. To analyze recursions, it's additionally useful to increase the stack trace limit:

node --stack-trace-limit=1000
Comment 7 Gabriel Wicke 2014-05-29 04:29:47 UTC
(In reply to ssastry from comment #3)
> Couple more pages from production logs:
> 
> * itwiki:IV_Copa_Brasil oldid:64293916

Now finishes in 10s & looks correct.

> * dewiki:Präsidentschaftswahl_in_Frankreich_2012 oldid:126550548

Now finishes in 35s & looks correct.
Comment 8 Gerrit Notification Bot 2014-05-29 22:42:17 UTC
Change 135992 merged by jenkins-bot:
Bug 57670: Avoid recursion via nested_block_in_table

https://gerrit.wikimedia.org/r/135992
Comment 9 ssastry 2014-06-10 22:51:59 UTC
New errors from production logs:

[fatal][eswikisource/Usuario:Cárdenas/PRUEBAS?oldid=651050] Maximum call stack size exceeded
[fatal][eswikisource/Usuario:Cárdenas/PRUEBAS?oldid=651060] Maximum call stack size exceeded
[fatal][eswikisource/Usuario:Cárdenas/PRUEBAS?oldid=651091] Maximum call stack size exceeded
[fatal][eswikisource/Usuario:Cárdenas/PRUEBAS?oldid=651094] Maximum call stack size exceeded
Comment 10 Gabriel Wicke 2014-06-10 23:04:42 UTC
Seems to work fine with both node 0.10 and 0.11:

[info][eswikisource/Usuario:Cárdenas/PRUEBAS] starting parsing
[info][eswikisource/Usuario:Cárdenas/PRUEBAS] completed parsing in 2094 ms
Comment 11 Arlo Breault 2014-06-10 23:50:33 UTC
@gwicke: did you try with the right oldid?
Comment 12 Gabriel Wicke 2014-06-11 00:05:40 UTC
(In reply to Arlo Breault from comment #11)
> @gwicke: did you try with the right oldid?

I assumed that the issue was still there in the latest revision, which evidently wasn't true. I now got a stack trace with node 0.11. This is the loop:

 at peg$parsetemplate (eval at <anonymous> (/home/gabriel/src/parsoid/lib/mediawiki.tokenizer.peg.js:80:38), <anonymous>:6695:26)
    at peg$parsetplarg_or_template (eval at <anonymous> (/home/gabriel/src/parsoid/lib/mediawiki.tokenizer.peg.js:80:38), <anonymous>:6315:20)
    at peg$parsetplarg_or_template_or_broken (eval at <anonymous> (/home/gabriel/src/parsoid/lib/mediawiki.tokenizer.peg.js:80:38), <anonymous>:6337:12)
    at peg$parseinline_element (eval at <anonymous> (/home/gabriel/src/parsoid/lib/mediawiki.tokenizer.peg.js:80:38), <anonymous>:3418:16)
    at peg$parseinlineline (eval at <anonymous> (/home/gabriel/src/parsoid/lib/mediawiki.tokenizer.peg.js:80:38), <anonymous>:3257:18)
    at peg$parseblock (eval at <anonymous> (/home/gabriel/src/parsoid/lib/mediawiki.tokenizer.peg.js:80:38), <anonymous>:2269:20)
    at peg$parsenested_block (eval at <anonymous> (/home/gabriel/src/parsoid/lib/mediawiki.tokenizer.peg.js:80:38), <anonymous>:2330:14)
    at peg$parsetemplate_param_text (eval at <anonymous> (/home/gabriel/src/parsoid/lib/mediawiki.tokenizer.peg.js:80:38), <anonymous>:7297:18)
    at peg$parsetemplate_param_name (eval at <anonymous> (/home/gabriel/src/parsoid/lib/mediawiki.tokenizer.peg.js:80:38), <anonymous>:7169:14)
    at peg$parsetemplate_param (eval at <anonymous> (/home/gabriel/src/parsoid/lib/mediawiki.tokenizer.peg.js:80:38), <anonymous>:6985:12)
    at peg$parsetemplate (eval at <anonymous> (/home/gabriel/src/parsoid/lib/mediawiki.tokenizer.peg.js:80:38), <anonymous>:6695:26)
Comment 13 Gabriel Wicke 2014-06-11 00:17:16 UTC
Looking at the source at http://es.wikisource.org/w/index.php?title=Usuario:C%C3%A1rdenas/PRUEBAS&action=edit&oldid=651094, there's *a lot* of unclosed template calls, which means that the tokenizer is (correctly) trying to parse this as deeply nested templated parameters.

The PHP parser handles this page without crashing. The reason for this is that it sets $wgMaxTemplateDepth = 40; by default, and aborts a recursion beyond that point without crashing.

I'll see if I can add a similar mechanism in the tokenizer.
Comment 14 Gerrit Notification Bot 2014-06-11 00:35:58 UTC
Change 138763 had a related patch set uploaded by GWicke:
Bug 57670: Limit template expansion depth similar to the PHP parser

https://gerrit.wikimedia.org/r/138763
Comment 15 Gerrit Notification Bot 2014-07-10 19:16:45 UTC
Change 138763 had a related patch set uploaded by Arlolra:
WIP: Limit template expansion depth similar to the PHP parser

https://gerrit.wikimedia.org/r/138763
Comment 16 Gerrit Notification Bot 2014-07-12 13:40:30 UTC
Change 145683 had a related patch set uploaded by Arlolra:
Allow backtracking in async tokenization

https://gerrit.wikimedia.org/r/145683
Comment 17 Gerrit Notification Bot 2014-07-14 16:23:19 UTC
Change 138763 merged by jenkins-bot:
Limit template expansion depth similar to the PHP parser

https://gerrit.wikimedia.org/r/138763
Comment 18 Gerrit Notification Bot 2014-07-14 20:57:34 UTC
Change 145683 merged by jenkins-bot:
Allow backtracking in async tokenization

https://gerrit.wikimedia.org/r/145683

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links