Last modified: 2014-10-25 00:25:08 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T53457, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 51457 - Excessive backtracking in attribute_preprocessor_text_line when parsing table cell causes 'hanging' workers in production
Excessive backtracking in attribute_preprocessor_text_line when parsing table...
Status: RESOLVED WORKSFORME
Product: Parsoid
Classification: Unclassified
tokenizer (Other open bugs)
unspecified
All All
: Low normal
: ---
Assigned To: Gabriel Wicke
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-16 18:42 UTC by Gabriel Wicke
Modified: 2014-10-25 00:25 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Gabriel Wicke 2013-07-16 18:42:24 UTC
Several busy ('hanging') workers in production were backtracking when parsing pathological tables in http://el.wikipedia.org/wiki/%CE%A0%CE%BF%CF%81%CE%B5%CE%AF%CE%B1_%CF%84%CF%89%CE%BD_%CE%BA%CF%85%CF%80%CF%81%CE%B9%CE%B1%CE%BA%CF%8E%CE%BD_%CE%BF%CE%BC%CE%AC%CE%B4%CF%89%CE%BD_%CF%83%CF%84%CE%B1_%CE%BA%CF%8D%CF%80%CE%B5%CE%BB%CE%BB%CE%B1_%CE%95%CF%85%CF%81%CF%8E%CF%80%CE%B7%CF%82
I tracked this down by attaching the node debugger to those workers.

Backtracking when parsing table cells with optional attributes is hard to avoid, but in this case there might be a bug in cache key construction for memoization. The presence of plenty of quotes additionally slows down potential-attribute parsing here.

I have some WIP code that speeds things up a lot by avoiding to parse attributes with clearly invalid names, but get some failures in tests where the PHP parser simply strips invalid attribute names. Needs further investigation.
Comment 1 Gerrit Notification Bot 2013-07-18 21:22:03 UTC
Change 74527 had a related patch set uploaded by GWicke:
Bug 51457: Avoid some table attribute parsing backtracking

https://gerrit.wikimedia.org/r/74527
Comment 2 Gerrit Notification Bot 2013-07-18 23:04:58 UTC
Change 74527 merged by jenkins-bot:
Bug 51457: Avoid some backtracking when tokenizing table attributes

https://gerrit.wikimedia.org/r/74527
Comment 3 Gabriel Wicke 2013-07-18 23:34:32 UTC
Lowering priority as this patch improves parse times from hours to minutes. I'm not closing this bug yet as there might be more optimization potential here, and there might be other hang causes too. We'll see how it goes after the next deployment.
Comment 4 Gabriel Wicke 2013-08-26 18:13:03 UTC
This is not a problem in production any more. There have not been major hangs in the last weeks: https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Parsoid%20eqiad&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2&st=1377540766&g=cpu_report&z=large

I'm keeping this bug open, but am de-assigning it in case somebody else would like to continue optimizing attribute tokenization.
Comment 5 Arlo Breault 2014-10-25 00:25:08 UTC
That page seems to parse pretty reasonably these days. Too bad there's no oldid provided here.

[info][elwiki/Πορεία_των_κυπριακών_ομάδων_στα_ευρωπαϊκά_κύπελλα_ποδοσφαίρου?oldid=4859043] redirecting to revision 4859043
[info][elwiki/Πορεία_των_κυπριακών_ομάδων_στα_ευρωπαϊκά_κύπελλα_ποδοσφαίρου?oldid=4859043] started parsing
[info][elwiki/Πορεία_των_κυπριακών_ομάδων_στα_ευρωπαϊκά_κύπελλα_ποδοσφαίρου?oldid=4859043] completed parsing in 15396 ms
[info][elwiki/Πορεία_των_κυπριακών_ομάδων_στα_ευρωπαϊκά_κύπελλα_ποδοσφαίρου?oldid=1678641] started parsing
[info][elwiki/Πορεία_των_κυπριακών_ομάδων_στα_ευρωπαϊκά_κύπελλα_ποδοσφαίρου?oldid=1678641] completed parsing in 12970 ms

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links