Last modified: 2014-01-08 22:29:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T60952, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 58952 - node 0.10 memory leak / lockup of client (possible test cases)
node 0.10 memory leak / lockup of client (possible test cases)
Status: RESOLVED FIXED
Product: Parsoid
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: High normal
: ---
Assigned To: Gabriel Wicke
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-12-24 23:00 UTC by ssastry
Modified: 2014-01-08 22:29 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description ssastry 2013-12-24 23:00:29 UTC
On RT-testing clients, a few pages seem to cause long post-result pauses (could be serializer, could be diff classifier) .. which could either be memory leak, a library issue, ..

plwiki:Podsumowanie_startów_zespołu_Lotus_w_Formule_1 (88064 ms)
itwiki:Campionato_Dilettanti_1958-1959 (parses in 11558 ms)
itwiki:Città_della_Pennsylvania_(M-P) (parses in 35763 ms)

This could be node 0.10 specific, but I haven't tested on node 0.8 yet. I can verify that at least the plwiki page above takes a loooooong time to parse.

But, adding info for now before I run out so this can be investigated later.
Comment 1 ssastry 2013-12-24 23:02:29 UTC
Another: enwiki:Russia_at_the_2004_Summer_Olympics in 35387 ms
Comment 2 Gerrit Notification Bot 2013-12-25 06:16:18 UTC
Change 103639 had a related patch set uploaded by GWicke:
Bug 58952: Avoid repeated require of Util from defines

https://gerrit.wikimedia.org/r/103639
Comment 3 Gerrit Notification Bot 2013-12-26 17:07:52 UTC
Change 103756 had a related patch set uploaded by GWicke:
Bug 58952 WIP: Try to make sure env and old documents are freed early

https://gerrit.wikimedia.org/r/103756
Comment 4 ssastry 2013-12-26 19:56:02 UTC
Actually, on some of these pages, it looks like at least in RT testing, diff classification is the culprit. Same behavior on node 0.8.26, node 0.10.19,and node 0.10.24.

----------
[subbu@earth tests] time node roundtrip-test.js --prefix itwiki Campionato_Dilettanti_1958-1959 > /dev/null
completed parsing of itwiki:Campionato_Dilettanti_1958-1959 in 11524 ms
wt2html + html2wt time: 14674
130.808u 1.072s 2:28.11 89.0%	0+0k 0+0io 0pf+0w
----------
[subbu@earth tests] time node roundtrip-test.js --prefix plwiki 'Podsumowanie_startów_zespołu_Lotus_w_Formule_1' > /dev/null
completed parsing of plwiki:Podsumowanie_startów_zespołu_Lotus_w_Formule_1 in 57277 ms
wt2html + html2wt time: 64485
464.153u 3.188s 9:25.44 82.6%	0+0k 0+0io 0pf+0w
----------

node 0.10 seems faster than 0.8 on the plwiki page. In addition, roundtrip-test.js on node 0.8 failed with FATAL ERROR: JS Allocation failed - process out of memory.

So, RT-issues on these pages are present in both 0.8 and 0.10, it seems.
Comment 5 ssastry 2013-12-26 22:06:44 UTC
It looks like jsdiff is the culprit at least in rt-testing for inefficient processing of some pages -- main cause for both long processing times + memory usage (peak 2.3g virtual, 1.3g resident) on the plwiki test page.

----------------------
[subbu@earth tests] time node roundtrip-test.js --prefix itwiki Campionato_Dilettanti_1958-1959 > /dev/null
completed parsing of itwiki:Campionato_Dilettanti_1958-1959 in 45254 ms
after parse: Thu Dec 26 2013 15:48:46 GMT-0600 (CST)
wt2html + html2wt time: 49303
after serialization before rt-diff: Thu Dec 26 2013 15:48:49 GMT-0600 (CST)
after jsdiff: Thu Dec 26 2013 15:51:12 GMT-0600 (CST)
after convert-diff: Thu Dec 26 2013 15:51:12 GMT-0600 (CST)
before double-rt-diff: Thu Dec 26 2013 15:51:13 GMT-0600 (CST)
before checkIfSignificant: Thu Dec 26 2013 15:51:55 GMT-0600 (CST)
159.505u 1.272s 3:58.13 67.5%	0+0k 0+0io 0pf+0w
----------------------

Relevant entry for plwiki:Podsumowanie_startów_zespołu_Lotus_w_Formule_1

----------------------
after serialization before rt-diff: Thu Dec 26 2013 15:56:42 GMT-0600 (CST)
after jsdiff: Thu Dec 26 2013 16:03:40 GMT-0600 (CST)
----------------------
Comment 6 Gabriel Wicke 2014-01-03 18:31:27 UTC
The general GC / possible memory leak issue happens without jsdiff too. The backend server forked by roundtrip-test.js grows in rss when processing many requests in a row until 1.7G is reached. Only pure wt2html and html2wt modes are used there, and only html2wt POSTs seem to trigger the memory growth. Some of my patches above are geared towards making it more likely for the small incremental collections in 0.10 to still pick up garbage as early as possible by explicitly breaking reference cycles.

The v8 GC in node 0.10 is apparently geared more towards low pause times than thorough GC. I experimented with a limited --max_old_space_size and a few other GC related parameters to force full old space collections, and was able to get rt tests to run within 300m rss for a while. GC would use most time when it got close to the limit  though, so more experimentation with a slightly higher limit is needed.

This issue is also discussed at https://github.com/joyent/node/issues/5828. The GC in v8 3.20 (will be in node 0.12) might have relevant fixes.
Comment 7 ssastry 2014-01-03 19:08:01 UTC
Hmm ... okay .. so, I guess it looks like we have to do a direct upgrade to 0.12 instead then unless we get a reliable workaround with the --max_old_space_size param.
Comment 8 Gabriel Wicke 2014-01-03 19:32:05 UTC
Also possibly relevant: The "LATENCY AND IDLE GARBAGE COLLECTION" paragraph in http://blog.nodejs.org/2013/03/11/node-v0-10-0-stable/
Comment 9 Gerrit Notification Bot 2014-01-07 00:34:38 UTC
Change 105875 had a related patch set uploaded by GWicke:
WIP Bug 58952: Help incremental GC further by breaking refs

https://gerrit.wikimedia.org/r/105875
Comment 10 Gerrit Notification Bot 2014-01-07 20:33:29 UTC
Change 105875 merged by jenkins-bot:
Bug 58952: Help incremental GC further by breaking refs

https://gerrit.wikimedia.org/r/105875
Comment 11 Gabriel Wicke 2014-01-08 22:26:23 UTC
The 0.10 specific memory leak issue seems to be solved by these patches. After the last round-trip test run, the largest node process used 600m with most using around 300, which is very close to what we have seen in 0.8.

The jsdiff issue on large pages is however still present and indpendent of the node version. I have created bug 59840 to track just the jsdiff issue.

I'm closing this bug as fixed, as it was primarily about node 0.10 issues.
Comment 12 Gabriel Wicke 2014-01-08 22:29:02 UTC
A dsh command to list memory of all node processes in rt testing (from https://www.mediawiki.org/wiki/Parsoid/Round-trip_testing#Some_dsh_tricks):

dsh -M -cf /home/gwicke/rtclients ps -C nodejs -o rss= | sort -n -k2

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links