Last modified: 2014-06-24 16:30:00 UTC
Created attachment 15142 [details] Time-series graph showing save timing from 1/28 to 4/20 Compared to the start of February 2014, median user-perceived save time is up by approximately a full second, whereas the save time at the 75th percentile is up ~4.5 seconds, jumping from 6.5 to 11 seconds. Backend profiling data for action=visualeditoredit corroborates this and suggests the problem is at the backend. Attached graph can be manipulated in Graphite Composer at <http://graphite.wikimedia.org/>; look for it under User Graphs -> ori.
Do we have data about page save time trends for wikitext saves? I ask because slowness on save in the backend can only really be in Parsoid or in code paths that are taken on every save. Maybe the "backend profiling data" you're talking about can shed more light on this?
(In reply to Roan Kattouw from comment #1) > Maybe the "backend profiling data" you're talking about can shed more light > on this? The backend profiling data is available in Graphite under the 'MediaWiki' namespace; it consists of aggregated wfProfile{In,Out} measurements collected on 2% of requests. Some of the useful metrics are MediaWiki.API.visualeditoredit, MediaWiki.API.edit, MediaWiki.WikiPage.doEditContent, and MediaWiki.API.edit. So far the data seems to suggest that the regression is Parsoid-related, but some additional analysis is required before that can be established definitively. http://graphite.wikimedia.org/render/?width=947&height=521&_salt=1398121734.894&from=00%3A00_20140201&until=23%3A59_20140421&lineMode=connected&target=movingAverage(MediaWiki.API.visualeditoredit.tp90%2C300)&target=movingAverage(MediaWiki.WikiPage.doEditContent.tp90%2C300)&target=movingAverage(MediaWiki.API.edit.tp90%2C300)&target=movingAverage(MediaWiki.API.edit.tp90%2C300)
You have a pending change against Parsoid to add instrumentation, right? Mind linking that change here?
(In reply to Roan Kattouw from comment #3) > You have a pending change against Parsoid to add instrumentation, right? > Mind linking that change here? https://gerrit.wikimedia.org/r/#/c/112647/
(In reply to Ori Livneh from comment #4) > (In reply to Roan Kattouw from comment #3) > > You have a pending change against Parsoid to add instrumentation, right? > > Mind linking that change here? > > https://gerrit.wikimedia.org/r/#/c/112647/ Note that the request timings are readily available in the VE extension. The internal Parsoid timings are returned by Parsoid in a header. In [1] the visualeditoredit API end point does not seem to change much. My understanding is that this is the time for the entire save request including Parsoid serialization. Is that correct? [1]: http://graphite.wikimedia.org/render/?width=947&height=521&_salt=1398121734.894&from=00%3A00_20140101&until=23%3A59_20140421&lineMode=connected&target=movingAverage(MediaWiki.API.visualeditoredit.tp90%2C300)&target=movingAverage(MediaWiki.WikiPage.doEditContent.tp90%2C300)&target=movingAverage(MediaWiki.API.edit.tp90%2C300)&target=movingAverage(MediaWiki.API.edit.tp90%2C300
(In reply to Gabriel Wicke from comment #5) > Note that the request timings are readily available in the VE extension. The > internal Parsoid timings are returned by Parsoid in a header. I should know -- I added this header. But as I recall, we agreed it'd be more practical to report this directly from Parsoid, which is why I submitted <https://gerrit.wikimedia.org/r/#/c/112647/>. We really need to get Parsoid performance data into Graphite. Could you either accept/amend my patch, or provide an alternate implementation?
As I have said earlier, the most important metric should be request timings to the Parsoid API which includes the cache layers. Parsoid itself should not see most requests, so naturally can't report on them. The information you are looking for can be readily measured by timing getHTML(line 85) and postHTML (line 152) in ApiVisualEditor.php.
Change 130543 had a related patch set uploaded by GWicke: WIP Bug 64171: Log Parsoid request timings to eventlogging https://gerrit.wikimedia.org/r/130543
I think the instrumentation in the WIP patch above should shed some light by breaking the save-pipeline time into parsoid-serialization time + everything else. In addition to the instrumentation in the VE extension to measure Parsoid serialization time, on the Parsoid side, we could instrument our serialization pipeline into (a) dom-diff times (b) varnish hit/miss info (c) serialization time. In roundtrip testing, on slower servers, where there is no dom-diff, and there is full serialization (unlike faster selective serialization in production), we found html2wt times to be consistently fast (< 3s in almost all cases). Unfortunately, those numbers are right now buried in our rt-server db which has grown too big. Once the current round of rt-testing finishes, if I can access those times and post some numbers here. On my laptop, a full serialization of 3.8M Barack Obama page completed in ~4.2 s. And, dom-diff of a lightly-edited (~5-6 text edits) version of BO html completed in ~2 s. All that said, dom-diff could still be a bottleneck in some heavy-edit scenarios -- we haven't really done any real perf tests on that component. If this first-level breakdown of time to get Parsoid HTML reveals that we need more breakdown of Parsoid numbers as above, we can prioritize that and add instrumentation in Parsoid (or repurpose Ori's existing patch).
It's also worth keeping in mind that most of the user-perceived save time on large articles & typical DSL connections is currently the network transfer to the server. The attached screenshot is of a save of the [[en:San Francisco]] article from a residential 16mbit DSL connection in SF. 23s is spent uploading 3MB of uncompressed HTML, 7s is spent waiting for and transferring the response. We already discussed this in https://bugzilla.wikimedia.org/show_bug.cgi?id=53093#c8. Since then the situation has already slightly improved by using multipart/form-data content encoding in VE POSTs. We are currently working on removing data attributes from the HTML, which make up the bulk of the uncompressed HTML content due to the excessive " escaping. See https://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec/Element_IDs and bug 52936.
Created attachment 15292 [details] Save request from VE for [[en:San Francisco]] from a 16mbit DSL connection in San Francisco. Most of the time is spent transferring the 3mb of uncompressed HTML to the server.