Last modified: 2014-05-17 08:17:13 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T33387, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 31387 - ResourceLoader can take a long time to package JS
ResourceLoader can take a long time to package JS
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
ResourceLoader (Other open bugs)
1.18.x
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
: performance
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-10-05 18:27 UTC by Neil Kandalgaonkar
Modified: 2014-05-17 08:17 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Neil Kandalgaonkar 2011-10-05 18:27:38 UTC
When deploying 1.18, we had many issues with this package-loading URL:

http://bits.wikimedia.org/commons.wikimedia.org/load.php?debug=false&lang=en&modules=ext.gadget.mwEmbed%7Cext.uploadwizard.mediawiki.language.parser%7Cjquery.autoEllipsis%2CcheckboxShiftClick%2CcollapsibleTabs%2Ccookie%2CdelayedBind%2ChighlightText%2CmakeCollapsible%2CmessageBox%2CmwPrototypes%2Cplaceholder%2Csuggestions%2CtabIndex%7Cmediawiki.Uri%2Chtmlform%2Clanguage%2Cuser%2Cutil%7Cmediawiki.action.watch.ajax%7Cmediawiki.legacy.ajax%2Cmwsuggest%2Cwikibits%7Cmediawiki.page.ready&skin=vector&version=20111005T161514Z&*

Varnish would return a 503 Service Not Available error after 5 seconds (plus a few hundredths of a second). There were a few other load.php URLs which also returned 503, but I didn't make a note of them.

I guessed that this was due to a timeout when RL was compressing, packaging, or caching this URL. We initially tried upping the first_bytes_timeout of the Varnish servers to 10 seconds, which seemed to work for a few minutes, but then we started getting similar 503 errors after just over 10 seconds.

I tried calling some Apaches directly with the problematic URL (hoping that RL would then memcache the results) which may have worked. But then the JS messages didn't work; in their place we got the message key names in brackets, e.g. a button labelled "[mwe-upwiz-some-button]". This is what the message library does when it can't find the translated message. 

While I was investigating then at some time later they seemed to fix themselves. Then I went home.

There are probably a lot of confounding issues with the 1.18 rollout, and the fact that we were scapping every now and then for other reasons. I am just speculating here, but if scap touches every file, that will cause RL to unnecessarily re-package JS, since it does not check file contents, only the last modified time. But in any case, that shouldn't matter; the underlying issue is that rebuilding the packages can take an inordinately long amount of time.
Comment 1 Roan Kattouw 2011-10-05 18:29:14 UTC
Wasn't this fixed by r99010?
Comment 2 Neil Kandalgaonkar 2011-10-05 18:45:10 UTC
(In reply to comment #1)
> Wasn't this fixed by r99010?

I don't even know what the issue was in production. Maybe it is fixed by r99010.
Comment 3 Roan Kattouw 2011-10-06 11:39:58 UTC
As mentioned on private-l, I have two plans to improve RL performance a little bit:

* Defragment the minification/transformation caches in memcached by caching each module separately rather than caching full responses
* Fix the RL registration performance issues Domas has been talking to me about, which cause a small performance hit in the MediaWiki initialization phase (i.e. during every non-Squid-cached request)

However, I maintain that we haven't actually seen "real" slowness. Yesterday's slowness was caused by DB slowness which was in turn caused by RL overloading the DB with TRUNCATE queries. The slowness in May was caused by a flaw in the cache freshness logic combined with a bug that caused i18n recaching to happen for every language rather than just for the requested language, meaning i18n recaching (which happened on every request due to the broken cache freshness check) was 278 times as slow.

"Things break if they suddenly get 100+ times slower due to a bug" is not a bug in itself. That said, performance improvements are always good, so I'll work on getting those two improvements in.
Comment 4 Krinkle 2012-06-20 04:22:23 UTC
(In reply to comment #3)
> * Defragment the minification/transformation caches in memcached by caching
> each module separately rather than caching full responses
> * Fix the RL registration performance issues Domas has been talking to me
> about, which cause a small performance hit in the MediaWiki initialization
> phase (i.e. during every non-Squid-cached request)
> 

What's the status on these two? I thought the former was solved in the mean time, right?
Comment 5 Nemo 2014-05-17 08:17:13 UTC
Assuming this fixed.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links