Last modified: 2014-02-12 23:32:47 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T49289, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 47289 - l10n cache doesn't actually cache (due to specific extension localization pair not in hash, marking the cache as invalid)
l10n cache doesn't actually cache (due to specific extension localization pai...
Status: RESOLVED WORKSFORME
Product: MediaWiki
Classification: Unclassified
Database (Other open bugs)
1.20.x
All All
: Low normal (vote)
: ---
Assigned To: Nobody - You can work on this!
aklapper-moreinfo
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-04-16 16:43 UTC by Mark A. Hershberger
Modified: 2014-02-12 23:32 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
The log of a sample pathological request doing full localisation recache (64.31 KB, application/octet-stream)
2013-04-17 18:28 UTC, Victor Danilchenko
Details

Description Mark A. Hershberger 2013-04-16 16:43:04 UTC
From http://permalink.gmane.org/gmane.org.wikimedia.mediawiki/41444

I have a strange problem on my (clustered and load-balanced) Mediawiki instance: the 'l10n_cache' table is getting written to *very* excessively (and constantly.) Mediawiki seems to write the same localization information to the database every 3-5 seconds. This is causing MySQL to constantly write binlog files (for replication to the slaves) which fills the partition after a number of hours.  Removing all my extensions does not stop the problem. I have APC installed for opt-code caching, and Memcached installed for caching other things (sessions, parser-stuff, etc.) I assume that this is all due to the constant requests made by the load-balancer to the wiki which in turn cause MW to recalculate the cache. 

Why is the l10n_cache table getting completely rewritten every few seconds? The cache is not functioning as a cache. 

I found one thing that will stop this behavior:         $wgLocalisationCacheConf['manualRecache'] = true; 
But this feels wrong because it makes me manually recomputed the l10n cache whenever I update/release. (Yet another thing to tack onto the release process.) There are also some configuration variables which will slow the process ($wgUpdateRowsPerQuery)  or cache it to disk ($wgLocalisationCacheConf), but this merely masks the problem. Does anyone have an insight into why the l10n_cache table is getting written to so much? Or why Mediawiki isn't actually *caching* the localization messages in the database?
Comment 1 Daniel Renfro 2013-04-16 17:11:07 UTC
This doesn't happen when using $wgCacheDirectory (and the path has the appropriate ownership/permissions.) It seems to happen only when the localization messages are "cached" in the database (in the l10n_cache table.)
Comment 2 Tim Starling 2013-04-17 03:51:14 UTC
Can you enable the debug log for an individual request that writes to the l10n_cache table, say with a config hack like this:

if ( isset( $_GET['forcedebuglog'] ) ) {
    $wgDebugLogFile = '/tmp/mw.log';
}

Then attach the log to this bug, with session cookies redacted. The log should have LocalisationCache::recache() in it somewhere -- if not, the request probably didn't trigger any cache writes.

Also, if possible, can you get on IRC, on #mediawiki on irc.freenode.net?
Comment 3 Victor Danilchenko 2013-04-17 18:24:37 UTC
I have just done that. There's no 'recache' entry when caching to file, but there is when caching to DB:

LocalisationCache::isExpired(en): cache for en expired due to GlobalDependency
LocalisationCache::recache: got localisation for en from source
DatabaseBase::query: Writes done: DELETE FROM `vpw_l10n_cache` WHERE lc_lang = 'en'
Fully initialised
...
LocalisationCache::recache                                                       1       914.617       914.617        37.637%   3030028  (      914.617 -      914.617) [114]

Full request log is attached.
Comment 4 Victor Danilchenko 2013-04-17 18:28:15 UTC
Created attachment 12129 [details]
The log of a sample pathological request doing full localisation recache

The recache to DB happens on every request; but when caching to file, it doesn't.
Comment 5 Tim Starling 2013-04-17 23:28:10 UTC
"expired due to GlobalDependency" implies that the set of extensions in use has changed, or more specifically,  $wgExtensionMessagesFiles has changed. If this problem doesn't happen when you use local file caching, then maybe the set of enabled extensions depends on what server MediaWiki runs on, say due to LocalSettings.php not being properly synchronised.
Comment 6 Dan Barrett 2013-04-17 23:43:46 UTC
In this server setup, the MediaWiki file tree (including LocalSettings.php) is on a single disk partition, which is mounted via NFS by multiple webservers which are load-balanced.

Is there anything about this setup that might contribute to "LocalSettings.php not being properly synchronised"?
Comment 7 Tim Starling 2013-04-18 04:01:48 UTC
If LocalSettings.php is on NFS then it probably isn't the problem. But there are other ways $wgExtensionMessagesFiles can change. You should patch GlobalDependency::isExpired() in includes/cache/CacheDependency.php so that it starts with:

wfDebug( __METHOD__ . ": {$this->name} old value: " . var_export( $this->value, true ) . "\n" );
wfDebug( __METHOD__ . ": {$this->name} new value: " . var_export( $GLOBALS[$this->name], true ) . "\n" );

Then extract the relevant log lines from the debug log, work out what has changed between the two arrays, and examine the relevant inclusion in LocalSettings.php and the relevant extension setup file.
Comment 8 Daniel Renfro 2013-04-19 21:13:12 UTC
Thanks for the help, Tim. I added some wfDebugLog() calls in GlobalDependency::isExpired(), and came up with this:

  GlobalDependency::isExpired: array (
    'count($GLOBALS[$this->name])' => 120,
    'count($this->value)' => 120,
    'count(array_intersect($GLOBALS[$this->name], $this->value))' => 120,
    'count(array_diff($GLOBALS[$this->name], $this->value))' => 0,
    '$GLOBALS[$this->name] == $this->value' => true,
    '$GLOBALS[$this->name] === $this->value' => true,
  )

Obviously $GLOBALS[$this->name] is set and the arrays are the same. GlobalDependency::isExpired() then should return false, but sometimes it doesn't and recache() is called. I am kinda stuck and handed it off to my colleague Victor to look at.
Comment 9 Victor Danilchenko 2013-04-22 16:19:15 UTC
We found it.

The problem turned out to be an extension that was loading another extension, thus affecting the extension execution scope. When this happened, the localization key/value pair for this extension ended up being present in $GLOBALS[$this->name] hash (in GlobalDependency::isExpired method in includes/cache/CacheDependency.php, line 410 or thereabouts), but absent in $this->value hash in the same code. Therefore, the cache would ALWAYS be flagged as invalid in such a case, because the two hashes would ALWAYS mismatch. 

This seems to be clearly incorrect behavior. While including extensions from other extensions may or may not be a good idea, they should not be treated differently WRT caching; i.e. you shouldn't get the extension localization key included in the GLOBALS table but not the $this->value one.

I am not sure how DB cache store plays into it though.

Thanks for all your help guys, this was invaluable!
Comment 10 Tim Starling 2013-04-23 02:39:04 UTC
Victor, if you still think this is a MediaWiki bug, can you give links to the source code of the relevant two extensions?
Comment 11 Andre Klapper 2013-05-13 00:13:40 UTC
Victor: Could you answer comment 10, please?
Comment 12 Andre Klapper 2013-07-03 12:17:32 UTC
Victor Danilchenko: Could you answer comment 10, please?
Comment 13 Victor Danilchenko 2013-07-03 12:49:24 UTC
I will need to get authorization to do so, after my manager returns from vacation. Give me a little time.
Comment 14 Andre Klapper 2013-09-20 15:28:32 UTC
Victor: Did you find time to ask?
Comment 15 Daniel Renfro 2013-10-01 13:21:00 UTC
Hi Andre,

Sorry for the seemingly abandoned ticket -- Victor himself went on vacation! I came across this ticket in my backlog of issues and am going to have another look at it. 

To review: 
* Our Mediawiki installation is quite complex, which does not ease the debugging. We have three load-balanced web nodes which share a codebase mounted on an NFS partition. We also have three database (MySQL) nodes: one master, two slaves.
* We have a plethora of extensions, both third-party and ones developed in-house for various enterprise functionality.
* The issue here is that the localization cache constantly updates. When requests come in MediaWiki writes a bunch of l10n stuff (I'm assuming these are l10n-messages) to the database. Aside from unnecessarily using resources, in a replicated environment the DB_MASTER writes these INSERTS to the binlog file[1]. These files get executed on the SLAVES, but in the meantime they slowly grow to fill the disk. This is bad.
* From my cursory debugging, it seems that this behavior only occurs on the DB_MASTER (which makes sense.) When $wgCacheDirectory is set this behavior stops (I'm assuming because MediaWiki writes the cache a single time.) This is the current workaround we have in place.

I'll use the stuff Tim posted above to have another look. Thanks for keeping up with this issue -- I'll post back here as soon as I find something!

--Daniel (User:AlephNull)

Refernces:
[1] https://dev.mysql.com/doc/refman/5.0/en/binary-log.html
Comment 16 Andre Klapper 2014-01-09 11:35:25 UTC
(In reply to comment #9)
> The problem turned out to be an extension that was loading another extension,

(In reply to comment #10)
> can you give links to the source code of the relevant two extensions?

(In reply to comment #15)
> I'll post back here as soon as I find something!

Hi Daniel, any news here?
Comment 17 Daniel Renfro 2014-01-15 16:16:24 UTC
For the life of me I cannot reproduce this. It seems to be caused by sunspots or maybe the phase of the moon. 

For a while there the localization cache was convinced it was out of date and re-cached itself (constantly.) I dumped some log files and compared them to the one Victor uploaded a while back (in comment 4.) I couldn't find any substantive differences. 

I have a vague feeling this was due to our MySQL replication and/or lag. I know the LoadBalancer/ChronologyProtector is supposed to guard against things like this, but I can't help but feel that this was the problem. (No real reason -- I definitely could be wrong.) 

Feel free to mark this bug as "Not fixed"/"Not reproducible".
Comment 18 Andre Klapper 2014-01-15 16:56:02 UTC
Alright then. Thanks for investigating and the update here!

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links