Last modified: 2013-05-07 16:29:04 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T46570, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 44570 - Time prior to removal of old wmfbranch directories from cluster MUST be higher than longest cache of ANY kind; leads to missing resources
Time prior to removal of old wmfbranch directories from cluster MUST be highe...
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: High major (vote)
: ---
Assigned To: Aaron Schulz
: code-update-regression
: 40126 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-02-01 05:04 UTC by Krinkle
Modified: 2013-05-07 16:29 UTC (History)
15 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Krinkle 2013-02-01 05:04:51 UTC
I just noticed when browsing some of our staff's user pages on wikimediafoundation.org that the search icon on the top left is not appearing for several user pages. This was because the we removed the wmfbranch directories on the server before the last cache expired.

For example the search-magnify icon is referred to in the html via the static-{wmfbranch} path on the "bits" server.

via https://bits.wikimedia.org/static-1.21wmf1/skins/vector/images/search-ltr.png?303-4

# Reproduce

- log out
- clear cookies
  // just logging out apparently still leaves 3 cookies
  // which cause some part of the cluster to serve
  // a new version instead >>> bug?
- page last modified before December 1, 2012
- current date after January 30, 2013
- visit one of:
  - https://wikimediafoundation.org/wiki/User:Gyoung
  - https://wikimediafoundation.org/wiki/User:Catrope


## Request

Request URL: https://wikimediafoundation.org/wiki/User:Gyoung
Request Method: GET
Status Code: 200 OK

### Request Headers
GET /wiki/User:Gyoung HTTP/1.1
Host: wikimediafoundation.org
Connection: keep-alive
Cache-Control: no-cache
Pragma: no-cache
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.101 Safari/537.11
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3

### Response Headers
HTTP/1.1 200 OK
Server: nginx/1.1.19
Date: Fri, 01 Feb 2013 04:47:40 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 7118
Connection: keep-alive
X-Content-Type-Options: nosniff
Content-Language: en
X-Vary-Options: Accept-Encoding;list-contains=gzip,Cookie;string-contains=foundationwikiToken;string-contains=foundationwikiLoggedOut;string-contains=foundationwiki_session;string-contains=mf_useformat
Last-Modified: Wed, 19 Sep 2012 20:02:07 GMT
Content-Encoding: gzip
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control: private, must-revalidate, max-age=0
Vary: Accept-Encoding,Cookie
X-Cache: HIT from sq72.wikimedia.org
X-Cache-Lookup: HIT from sq72.wikimedia.org:3128
X-Cache: MISS from sq64.wikimedia.org
X-Cache-Lookup: HIT from sq64.wikimedia.org:80
Via: 1.1 sq72.wikimedia.org:3128 (squid/2.7.STABLE9), 1.0 sq64.wikimedia.org:80 (squid/2.7.STABLE9)


## Response

<!DOCTYPE html>
<html>
 <meta name="generator" content="MediaWiki 1.21wmf1">
 ..
 <div id="mw-content-text" ..>
  ..
<!-- 
NewPP limit report
Preprocessor visited node count: 62/1000000
Preprocessor generated node count: 349/1000000
Post-expand include size: 4937/2048000 bytes
Template argument size: 1975/2048000 bytes
Highest expansion depth: 4/40
Expensive parser function count: 0/500
-->

<!-- Saved in parser cache with key foundationwiki:pcache:idhash:21087-0!*!0!!*!4!* and timestamp 20120919200207 -->
  ..
 </div>
 ..
 <img src="//bits.wikimedia.org/static-1.21wmf1/skins/vector/images/search-ltr.png?303-4" alt="Search" width="12" height="13">
 ..
 <!-- Served by srv231 in 0.140 secs. -->
 ..
</html>


## Errors

404 (Not Found)
 GET https://bits.wikimedia.org/static-1.21wmf1/skins/vector/images/search-ltr.png?303-4

404 (Not Found)
 GET https://bits.wikimedia.org/static-1.21wmf1/skins/common/images/poweredby_mediawiki_88x31.png



So, to conclude. These paths can be in the database, memcached, squid, varnish, whatever the case. If some component somewhere is not modified (modules, files, wiki pages, configuration, epoch, whatever it is) it may be cached by one of the caches somewhere, which means we must be sure to never remove publicly exposed paths before the longest cache is expired.

Marking as regression as this is a regression from the het deploy process.

We just need to make sure that we don't perform the teardown of an iteration until the longest cache is expired.

This can be documented and hoped that everyone will remember, but though it is only a small image this time, it can cause more significant and visual damages other times. The principle is the same, so let's not find out the hard way but be smart about it.

If I recall correctly there is a maintenance script in multiversion that removes the paths and symlinks (essentially the teardown opposite of `bin/checkoutMediaWiki` (`bin/deleteMediaWiki`)[1]). I propose we add some logic there that determines how old a branch is (commit date of first commit in the branch deriving from master) and ensure that it is
> older than (current time) - CACHE_MAX_MAX_AGE + CACHE_HERE-BE-DRAGONS_MARGIN

These constants can be hardcoded in the script since there is no realistically feasible way to determine the maximum max age of all caching layers we have. From guess I'd say that max max age is 31 days and margin of 7 days.

If the condition is false, the shell user is NOT allowed to execute the script further.



-----

[1]
 https://gerrit.wikimedia.org/r/gitweb?p=operations/mediawiki-multiversion.git;a=tree
 https://gerrit.wikimedia.org/r/gitweb?p=operations/mediawiki-multiversion.git;a=blob;f=checkoutMediaWiki;h=677f17d0121743ed4b94bfc259d4b46255edc0ce;hb=HEAD
 https://gerrit.wikimedia.org/r/gitweb?p=operations/mediawiki-multiversion.git;a=blob;f=deleteMediaWiki;h=b90bf0c0a7b4687a880d077dcfab360e3add5949;hb=HEAD
Comment 1 Krinkle 2013-02-01 05:07:48 UTC
cc-ing @reedy (since he runs them most often) and @catrope, @AaronSchulz because they appear to have written most of the code.
Comment 2 Sam Reed (reedy) 2013-02-01 10:16:58 UTC
Faidon noticed this earlier this week. ;) I wonder if there was already a bug for it. I seem to recall it being much more frequent in 1.20

Noting wmf1 was out of production in October I seem to recall.

I don't remove any till at least a month after the last used date.

I guess the parser cache is seemingly far too long.
Comment 3 Sam Reed (reedy) 2013-02-01 10:17:08 UTC
Faidon noticed this earlier this week. ;) I wonder if there was already a bug for it. I seem to recall it being much more frequent in 1.20

Noting wmf1 was out of production in October I seem to recall.

I don't remove any till at least a month after the last used date.

I guess the parser cache is seemingly far too long.
Comment 4 Andre Klapper 2013-02-01 10:29:28 UTC
Duplicate of bug 40126?
Comment 5 Sam Reed (reedy) 2013-02-01 10:35:15 UTC
(In reply to comment #4)
> Duplicate of bug 40126?

Yup. Though I think it probably makes more sense to mark 40126 as a dupe of this one...
Comment 6 Sam Reed (reedy) 2013-02-02 03:34:34 UTC
*** Bug 40126 has been marked as a duplicate of this bug. ***
Comment 7 Sam Reed (reedy) 2013-02-02 03:35:14 UTC
'wgParserCacheExpireTime' => array(
	'default' => 86400 * 365,
),

Down to 28 days in https://gerrit.wikimedia.org/r/47202
Comment 8 Sam Reed (reedy) 2013-02-02 03:40:08 UTC
Note, it's not deployed! (for obvious reasons)

Though, it doesn't help us with other old stale parser cache entries laying around.

'wgSquidMaxage' => array(
	'default' => 2678400, // 31 days seems about right
	'foundationwiki' => 3600, // template links may be funky
),


I guess it should maybe match Squid Maxage.. Should the parser cache also be lower for foundationwiki? 60 minutes seems a bit on the low side though.
Comment 9 Sam Reed (reedy) 2013-02-02 03:45:37 UTC
(In reply to comment #8)
> Though, it doesn't help us with other old stale parser cache entries laying
> around.

maintenance/purgeParserCache.php

Using an age option of the value we have the new expire time to be. Might want to do it in stages...
Comment 10 Sam Reed (reedy) 2013-02-02 17:29:17 UTC
class misc::maintenance::parsercachepurging {

	system_role { "misc::maintenance::parsercachepurging": description => "Misc - Maintenance Server: parser cache purging" }

	cron { 'parser_cache_purging':
		user => apache,
		minute => 0,
		hour => 1,
		weekday => 0,
		# Purge entries older than 30d * 86400s/d = 2592000s
		command => '/usr/local/bin/mwscript purgeParserCache.php --wiki=aawiki --age=2592000 >/dev/null 2>&1',
		ensure => present,
	}

}


Those parser cache entries should've been removed at 30 days old...
Comment 11 Sam Reed (reedy) 2013-02-03 17:58:47 UTC
*** Bug 42858 has been marked as a duplicate of this bug. ***
Comment 12 Tim Starling 2013-02-04 01:16:17 UTC
Those caches have an expiry time longer than 30 days because it takes longer than 30 days for them to fill. We didn't just pick those numbers out of a hat.
Comment 13 Sam Reed (reedy) 2013-02-20 19:05:02 UTC
*** Bug 42858 has been marked as a duplicate of this bug. ***
Comment 14 Krinkle 2013-02-20 19:05:26 UTC
We're still serving links to 1.21wmf1 and 1.21wmf2.

I propose we:
* Short term: Stop deleting branches from now on.
* Short term: Re-deploy missing versions between 1.21wmf1 and current

* Medium term: Find out a reliable time duration at which no pages are being served anymore linking to an old version.
* Medium term: Schedule deletions of old versions from production only after a version is completely "expired" and obsolete.
Comment 15 Krinkle 2013-02-20 19:11:20 UTC
(In reply to comment #14)
> We're still serving links to 1.21wmf1 and 1.21wmf2.
> 
> I propose we:
> * Short term: Stop deleting branches from now on.
> * Short term: Re-deploy missing versions between 1.21wmf1 and current
> 
> * Medium term: Find out a reliable time duration at which no pages are being
> served anymore linking to an old version.
> * Medium term: Schedule deletions of old versions from production only after
> a
> version is completely "expired" and obsolete.


* Long term: Modify references to not be hardcoded in the main html output, so that this doesn't matter anymore and it is all handled by ResourceLoader instead. For images that means css, for scripts that means mw.loader.load, and for the csshover file... well, I guess we could maybe set $wgLocalStylePath to the generic /skins/ symlink that points to one of the HET-deployed versions (may not be the right version though). Or perhaps make it so that the docroot /w of each wiki is pointing to the correct php dir). Anyway, that's long term idealism.
Comment 16 Sam Reed (reedy) 2013-02-20 19:17:04 UTC
'wgParserCacheExpireTime' => array(
	'default' => 86400 * 365,
),


Should we keep files around for a year? I don't think so..

Considering we still have space issues on some machines (non apache), just storing numerous times more files is wasting space.
Comment 17 MZMcBride 2013-02-20 19:30:03 UTC
(In reply to comment #15)
> * Long term: Modify references to not be hardcoded in the main html output,
> so that this doesn't matter anymore and it is all handled by ResourceLoader
> instead. For images that means css, for scripts that means mw.loader.load,
> and for the csshover file... well, I guess we could maybe set $wgLocalStylePath
> to the generic /skins/ symlink that points to one of the HET-deployed versions
> (may not be the right version though). Or perhaps make it so that the docroot
> /w of each wiki is pointing to the correct php dir). Anyway, that's long term
> idealism.

This seems like a sensible approach. Why is it long term idealism, though?
Comment 18 Krinkle 2013-02-20 23:34:17 UTC
(In reply to comment #17)
> (In reply to comment #15)
> > * Long term: Modify references to not be hardcoded in the main html output,
> > so that this doesn't matter anymore and it is all handled by ResourceLoader
> > instead. For images that means css, for scripts that means mw.loader.load,
> > and for the csshover file... well, I guess we could maybe set $wgLocalStylePath
> > to the generic /skins/ symlink that points to one of the HET-deployed versions
> > (may not be the right version though). Or perhaps make it so that the docroot
> > /w of each wiki is pointing to the correct php dir). Anyway, that's long term
> > idealism.
> 
> This seems like a sensible approach. Why is it long term idealism, though?

Because we already have a ton of stuff in various layers of cache that we can't just get rid of (so there's the short term first). And certain things can't use mw.loader.load yet because of other issues, and it isn't without controversy to start linking to /skins/ instead of /skins-{version}/, afterall we version these for a reason. Blinding linking to version A from output of wiki on version B can cause all sorts of undocumented trouble.

And there's maybe some layout reason or semantic reason for certain references to be in HTML instead of CSS.
Comment 19 Bawolff (Brian Wolff) 2013-02-22 21:55:55 UTC
So from what I understand:
*we want to have version numbers in static resources urls
*we want to delete old skins directories (i must say im surprised that space is an issue. I would expect the static assets that are not loaded via load.php (only ui images?) To total about a megabyte.)
*we want to have really long lived parser cache entries that contain refs to static assets and the references shouldnt expire.

Possibly stupid idea - why not have a 404 handler (or rewrite rule, etc) which if given a url with an outdated skins url gives an http redirect to the new url. In most cases showing a newer version of an image is not a bad thing.
Comment 20 Jesús Martínez Novo (Ciencia Al Poder) 2013-02-23 11:52:19 UTC
Maybe a solution would be to change the way we're dealing with versioning. Instead of creating new directories for each wmfbranch, place all of them in the same (just upgrade over it) and on all links to it (images, scripts, all static content) use a query string that differentiates each version, like for example:

/static/skins/vector/images/search-ltr.png?1.21wmf1

Then the server cache could store it for a very long time, and whenever a new version is made, change the query string so it forces a new entry on the cache. If that query string is problematic (because of the dot, may be interpreted as a file extension) just use the same schema and make all of them point to the same directory as Bawolff suggests.

But then we could have a problem if not all wikis use the same wmfbranch and we need to keep two different versions accessible at the same time in case the server cache is cleared up. I'm not familiar about how Wikimedia is doing the upgrade of all wikis. All upgrade at once? By groups? One by one? Maybe creating two or more groups of static content could deal with that (bits1.wikimedia.org, bits2.wikimedia.org, etc).
Comment 21 Krinkle 2013-02-24 15:00:57 UTC
(In reply to comment #20)
> Maybe a solution would be to change the way we're dealing with versioning.
> Instead of creating new directories for each wmfbranch, place all of them in
> the same (just upgrade over it) and on all links to it (images, scripts, all
> static content) use a query string that differentiates each version, like for
> example:
> 
> /static/skins/vector/images/search-ltr.png?1.21wmf1
> 

No.

First of all, that kind of path is not supported in MediaWiki core right not (only prefix, not suffice), though support could be added.

The problem here is this:

* Cache older than the oldest wmfbranch we have can't access the resources anymore
* We have cache older than the oldest wmfbranch.
* Or.. we removed wmf branches before the oldest cache expired.

By changing the url structure, we mask 1 problem and add a new problem. It doesn't truly solve *any* problem

1) Files that haven't changed will appear to work, but:
2) files that have changed are either still a 404 error (if they were moved), or worse, if they're incompatible they'll break all sorts of stuff (by applying new styles to old content, or executing parts of a scripts etc.).

Even in the current system we occasionally get mis-match between versions causing headers to stick out of boxes and things to look ugly on the site for several weeks. We've had that and it wasn't fun.

Implementing something that by design will cause an unpredictable combination of version mismatches is unacceptable.

Our problem is relatively simple, so lets try to solve it without all kinds of weird workflow and infrastructure changes that introduce new problems.

I refer to comment 14, with additional thought that we may have to tighten up certain caches if we don't want to keep them around that long.
Comment 22 Andre Klapper 2013-02-25 15:26:47 UTC
For the records: Problem brought to a wider audience: http://lists.wikimedia.org/pipermail/wikitech-l/2013-February/066770.html
Comment 23 Krinkle 2013-02-27 01:36:55 UTC
https://wikitech.wikimedia.org/index.php?title=Server_admin_log&diff=56842&oldid=56841
> Bringing back checkouts of MediaWiki 1.21wmf5 - 1.21wmf1 on fenari for [[bugzilla:44570|bug 44570]]
Comment 24 Ariel T. Glenn 2013-02-27 06:55:04 UTC
So who would be the person or persons to come up with a reasonable lifetime for the parser cache?  I agree that a year is ridiculously long (see comment 16).
Comment 25 Krinkle 2013-02-27 07:29:31 UTC
(In reply to comment #16)
> 'wgParserCacheExpireTime' => array(
>     'default' => 86400 * 365,
> ),
> 
> 
> Should we keep files around for a year? I don't think so..
> 

Proposal to shorten it to 30 days:
 Change If7dad7f5a8 (author=reedy)

To get an insight into how far back the cache really goes in production (wgParserCacheExpireTime may not be the only factor), we should get statistics on hits (including 404 hits).

e.g. Any GET request for //bits.wikimedia.org/static-* in the last 30 days (number of hits by unique url, regardless of whether response was 200/304/404).

Then once we know what the oldest version is that we're still serving, we can periodically re-query this to see if our changes are helping to move up the cut off point.
Comment 26 Ariel T. Glenn 2013-02-27 09:50:44 UTC
So that change is now live.  still a few things pending, recording them here so we don't forget.

1) really the cron purge job should have taken care of that.  it's fine that the purge job and the parser cache expiry limit are in sync but with that change we now have a bunch of things that won't get purged by it, need to reset expiry on all items with > end of march to end of march, or some such.

2) there was an entry for  foundationwiki:pcache:idhash:21087-0!*!0!!*!4!* with timestamp 20120919200207 but where was it?  and how did it survive purges? 

3) the footer on pages has a reference to the powered by mediawiki icon, which varies by version, and that breaks. it would be nice to handle things like this differently.
Comment 27 Tim Starling 2013-02-28 22:17:07 UTC
(In reply to comment #26)
> So that change is now live.

I reverted it. I thought I was pretty clear that I didn't think it was a good idea.
Comment 28 Ariel T. Glenn 2013-03-01 08:10:45 UTC
(In reply to comment #27)

We have been running /usr/local/bin/mwscript purgeParserCache.php with --age=2592000 for a couple months now (see r38275), did you not agree with this?  Should we be changing it?
Comment 29 Sam Reed (reedy) 2013-03-01 08:32:45 UTC
Tim suggested that it didn't seem to be working..

mysql:wikiadmin@pc1001 [parsercache]> SELECT exptime FROM pc001 ORDER BY exptime ASC limit 1;
+---------------------+
| exptime             |
+---------------------+
| 2013-03-01 03:35:55 |
+---------------------+
1 row in set (0.02 sec)

mysql:wikiadmin@pc1001 [parsercache]> SELECT exptime FROM pc001 ORDER BY exptime DESC limit 1;
+---------------------+
| exptime             |
+---------------------+
| 2014-03-01 08:30:07 |
+---------------------+
1 row in set (0.03 sec)
Comment 30 MZMcBride 2013-03-01 08:37:43 UTC
(In reply to comment #28)
> We have been running /usr/local/bin/mwscript purgeParserCache.php with
> --age=2592000 for a couple months now (see r38275), [...]

You mean <https://gerrit.wikimedia.org/r/38275>, of course. :-)
Comment 31 Ariel T. Glenn 2013-03-01 08:48:01 UTC
(In reply to comment #30)
Er yes, I do. :-)

(In reply to comment #29)
Ok, if it's broken maybe we should look at that.
Comment 32 Tim Starling 2013-03-01 19:15:37 UTC
(In reply to comment #28)
> (In reply to comment #27)
> 
> We have been running /usr/local/bin/mwscript purgeParserCache.php with
> --age=2592000 for a couple months now (see r38275), did you not agree with
> this?  Should we be changing it?

I've reviewed the hit rate data on graphite. From the perspective of parser cache hit rate, the expiry time should probably be 2-3 months, but judging by the time between parser resets, we can't store much more than 1 month without running out of disk space. 

In February 2012, we had an absent rate of only 2%, with an expired rate of 5%, after 6 months of fill time. We never achieved anything like that again, apparently because of disk space constraints. But with 1 month we should see something like 7% absent plus 2% expired. At least it's a big improvement over

http://tstarling.com/stuff/hit-rate-2011-03-25.png

Some data I gathered before the start of the MySQL parser cache project.

I don't think it's appropriate to set the parser cache expiry time based on the number of MW instances we can store on the cluster. The CPU cost of rewriting the bits URLs would be negligible compared to the CPU cost of reparsing the article from scratch. We don't want to have to increase our deployment period just to achieve a higher hit rate, and we don't want the deployment period to affect how much disk space we buy for the parser cache. There are plenty of ways to decouple the two.

Ultimately, I'd like to use an LRU expiry policy for the parser cache, instead of deleting objects based on creation time. That will make a decoupling between expiry time and MW deployment cycle even more necessary.
Comment 33 Tim Starling 2013-03-01 19:28:14 UTC
(In reply to comment #31)
> (In reply to comment #30)
> Er yes, I do. :-)
> 
> (In reply to comment #29)
> Ok, if it's broken maybe we should look at that.

mysql:root@localhost [parsercache]> select date_format(exptime,'%Y-%m') as mo,count(*) from pc255 group by mo;
+---------+----------+
| mo      | count(*) |
+---------+----------+
| 2013-02 |     2144 |
| 2013-03 |    44279 |
| 2013-04 |   298564 |
| 2014-02 |     1156 |
| 2014-03 |    18231 |
+---------+----------+
5 rows in set (0.46 sec)

The objects expiring in 2013-02 are probably ones with "old magic", i.e. the parser overrides the expiry time to be 1 hour. The ones expiring in 2013-03 and 2013-04 would be the objects written in the last few days, with one-month expiries. The objects with expiries of 2014-02 and 2014-03 are from when the expiry time was 12 months -- they will not be deleted for 11 months due to the way purgeParserCache.php determines creation times. Just changing $wgParserCacheExpireTime causes purgeParserCache.php to stop purging things, because it makes those objects look like they were created in the future.
Comment 34 Tim Starling 2013-03-01 19:43:12 UTC
Interestingly, this also means that now that the parser cache expiry time is back to 12 months, purgeParserCache.php will purge 95% of the parser cache on Saturday night, as fast as the script can manage to do it. It won't even wait for replication. Not sure what sort of CPU spike that will make. I'll be in the air, do have fun with that.

The reason this change makes me so angry is because this critical site-wide parameter was changed without any kind of research being done into the possible consequences.
Comment 35 Sam Reed (reedy) 2013-03-01 20:43:14 UTC
And of course, due to Timo creating all those checkouts again, we've no way of confirming that the fix has actually fixed anything, as users getting old pages (from 1.21 at least) will be able to the images, meaning they see no issue in the real world.

They're going to have to go again
Comment 36 Sam Reed (reedy) 2013-03-01 21:08:15 UTC
(In reply to comment #35)
> They're going to have to go again

Or just delete/break the symlinks
Comment 37 Tim Starling 2013-03-01 21:32:13 UTC
(In reply to comment #34)
> Interestingly, this also means that now that the parser cache expiry time is
> back to 12 months, purgeParserCache.php will purge 95% of the parser cache on
> Saturday night, as fast as the script can manage to do it. It won't even wait
> for replication. Not sure what sort of CPU spike that will make. I'll be in
> the
> air, do have fun with that.

I reduced the parser cache expiry again so that this won't happen.
Comment 38 Sam Reed (reedy) 2013-03-02 00:47:07 UTC
Timo: Could we add some sort of JS workaround, that if a page loads and there's missing (specific) resources, the page get's purged and maybe even refreshed?
Comment 39 MZMcBride 2013-03-02 07:08:06 UTC
For reference:

(Reedy's original commit)
If7dad7f5a8b0081f1118941f4aa63e963986cf6a

(In reply to comment #27)
> I reverted [...]
Ic453ad0a10a7189c0f3281c06f98227c57cbf81d

(In reply to comment #37)
> I reduced the parser cache expiry again [...]
I61a706d931ff2e53108c082da88fa91b82ea1214
Comment 40 Ariel T. Glenn 2013-03-02 07:20:30 UTC
(In reply to comment #33)

> +---------+----------+
> | mo      | count(*) |
> +---------+----------+
> | 2013-02 |     2144 |
> | 2013-03 |    44279 |
> | 2013-04 |   298564 |
> | 2014-02 |     1156 |
> | 2014-03 |    18231 |
> +---------+----------+

I'm trying to understand these results; we were running with he month-long setting for about a day and a half, yet the vast majority of the entries have these short expiries.  I would have expected that most of the entries would have had long expiration dates, since nothing would have removed them   What am I missing?

> ... Just changing
> $wgParserCacheExpireTime causes purgeParserCache.php to stop purging things,
> because it makes those objects look like they were created in the future.

Yes, the expiration times would have needed to be adjusted, see comment #26 first item.  And I guess they do again, since we are back at on month expiry again.
Comment 41 Roan Kattouw 2013-03-21 01:16:39 UTC
I had a chat about this with Greg, Aaron, Peter and Asher.

We realized that we have pages in our cache that were generated more than 30 days ago, most likely due to a phenomenon I'm calling 304 extension. MediaWiki serves pages with a cache expiry of 30 days but with must-revalidate. This means that every time someone requests the page from Squid, Squid will issue an If-Modified-Since request to MediaWiki, and MediaWiki will respond with a 304 if the page hasn't been edited. This 304 also comes with a 30-day cache expiry, so the cache expiry timer now rewinds back to zero and starts counting to 30 days again. This way, a page that is never edited will never be recached, as long as it is requested at least once every 30 days. So our assumptions that the Squid cache turns over every 30 days is faulty, and there are pages that have been in the cache for longer than that. http://en.wikipedia.org/wiki/Wikipedia:No_climbing_the_Reichstag_dressed_as_Spider-Man is an example: visit it as an anon and you'll see <meta name="generator" content="MediaWiki 1.21wmf8"> and Last-Modified: Mon, 04 Feb 2013 18:34:48 GMT .

The suggested workaround for this issue is to modify MediaWiki such that it only sends a 304 when the If-Modified-Since timestamp is after the page_touched timestamp AND the If-Modified-Since timestamp is not more than 30 days ago. That way, Squid will do a full revalidation every 30 days, and we never have pages older than 30 days in the Squid cache.
Comment 42 Jesús Martínez Novo (Ciencia Al Poder) 2013-03-21 20:14:45 UTC
Wait. I can not believe MediaWiki is sending pages with Cache-Control: max-age=2592000 (2592000 = 30 days in seconds).

I've accessed that page as anon and I got as a response:

 Cache-Control=private, s-maxage=0, max-age=0, must-revalidate

And indeed I got the 1.21wmf8 meta tag. Since it has max-age=0, the browser stores a copy of the page on the cache, but every time that page is requested, a request is made to the server as Roan said (with the If-Modified-Since header).

Of course, that's the response from the squids, but I'm pretty sure MediaWiki also sends the max-age=0 to the squids. Sending a value as long as 30 days is bizarre and not desired under any circumstance for a page (it should be good for static resources, but not for anything dynamic).

When changing wmf branch, maybe we should update $wgCacheEpoch, and MediaWiki should send as a "Last-modified" header the min value between $wgCacheEpoch and page_touched (ideally, not only page_touched but page_touched of any page linked or transcluded from it).
Comment 43 Bawolff (Brian Wolff) 2013-03-21 20:18:41 UTC
(In reply to comment #42)
> Wait. I can not believe MediaWiki is sending pages with Cache-Control:
> max-age=2592000 (2592000 = 30 days in seconds).
> 
> I've accessed that page as anon and I got as a response:
> 
>  Cache-Control=private, s-maxage=0, max-age=0, must-revalidate
> 
> And indeed I got the 1.21wmf8 meta tag. Since it has max-age=0, the browser
> stores a copy of the page on the cache, but every time that page is
> requested,
> a request is made to the server as Roan said (with the If-Modified-Since
> header).
> 
> Of course, that's the response from the squids, but I'm pretty sure MediaWiki
> also sends the max-age=0 to the squids. Sending a value as long as 30 days is
> bizarre and not desired under any circumstance for a page (it should be good
> for static resources, but not for anything dynamic).
> 
> When changing wmf branch, maybe we should update $wgCacheEpoch, and MediaWiki
> should send as a "Last-modified" header the min value between $wgCacheEpoch
> and
> page_touched (ideally, not only page_touched but page_touched of any page
> linked or transcluded from it).

Squids get longer expires because we purge pages when they get edited. See the cache control logic in OutputPage.php
Comment 44 Roan Kattouw 2013-03-28 22:50:16 UTC
(In reply to comment #43)
> (In reply to comment #42)
> > Wait. I can not believe MediaWiki is sending pages with Cache-Control:
> > max-age=2592000 (2592000 = 30 days in seconds).
> > 
> > I've accessed that page as anon and I got as a response:
> > 
> >  Cache-Control=private, s-maxage=0, max-age=0, must-revalidate
> > 
That's what you saw, but you got that from the Squids. MediaWiki send s-maxage headers, and Squid obeys those then munges them to make sure no one else will cache the page downstream.

> Squids get longer expires because we purge pages when they get edited. See
> the
> cache control logic in OutputPage.php
That's right. We send 30-day s-maxage headers and send explicit purges when pages are edited.
Comment 45 Rob Lanphier 2013-04-08 21:36:36 UTC
Per MW Core mtg today
Comment 46 Gerrit Notification Bot 2013-04-09 20:58:05 UTC
Related URL: https://gerrit.wikimedia.org/r/58415 (Gerrit Change I3889f300012aeabd37e228653279ad19b296e4ae)
Comment 47 Andre Klapper 2013-04-15 12:13:08 UTC
(In reply to comment #46)
> Related URL: https://gerrit.wikimedia.org/r/58415

Aaron's three-liner patch is still awaiting review.
Comment 48 Gerrit Notification Bot 2013-04-16 16:02:36 UTC
Related URL: https://gerrit.wikimedia.org/r/59414 (Gerrit Change I3889f300012aeabd37e228653279ad19b296e4ae)
Comment 49 Aaron Schulz 2013-04-16 22:17:21 UTC
(In reply to comment #48)
> Related URL: https://gerrit.wikimedia.org/r/59414 (Gerrit Change
> I3889f300012aeabd37e228653279ad19b296e4ae)

This will apply to all wikis next Wen.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links