Last modified: 2013-12-24 07:07:18 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T50835, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 48835 - Separate Cache-Control header for proxy and client
Separate Cache-Control header for proxy and client
Status: NEW
Product: MediaWiki
Classification: Unclassified
General/Unknown (Other open bugs)
1.22.0
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
perfsprint-13
: platformeng
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-26 11:58 UTC by Tim Starling
Modified: 2013-12-24 07:07 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Tim Starling 2013-05-26 11:58:17 UTC
MediaWiki has traditionally used the Cache-Control header to control the CDN (i.e. Squid reverse proxy), then the Cache-Control header for clients has been specified in Squid configuration. Specifically, when a certain URL regex matches, the Cache-Control header is stripped out and replaced with the configured header.

This is not ideal, as noted by Gabriel in a comment in the original code. It would be better if MediaWiki specified both headers in its response, so that the URL regex and client Cache-Control header does not need to be maintained in the CDN configuration. Originally, this would have required a Squid patch, but now that we are switching to Varnish, the feature can be implemented with VCL.

Specifically, MW should send a Client-Cache-Control header which Varnish will rewrite to Cache-Control as appropriate.
Comment 1 Faidon Liambotis 2013-07-15 21:50:14 UTC
We could do this the other way around and partially implement the semi-standard (semi because it's from the W3C, not IETF) Surrogate-Control header and leave Cache-Control intact for end-users. Fastly, for example, seems to be suggesting users to use this, so this may be a more compatible with the real world alternative.
Comment 2 Tim Starling 2013-11-26 04:56:31 UTC
(In reply to comment #1)
> We could do this the other way around and partially implement the
> semi-standard (semi because it's from the W3C, not IETF) 
> Surrogate-Control header and leave Cache-Control intact for
> end-users. Fastly, for example, seems to be suggesting users 
> to use this, so this may be a more compatible with the real world
> alternative.

Varnish uses the Cache-Control header in RFC2616_Ttl(), so I suppose it would be necessary to move the Cache-Control header out to some temporary pseudo-header in vcl_fetch, and to move it back into place in vcl_deliver. While the object is in the cache, Surrogate-Control would be copied into Cache-Control.

Support for pass mode would theoretically be simpler with Surrogate-Control.

Either way, there would have to be some backwards compatible handling in Varnish, to account for the progressive rollout of the new MW code. If Client-Cache-Control/Surrogate-Control is missing, Varnish would have to interpret Cache-Control in the old way.

On the MW side, OutputPage could provide an interface allowing configuration of the mapping of headers:

a) Old $wgUseSquid = false:
  * Client-Cache-Control  -> Cache-Control
  * Surrogate-Control     -> deleted

b) Old $wgUseSquid = true;
  * Surrogate-Control     -> Cache-Control
  * Client-Cache-Control  -> deleted

c) Surrogate-Control scheme:
  * Surrogate-Control     -> Surrogate-Control
  * Client-Cache-Control  -> Cache-Control

d) Client-Cache-Control scheme:
  * Surrogate-Control     -> Cache-Control
  * Client-Cache-Control  -> Client-Cache-Control
Comment 3 Faidon Liambotis 2013-11-26 06:34:09 UTC
We currently don't have a Client-Cache-Control header at all and I don't think we should introduce it now with that name. Introducing just Surrogate-Control and doing the VCL ping-pong you mentioned sounds more sensible to me. We'd need a temporary header to store the client cache control, so we may end up using Client-Cache-Control internally inside VCL as an interim header but I don't see a reason for MediaWiki to use it. i.e. as I see it, the VCL could just be:

sub vcl_fetch {
  if (beresp.http.Surrogate-Control) {
    set beresp.http.Client-Cache-Control = beresp.http.Cache-Control
    set beresp.http.Cache-Control = beresp.http.Surrogate-Control
    unset beresp.http.Surrogate-Control
  }
}

sub vcl_deliver {
  if (resp.http.Client-Cache-Control) {
    set resp.http.Cache-Control = resp.http.Client-Cache-Control
    unset resp.http.Client-Cache-Control
  }
}

I don't see any handling that we do now to preserve the backwards compatibility you mentioned. Even if we do and I missed it, we can easily implement it as "else" clauses above, no?

It's a pity that Varnish doesn't natively support Surrogate-Control natively, indeed. Ironically, Squid 3 does in some form :) (so using it inside MediaWiki may be generally useful). I guess we could provide patches to Varnish for the long-term but VCL hacks seem viable in the short-term.

Note that the standard specifies a Surrogate-Capabilities request header to signal the capability to handle Surrogate-Control. We could set it in Varnish and MediaWiki could check for it, so you may avoid a configuration option.

Also note that the same Surrogate-Capabilities/Control mechanism could be also used to signal ESI back and forth (this is defined in the spec). Yuri has used X-Force-ESI (request) and X-Enable-ESI (response) for this purpose in the mobile caches for his ESI testing. We could deprecate those in favor of a unified Surrogate handling by core, especially while we move in the direction of doing ESI.
Comment 4 Tim Starling 2013-12-03 06:03:54 UTC
Immediate applications:

* Normal page views (vcl_deliver in text-frontend.inc.vcl.erb)
* Mobile page views (vcl_deliver in mobile-frontend.inc.vcl.erb)

Also, the use of Cache-Control in vcl_fetch in wikimedia.vcl.erb and in vcl_fetch in text-backend.inc.vcl.erb would have to be updated. Some care would have to be taken to ensure that MW does not accidentally send a public Surrogate-Control on responses with private data, where CC:private is currently sent and assumed to be sufficient. Maybe CC:private should override Surrogate-Control.

Aaron suggests that the feature could be used to allow private caching of resources delivered to logged-in users.

Note that, contrary to what I implied in comment #2, Surrogate-Control does not have the same format as Cache-Control. In particular http://www.w3.org/TR/edge-arch specifies the use of the no-store token and does not recognise no-cache or private.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links