Last modified: 2013-12-24 07:07:18 UTC
MediaWiki has traditionally used the Cache-Control header to control the CDN (i.e. Squid reverse proxy), then the Cache-Control header for clients has been specified in Squid configuration. Specifically, when a certain URL regex matches, the Cache-Control header is stripped out and replaced with the configured header. This is not ideal, as noted by Gabriel in a comment in the original code. It would be better if MediaWiki specified both headers in its response, so that the URL regex and client Cache-Control header does not need to be maintained in the CDN configuration. Originally, this would have required a Squid patch, but now that we are switching to Varnish, the feature can be implemented with VCL. Specifically, MW should send a Client-Cache-Control header which Varnish will rewrite to Cache-Control as appropriate.
We could do this the other way around and partially implement the semi-standard (semi because it's from the W3C, not IETF) Surrogate-Control header and leave Cache-Control intact for end-users. Fastly, for example, seems to be suggesting users to use this, so this may be a more compatible with the real world alternative.
(In reply to comment #1) > We could do this the other way around and partially implement the > semi-standard (semi because it's from the W3C, not IETF) > Surrogate-Control header and leave Cache-Control intact for > end-users. Fastly, for example, seems to be suggesting users > to use this, so this may be a more compatible with the real world > alternative. Varnish uses the Cache-Control header in RFC2616_Ttl(), so I suppose it would be necessary to move the Cache-Control header out to some temporary pseudo-header in vcl_fetch, and to move it back into place in vcl_deliver. While the object is in the cache, Surrogate-Control would be copied into Cache-Control. Support for pass mode would theoretically be simpler with Surrogate-Control. Either way, there would have to be some backwards compatible handling in Varnish, to account for the progressive rollout of the new MW code. If Client-Cache-Control/Surrogate-Control is missing, Varnish would have to interpret Cache-Control in the old way. On the MW side, OutputPage could provide an interface allowing configuration of the mapping of headers: a) Old $wgUseSquid = false: * Client-Cache-Control -> Cache-Control * Surrogate-Control -> deleted b) Old $wgUseSquid = true; * Surrogate-Control -> Cache-Control * Client-Cache-Control -> deleted c) Surrogate-Control scheme: * Surrogate-Control -> Surrogate-Control * Client-Cache-Control -> Cache-Control d) Client-Cache-Control scheme: * Surrogate-Control -> Cache-Control * Client-Cache-Control -> Client-Cache-Control
We currently don't have a Client-Cache-Control header at all and I don't think we should introduce it now with that name. Introducing just Surrogate-Control and doing the VCL ping-pong you mentioned sounds more sensible to me. We'd need a temporary header to store the client cache control, so we may end up using Client-Cache-Control internally inside VCL as an interim header but I don't see a reason for MediaWiki to use it. i.e. as I see it, the VCL could just be: sub vcl_fetch { if (beresp.http.Surrogate-Control) { set beresp.http.Client-Cache-Control = beresp.http.Cache-Control set beresp.http.Cache-Control = beresp.http.Surrogate-Control unset beresp.http.Surrogate-Control } } sub vcl_deliver { if (resp.http.Client-Cache-Control) { set resp.http.Cache-Control = resp.http.Client-Cache-Control unset resp.http.Client-Cache-Control } } I don't see any handling that we do now to preserve the backwards compatibility you mentioned. Even if we do and I missed it, we can easily implement it as "else" clauses above, no? It's a pity that Varnish doesn't natively support Surrogate-Control natively, indeed. Ironically, Squid 3 does in some form :) (so using it inside MediaWiki may be generally useful). I guess we could provide patches to Varnish for the long-term but VCL hacks seem viable in the short-term. Note that the standard specifies a Surrogate-Capabilities request header to signal the capability to handle Surrogate-Control. We could set it in Varnish and MediaWiki could check for it, so you may avoid a configuration option. Also note that the same Surrogate-Capabilities/Control mechanism could be also used to signal ESI back and forth (this is defined in the spec). Yuri has used X-Force-ESI (request) and X-Enable-ESI (response) for this purpose in the mobile caches for his ESI testing. We could deprecate those in favor of a unified Surrogate handling by core, especially while we move in the direction of doing ESI.
Immediate applications: * Normal page views (vcl_deliver in text-frontend.inc.vcl.erb) * Mobile page views (vcl_deliver in mobile-frontend.inc.vcl.erb) Also, the use of Cache-Control in vcl_fetch in wikimedia.vcl.erb and in vcl_fetch in text-backend.inc.vcl.erb would have to be updated. Some care would have to be taken to ensure that MW does not accidentally send a public Surrogate-Control on responses with private data, where CC:private is currently sent and assumed to be sufficient. Maybe CC:private should override Surrogate-Control. Aaron suggests that the feature could be used to allow private caching of resources delivered to logged-in users. Note that, contrary to what I implied in comment #2, Surrogate-Control does not have the same format as Cache-Control. In particular http://www.w3.org/TR/edge-arch specifies the use of the no-store token and does not recognise no-cache or private.