Last modified: 2013-11-24 04:03:13 UTC
On beta, https://login.wikimedia.beta.wmflabs.org/ redirect to itself: HTTP/1.1 301 Moved Permanently Server: nginx/1.1.19 Date: Fri, 19 Jul 2013 20:08:24 GMT Content-Type: text/html; charset=iso-8859-1 Content-Length: 249 Connection: keep-alive Location: https://login.wikimedia.beta.wmflabs.org/ X-Varnish: 51865472 51864739, 1498759040 Via: 1.1 varnish, 1.1 varnish Accept-Ranges: bytes Age: 329 X-Cache: deployment-cache-text1 hit (1), deployment-cache-text1 frontend miss (0) HTTP/1.1 301 Moved Permanently Server: nginx/1.1.19 Date: Fri, 19 Jul 2013 20:08:24 GMT Content-Type: text/html; charset=iso-8859-1 Content-Length: 249 Connection: keep-alive Location: https://login.wikimedia.beta.wmflabs.org/ X-Varnish: 51865472 51864739, 1498759041 1498759040 Via: 1.1 varnish, 1.1 varnish Accept-Ranges: bytes Age: 329 X-Cache: deployment-cache-text1 hit (1), deployment-cache-text1 frontend hit (1) X-Cache: deployment-cache-text1 hit (1), deployment-cache-text1 frontend hit (2) X-Cache: deployment-cache-text1 hit (1), deployment-cache-text1 frontend hit (3) ... The Apache configuration in /data/project/apache/conf/loginwiki.php has: RewriteEngine On RewriteCond %{HTTP:X-Forwarded-Proto} !https RewriteRule ^/(.*)$ https://login.wikimedia.beta.wmflabs.org/$1 [R=301,L] On beta the protoproxy is local to the varnish cache (i.e. it is on deployment-cache-text1). I guess it does not properly set the "X-Forwarded-Proto: https" header.
Doing a curl query to the HTTPS URL curl -I "https://login.wikimedia.beta.wmflabs.org/?`date +%s`" varnishncsa let you dump any request header using the syntax %{X}-i %{X}o where X is the name of the header and 'i' stand for request header and 'o' for a response header. I got: On the frontend: varnishncsa -n frontend -F 'Proto in: %{X-Forwarded-Proto}i out: %{X-Forwarded-Proto}o "%r" %s %b"' Proto in: https out: - "HEAD http://login.wikimedia.beta.wmflabs.org/?1374484265 HTTP/1.0" 301 0" Proto in: https out: - "HEAD http://login.wikimedia.beta.wmflabs.org/?1374484267 HTTP/1.0" 301 0" On the backend: varnishncsa -F 'Proto in: %{X-Forwarded-Proto}i out: %{X-Forwarded-Proto}o "%r" %s %b"' Proto in: http out: - "GET http://login.wikimedia.beta.wmflabs.org/?1374484331 HTTP/1.1" 301 260" Looking at the remote, the backend hits 127.0.0.1 so that would be put back in varnish text again hence a looping.
The requests do it the apaches: $ tail -f /data/project/logs/apache-access.log |egrep ^login.wikimedia.beta.wmflabs.org login.wikimedia.beta.wmflabs.org 10.4.1.133 - - [22/Jul/2013:09:18:47 +0000] "GET /?1374484729 HTTP/1.1" 301 260 login.wikimedia.beta.wmflabs.org 10.4.1.133 - - [22/Jul/2013:09:18:48 +0000] "GET /?1374484730 HTTP/1.1" 301 260 login.wikimedia.beta.wmflabs.org 10.4.1.133 - - [22/Jul/2013:09:18:50 +0000] "GET /?1374484732 HTTP/1.1" 301 260 login.wikimedia.beta.wmflabs.org 10.4.1.133 - - [22/Jul/2013:09:18:51 +0000] "GET /?1374484734 HTTP/1.1" 301 260
I have tweaked the Apache configuration to log the Apache IP and the X-Fowarded-Proto field. /data/project/apache/conf/wmflabs-logging.conf: -CustomLog /data/project/logs/apache-access.log "%v %h %l %u %t \"%r\" %>s %b" +CustomLog /data/project/logs/apache-access.log "%A %v %h %l %u %t \"%r\" %>s %b xfp: %{X-Forwarded-Proto}i"
On varnish, to debug the queries made to login.wikimedia.beta.wmflabs.org, use the varnishlog utility and filter by host header: Doing a curl "https://login.wikimedia.beta.wmflabs.org/wiki/Main_Page?date=`date +%s`" with: varnishlog -n frontend -b -m TxHeader:login.wikimedia.beta.wmflabs 13 BackendOpen b ipv4_127_0_0_1 127.0.0.1 14366 127.0.0.1 3128 13 TxRequest b GET 13 TxURL b /wiki/Main_Page?date=1374487021 13 TxProtocol b HTTP/1.1 13 TxHeader b Host: login.wikimedia.beta.wmflabs.org 13 TxHeader b X-Real-IP: 88.X.X.X 13 TxHeader b User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8x zlib/1.2.5 13 TxHeader b Accept: */* 13 TxHeader b X-Forwarded-For: 127.0.0.1 13 TxHeader b X-Forwarded-Proto: http 13 TxHeader b X-Varnish: 1498894934 13 TxHeader b Accept-Encoding: gzip 13 RxProtocol b HTTP/1.1 13 RxStatus b 301 13 RxResponse b Moved Permanently 13 RxHeader b Server: Apache 13 RxHeader b Location: https://login.wikimedia.beta.wmflabs.org/wiki/Main_Page?date=1374487021 13 RxHeader b Content-Type: text/html; charset=iso-8859-1 13 RxHeader b Content-Length: 279 13 RxHeader b Accept-Ranges: bytes 13 RxHeader b Date: Mon, 22 Jul 2013 09:56:59 GMT 13 RxHeader b X-Varnish: 52212372 13 RxHeader b Age: 0 13 RxHeader b Via: 1.1 varnish 13 RxHeader b Connection: keep-alive 13 RxHeader b X-Cache: deployment-cache-text1 miss (0) 13 Fetch_Body b 4(length) cls 0 mklen 1 13 Length b 279 13 BackendReuse b ipv4_127_0_0_1 Backend: varnishlog -b -m TxHeader:login.wikimedia.beta.wmflabs 15 TxRequest b GET 15 TxURL b /wiki/Main_Page?date=1374486920 15 TxProtocol b HTTP/1.1 15 TxHeader b Host: login.wikimedia.beta.wmflabs.org 15 TxHeader b X-Real-IP: 88.X.X.X 15 TxHeader b User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8x zlib/1.2.5 15 TxHeader b Accept: */* 15 TxHeader b X-Varnish: 1498894871 15 TxHeader b X-Forwarded-For: 127.0.0.1 15 TxHeader b X-Varnish: 52212225 15 TxHeader b Accept-Encoding: gzip 15 RxProtocol b HTTP/1.1 15 RxStatus b 301 15 RxResponse b Moved Permanently 15 RxHeader b Date: Mon, 22 Jul 2013 09:55:18 GMT 15 RxHeader b Server: Apache 15 RxHeader b Location: https://login.wikimedia.beta.wmflabs.org/wiki/Main_Page?date=1374486920 15 RxHeader b Content-Length: 279 15 RxHeader b Content-Type: text/html; charset=iso-8859-1 15 Fetch_Body b 4(length) cls 0 mklen 1 15 Length b 279 15 BackendReuse b ipv4_10_4_0_187 As seen in the frontend log, nginx set "X-Fowarded-Proto: http" header. It is not included in the request made by the varnish backend. From the backend vcl, the header is stripped out whenever the client ip is not part of allow_xff or opera_mini access lists which are: acl allow_xff { "91.198.174.0"/24; "208.80.152.0"/22; "2620:0:860::"/46; "10.0.0.0"/8; } acl opera_mini { // bunch of opera requests } So I guess adding 127.0.0.1 in allow_xff would fix the issue.
Change 75085 had a related patch set uploaded by Hashar: varnish: backends trust 127.0.0.1 for XFF https://gerrit.wikimedia.org/r/75085
Change 75085 merged by Mark Bergsma: varnish: backends trust 127.0.0.1 for XFF https://gerrit.wikimedia.org/r/75085
The above change unlocked the redirect loop. There are still old copy of the pages cached in which will cause a redirect loop but that is related to vhtcpd not processing purges. That issue is bug 51874.
This is not entirely fixed. When querying the http://login.wikimedia.beta.wmflabs.org/ URL, a redirect loop is cached. Purge all pages: echo -e 'http://login.wikimedia.beta.wmflabs.org/\nhttps://login.wikimedia.beta.wmflabs.org/\nMain_Page'|mwscript purgeList.php --wiki=loginwiki Connecting to the https url yields a Location redirect to the main page: $ curl -i https://login.wikimedia.beta.wmflabs.org/ |grep Location Location: https://login.wikimedia.beta.wmflabs.org/wiki/Main_Page The http URL yields a Location redirect to the https root: $ curl -i http://login.wikimedia.beta.wmflabs.org/ |grep Location Location: https://login.wikimedia.beta.wmflabs.org/ But then the https:// is a redirect to itself: $ curl -i https://login.wikimedia.beta.wmflabs.org/ |grep Location Location: https://login.wikimedia.beta.wmflabs.org/ This is always reproducable when clearing the cache then querying the http /.
The login.wikimedia.beta.wmflabs.org virtualhost definition has: RewriteEngine On RewriteCond %{HTTP:X-Forwarded-Proto} !https RewriteRule ^/(.*)$ https://login.wikimedia.beta.wmflabs.org/$1 [R=301,L] Varnish would do a http query which redirect to https: $ curl -s -i -x 127.0.0.1:80 http://login.wikimedia.beta.wmflabs.org/|grep Location Location: https://login.wikimedia.beta.wmflabs.org/ If it has set the X-Fowarded-Proto: https we get the main page: $ curl -s -i -H 'X-Forwarded-Proto: https' -x 127.0.0.1:80 http://login.wikimedia.beta.wmflabs.org/|grep Location Location: https://login.wikimedia.beta.wmflabs.org/wiki/Main_Page After talking with mark, varnish consider both queries (with and without X-Forwarded-Proto) as the same cached content. This is because Apache does not send a Vary: X-Forwarded-Proto header. So we need to fix up our Apache configuration.
In production on mw1177: $ curl -s -i -x 127.0.0.1:80 http://login.wikimedia.org/|grep Location Location: https://login.wikimedia.org/ $ $ curl -s -i -H 'X-Forwarded-Proto: https' -x 127.0.0.1:80 http://login.wikimedia.org/ ... Vary: Accept-Encoding,X-Forwarded-Proto,Cookie X-Vary-Options: ...X-Forwarded-Proto... Location: https://login.wikimedia.org/wiki/Main_Page $ So something is set differently in production.
So finally here is the explanation: In production the squids do not cache the redirects (we see MISS everywhere) whereas Varnish text does cache them. So in production the http:// URL is always redirected to the https:// one and thus the cache is only hit with http:// URL with X-Forwarded-Proto: https. On labs, which uses varnish as a text cache, the redirect are cached. The cache is thus hit with http:// url with or without X-Forwarded-Proto: https set and with or without Vary: X-Forwarded-Proto. When the X-Forwarded-Proto: https is set, the query hit the MediaWiki backend which properly set the Vary header. When X-Forwarded-Proto is missing, the Apache does not set any Vary header. That needs to be fixed.
Change 75583 had a related patch set uploaded by Hashar: Fix up Vary headers on 30x redirects from Apache https://gerrit.wikimedia.org/r/75583
Puppet is broken in labs (bug 51955) so I cant test out mark fix above.
https://gerrit.wikimedia.org/r/75583 did fix the issue \O/