Last modified: 2014-07-15 19:42:49 UTC
Noticed this while digging into another bug. In beta, users are not created on loginwiki. So cross-project login isn't working. The odd thing that I think is happening is that the browser isn't getting all of the cookies that the wiki is setting. Maybe a Varnish issue? In the beta logs, I see the user is redirected to loginwiki: 0.1661 4.2M Start request GET /wiki/Special:CentralLogin/start?token=b4bac89e539642795df51e425a974ffc HTTP HEADERS: HOST: login.wikimedia.beta.wmflabs.org And then (as it should), loginwiki sets the user's cookies: loginwiki-9c1ccc4f: 0.1995 8.8M [cookie] setcookie: "centralauth_Session", "aa2442f2d2e716dc89eb16d1f89f3cfe", "0", "/", "", "", "1" loginwiki-9c1ccc4f: 0.2004 8.8M [cookie] setcookie: "centralauth_User", "TestBug16864'7", "1396574220", "/", "", "", "1" loginwiki-9c1ccc4f: 0.2006 8.8M [cookie] setcookie: "centralauth_Token", "", "1393895820", "/", "", "", "1" However, in my browser, the only Set-Cookie header is: Set-Cookie centralauth_Session=aa2442f2d2e716dc89eb16d1f89f3cfe; path=/; httponly; GeoIP=US:Santa_Clara:37.3541:-121.9552:v4; path=/ When the AutoLogin code runs, loginwiki then has: 0.1973 4.2M Start request GET /wiki/Special:CentralAutoLogin/refreshCookies?type=1x1&wikiid=enwiki&proto=http HTTP HEADERS: HOST: login.wikimedia.beta.wmflabs.org USER-AGENT: Mozilla/5.0 (X11; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0 ACCEPT: image/png,image/*;q=0.8,*/*;q=0.5 ACCEPT-LANGUAGE: en-US,en;q=0.5 DNT: 1 REFERER: http://en.wikipedia.beta.wmflabs.org/w/index.php?title=Main_Page&gettingStartedReturn=true COOKIE: GeoIP=US:Santa_Clara:37.3541:-121.9552:v4; centralauth_Session=aa2442f2d2e716dc89eb16d1f89f3cfe X-VARNISH: 497335809, 1267638667 X-FORWARDED-FOR: 50.136.233.16, 127.0.0.1 ACCEPT-ENCODING: gzip loginwiki-8577f040: 0.1980 4.2M CACHES: MemcachedPhpBagOStuff[main] MemcachedPhpBagOStuff[message] MemcachedPhpBagOStuff[parser] loginwiki-8577f040: 0.2054 5.0M LocalisationCache: using store LCStoreCDB loginwiki-8577f040: 0.2084 5.5M Fully initialised loginwiki-8577f040: 0.2148 5.8M CentralAuthHooks::onUserLoadFromSession: no User cookie, so unable to check for session mismatch So no session because the centralauth_User cookie is missing.
Moving this to labs bugs, since I think this is a config issue with beta, not the extension. This issue is not showing up in production (although hopefully not by some config that is going into production soon!)
This is currently also causing Selenium tests to fail which need to log in. e.g. https://saucelabs.com/jobs/5ca77278cb5e412c8e7a231561f6ba4d
upping the importance here. this seems to be a recent phenomena, if you want to pinpoint a time let me know
also, we seem to have the same issue on test2wiki (and presumably mw.o) The problem started showing up yesterday afternoon SF time. FWIW, just before the automated test displays the "No active login attempt... " message, Firefox shows a message "Transferring data from bits..." and then "Read bits..." And, it seems to be intermittent. Of seven automated tests that login, two (in the most recent run) Mar 5 incurred the "No active login attempt..." message.
(In reply to Chris McMahon from comment #3) > upping the importance here. this seems to be a recent phenomena, if you > want to pinpoint a time let me know Please! (In reply to Chris McMahon from comment #4) > also, we seem to have the same issue on test2wiki (and presumably mw.o) > > The problem started showing up yesterday afternoon SF time. > > FWIW, just before the automated test displays the "No active login > attempt... " message, Firefox shows a message "Transferring data from > bits..." and then "Read bits..." > > And, it seems to be intermittent. Of seven automated tests that login, two > (in the most recent run) Mar 5 incurred the "No active login attempt..." > message. That's a different issue. That is memcache not working right. In the issue on this bug, loginwiki has the token and session, it's a problem with the cookie setting.
Created attachment 14743 [details] central login problem
on mw.o just now: load a page, get a message saying Central login/You are centrally logged in. Reload the page to apply your user settings. Reload the page, notice re-occurs, I am still not logged in.
(In reply to Chris McMahon from comment #7) > on mw.o just now: > > load a page, get a message saying Central login/You are centrally logged in. > Reload the page to apply your user settings. > > Reload the page, notice re-occurs, I am still not logged in. Again, this is a separate issue. Please file separate bugs for them.
Mentioned on IRC: to have multiple login issues show up simultaneously on beta labs and test2wiki/mw.o within roughly a 12-hour period suggests a single root cause. I could be wrong.
I have a manual repro for the login issue, still not sure it's connected to the original report, and apologies for hijacking the BZ ticket on http://en.wikipedia.beta.wmflabs.org clear cache and cookies * Enter username and password * Click "Log in " button and immediately click "Random article" link Get error message "Central user log in/No active login attempt is in progress for your session." Repeat these steps in the same browser session and the second time get the error message: Central user log in/The token does not match with your session username. Someone might be trying to log you in under a malicious account./If you are just trying to log in as another user, please log out first.
Unable to repro manually on test2wiki but see screen shot for evidence that it can happen.
Created attachment 14744 [details] no active login error msg on test2wiki
Created attachment 14753 [details] login script for internal/external comparison We definitely have an issue with the Caching. This script when run internally against the apache directly returns the correct Set-Cookie headers. When run externally, only the Session and GeoIP cookies are set. Further, if you add a -H 'Cookie: centralauth_Session=asdf; GeoIP=asdf' to the final curl call, the correct cookies are returned. It doesn't matter what values they have, but when those 2 cookies are set, then all of the cookies come back through. Since GeoIP is needed for the correct cookies to come back, I'll add ori in case this is related to the GeoIP cookie vcl.
Running script script against Apache and Varnish have the following difference: Apache: Cache-control: private, must-revalidate, max-age=0 Set-Cookie: centralauth_Session=2f8f82190cd8c3a066aa5c8abca47699; path=/; httponly Set-Cookie: centralauth_User=TestBug16864; expires=Sat, 05-Apr-2014 09:09:28 GMT; path=/; httponly Set-Cookie: centralauth_Token=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; httponly Varnish: Age: 0 Cache-Control: private, s-maxage=0, max-age=0, must-revalidate Set-Cookie: centralauth_Session=4dc3572e4bde7a617de112f22d3fa18a; path=/; httponly; GeoIP=FR::48.8600:2.3500:v4; path=/ X-Cache: deployment-cache-text1 miss (0), deployment-cache-text1 frontend miss (0)
Ori change "Send GeoIP lookup result as 'GeoIP' cookie" is https://gerrit.wikimedia.org/r/#/c/113935/
On deployment-cache-text1.pmtpa.wmflabs I edited /etc/varnish/text-frontend.inc.vcl and commented out the GeoIP snippet added by change 113935: https://gerrit.wikimedia.org/r/#/c/113935/6/templates/varnish/text-frontend.inc.vcl.erb,unified ) Restarted varnish with: /etc/init.d/varnish restart ; /etc/init.d/varnish-frontend restart I then ran Chris script and I received three cookies: < Set-Cookie: centralauth_Session=3428f185da995eaaf9550ae54b4e634f; path=/; httponly < Set-Cookie: centralauth_User=TestBug16864; expires=Sat, 05-Apr-2014 09:34:59 GMT; path=/; httponly < Set-Cookie: centralauth_Token=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; httponly < Location: http://en.wikipedia.beta.wmflabs.org/wiki/Special:CentralLogin/complete?token=ef3afb1dbc82549c1655b6841d8614ae < Age: 0 < X-Cache: deployment-cache-text1 miss (0), deployment-cache-text1 frontend miss (0) < Cache-Control: private, s-maxage=0, max-age=0, must-revalidate I have no idea why it would work in production and not on labs though :(
Apparently happens on production as well: bug 62288 'Random API issues'
Fixed by Faidon by disabling the cookie based geolocation on text varnish: https://gerrit.wikimedia.org/r/#/c/117004/ The shell script pass and shows three Set-Cookie: headers now.
For completeness, copying from my postmortem: What happens is, manipulation of the Set-Cookie header from VCL was broken. Ori's code is: set_cookie_header = VRT_GetHdr(sp, HDR_RESP, "\013set-cookie:"); if (set_cookie_header == NULL) { // New header VRT_SetHdr(sp, HDR_RESP, "\013Set-Cookie:", cookie_out, vrt_magic_string_end); } else { // Append to existing header VRT_SetHdr(sp, HDR_RESP, "\013Set-Cookie:", set_cookie_header, "; ", cookie_out, vrt_magic_string_end); } While this looks correct on a first glance, it makes you wonder what would happen with multiple occurences of Set-Cookie. What happens is exactly what you observe: VRT_GetHdr() returns only one of them and VRT_SetHdr() sets all of them, essentially discarding all Set-Cookie headers but one. As for going forward: the std vmod has an std.collect() function to collect multiple headers into one, but I don't believe this would work with Set-Cookie. There's https://github.com/varnish/libvmod-header that's designed exactly to address this use case -- its description is "Varnish Module (vmod) for manipulation of duplicated headers (for instance multiple set-cookie headers)". We should probably use this instead.
Zeljkof confirmed the QA job that had the issue is now passing the login step https://wmf.ci.cloudbees.com/job/MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox/91/ https://saucelabs.com/tests/1a52304399624b5a8a76243f32372e63 (Job still fails but for a different reason). Thanks Chris Steipp for the shell script!
*** Bug 62288 has been marked as a duplicate of this bug. ***
Confirmed that beta is once again giving out all the cookies for loginwiki, and so accounts are automatically attached on loginwiki on creation and SUL actually all works now. Thanks for all the help!
Interesting, because we had an occurrence of bug 62288 with the commons api monitor, but it passed on re-run.
The same symptoms have reappeared for the UploadWizard API test: https://integration.wikimedia.org/ci/job/UploadWizard-api-commons.wikimedia.beta.wmflabs.org/389/console Could be a different bug, but the last occurrence of this failure on that particular test led my to this bug.
Gilles: Isn't that bug 63981?
Probably, since it just came back to normal.