Last modified: 2014-11-07 19:23:52 UTC
Email notifications used to be instant, now they're taking about 20 minutes on en.wiki (with only 60 thousands jobs in the queue). Not only this shows some problem with the job queue system and it's a non-small regression, but it's also very confusing because I'm sent notifications when they're already obsoleted (for instance because I already replied). ---- Dear Nemo bis, The Wikipedia page User talk:Nemo bis has been changed on 13 January 2013 by anonymous user 76.126.142.118, see http://en.wikipedia.org/wiki/User_talk:Nemo_bis for the current revision. See http://en.wikipedia.org/w/index.php?title=User_talk:Nemo_bis&diff=next&oldid=532890436 to view this change. ---- Received: from imp-3.mail.tiscali.it (10.39.115.235) by mx-3-it.mail.tiscali.it (8.5.148) id 50BF36D0094B0EEF for <redacted>@tiscali.it; Sun, 13 Jan 2013 21:01:44 +0100 Received: from wiki-mail.wikimedia.org ([208.80.152.133]) by imp-3.mail.tiscali.it with id nY1j1k02z2swdko01Y1kqf; Sun, 13 Jan 2013 21:01:44 +0100 x-cnfs-analysis: v=2.0 cv=RYES+iRv c=1 sm=2 a=P51sRyCuLXUxWMHwWK9oAA==:17 a=eIhxMilvRf8A:10 a=z82XInz0jxkA:10 a=RyZ8rIAjjLkA:10 a=eztASiHJGFwA:10 a=IkcTkHD0fZMA:10 a=3GbmggnxAAAA:8 a=8pif782wAAAA:8 a=d2uY_mg3cpUA:10 a=nk0ike9KCJb9eP9e8BIA:9 a=QEXdDO2ut3YA:10 a=c7XZu54lUV4A:10 a=9vCFg7g2Nj6V2bzh:21 a=HUl_rzNbRn9v3Gf1:21 a=P51sRyCuLXUxWMHwWK9oAA==:117 Received: from mw8.pmtpa.wmnet ([10.0.11.8]:57845) by mchenry.wikimedia.org with esmtp (Exim 4.69) (envelope-from <wiki@wikimedia.org>) id 1TuTkG-0003E4-Fs for <redacted>@tiscali.it; Sun, 13 Jan 2013 20:01:28 +0000 Received: from apache by mw8.pmtpa.wmnet with local (Exim 4.76) id 1TuTkG-0008Ux-Bg for <redacted>@tiscali.it; Sun, 13 Jan 2013 20:01:28 +0000 To: Nemo bis Subject: Wikipedia page User talk:Nemo bis has been changed by anonymous user 76.126.142.118 From: MediaWiki Mail <wiki@wikimedia.org> Reply-To: reply@not.possible Date: Sun, 13 Jan 2013 20:01:28 +0000 MIME-Version: 1.0 Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: 8bit Message-ID: <enwiki.50f3129856d1c5.83285442@en.wikipedia.org> X-Mailer: MediaWiki mailer
One day when https://ganglia.wikimedia.org will be accessible again I could even look at the JobQueue graph... Nemo, is the lag of ~20min still a problem? /me looking at https://gerrit.wikimedia.org/r/#/q/project:mediawiki/core+-owner:L10n-bot+message:jobqueue,n,z
Job queue is now under 2000 or so on en.wiki, so it looks like the wrong timing to try to reproduce this bug. https://en.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=statistics Anyway next time you can ask on my user talk and I'll compare timestamps of edit and enotif. :-)
Should probably raise severity because it takes now hours to receive an enotif from mediawiki.org (job queue 0 now, ~20 at 14 CET): 15:22–17:05 in the example. Received: from wiki-mail.wikimedia.org ([208.80.152.133]) by imp-2.mail.tiscali.it with id B55A1l00w2swdko0155BbL; Wed, 13 Mar 2013 18:05:11 +0100 x-cnfs-analysis: v=2.0 cv=KYdQQHkD c=1 sm=2 a=P51sRyCuLXUxWMHwWK9oAA==:17 a=gbdniXhMvlMA:10 a=RyZ8rIAjjLkA:10 a=cNjpVsleRgUA:10 a=eztASiHJGFwA:10 a=IkcTkHD0fZMA:10 a=3GbmggnxAAAA:8 a=4P5xif6CAAAA:8 a=KcaC6ams3nQA:10 a=mdTHgZqYbhYL0A32_hcA:9 a=QEXdDO2ut3YA:10 a=4wRdB16iIHwA:10 a=P51sRyCuLXUxWMHwWK9oAA==:117 Received: from mw1003.eqiad.wmnet ([10.64.0.33]:38380) by mchenry.wikimedia.org with esmtp (Exim 4.69) (envelope-from <wiki@wikimedia.org>) id 1UFnVc-00068m-87 for <redacted>; Wed, 13 Mar 2013 15:22:28 +0000 Received: from apache by mw1003.eqiad.wmnet with local (Exim 4.76) id 1UFnVc-00075V-19 for <redacted>; Wed, 13 Mar 2013 15:22:28 +0000 To: Nemo bis <redacted> Subject: MediaWiki page Help:Extension:Translate/Configuration has been changed by Nikerabbit From: MediaWiki Mail <wiki@wikimedia.org> Reply-To: reply@not.possible Date: Wed, 13 Mar 2013 15:22:28 +0000
If bug 46603 is right, Site requests is the correct component. If it's just a jobqueue problem and mail relay doesn't factor into it, perhaps we just have too much stuff in "high priority"?
Currently it's basically instant, no time (1 s? unless Date is wrong) spent on apaches and about 20 s between mchenry.wikimedia.org and wiki-mail.wikimedia.org. Global jobqueue very low around 100k, will check again when it gets higher. https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20pmtpa&h=hume.wikimedia.org&v=823574&m=Global_JobQueue_length&r=hour&z=default&jr=&js=&st=1365625056&z=large
Closing
Reopening: we have reports that password reminders on en.wiki take 60 minutes to arrive. I can't think of any reason other than this bug; global job queue is reportedly around 2 millions. <https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20pmtpa&h=hume.wikimedia.org&v=823574&m=Global_JobQueue_length&r=month&z=default&jr=&js=&st=1365625056&z=large>
From graphite, none of the job queue push/pop graphs look remarkable over the last 2 months. The are lots of Parsoid jobs though (about 2 million on enwiki).
(In reply to comment #7) > Reopening: we have reports that password reminders on en.wiki take 60 minutes > to arrive. Link(s)? > I can't think of any reason other than this bug; global job queue is > reportedly around 2 millions. There are apparently different queues.
(In reply to comment #9) > (In reply to comment #7) > > Reopening: we have reports that password reminders on en.wiki take 60 minutes > > to arrive. > > Link(s)? Nope. Reported on #wikimedia-tech, relayed from #wikipedia-en-help I think. > > > I can't think of any reason other than this bug; global job queue is > > reportedly around 2 millions. > > There are apparently different queues. Yes (and it would be good to raise the concurrency for high priority jobs, they're still at 6 and used to be 8 till April IIRC) but this doesn't mean they don't affect each other; it happened in the past e.g. with bug 42614.
Nemo / MZ: Are you aware of any recent issues (as I'm not)? This might end up as WORKSFORME now...
Is anybody aware of any recent issues (as I'm not) or is this WORKSFORME now?
Last call: Is anybody aware of any recent issues (as I'm not) or is this WORKSFORME now?
This bug can only be tested when the job queue is very high.