Last modified: 2013-12-04 18:46:19 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T46106, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 44106 - Database runs amok after using Special:Import - Apache and MySQL use full CPU
Database runs amok after using Special:Import - Apache and MySQL use full CPU
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
JobQueue (Other open bugs)
1.21.x
All All
: Normal major (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-01-18 14:39 UTC by DaSch
Modified: 2013-12-04 18:46 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description DaSch 2013-01-18 14:39:26 UTC
Somehow I have the effect, that after importing pages from another wiki with Special:Import my apache and mysql take the CPU 50:50 and only work normaly again after restarting mysql.

Making strace on the apache prozess with 50% CPU shows always the same:

poll([{fd=82, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
write(82, "\370\0\0\0\3UPDATE /* JobQueueDB::claim"..., 252) = 252
read(82, "0\0\0\1\0\0\0\3\0\0\0(Rows matched: 0  Cha"..., 16384) = 52

and strace on mysql shows this (that's not helpful I think :/):

getsockname(33, {sa_family=AF_FILE, path="/var/run/mysql"}, [30]) = 0
fcntl(33, F_SETFL, O_RDONLY)            = 0
fcntl(33, F_GETFL)                      = 0x2 (flags O_RDWR)
fcntl(33, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
setsockopt(33, SOL_IP, IP_TOS, [8], 4)  = -1 EOPNOTSUPP (Operation not supported)
futex(0x2b202ca082a4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x2b202ca082a0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x2b202ca076e0, FUTEX_WAKE_PRIVATE, 1) = 1
select(13, [10 12], NULL, NULL, NULL)   = 1 (in [12])
fcntl(12, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
accept(12, {sa_family=AF_FILE, NULL}, [2]) = 33
fcntl(12, F_SETFL, O_RDWR) 


It seams to me, that there is something going wrong with the Job Que
Comment 1 DaSch 2013-01-19 13:08:27 UTC
Sometimes I also get this error
„JobQueueDB::doAck“. Die Datenbank meldete den Fehler „1205: Lock wait timeout exceeded; try restarting transaction (localhost)“

But it hides on a "second" page right bellow the normals wikipage

I think this two error somehow belong to each other
Comment 2 Andre Klapper 2013-01-21 07:45:22 UTC
Tentatively moving this to JobQueue component.
Comment 3 DaSch 2013-01-25 00:29:17 UTC
The funny thing is, that looking into the JobQueue with maintaince/showJobs.php shows 0 and the apache process shows hunderst of this per second 
poll([{fd=82, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
write(82, "\370\0\0\0\3UPDATE /* JobQueueDB::claim"..., 252) = 252
read(82, "0\0\0\1\0\0\0\3\0\0\0(Rows matched: 0  Cha"..., 16384) = 52
Comment 4 DaSch 2013-05-08 21:18:28 UTC
Anybody an idea about this? I'm losing money because my wiki isn't running correctly. I have limited mysql with cpulimit to 30% but his also kills the wiki completely, that can not be that mysql is using 100% CPU for hours just because I imported a wikipage.
Comment 5 Aaron Schulz 2013-05-08 22:03:59 UTC
Is showJobs.php always empty or almost empty? Did you do anything to $wgJobTypeConf? JobQueueDB::claim doesn't get called unless a prior SELECT found a job. If $wgJobRunRate is high and there are many many page requests and only a few jobs (but there are at least some jobs), you could maybe get something like this sometimes. I'd be a bit skeptical about that. Maybe claimRandom() was in a tight loop but I don't see how that's possible either.

You can try setting $wgJobRunRate = 0 and daemonizing maintenance/runJobs.php to run in the background instead of running jobs on random page requests.
Comment 6 DaSch 2013-05-08 22:17:06 UTC
My jobqueue has 20k. I set $wgJobRunRate = 0.1; maybe this helps. But this would take 3 years with the visits I have. And I'm afraid to kill my server when runing runJobs.php
Comment 7 Aaron Schulz 2013-05-08 22:20:18 UTC
Do you have any long running transactions (like from scripts doing queries in the background that take hours/days)? This can cause many many deleted but not purged rows in MySQL which can make the queue unusable.
Comment 8 DaSch 2013-05-19 18:45:29 UTC
What are long running transactions? Where can I check this?

Yesterday I at around 22:00 UTC I made an import, then around 12 hours later the database run on to 30% (I set this as CPU Limit) and the ran on this level for 10 hours which made my wiki be unavailable for 10 hours!

This is a really nasty problem, because this kills my Google and Bing rating and makes me lose money.

My JobRunRate is at 0.1 and I have 20000 Jobs.

What can I do to investigate this problem? I really need to solve this. This is the third time this month and it's really annoying.
Comment 9 Aaron Schulz 2013-06-25 21:40:20 UTC
https://gerrit.wikimedia.org/r/#/c/63819/ might help if jobs are still run on page views.
Comment 10 Aaron Schulz 2013-11-06 01:53:44 UTC
Any more on this?
Comment 11 Andre Klapper 2013-11-22 15:29:30 UTC
DaSch: Is this still an issue? Did the patch in comment 9 help?
Comment 12 DaSch 2013-11-22 15:56:26 UTC
This is merged already. I haven't experienced any problems with the last imports. So this seams to help.
Comment 13 Andre Klapper 2013-12-04 18:46:19 UTC
Thanks. Closing as FIXED as per last comment. Please reopen if this happens again.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links