Last modified: 2014-09-03 18:32:05 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T64358, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 62358 - CirrusSearch: We should dig into CirrusSearch-failures.log


Summary:	CirrusSearch: We should dig into CirrusSearch-failures.log

Status:	NEW

Product:	MediaWiki extensions
Classification:	Unclassified
Component:	CirrusSearch (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	High normal (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2014-03-07 02:44 UTC by Nik Everett
Modified:	2014-09-03 18:32 UTC (History)
CC List:	3 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Nik Everett 2014-03-07 02:44:47 UTC

We should figure out exactly what is going on.  For example, I saw this:

2014-03-07 02:40:46 mw1015 wikidatawiki: Update for doc ids: 15087781
2014-03-07 02:40:46 mw1008 wikidatawiki: Update for doc ids: 15087781
2014-03-07 02:40:46 mw1008 wikidatawiki: Update for doc ids: 15087781


These might just be trying to update the same document concurrently.  Chad was talking about just retrying here.  If we're trying to update the same document multiple times we could do that.  We also might want to use the pool counter to prevent it.  We could probably use the shared acquire to notice that another job tried to update then, well, not do it.  On problem, though, is that updating a page will get a parser lock (I think) so we have to make sure not to lock eachother.  I think.

Comment 1 Chad H. 2014-03-07 02:50:14 UTC

We do retry now as of Gerrit change #117335.

Comment 2 Nik Everett 2014-03-07 02:56:15 UTC

Cool!  I guess I was just going on faith that we were retrying....

Comment 3 Nik Everett 2014-07-22 16:46:14 UTC

I just checked and saw two things:
1.  If we try to send 50 updates all at once we might bump against an Elasticsearch queue limit.  I'm chunking it to 10 at a time.
2.  I _think_ moving a page and leaving behind a redirect can cause that version conflict error.  I believe it makes two jobs - one for the new page and one for the redirect.  We need to keep the redirect job but might be able to throw away the one for the new page.  Worth checking.

Comment 4 Gerrit Notification Bot 2014-07-22 16:48:32 UTC

Change 148405 had a related patch set uploaded by Manybubbles:
Chunk updates at 10

https://gerrit.wikimedia.org/r/148405

Comment 5 Nik Everett 2014-07-22 17:07:38 UTC

Finishing up skipping the second update in point #2 from comment 3.

Comment 6 Gerrit Notification Bot 2014-07-22 17:44:25 UTC

Change 148417 had a related patch set uploaded by Manybubbles:
On article move only use one job

https://gerrit.wikimedia.org/r/148417

Comment 7 Gerrit Notification Bot 2014-07-22 20:21:38 UTC

Change 148405 merged by jenkins-bot:
Chunk updates at 10

https://gerrit.wikimedia.org/r/148405

Comment 8 Gerrit Notification Bot 2014-07-22 20:26:27 UTC

Change 148417 merged by jenkins-bot:
On article move only use one job

https://gerrit.wikimedia.org/r/148417

Comment 9 Nik Everett 2014-07-23 13:07:29 UTC

Shifting back to new - we'll have to reevaluate in two weeks or so once these changes hit production and we've churned through the queue.

Comment 10 Nik Everett 2014-09-03 18:32:05 UTC

Dug into these and the vast majority of them now come from trying to run the same updates twice or three times at the same time.  Noop detection, going out to wikipedias tomorrow, should squash most of these.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links