Last modified: 2014-09-03 18:32:05 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T64358, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 62358 - CirrusSearch: We should dig into CirrusSearch-failures.log
CirrusSearch: We should dig into CirrusSearch-failures.log
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
CirrusSearch (Other open bugs)
unspecified
All All
: High normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-03-07 02:44 UTC by Nik Everett
Modified: 2014-09-03 18:32 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Nik Everett 2014-03-07 02:44:47 UTC
We should figure out exactly what is going on.  For example, I saw this:

2014-03-07 02:40:46 mw1015 wikidatawiki: Update for doc ids: 15087781
2014-03-07 02:40:46 mw1008 wikidatawiki: Update for doc ids: 15087781
2014-03-07 02:40:46 mw1008 wikidatawiki: Update for doc ids: 15087781


These might just be trying to update the same document concurrently.  Chad was talking about just retrying here.  If we're trying to update the same document multiple times we could do that.  We also might want to use the pool counter to prevent it.  We could probably use the shared acquire to notice that another job tried to update then, well, not do it.  On problem, though, is that updating a page will get a parser lock (I think) so we have to make sure not to lock eachother.  I think.
Comment 1 Chad H. 2014-03-07 02:50:14 UTC
We do retry now as of Gerrit change #117335.
Comment 2 Nik Everett 2014-03-07 02:56:15 UTC
Cool!  I guess I was just going on faith that we were retrying....
Comment 3 Nik Everett 2014-07-22 16:46:14 UTC
I just checked and saw two things:
1.  If we try to send 50 updates all at once we might bump against an Elasticsearch queue limit.  I'm chunking it to 10 at a time.
2.  I _think_ moving a page and leaving behind a redirect can cause that version conflict error.  I believe it makes two jobs - one for the new page and one for the redirect.  We need to keep the redirect job but might be able to throw away the one for the new page.  Worth checking.
Comment 4 Gerrit Notification Bot 2014-07-22 16:48:32 UTC
Change 148405 had a related patch set uploaded by Manybubbles:
Chunk updates at 10

https://gerrit.wikimedia.org/r/148405
Comment 5 Nik Everett 2014-07-22 17:07:38 UTC
Finishing up skipping the second update in point #2 from comment 3.
Comment 6 Gerrit Notification Bot 2014-07-22 17:44:25 UTC
Change 148417 had a related patch set uploaded by Manybubbles:
On article move only use one job

https://gerrit.wikimedia.org/r/148417
Comment 7 Gerrit Notification Bot 2014-07-22 20:21:38 UTC
Change 148405 merged by jenkins-bot:
Chunk updates at 10

https://gerrit.wikimedia.org/r/148405
Comment 8 Gerrit Notification Bot 2014-07-22 20:26:27 UTC
Change 148417 merged by jenkins-bot:
On article move only use one job

https://gerrit.wikimedia.org/r/148417
Comment 9 Nik Everett 2014-07-23 13:07:29 UTC
Shifting back to new - we'll have to reevaluate in two weeks or so once these changes hit production and we've churned through the queue.
Comment 10 Nik Everett 2014-09-03 18:32:05 UTC
Dug into these and the vast majority of them now come from trying to run the same updates twice or three times at the same time.  Noop detection, going out to wikipedias tomorrow, should squash most of these.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links