Last modified: 2013-08-01 18:08:07 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T54395, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 52395 - CirrusSearch: indexing simplewiki in beta seems to be stuck in some kind of loop....
CirrusSearch: indexing simplewiki in beta seems to be stuck in some kind of l...
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
CirrusSearch (Other open bugs)
unspecified
All All
: Unprioritized normal (vote)
: ---
Assigned To: Nik Everett
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-01 12:59 UTC by Nik Everett
Modified: 2013-08-01 18:08 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Nik Everett 2013-08-01 12:59:12 UTC
It did this:
Indexed 50 pages ending at 29137 at 24/second
Indexed 50 pages ending at 29213 at 24/second
Indexed 50 pages ending at 29278 at 24/second
Indexed 50 pages ending at 29368 at 24/second
Indexed 50 pages ending at 29467 at 24/second
Indexed 50 pages ending at 29542 at 24/second
Indexed 50 pages ending at 29612 at 24/second
Indexed 50 pages ending at 29687 at 24/second
Indexed 50 pages ending at 29764 at 24/second
Indexed 50 pages ending at 29835 at 24/second
Indexed 50 pages ending at 29907 at 24/second
Indexed 50 pages ending at 30016 at 24/second
Indexed 50 pages ending at 29261 at 24/second
Indexed 50 pages ending at 29347 at 24/second
Indexed 50 pages ending at 29449 at 24/second
Indexed 50 pages ending at 29512 at 24/second
Indexed 50 pages ending at 29592 at 24/second
Indexed 50 pages ending at 29660 at 24/second
Indexed 50 pages ending at 29743 at 24/second
Indexed 50 pages ending at 29815 at 24/second
Indexed 50 pages ending at 29888 at 24/second
Indexed 50 pages ending at 29969 at 24/second
Indexed 50 pages ending at 30070 at 24/second
Indexed 50 pages ending at 30143 at 24/second
Indexed 50 pages ending at 30214 at 24/second
Indexed 50 pages ending at 30294 at 24/second
Indexed 50 pages ending at 30355 at 24/second
Indexed 50 pages ending at 30433 at 24/second
Indexed 50 pages ending at 16386 at 24/second
Indexed 50 pages ending at 16464 at 24/second
Indexed 50 pages ending at 16525 at 24/second
Indexed 50 pages ending at 16605 at 24/second
Indexed 50 pages ending at 16694 at 24/second
Indexed 50 pages ending at 16755 at 24/second
Indexed 50 pages ending at 16822 at 24/second
Indexed 50 pages ending at 16889 at 24/second
Indexed 50 pages ending at 16989 at 24/second
Indexed 50 pages ending at 17095 at 24/second
Indexed 50 pages ending at 17157 at 24/second
Indexed 50 pages ending at 17242 at 24/second
Indexed 50 pages ending at 17320 at 24/second
Indexed 50 pages ending at 17422 at 24/second
Indexed 50 pages ending at 17493 at 24/second
Indexed 50 pages ending at 17547 at 24/second
Indexed 50 pages ending at 17654 at 24/second
Indexed 50 pages ending at 17727 at 24/second
Indexed 50 pages ending at 17859 at 24/second
Indexed 50 pages ending at 17940 at 24/second


After I killed it it only had a total of 17349 live documents:
manybubbles@deployment-bastion:~$ curl deployment-es0:9200/simplewiki/page/_count?pretty
{
  "count" : 17349,
  "_shards" : {
    "total" : 4,
    "successful" : 4,
    "failed" : 0
  }
}manybubbles@deployment-bastion:~$

Looks like the documents were just overwriting themselves over and over again:
manybubbles@deployment-bastion:~$ curl -s deployment-es0:9200/simplewiki/_status?pretty | grep deleted | head -n1
        "deleted_docs" : 272
manybubbles@deployment-bastion:~$ 

Note that the reason there aren't a ton of deleted docs sitting around is because elasticsearch cleans them up.
Comment 1 Nik Everett 2013-08-01 13:49:52 UTC
This seems to be caused by the forceSearchIndex.php hitting a redirect.  It is supposed to filter out redirects (by the page_is_redirect column) but that doesn't seem to work 100%.  In any case this interacts causes the code that tries to index the redirect target to confuse the code that finds the place to keep indexing because it.  I have a solution I'm testing locally now.
Comment 2 Nik Everett 2013-08-01 13:57:09 UTC
Patch here: https://gerrit.wikimedia.org/r/#/c/77127
Comment 3 Nik Everett 2013-08-01 18:08:07 UTC
This has been merged and I'm rebuilding the search indecies now to make sure nothing was skipped.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links