Last modified: 2013-12-26 21:09:40 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T58968, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 56968 - CirrusSearch: We think CirrusSearch is doing two page parses on a page update and don't like that
CirrusSearch: We think CirrusSearch is doing two page parses on a page update...
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
CirrusSearch (Other open bugs)
unspecified
All All
: High normal (vote)
: ---
Assigned To: Chad H.
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-12 22:24 UTC by Nik Everett
Modified: 2013-12-26 21:09 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Nik Everett 2013-11-12 22:24:30 UTC
We think CirrusSearch is doing two page parses on a page update and don't like that.  We should do something about that but we're not sure what yet.
Comment 1 Chad H. 2013-11-21 18:36:40 UTC
So I think I more or less tracked it down. It can happen in two places:

1) When we're called in SearchUpdate during a page edit, we're not making use of the already parsed version of the page. This makes page edits slower. It should be basically a non-issue though if we fix bug 57316

2) Page redirects, when calling updateFromTitleAndText() and then updateFromTitle(), we we're likely throwing away any parser output we have on hand.

To fix this once and for all, I think I've got a solution in mind: I'm going to write a class that handles grabbing parser output and the parser cache so Cirrus won't have to care.

At the very least, we'd only end up with one parse per page in a given process.
Comment 2 Nik Everett 2013-11-21 18:39:57 UTC
I like the idea.  I think it'll be less important when we get Bug 572316 because there will be fewer paths to take into the update process.
Comment 3 Nik Everett 2013-11-21 18:40:31 UTC
Oh, I still advocate doing it every though we're doing bug 57316.
Comment 4 Chad H. 2013-12-26 21:08:32 UTC
So, the situation is far better than it was now that bug 57316 is fixed. There's tons of refactoring for that and other things have improved this.

We pass ParserOutput around everywhere now, and don't stampede the cache like we used to. We still toss the EditPage's output like I said in comment 1 (1), but we don't care anymore since we're using the job queue (and passing the output through the job queue would be bad).

Comment 1 (2) is fixed now

Other than bulk indexing jobs (where we bump the batch size to 10-ish), we shouldn't be doing any more than 1 parse per job now, which was the goal.

Marking this FIXED since we don't have a MOSTLY-FIXED-BUT-DONT-CARE :)
Comment 5 Chad H. 2013-12-26 21:09:40 UTC
(In reply to comment #4)
> Comment 1 (2) is fixed now
> 

Pressed enter too soon. Meant to say:

Comment 1 (2) is fixed now with the job queue as well. We still index recursively, but since it's in the job queue we're not doing multiple in-process parses.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links