Last modified: 2013-12-26 21:09:40 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T58968, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 56968 - CirrusSearch: We think CirrusSearch is doing two page parses on a page update and don't like that


Summary:	CirrusSearch: We think CirrusSearch is doing two page parses on a page update...

Status:	RESOLVED FIXED

Product:	MediaWiki extensions
Classification:	Unclassified
Component:	CirrusSearch (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	High normal (vote)
Target Milestone:	---
Assigned To:	Chad H.

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2013-11-12 22:24 UTC by Nik Everett
Modified:	2013-12-26 21:09 UTC (History)
CC List:	3 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Nik Everett 2013-11-12 22:24:30 UTC

We think CirrusSearch is doing two page parses on a page update and don't like that.  We should do something about that but we're not sure what yet.

Comment 1 Chad H. 2013-11-21 18:36:40 UTC

So I think I more or less tracked it down. It can happen in two places:

1) When we're called in SearchUpdate during a page edit, we're not making use of the already parsed version of the page. This makes page edits slower. It should be basically a non-issue though if we fix bug 57316

2) Page redirects, when calling updateFromTitleAndText() and then updateFromTitle(), we we're likely throwing away any parser output we have on hand.

To fix this once and for all, I think I've got a solution in mind: I'm going to write a class that handles grabbing parser output and the parser cache so Cirrus won't have to care.

At the very least, we'd only end up with one parse per page in a given process.

Comment 2 Nik Everett 2013-11-21 18:39:57 UTC

I like the idea.  I think it'll be less important when we get Bug 572316 because there will be fewer paths to take into the update process.

Comment 3 Nik Everett 2013-11-21 18:40:31 UTC

Oh, I still advocate doing it every though we're doing bug 57316.

Comment 4 Chad H. 2013-12-26 21:08:32 UTC

So, the situation is far better than it was now that bug 57316 is fixed. There's tons of refactoring for that and other things have improved this.

We pass ParserOutput around everywhere now, and don't stampede the cache like we used to. We still toss the EditPage's output like I said in comment 1 (1), but we don't care anymore since we're using the job queue (and passing the output through the job queue would be bad).

Comment 1 (2) is fixed now

Other than bulk indexing jobs (where we bump the batch size to 10-ish), we shouldn't be doing any more than 1 parse per job now, which was the goal.

Marking this FIXED since we don't have a MOSTLY-FIXED-BUT-DONT-CARE :)

Comment 5 Chad H. 2013-12-26 21:09:40 UTC

(In reply to comment #4)
> Comment 1 (2) is fixed now
> 

Pressed enter too soon. Meant to say:

Comment 1 (2) is fixed now with the job queue as well. We still index recursively, but since it's in the job queue we're not doing multiple in-process parses.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links