Last modified: 2013-10-15 19:42:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T57592, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 55592 - CirrusSearch renders every page in the search results probably just to tell the user how many bytes are in it
CirrusSearch renders every page in the search results probably just to tell t...
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
CirrusSearch (Other open bugs)
unspecified
All All
: High normal (vote)
: ---
Assigned To: Nik Everett
:
Depends on: 55590
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-10 21:07 UTC by Nik Everett
Modified: 2013-10-15 19:42 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Nik Everett 2013-10-10 21:07:47 UTC
+++ This bug was initially created as a clone of Bug #55590 +++

Bug 55590 was "discovered" by CirrusSearch's overzealous rendering so I'm cloning it to make this one.  That bug caused crashing which is bad the crashing happens without CirrusSearch.  Its just that CirrusSearch casts a wide net (due to this bug) and 55590 throws a bomb in the net so the search results page blows up.

The  part of the backtrace that matters here:
#17 /usr/local/apache/common-local/php-1.22wmf21/includes/search/SearchEngine.php(868): CirrusSearch->getTextFromContent(Object(Title), Object(WikitextContent))
#18 /usr/local/apache/common-local/php-1.22wmf21/includes/search/SearchEngine.php(954): SearchResult->initText()
#19 /usr/local/apache/common-local/php-1.22wmf21/includes/specials/SpecialSearch.php(651): SearchResult->getByteSize()
#20 /usr/local/apache/common-local/php-1.22wmf21/includes/specials/SpecialSearch.php(543): SpecialSearch->showHit(Object(CirrusSearchResult), Array)
Comment 1 Nik Everett 2013-10-10 21:08:03 UTC
This makes showing results really really really slow.
Comment 2 Nik Everett 2013-10-10 21:47:22 UTC
Got started on this but I have to stop for the night.  We already have the number of bytes in the article in elasticsearch (called textLen) but it isn't stored (so it has to be retrieved from the source, slowing down queries).  I'd like to store both the number of bytes and the number of words directly in Elasticsearch.  I think it is worth overriding these methods to stop the rendering and return textLen for both, deprecate textLen, and replace it with text_bytes and text_words.  The next step would be to reindex.  Then stop using textLen and stop writing it.  On the next reindex it won't be recreated.

Ultimately I'd like to let Elasticsearch figure out the word length on its own but I'm not sure how to do that at this point.  str_word_count will have to do for now.
Comment 3 Gerrit Notification Bot 2013-10-15 14:47:05 UTC
Change 89832 had a related patch set uploaded by Manybubbles:
Include wordCount and byteSize in result

https://gerrit.wikimedia.org/r/89832
Comment 4 Gerrit Notification Bot 2013-10-15 19:37:27 UTC
Change 89832 merged by jenkins-bot:
Include wordCount and byteSize in result

https://gerrit.wikimedia.org/r/89832

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links