Last modified: 2013-10-29 02:12:27 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T34871, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 32871 - Search indexes limited to first 100k words (MAX_FIELD_LENGTH)
Search indexes limited to first 100k words (MAX_FIELD_LENGTH)
Status: RESOLVED WONTFIX
Product: Wikimedia
Classification: Unclassified
lucene-search-2 (Other open bugs)
unspecified
All All
: Low normal with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-12-08 10:19 UTC by tagishsimon
Modified: 2013-10-29 02:12 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description tagishsimon 2011-12-08 10:19:54 UTC
There's a suggestion currently at http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Web_scraping_tool_for_article_research_.28list_expansion.29 that the search indexes only the first 100k words in a page. 

This means that important stuff at the bottom of a very long page is not included in the index, which is a bad thing.

Is there any possibility this restriction - if it exists - could be lifted such that all of the text is indexed?
Comment 2 Robert Stojnic 2011-12-08 10:23:52 UTC
http://svn.wikimedia.org/viewvc/mediawiki/trunk/lucene-search-2/src/org/wikimedia/lsearch/index/WikiIndexModifier.java?revision=63824&view=markup

static public final int MAX_FIELD_LENGTH = 100000;

It could be increased, however, I don't remember offhand what were the issues with increasing this number.
Comment 4 Andre Klapper 2013-03-26 11:20:27 UTC
[Merging "MediaWiki extensions/Lucene Search" into "Wikimedia/lucene-search2", see bug 46542. You can filter bugmail for: search-component-merge-20130326 ]
Comment 5 Chad H. 2013-10-29 02:12:27 UTC
lsearchd has reached end of life and will not be improved further. Marking this WONTFIX as a result.

We don't have this limit in CirrusSearch.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links