Last modified: 2013-10-29 02:12:27 UTC
There's a suggestion currently at http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Web_scraping_tool_for_article_research_.28list_expansion.29 that the search indexes only the first 100k words in a page. This means that important stuff at the bottom of a very long page is not included in the index, which is a bad thing. Is there any possibility this restriction - if it exists - could be lifted such that all of the text is indexed?
Section was renamed on village pump. Permalink: http://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(technical)&oldid=464743644#Web_scraping_tool_for_article_research_.28list_expansion.29
http://svn.wikimedia.org/viewvc/mediawiki/trunk/lucene-search-2/src/org/wikimedia/lsearch/index/WikiIndexModifier.java?revision=63824&view=markup static public final int MAX_FIELD_LENGTH = 100000; It could be increased, however, I don't remember offhand what were the issues with increasing this number.
I got the link to the VP wrong - try http://en.wikipedia.org/w/index.php?title=Wikipedia%3AVillage_pump_%28technical%29&action=historysubmit&diff=464743644&oldid=464742465#Archive_search_bug.3F for the original complaint.
[Merging "MediaWiki extensions/Lucene Search" into "Wikimedia/lucene-search2", see bug 46542. You can filter bugmail for: search-component-merge-20130326 ]
lsearchd has reached end of life and will not be improved further. Marking this WONTFIX as a result. We don't have this limit in CirrusSearch.