Last modified: 2014-06-26 21:24:35 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T69157, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 67157 - CirrusSearch: Failing to reindex Meta
CirrusSearch: Failing to reindex Meta
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
CirrusSearch (Other open bugs)
unspecified
All All
: High normal (vote)
: ---
Assigned To: Nik Everett
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-06-26 18:51 UTC by Nik Everett
Modified: 2014-06-26 21:24 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Nik Everett 2014-06-26 18:51:27 UTC
We're having trouble reindexing meta because we're hitting a page with an external link that contains invalid utf-8:

[2014-06-26 18:43:29,960][DEBUG][action.bulk              ] [elastic1018] [metawiki_general_1403807864][5] failed to execute bulk item (index) index {[metawiki_general_1403807864][page][661035], source[{"namespace":2,"namespace_text":"User","title":"COIBot/Local/selftrans.narod.ru","timestamp":"2011-10-11T04:19:40Z","category":["Pages where template include size is exceeded","Noindexed pages","COIBot Local Reports"],"external_link":["//wikipediatools.appspot.com/linksearch.jsp?set=top20&link=selftrans.narod.ru","//wikipediatools.appspot.com/linksearch.jsp?set=top40&link=selftrans.narod.ru","//wikipediatools.appspot.com/linksearch.jsp?set=major&link=selftrans.narod.ru","http://www.google.com/search?num=10&hl=en&rls=en&q=selftrans.narod.ru","//www.google.com/search?num=100?h1=en&rls=en&q=selftrans.narod.ru+site:en.wikipedia.org","//www.google.com/search?num=100&hl=en&rls=en&q=selftrans.narod.ru+site:fr.wikipedia.org","//www.google.com/search?num=100&hl=en&rls=en&q=selftrans.narod.ru+site:de.wikipedia.org","//www.google.com/search?num=100&hl=en&rls=en&q=selftrans.narod.ru+site:meta.wikimedia.org","http://siteexplorer.search.yahoo.com/advsearch?p=selftrans.narod.ru&bwm=i&bwmf=d&bwms=p","//toolserver.org/~erwin85/xwiki.php?report=User:COIBot/LinkReports/selftrans.narod.ru&forcelive=1","//toolserver.org/~erwin85/xwiki.php?report=User:COIBot/Local/selftrans.narod.ru&forcelive=1","//tools.wmflabs.org/searchsbl/?url=selftrans.narod.ru","http://whois.domaintools.com/selftrans.narod.ru","http://www.aboutus.org/selftrans.narod.ru","http://www.malwaredomainlist.com/mdl.php?search=selftrans.narod.ru&colsearch=Domain&quantity=50","http://www.alexa.com/data/details/main?url=selftrans.narod.ru","http://213.180.199.13","//wikipediatools.appspot.com/linksearch.jsp?set=top20&link=213.180.199.13","//wikipediatools.appspot.com/linksearch.jsp?set=top40&link=213.180.199.13","//wikipediatools.appspot.com/linksearch.jsp?set=major&link=213.180.199.13","http://www.google.com/search?num=10&hl=en&rls=en&q=213.180.199.13","//www.google.com/search?num=100?h1=en&rls=en&q=213.180.199.13+site:en.wikipedia.org","//www.google.com/search?num=100&hl=en&rls=en&q=213.180.199.13+site:fr.wikipedia.org","//www.google.com/search?num=100&hl=en&rls=en&q=213.180.199.13+site:de.wikipedia.org","//www.google.com/search?num=100&hl=en&rls=en&q=213.180.199.13+site:meta.wikimedia.org","http://siteexplorer.search.yahoo.com/advsearch?p=213.180.199.13&bwm=i&bwmf=d&bwms=p","//tools.wmflabs.org/searchsbl/?url=213.180.199.13","http://whois.domaintools.com/213.180.199.13","http://www.aboutus.org/213.180.199.13","http://www.malwaredomainlist.com/mdl.php?search=213.180.199.13&colsearch=Domain&quantity=50","http://www.alexa.com/data/details/main?url=213.180.199.13","http://uk.wikipedia.org/wiki/Mediawiki:Spam-whitelist","http://www.google.com/search?q=%C3%83%C2%83%C3%82%C2%83%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%82%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%83%C3%83%C2%82%C3%82%C2%82%C3%83%C2%83%C3%82%C2%82%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%83%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%82%C3%83%C2%82%C3%82%C2%82%C3%83%C2%83%C3%82%C2%83%C3%83%C2%82%C3%82%C2%82%C3%83%C2%83%C3%82%C2%82%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%83%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%82%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%83%C3%83%C2%82%C3%82%C2%82%C3%83%C2%83%C3%82%C2%82%C3%83%C2%82%C3%82%C2%82%C3%83%C2%83%C3%82%C2%83%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%82%C3%83%C2%82%C3%82%C2%82%C3%83%C2%83%C3%82%C2%83%C3%83%C2%82%C3%82%C2%82%C3%83%C2%83%C3%82%C2%82%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%83%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%82%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%83%C3%83%C2%82%C3%82%C2%82%C3%83%C2%83%C3%82%C2%82%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%83%C3

...

java.lang.IllegalArgumentException: Document contains at least one immense term in field="external_link" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[68 74 74 70 3a 2f 2f 77 77 77 2e 67 6f 6f 67 6c 65 2e 63 6f 6d 2f 73 65 61 72 63 68 3f 71]...'


I'm not sure if this is a new feature of 1.2.1 or what.
Comment 1 Nik Everett 2014-06-26 18:53:20 UTC
index: /metawiki_general_1403807864/page/661035 caused IllegalArgumentException[Document contains at least one immense term in field="external_link" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[68 74 74 70 3a 2f 2f 77 77 77 2e 67 6f 6f 67 6c 65 2e 63 6f 6d 2f 73 65 61 72 63 68 3f 71]...']
Comment 2 Nik Everett 2014-06-26 18:57:10 UTC
Looks like there are more such issues:
cirrus_log/arwikisource.reindex.log:Warning: Search backend error during reindex.  Error message is:  No enabled connection [Called from CirrusSearch\UpdateOneSearchIndexConfig::reindexInternal in /usr/local/apache/common-local/php-1.24wmf10/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php at line 794] in /usr/local/apache/common-local/php-1.24wmf10/includes/debug/Debug.php on line 303
cirrus_log/commonswiki.reindex.log:Warning: Search backend error during sending 10 documents to the file index after 49.  Regex syntax error:  failed to execute script [Called from CirrusSearch\ElasticsearchIntermediary::failure in /usr/local/apache/common-local/php-1.24wmf10/extensions/CirrusSearch/includes/ElasticsearchIntermediary.php at line 98] in /usr/local/apache/common-local/php-1.24wmf10/includes/debug/Debug.php on line 303
cirrus_log/commonswiki.reindex.log:Warning: Search backend error during sending 8 documents to the file index after 89.  Regex syntax error:  failed to execute script [Called from CirrusSearch\ElasticsearchIntermediary::failure in /usr/local/apache/common-local/php-1.24wmf10/extensions/CirrusSearch/includes/ElasticsearchIntermediary.php at line 98] in /usr/local/apache/common-local/php-1.24wmf10/includes/debug/Debug.php on line 303
cirrus_log/ltwiktionary.reindex.log:Warning: Search backend error during sending 1 documents to the general index after 75.  Regex syntax error:  failed to execute script [Called from CirrusSearch\ElasticsearchIntermediary::failure in /usr/local/apache/common-local/php-1.24wmf10/extensions/CirrusSearch/includes/ElasticsearchIntermediary.php at line 98] in /usr/local/apache/common-local/php-1.24wmf10/includes/debug/Debug.php on line 303
cirrus_log/metawiki.reindex.log:Warning: Search backend error during reindex.  Error message is:  Error in one or more bulk request actions:


Though, it isn't clear what the error is due to the broken syntax checker that we just fixed.
Comment 3 Nik Everett 2014-06-26 19:52:27 UTC
OK!  Those error messages - the ones about regex syntax errors will stop masking their real errors tonight.  They are caused by update errors.  Simple enough to fix, and I'll put that in the same patch that fixes meta's problem.
Comment 4 Nik Everett 2014-06-26 19:53:13 UTC
arwikisource is different - I'm not sure what is up with it.  It errors out (every time) with 
Warning: Search backend error during reindex.  Error message is:  No enabled connection [Called from CirrusSearch\UpdateOneSearchIndexConfig::reindexInternal in /usr/local/apache/common-local/php-1.24wmf10/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php at line 794] in /usr/local/apache/common-local/php-1.24wmf10/includes/debug/Debug.php on line 303

That means it got multiple http failures.
Comment 5 Gerrit Notification Bot 2014-06-26 20:54:52 UTC
Change 142404 had a related patch set uploaded by Manybubbles:
Fix rare-ish errors

https://gerrit.wikimedia.org/r/142404
Comment 6 Gerrit Notification Bot 2014-06-26 21:02:15 UTC
Change 142404 merged by jenkins-bot:
Fix rare-ish errors

https://gerrit.wikimedia.org/r/142404
Comment 7 Gerrit Notification Bot 2014-06-26 21:02:36 UTC
Change 142412 had a related patch set uploaded by Manybubbles:
Fix rare-ish errors

https://gerrit.wikimedia.org/r/142412
Comment 8 Gerrit Notification Bot 2014-06-26 21:03:07 UTC
Change 142413 had a related patch set uploaded by Manybubbles:
Fix rare-ish errors

https://gerrit.wikimedia.org/r/142413
Comment 9 Gerrit Notification Bot 2014-06-26 21:08:34 UTC
Change 142413 merged by jenkins-bot:
Fix rare-ish errors

https://gerrit.wikimedia.org/r/142413
Comment 10 Gerrit Notification Bot 2014-06-26 21:08:40 UTC
Change 142412 merged by jenkins-bot:
Fix rare-ish errors

https://gerrit.wikimedia.org/r/142412

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links