Last modified: 2014-09-24 20:53:49 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T73233, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 71233 - Tables not being indexed
Tables not being indexed
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
CirrusSearch (Other open bugs)
master
All All
: Unprioritized normal (vote)
: ---
Assigned To: Nik Everett
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-09-24 13:36 UTC by Jeremy
Modified: 2014-09-24 20:53 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Jeremy 2014-09-24 13:36:43 UTC
CirrusSearch does not appear to be indexing tables, not even into "auxiliary_text". In fact, the "auxiliary_text" field in my index appears to be empty for all entries, and pages that have content consisting of just tables is not indexed at all - even the title is not there. 977e3f9 branch with Elasticsearch 1.3.2.
Comment 1 Nik Everett 2014-09-24 13:42:38 UTC
Its certainly _supposed_ to index them in the auxiliary text.  What is in the index for pages with just tables?  Can you run ?action=cirrusdump on one?  Can you null edit a page and see if anything interesting is logged when the index is updated?
Comment 2 Jeremy 2014-09-24 15:33:27 UTC
Performing action=cirrusdump shows "auxiliary_text":[]

The actual table is indexed in the "source_text" field.

Also, it appears that the page *did* get indexed, I just couldn't find it in Sense by title search.

Nothing gets logged by Elasticsearch when the index runs (logging set to "DEBUG"). I also did a regular edit, then checked ?action=cirrusdump and confirmed that the new text is there (not in a table, of course).

I see that it's working on Wikipedia, so could it just be that my index is bad?
Comment 3 Nik Everett 2014-09-24 15:41:02 UTC
If the json comes back with "auxiliary_text":[] that means Cirrus is sending the table empty.  Do you have any auxiliary text in the index at all?  Maybe it needs tidy or something.  What version of Mediawiki and PHP are you using?
Comment 4 Jeremy 2014-09-24 16:59:23 UTC
As far as I can tell auxiliary_text is completely empty in the index.

I'm on MW 1.23.3, PHP 5.3.10.
Comment 5 Nik Everett 2014-09-24 17:38:02 UTC
Ok.  It won't work properly right now with MW version 1.24wmf10.  That version has a change where the HtmlFormatter can return the text that it filtered out.  This is how auxiliary text works for us.   Let me see if I can work around that.
Comment 6 Gerrit Notification Bot 2014-09-24 17:56:24 UTC
Change 162653 had a related patch set uploaded by Manybubbles:
Don't remove auxiliary text if mw is too old

https://gerrit.wikimedia.org/r/162653
Comment 7 Nik Everett 2014-09-24 17:59:00 UTC
I've uploaded a patch to Cirrus that should leave the table text in the "text" field if MediaWiki doesn't yet support what we need to build the auxiliary text properly.
Comment 8 Jeremy 2014-09-24 19:00:04 UTC
Thanks for the quick turnaround! I confirmed that editing a page with tables now causes the the table contents to be indexed into the "text" field. I will rebuild the index to take care of the rest.

Thanks again!
Comment 9 Nik Everett 2014-09-24 19:03:26 UTC
Glad it worked!
Comment 10 Gerrit Notification Bot 2014-09-24 20:45:03 UTC
Change 162653 merged by jenkins-bot:
Don't remove auxiliary text if mw is too old

https://gerrit.wikimedia.org/r/162653

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links