Last modified: 2014-09-28 06:06:53 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T68011, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 66011 - Search index not updating on en.wikipedia
Search index not updating on en.wikipedia
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
lucene-search-2 (Other open bugs)
wmf-deployment
All All
: Lowest normal (vote)
: ---
Assigned To: Chad H.
cirrus-fixed
:
: 70984 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-06-02 02:37 UTC by SpontaneousGrumbler
Modified: 2014-09-28 06:06 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description SpontaneousGrumbler 2014-06-02 02:37:49 UTC
Search  for "2014 ATP World Tour is the global elite" – results show "2014 ATP World Tour" article was indexed early on 29 May 2014, but its history shows it was updated several times later that day and over the next several days.
Comment 1 Andre Klapper 2014-06-02 10:46:46 UTC
Confirming.  Correct date is shown when using the CirrusSearch backend hence likely WONTFIX:
https://en.wikipedia.org/w/index.php?search=2014+ATP+World+Tour+is+the+global+elite&title=Special%3ASearch&go=Go&srbackend=CirrusSearch
Comment 2 SpontaneousGrumbler 2014-06-02 13:47:17 UTC
So, you are saying that we will not update the production search index until CirrusSearch goes into production? CirrusSearch is not ready for prime time.
Comment 3 Nik Everett 2014-06-02 14:08:18 UTC
(In reply to SpontaneousGrumbler from comment #2)
> So, you are saying that we will not update the production search index until
> CirrusSearch goes into production? CirrusSearch is not ready for prime time.

Its not a cut and dry as that.  The production search index is updating but slowly and so far as I can tell spotily.  We're not going to fix it because every time we try it causes more problems.  We don't have the expertise to even deploy an update any more much less trace down what's going on in this case.

So we focus on Cirrus because the hard bit (Elasticsearch) has significantly more folks using it so the knowledge gap isn't so bad.  Also the documentation is better.

As far as Cirrus not being read for prime time, are you referring to any particular feature shortcoming or "just" the performance aspect?  I'd love to know so I can, well, fix it.
Comment 4 SpontaneousGrumbler 2014-06-02 15:06:17 UTC
Chris the speller has posted on mediawiki.org/wiki/Talk:Search about hyphens being ignored. Also, I have seen many cases where some hiccup caused CirrusSearch to miss an update; these apparently will never get fixed, whereas the current updater for lsearchd will provide a completely updated index whenever it runs to completion, even if that is not every day, as it should be.
Comment 5 Nemo 2014-06-03 10:39:45 UTC
Yes, there are some bugs, which will be solved. Cf. https://www.mediawiki.org/wiki/Thread:Talk:Search/LiquidThreads_archive/%27Old_search%27_is_better

(In reply to SpontaneousGrumbler from comment #4)
> Also, I have seen many cases

{{vague}}

> where some hiccup caused
> CirrusSearch to miss an update; these apparently will never get fixed,

It's enough to dummy edit the page (perhaps even null edit, I don't remember).

> whereas the current updater for lsearchd will provide a completely updated
> index whenever it runs to completion, even if that is not every day, as it
> should be.

Sounds like "whenever the gates of heaven open on earth, even if that is not every day, as it should be". AFAIK this has not happened in years.

It's useless for us editors to pretend otherwise, this bug will not be fixed in this component (lsearchd). Please keep testing CirrusSearch, feedback and criticism are very useful to the devs.
Comment 6 Bawolff (Brian Wolff) 2014-06-04 16:40:05 UTC
If we're going to wontfix this bug, it seems like we should really have cirrus deployed as primary. Or at least deployed in the very near future...

----

For reference, people at commons are also complaining that new files (within the last 4 days) aren't being indexed by Lucene.
Comment 7 Nemo 2014-06-04 16:55:17 UTC
(In reply to Bawolff (Brian Wolff) from comment #6)
> For reference, people at commons are also complaining that new files (within
> the last 4 days) aren't being indexed by Lucene.

They should probably have a brief discussion at village pump and then ask file a site request for Cirrus to be primary, then.
There is a timeline at https://www.mediawiki.org/wiki/Search#Wikis but it's slightly out of date.
Comment 8 Andre Klapper 2014-09-18 09:14:20 UTC
*** Bug 70984 has been marked as a duplicate of this bug. ***
Comment 9 John C. Watson 2014-09-18 10:47:31 UTC
Pardon me, as I am not particularly technically inclined.  As I understand it, the currently primary search engine/search engine backend for English and other Wikipedias is buggy, the bug(s) is/are not going to be corrected any time soon if at all, a new search engine/search engine backend is supposed to be made the primary sometime soon, and in the meantime the users and editors of English Wikipedia have to wait.  Oh, and the update time for the search engine index is at least four days.  Is that a fair assessment of the situation?
Comment 10 Bartosz Dziewoński 2014-09-18 11:16:32 UTC
It looks almost fair to me:

* As for "English and other Wikipedias", there are only four projects left
  where CirrusSearch (the new search engine, which *doesn't* have problems
  with updates stalling and *doesn't* need days to update) isn't default:
  de.wp, en.wp, fr.wp and zh.wp. The remaining 881 (sic) are using the new
  search engine already.

* As for "soon", [[mw:Search#Timeline]] says that "Our general goal is to
  deploy CirrusSearch as the primary search backend for all wikis by the
  end of September 2014", and this seems realistic to me based on the
  table of deployments completed there. Two more weeks of waiting sounds
  reasonable to me.
Comment 11 Nemo 2014-09-18 11:17:31 UTC
(In reply to John C. Watson from comment #9)
> Is that a fair assessment of
> the situation?

The general idea is correct. On the bright side, [[mw:Search#Wikis]] is now up to date and it's probably just a matter of weeks before they're able to enable Cirrus on the last 4 wikis.
Comment 12 SpontaneousGrumbler 2014-09-24 16:30:07 UTC
Well, LuceneSearch's index got updated today (24 September 2014). Thanks to the person who got it running!
Comment 13 Nik Everett 2014-09-24 17:15:51 UTC
Chad got it!  He fixed it yesterday but it takes a day to really be sure it worked.
Comment 14 John C. Watson 2014-09-27 05:49:28 UTC
How long should it take for the index of new primary search engine to update?  (I made some edits over 24 hours ago, but the searches I just made have yet to "notice" them.)
Comment 15 Chad H. 2014-09-27 14:31:03 UTC
I'm not entirely sure as I hadn't managed to track down *all* of the problems with indexing, just some. There are many :(

Some pages are definitely being updated: [[2014]] is now showing the version from the 27th of September (it was stuck at the 11th!), other articles at a cursory glance do have some recent timestamps as well.

So my advice, much as it's lame, is to be patient and things will continue to slowly update.
Comment 16 John C. Watson 2014-09-28 06:06:53 UTC
Okay—understood.  (This issue is important to me because I "patrol" a list of frequently/occasionally misspelled words, some instances of which are valid for one reason or another, so a frequently updated index is very helpful in finding new mistakes and making sure that I really did correct old ones.)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links