Last modified: 2013-10-23 23:48:17 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T34026, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 32026 - Lucene results order varies when lucene scores are equivalent
Lucene results order varies when lucene scores are equivalent
Status: RESOLVED WONTFIX
Product: Wikimedia
Classification: Unclassified
lucene-search-2 (Other open bugs)
unspecified
All All
: Normal major with 1 vote (vote)
: ---
Assigned To: Robert Stojnic
http://en.wikipedia.org/w/api.php?srp...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-10-28 22:28 UTC by Joe Osowski
Modified: 2013-10-23 23:48 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Joe Osowski 2011-10-28 22:28:56 UTC
As per this discussion:

http://lists.wikimedia.org/pipermail/mediawiki-api/2011-October/002416.html

Make a call:

http://en.wikipedia.org/w/api.php?srprop=sectiontitle&srlimit=25&srsearch=10.1371%2Fjournal.pone.0008776&action=query&format=xml&list=search&sroffset=0&srwhat=text

Click refresh, click refresh, and the result order changes.  

You can't page the results without the order being deterministic.
Comment 1 Robert Stojnic 2011-10-28 22:39:21 UTC
The indexes are currently up to data. The problem here is that multiple results have exactly the same score:

689
1.1514114 0 Flamingo_tongue_snail
1.14319 0 Oxygyrus_keraudrenii
1.1418508 0 Atlanta_lesueurii
1.1418508 0 Tenagodus_barbadensis
1.1418508 0 Atlanta_brunnea
1.1418508 0 Littoraria_irrorata
1.1410456 0 Atlanta_pulchella
1.1410456 0 Caecum_clava
1.1410456 0 Hypselodoris_acriba
1.1410456 0 Calliostoma_adelae
1.1410456 0 Alvania_dejongi
1.1410456 0 Benthonellania_xanthias
1.1410456 0 Cardiapoda_placenta
1.1410456 0 Bursa_rhodostoma
1.1410456 0 Capulus_subcompressus
1.1410456 0 Cerithiopsis_lata
1.1410456 0 Cerithiopsis_flava
1.1410456 0 Cerithium_guinaicum
1.1410456 0 Cerithiopsis_merida
1.1410456 0 Cerithiopsis_georgiana

For some reason, the results in some cases have their score rounded to 5 decimals, and in some cases to 7 decimals. Not sure why this is happening, since the scores are calculated on the same machine. Could be some lucene query caching weirdness.
Comment 2 Robert Stojnic 2011-10-28 22:46:47 UTC
So it seems on the same machine the order is always the same, although the scores are not always rounded the same. Previous comment was from results on search1, this is running the same search on search4 a couple of times

curl http://search4:8123/search/enwiki/10.1371%2Fjournal.pone.0008776?version=2
689
1.1514115 0 Flamingo_tongue_snail
1.1431901 0 Oxygyrus_keraudrenii
1.141851 0 Atlanta_brunnea
1.141851 0 Littoraria_irrorata
1.141851 0 Atlanta_lesueurii
1.141851 0 Tenagodus_barbadensis
1.1410457 0 Bulbus_carcellesi
1.1410457 0 Caecum_circumvolutum
1.1410457 0 Caecum_multicostatum
1.1410457 0 Calliostoma_javanicum
1.1410457 0 Alvania_verrilli
1.1410457 0 Bursa_natalensis
1.1410457 0 Bursa_corrugata
1.1410457 0 Caecum_insularum
1.1410457 0 Cerithiopsis_academicorum
1.1410457 0 Cerithioclava_garciai
1.1410457 0 Cerithiopsis_fuscoflava
1.1410457 0 Cerithiopsis_guitarti
1.1410457 0 Copulabyssia_riosi
1.1410457 0 Crucibulum_auricula

curl http://search4:8123/search/enwiki/10.1371%2Fjournal.pone.0008776?version=2
689
1.1514114 0 Flamingo_tongue_snail
1.14319 0 Oxygyrus_keraudrenii
1.1418508 0 Atlanta_brunnea
1.1418508 0 Littoraria_irrorata
1.1418508 0 Atlanta_lesueurii
1.1418508 0 Tenagodus_barbadensis
1.1410456 0 Bulbus_carcellesi
1.1410456 0 Caecum_circumvolutum
1.1410456 0 Caecum_multicostatum
1.1410456 0 Calliostoma_javanicum
1.1410456 0 Alvania_verrilli
1.1410456 0 Bursa_natalensis
1.1410456 0 Bursa_corrugata
1.1410456 0 Caecum_insularum
1.1410456 0 Cerithiopsis_academicorum
1.1410456 0 Cerithioclava_garciai
1.1410456 0 Cerithiopsis_fuscoflava
1.1410456 0 Cerithiopsis_guitarti
1.1410456 0 Copulabyssia_riosi
1.1410456 0 Crucibulum_auricula
Comment 3 Joe Osowski 2011-10-31 21:15:58 UTC
Is there a way for me (On the other side of the load balancer) to keep using the same server?
Comment 4 Platonides 2011-12-04 21:11:05 UTC
Are the machine architectures and program versions the same? That reminds me the issue with php which depended on 32bit or 64bit.
Comment 5 Joe Osowski 2012-02-08 16:30:23 UTC
Has this been resolved?  I'm no longer replicating it.
Comment 6 Joe Osowski 2012-02-22 21:37:16 UTC
Nevermind, still happening
Comment 7 Andre Klapper 2013-03-06 14:43:35 UTC
This is still an issue, and the thread at http://lists.wikimedia.org/pipermail/mediawiki-api/2011-October/002420.html and followup implies that some indexes could be out of sync. Or not.
Comment 8 Quim Gil 2013-03-13 19:03:53 UTC
Is there a way to reproduce this from the regular Special:Search interface, how a regular user would find out?

I'm just asking because I volunteered to write a test automation scenario to keep observing this problem if/when gets fixed. In the context of https://www.mediawiki.org/wiki/QA/Browser_testing/Search_features
Comment 9 jeremyb 2013-03-13 19:07:26 UTC
Just rerun comment 2 against multiple backends and check if it varies or not?
Comment 10 Andre Klapper 2013-07-25 18:53:53 UTC
(Nothing here to fix by ops but in Lucene code instead. Removing "ops" keyword.)
Comment 11 Chad H. 2013-10-23 23:48:17 UTC
Wont be fixing this, lsearchd has reached its end of life.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links