Last modified: 2014-06-18 14:54:28 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T67803, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 65803 - [Regression] CirrusSearch: Excerpts should not show normalised version (all lowercase and no punctuation)
[Regression] CirrusSearch: Excerpts should not show normalised version (all l...
Status: VERIFIED FIXED
Product: MediaWiki extensions
Classification: Unclassified
CirrusSearch (Other open bugs)
unspecified
All All
: High major (vote)
: ---
Assigned To: Chad H.
: code-update-regression
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-05-27 12:19 UTC by Krinkle
Modified: 2014-06-18 14:54 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Krinkle 2014-05-27 12:19:30 UTC
I believe this change happened fairly recently. When searching for a phrase, the excerpts shown on Special:Search now seem to be exposing the normalised versions (e.g. stripped of all parenthesis, character casing and other variants, presumably accents as well).

Though it doesn't happen consistently. Presumably this is more than a simple character stripping/replacement, but something more finetuned for language. So maybe the recent regression was not it being turned on for excerpts, but the normalisation itself being changed.

User facing issue:

Search for "WisReden" on nl.wikipedia.org.

Results:

1. Gebruiker:Chaemera/monobook.js

   'gebruiker erwin blockmsg.js ; importscript 'wikipedia wisreden' ;

   130 B (11 woorden) - 21 nov 2007 16:26

2. Gebruiker:Emmelie/monobook.js

   'gebruiker warddr qpreview.js ; importscript 'wikipedia wisreden' ; document.write ' ' ; version 1.beta.4 zeus_head_thumb-zanaq

   4 kB (563 woorden) - 25 mrt 2008 17:18

3. Gebruiker:Oliphaunt/monobook.js

   importscript 'wikipedia wisreden' ; importscript 'en wikipedia wikiproject user scripts scripts add

   2 kB (190 woorden) - 23 jul 2008 10:19


Actual page content:

- Gebruiker:Chaemera/monobook.js:


   // [[Gebruiker:Erwin/blockmsg.js]]
   importScript('Gebruiker:Erwin/blockmsg.js');
   importScript('Wikipedia:WisReden');

- excerpt:

   'gebruiker erwin blockmsg.js ; importscript 'wikipedia wisreden' ;

Rather weird that it:
* Converted everything to lower case.
* Added a space before the semi-colon.
* Turned the quoted text into one quotation instead of two, but preserved the semi-colon.



As for inconsistency, here is a search for "addOnloadhook" on nl.wikipedia.org:

1. Gebruiker:Aleichem/monobook.js

   addonloadhook stats ; document.write ' ' ;

2. Wikipedia:WisReden

   location.href.indexOf("action=delete")!=-1) addOnloadHook(WisReden); //

Result #1 has a normalised excerpt, result #3 has original case preserved.
Comment 1 Nik Everett 2014-06-02 18:51:04 UTC
Chad, can you look at this?  I think you have more experience with the normalization stuff then I.  It looks like we're normalizing the on the way in in Cirrus somewhere.  I think we should let Elasticsearch do the normalization, for the most part.

This example from the bottom of the description is good:
https://nl.wikipedia.org/w/index.php?title=Speciaal%3AZoeken&profile=all&search=addOnloadhook&fulltext=Search
Comment 2 Nik Everett 2014-06-18 14:54:19 UTC
Chad told me earlier that he believed this was something we'd fixed and that reindexing the scripts should have fixed it.  I reindexed the whole wiki and poked around and couldn't find any more scripts that looked bad.  I'm marking this verified.  If you see something still broken reopen and we'll dig into it.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links