Last modified: 2014-10-06 13:54:55 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T66473, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 64473 - Text contained in a URL is not returned via fulltext search
Text contained in a URL is not returned via fulltext search
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
OTRS (Other open bugs)
wmf-deployment
All All
: Low normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-04-26 07:48 UTC by Patrik (pajz)
Modified: 2014-10-06 13:54 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Patrik (pajz) 2014-04-26 07:48:06 UTC
PROBLEM DESCRIPTION
===================
Searching for a term within a URL does not return a result even though a result
is returned if the same term is contained in some normal (non-URL) text.


STEP-BY-STEP DESCRIPTION TO REPRODUCE THE PROBLEM
=================================================
1) Create a ticket with the following content:

   https://commons.wikimedia.org/wiki/File:Test_carnival.jpg
   File:Test_carnival2.jpg

2) Perform a fulltext search for Test_carnival2.jpg and notice that the created
ticket is returned.

3) Perform a fulltext search for Test_carnival.jpg and notice that the created
ticket is NOT returned.


STATUS
======
This was originally reported upstream but please see http://bugs.otrs.org/show_bug.cgi?id=10393#c5.
Comment 1 Patrik (pajz) 2014-06-29 13:37:12 UTC
It seems likely that this is related to the WordLengthMax property in Ticket::SearchIndex::Attribute (Ticket -> Core::FulltextSearch). This is currently set to the default value of 30, a limit exceeded by almost all URLs.

As an example, a ticket with the following content:

https://de.wikipedia.org/wiki/File:1XXY.jpg
https://de.wi.org/wiki/File:2XXY.jpg
https://de.w.org/File:3XXY.jpg

is returned after a search for 3XXY, but not after one for 1XXY or 2XXY (Ticket# 2014052710014793).

It would need to be looked into if, following an increase of that limit, a rebuild of the fulltext db is feasible (RebuildFulltextIndex.pl) and, if not / before that, if this value can simply be lifted without re-indexing all existing articles, so that the bug is fixed at least for all new articles.

(This bug is not low-priority, it's a critical feature for the permissions team. If they can't properly search tickets and specifically file URLs, they aren't able to find permission emails, and the corresponding files get deleted for copyright reasons.)
Comment 2 Patrik (pajz) 2014-08-15 17:44:44 UTC
Upon consultation with Jeff Green, I've set WordLengthMax to 200. Indeed the bug appears to be fixed now for all new tickets, see ticket#2014081510013908 which basically reproduces the above example.

Jeff is currently investigating the possibility of rebuilding the fulltext db.
Comment 3 Patrik (pajz) 2014-10-06 13:54:55 UTC
The search index was rebuilt (thanks Jeff), that way the fix was additionally applied to all existing tickets. => All done.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links