Last modified: 2014-02-20 21:50:38 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T56503, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 54503 - CirrusSearch doesn't find terms that appear in wikitext but not in rendered text
CirrusSearch doesn't find terms that appear in wikitext but not in rendered text
Status: RESOLVED DUPLICATE of bug 43652
Product: MediaWiki extensions
Classification: Unclassified
CirrusSearch (Other open bugs)
unspecified
All All
: Lowest normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-24 15:11 UTC by p858snake
Modified: 2014-02-20 21:50 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description p858snake 2013-09-24 15:11:41 UTC
+++ This bug was initially created as a clone of Bug #54502 +++

> <p858snake|l> ^d: search isn't looking inside templates anymore? https://www.mediawiki.org/w/index.php?search=AllowImageTag&button=&title=Special%3ASearch (used to find a result)
> <p858snake|l> also it doesn't find the page, unless you search its exact title either... https://www.mediawiki.org/w/index.php?search=wgAllowImageTag&button=&title=Special%3ASearch
> <^d> Hmm, should. Let's have a looksee.
> <p858snake|l> it should find https://www.mediawiki.org/wiki/Manual:$wgAllowExternalImages#See_also at least
> <^d> Right. File a bug, that looks...wrong.
Comment 1 Nik Everett 2013-09-24 20:31:17 UTC
Works for me:
Searching for https://www.mediawiki.org/w/index.php?search=wgAllowImageTag&button=&title=Special%3ASearch finds https://www.mediawiki.org/wiki/Manual:$wgAllowImageTag as expected.

Please reopen if it stops working again.
Comment 2 p858snake 2013-09-24 20:34:10 UTC
(In reply to comment #1)
> Works for me:
> Searching for
> https://www.mediawiki.org/w/index.
> php?search=wgAllowImageTag&button=&title=Special%3ASearch
> finds https://www.mediawiki.org/wiki/Manual:$wgAllowImageTag as expected.
> 
> Please reopen if it stops working again.

well of course it does, if you search its full name minus the NS.

try

https://www.mediawiki.org/w/index.php?title=Special%3ASearch&profile=default&search=AllowImageTag&fulltext=Search
Comment 3 Nik Everett 2013-09-24 21:16:37 UTC
Sorry, I had trouble parsing the IRC conversation into a bug.  So if you search with Lucene search (https://www.mediawiki.org/w/index.php?search=AllowImageTag&button=&title=Special%3ASearch&srbackend=LuceneSearch) you get two results:
Manual:$wgAllowExternalImages and Manual:$wgAllowImageTag.  With CirrusSearch you get no results.

LuceneSearch finds both pages because the term AllowImageTag exists inside a template.  In both cases AllowImageTag is a parameter passed to the wg template.  CirrusSearch doesn't find either one because when the template is expanded it doesn't contain the string AllowImageTag.  This is the expected behaviour for both extensions.

I think the part of the IRC conversation about CirrusSearch not finding pages with an exact title is really describing this behaviour.

I'm not really sure how to prioritize this given that searching the rendered version of the article rather than the wikitext that built it is an explicit design goal and one that lots of folks are excited about.  CirrusSearch certainly won't switch that off.

We could solve this bug in a bunch of ways, none of which I'm particularly happy with:
1.  Parse camelCase text specially, splitting terms on case change.  This would fix this particular problem and may even be useful for MW.org but doesn't solve all template contents searching problems and doesn't make sense outside of MW.org.
2.  Index wikitext in another field and query against it when querying against text.  Only show wikitext highlights if there aren't any matches in the rendered text.  This has a ton of overhead because we're doubling up on the largest field and would create confusing highlighting for readers but only in the case where the search wouldn't have found anything anyway.
3.  Index template text next to its expanded version in wiki.  This would make highlighting look hideous and would be really hard to implement and would bloat the page text mightily.
4.  Allow term fragmenting hints to be inserted into wikitext that aren't rendered but are passed to search.  This creates a bunch of work for the community and a bunch of development work but provides a general solution to the problem.  It certainly could be abused though.
Comment 4 Nik Everett 2013-11-04 21:44:37 UTC
Moving to low with feature requests.  I know this is a parity thing, but it is the opposite of the core "we index the expanded templates" feature of CirrusSearch.
Comment 5 Chad H. 2014-02-20 21:50:38 UTC
I'm going to dupe this to bug 43652 for allowing free querying of the search index (which will also contain unparsed wikitext).

*** This bug has been marked as a duplicate of bug 43652 ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links