Last modified: 2014-06-30 02:22:55 UTC
1. Note that https://test2.wikipedia.org/w/index.php?title=Birch_beer&oldid=57684 includes a link to growstuff.org . 2. Search test2wiki for "growstuff.org" - https://test2.wikipedia.org/w/index.php?search=growstuff.org&title=Special%3ASearch 3. Empty results set. What is the desired behavior here? If a page does not *mention* growstuff.org but does *link* to it, should we include it in the results set?
I _think_ we should include it. One way to think of this is, if we did include it, how would you like it highlighted? Another thing to consider is that we're mostly optimized for searching for words and might not be able to notice a url in the stream to properly not split it and (heaven forbid) stem it. Like Bug 53013, my gut says set the priority to low because we're mostly concerned with searching words. So I'm setting the priority to low. We should revisit this once we're comfortable with other issues.
I've been pondering this, and I'm not convinced we should index it. I can't think of a sane way of doing so, or how to reinsert it into the content (which we've already stripped of all wikitext and html). We have Special:LinkSearch, does it not work?
*** Bug 59205 has been marked as a duplicate of this bug. ***
Bug 59205 showed us that folks do expect link searches to work. Options: 0. Do nothing. 1. Detect a link in the search and people to Special:LinkSearch. If folks are searching for full uris without extra terms this would probably work. 2. Index links in their own multivalued field like section heading but with a uri or non-splitting analyzer and display them like file contents matches. Search them all the time. This would find links to places in the results. 3. #2 but only search them with terms that "look like" uris. This one makes more sense if users are searching for whole uris AND other terms at the same time. 4. Figure out some way to get the uris back into the text but strip them out on matches for which they were not explicitly searched. This would produce results similar to what works now but is technically more difficult (changes to how we get parsed output, changes to cirrus, probably changes to Elasticsearch to strip the uris during the highlighting phrase).
Chad got us indexing the links: https://gerrit.wikimedia.org/r/#/c/104986/ Now I'll grab searching them.
I'm going to shoot for option #3 in comment 4. So we'll only look in the link field one of the terms looks like a URI.
An important point I didn't realize at first: if a term "looks like" a link, we can't just search the links. We have to OR that together with searching the text. No big deal, just more syntax we have to send to Elasticsearch.
Another point: Sumana's original query still wouldn't find her growstuff link. You'd have to search for it as http://growstuff.org. Still, we're better off then we were.
Change 105202 had a related patch set uploaded by Manybubbles: Search links https://gerrit.wikimedia.org/r/105202
(In reply to Nik Everett from comment #8) > Another point: Sumana's original query still wouldn't find her growstuff > link. You'd have to search for it as http://growstuff.org. Still, we're > better off then we were. Just a note, that being able to search for partial URL strings is quite useful when trying to combat spam, or to update links to sites that reorganized their directory structure without leaving proper redirects. Hence, option 1 from comment #4 might be a good addition. Thanks!
Change 105202 abandoned by Manybubbles: Search links https://gerrit.wikimedia.org/r/105202
The patch was abandoned as it wasn't relevant. We will possibly redirect users to [[Special:LinkSearch]] if they type a URL into the search box, as it will serve the user's needs.
Now that insource: is available, it is at least possible to find the desired content. E.g. https://test2.wikipedia.org/w/index.php?title=Special%3ASearch&profile=default&search=insource%3Agrowstuff.org&fulltext=Search Perhaps we could somehow add "insource:" as an option (or text-hint) at Advanced Search, in order to remind editors of that feature? (Because only crazy people like me, are actually going to hunt their way to [[mw:Help:CirrusSearch#insource:]] ;)