Last modified: 2014-05-16 10:36:17 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T37701, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 35701 - Clustering for image searches
Clustering for image searches
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
CirrusSearch (Other open bugs)
master
All All
: Low enhancement with 5 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
: design
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-04-04 17:28 UTC by Rd232
Modified: 2014-05-16 10:36 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Rd232 2012-04-04 17:28:44 UTC
Problems with image searching (particularly an issue for Commons, but elsewhere too):
1. The current search reacts to every keyword and might give surprising results. For example: A search for "cucumber" delivers not only a cucumber, but also its use as a sex toy.
2. When terms collide you won't find what you want to find. For example: If you search for "monarch" you will get hundreds of images of a butterfly, but very few results concerning monarchy.

The basic idea is to improve on this by clustering related search results. Roughly, this could work like this:
1. The search works as usual and grabs all results by keyword.
2. It looks at the categories of the results. If it finds multiple images from different parts of the category tree it will split the results into groups, labeling them after the lowest parent category. This means that it would form clusters using the categories to group the results.
3. Instead of showing a list of images it would display these groups, which can be expanded.

Clustered search would not only be much more useful, but it would also solve (in relation to searching) the problem which the WMF's image filter is supposed to address - but without any need to specially classify or tag individual images.

There is a more detailed explanation (and an image mockup) at https://meta.wikimedia.org/wiki/Controversial_content/Brainstorming#Clustering_for_search_results_on_Commons

Bugzilla is not a good format to discuss this idea (it doesn't even have a Preview button!!), but we'd like to put it on developers' radar, and get some feedback if possible. Please feel free to leave comments on Meta in addition to Bugzilla.
Comment 1 Rd232 2013-04-29 15:10:00 UTC
Well it's been a year. Is there any sign of ... anything?
Comment 2 Andre Klapper 2013-04-30 10:49:02 UTC
Rd232: No, as there has been no comment here, plus it's a low priority enhancement request (which makes it rather unlikely to get fixed if nobody contributes a patch), plus search has quite a few high priority issues that are more important, plus it's only two or three months now that there is somebody (Ram) who actively works on the Wikimedia Search code again.
Comment 3 Chad H. 2014-02-13 23:43:34 UTC
(In reply to Rd232 from comment #0)
> 2. When terms collide you won't find what you want to find. For example: If
> you search for "monarch" you will get hundreds of images of a butterfly, but
> very few results concerning monarchy.
> 

This is annoying.

> The basic idea is to improve on this by clustering related search results.
> Roughly, this could work like this:
> 1. The search works as usual and grabs all results by keyword.
> 2. It looks at the categories of the results. If it finds multiple images
> from different parts of the category tree it will split the results into
> groups, labeling them after the lowest parent category. This means that it
> would form clusters using the categories to group the results.
> 3. Instead of showing a list of images it would display these groups, which
> can be expanded.
> 

This could be a very cool idea, although implementation details sound hairy at the moment. Let's repurpose into a Cirrus bug though :)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links