Last modified: 2013-10-23 18:17:09 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T35614, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 33614 - $wgUseCategoryBrowser generates many dupes
$wgUseCategoryBrowser generates many dupes
Status: NEW
Product: MediaWiki
Classification: Unclassified
Categories (Other open bugs)
1.18.x
All All
: Normal normal with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
: patch, patch-reviewed
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-01-09 20:47 UTC by steevithak
Modified: 2013-10-23 18:17 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description steevithak 2012-01-09 20:47:11 UTC
I turned on $wgUseCategoryBrowser and discovered it displays a very large number of duplicate entries. I'm using this on a large wiki (Camera-Wiki.org) with several thousand pages and hundreds of categories. In some cases it displays the top level category entry as many as 10 or 20 times and many categories are display 3 to 5 times. 

Seems like a simple fix to add code to filter out duplicates. If someone can point me to the appropriate piece of code I'd be happy to provide a patch.

Here's a typical display from the bottom of one page in our wiki:

Root category
Root category
Root category
Root category
Root category
Root category
Root category
Root category
Root category > Cameras
Root category > Cameras
Root category > Cameras > Cameras by first letter > B
Root category > Cameras > Cameras by first letter > C
Root category > Cameras > Medium format > 127 film
Root category > Companies > Camera makers
Root category > Countries > Italy
Root category > Countries > Italy > Bencini
Root category > Imaging media > Film > Film formats
Root category > Special categories
Root category > Special categories
Root category > Special categories
Root category > Special categories
Root category > Special categories
Root category > Special categories
Root category > Special categories
Root category > Special categories
Root category > Templates > Wiki > Flickr image
Root category > Templates > Wiki > Flickr image
Root category > Templates > Wiki > Flickr image
Root category > Templates > Wiki > Flickr image
Root category > Templates > Wiki > Hidden categories > Image by AWCam
Root category > Templates > Wiki > Hidden categories > Image by Dirk HR Spennemann
Root category > Templates > Wiki > Hidden categories > Image by Rick Soloway
Root category > Templates > Wiki > Hidden categories > Image by jgs4309976
Comment 1 steevithak 2012-01-09 21:08:19 UTC
Found a fairly trivial fix for this. In Skin.php, I added an array_unique() to the explode(). the line was:

$tempout = explode( "\n", $this->drawCategoryBrowser( $parenttree, $this ) );

I changed it to:

$tempout = array_unique( explode( "\n", $this->drawCategoryBrowser( $parenttree, $this ) ) );


The only drawback now is that it still displays hidden categories, which doesn't seem right. Probably a separate bug however.

Here's the current output from the same page as show in initial comment:

Root category
Root category > Cameras
Root category > Cameras > Cameras by first letter > B
Root category > Cameras > Cameras by first letter > C
Root category > Cameras > Medium format > 127 film
Root category > Companies > Camera makers
Root category > Countries > Italy
Root category > Countries > Italy > Bencini
Root category > Imaging media > Film > Film formats
Root category > Special categories
Root category > Templates > Wiki > Flickr image
Root category > Templates > Wiki > Hidden categories > Image by AWCam
Root category > Templates > Wiki > Hidden categories > Image by Dirk HR Spennemann
Root category > Templates > Wiki > Hidden categories > Image by Rick Soloway
Root category > Templates > Wiki > Hidden categories > Image by jgs4309976
Comment 2 steevithak 2012-01-09 22:24:30 UTC
Upon further thought, there's still redundancy here. For example:

If the page is in:

A > B > C > D

There's really no point in also displaying these lines:

A
A > B
A > B > C

As they're all included in D. They're not really paths to the given page anyway. Really what's wanted is a list of unique paths through the hierarchy to the given page. There's no need to provide additional paths to each point along the way. If that makes sense.
Comment 3 Mark A. Hershberger 2012-01-11 17:37:15 UTC
Adding patch keyword for solution in comment #1
Comment 4 Antoine "hashar" Musso (WMF) 2012-01-12 22:38:39 UTC
> If the page is in:
>
> A > B > C > D

Well the whole idea of category browser is to put the article in D category and skipping A,B,C :-b

array_unique() works there. But it is on display. We should be able to filter before rendering, I E when building the category tree.
Comment 5 Technical 13 2012-03-12 02:26:55 UTC
I've also noticed that the hiddencats display regardless of the status of the Show Hidden Categories checkbox in user preferences.  Need a way to actually hide the hidden cats..
Comment 6 Ken 2012-03-12 02:43:18 UTC
Thanks for this bug report! I thought for sure I had something in my wiki configured incorrectly.

I ended up hacking my 1.17 wiki to fix this. I replaced this line from includes/Skin.php:

$tempout = array_unique(explode( "\n", $this->drawCategoryBrowser( $parenttree, $this ) ));

with this:

if ($wgUser->getBoolOption( 'showhiddencats' )) {
    $tempout = array_unique(explode( "\n", $this->drawCategoryBrowser( $parenttree, $this ) ));
}
else {  
    $tempout = preg_grep( "/Hidden categories/", array_unique(explode( "\n", $this->drawCategoryBrowser( $parenttree, $this ) )), PREG_GREP_INVERT );
}
Comment 7 Ken 2012-03-12 02:49:35 UTC
sorry, I pasted the wrong line for the "original" line. The original line is this (it does not have array_unique in it):

$tempout = explode( "\n", $this->drawCategoryBrowser( $parenttree, $this ) );
Comment 8 Bawolff (Brian Wolff) 2012-03-12 14:10:18 UTC
(In reply to comment #6)
> Thanks for this bug report! I thought for sure I had something in my wiki
> configured incorrectly.
> 
> I ended up hacking my 1.17 wiki to fix this. I replaced this line from
> includes/Skin.php:

Glad to hear you got this working on your wiki.

(In response more to the patch keyword added by others then to your comment) We can't directly incorporate your code into core MediaWiki since there is no guarantee that the hidden category's name is actually "Hidden categories" (with i18n and all).

Ideally this filtering would be done when querying the db/building the list of categories, as opposed to after the fact.
Comment 9 Technical 13 2012-03-12 14:35:38 UTC
Correct me if I am wrong, but wouldn't it be feasible to replace:

$tempout = explode( "\n", $this->drawCategoryBrowser( $parenttree, $this ) );

With this:

if ($wgUser->getBoolOption( 'showhiddencats' )) {
    $tempout = array_unique(explode( "\n", $this->drawCategoryBrowser(
$parenttree, $this ) ));
}
else {  
    $tempout = preg_grep( "/MediaWiki:Hidden-categories/", array_unique(explode( "\n",
$this->drawCategoryBrowser( $parenttree, $this ) )), PREG_GREP_INVERT );
}

So that instead of specifically specifying it as the hidden category's name as "Hidden categories" you have it refer to the MediaWiki page that the name is actually set on?
Comment 10 Mark A. Hershberger 2012-03-14 17:50:21 UTC
>    $tempout = preg_grep( "/MediaWiki:Hidden-categories/",

can't be used:

> We can't directly incorporate your code into core MediaWiki
> since there is no guarantee that the hidden category's name is
> actually "Hidden categories" (with i18n and all).
Comment 11 Bawolff (Brian Wolff) 2012-03-14 18:00:00 UTC
preg_grep( "/" . preg_quote( wfMsgForContent( "MediaWiki:Hidden-categories" ), "/" ) ."/", ...

Would work, which is what I believe you were trying to get at (In theory anyways, I haven't tested it). However, I think it would be preferable to look for the cat_hidden prop in page_props table when doing the actual db query.
Comment 12 Technical 13 2012-03-14 18:12:54 UTC
(In reply to comment #11)
> preg_grep( "/" . preg_quote( wfMsgForContent( "MediaWiki:Hidden-categories" ),
> "/" ) ."/", ...
> 
> Would work, which is what I believe you were trying to get at (In theory
> anyways, I haven't tested it). However, I think it would be preferable to look
> for the cat_hidden prop in page_props table when doing the actual db query.

That was what I was trying to get at..  That way, it wouldn't matter what the hiddencat name actually was, as it would be defined correctly in all instance on that page anyways.
Comment 13 balano 2012-06-20 19:24:35 UTC
I was also seeing duplicates in my small (1000 non-stubs) MW 1.18.2. I believe at least some of the duplicates are coming because the category browser drops the bottom level category off some entries. I've documented this behavior at 

http://www.mediawiki.org/w/index.php?title=Help:Categories&stable=0&shownotice=1&fromsection=Adding_a_page_to_a_category#Adding_a_page_to_a_category

and I repeat it here:

(At least in MediaWiki 1.18.2) if a category is a subcategory of more than one parent, both hierarchies will be listed, but the tagged category will be stripped off all but one of these. This creates the potential for what appear to be duplicate entries if a category with multiple parents and one of its parents are both tagged on a page. For example suppose Maryanne is a subcategory of both Mary and Anne. If a page tags categories Maryanne and Anne then the Category breadcrumbs will show

Anne 
Anne
Mary -> Maryanne

"Anne" appears to be duplicated, but what is meant is

Anne 
Anne -> Maryanne
Mary -> Maryanne

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links