Last modified: 2013-10-23 18:17:09 UTC
I turned on $wgUseCategoryBrowser and discovered it displays a very large number of duplicate entries. I'm using this on a large wiki (Camera-Wiki.org) with several thousand pages and hundreds of categories. In some cases it displays the top level category entry as many as 10 or 20 times and many categories are display 3 to 5 times. Seems like a simple fix to add code to filter out duplicates. If someone can point me to the appropriate piece of code I'd be happy to provide a patch. Here's a typical display from the bottom of one page in our wiki: Root category Root category Root category Root category Root category Root category Root category Root category Root category > Cameras Root category > Cameras Root category > Cameras > Cameras by first letter > B Root category > Cameras > Cameras by first letter > C Root category > Cameras > Medium format > 127 film Root category > Companies > Camera makers Root category > Countries > Italy Root category > Countries > Italy > Bencini Root category > Imaging media > Film > Film formats Root category > Special categories Root category > Special categories Root category > Special categories Root category > Special categories Root category > Special categories Root category > Special categories Root category > Special categories Root category > Special categories Root category > Templates > Wiki > Flickr image Root category > Templates > Wiki > Flickr image Root category > Templates > Wiki > Flickr image Root category > Templates > Wiki > Flickr image Root category > Templates > Wiki > Hidden categories > Image by AWCam Root category > Templates > Wiki > Hidden categories > Image by Dirk HR Spennemann Root category > Templates > Wiki > Hidden categories > Image by Rick Soloway Root category > Templates > Wiki > Hidden categories > Image by jgs4309976
Found a fairly trivial fix for this. In Skin.php, I added an array_unique() to the explode(). the line was: $tempout = explode( "\n", $this->drawCategoryBrowser( $parenttree, $this ) ); I changed it to: $tempout = array_unique( explode( "\n", $this->drawCategoryBrowser( $parenttree, $this ) ) ); The only drawback now is that it still displays hidden categories, which doesn't seem right. Probably a separate bug however. Here's the current output from the same page as show in initial comment: Root category Root category > Cameras Root category > Cameras > Cameras by first letter > B Root category > Cameras > Cameras by first letter > C Root category > Cameras > Medium format > 127 film Root category > Companies > Camera makers Root category > Countries > Italy Root category > Countries > Italy > Bencini Root category > Imaging media > Film > Film formats Root category > Special categories Root category > Templates > Wiki > Flickr image Root category > Templates > Wiki > Hidden categories > Image by AWCam Root category > Templates > Wiki > Hidden categories > Image by Dirk HR Spennemann Root category > Templates > Wiki > Hidden categories > Image by Rick Soloway Root category > Templates > Wiki > Hidden categories > Image by jgs4309976
Upon further thought, there's still redundancy here. For example: If the page is in: A > B > C > D There's really no point in also displaying these lines: A A > B A > B > C As they're all included in D. They're not really paths to the given page anyway. Really what's wanted is a list of unique paths through the hierarchy to the given page. There's no need to provide additional paths to each point along the way. If that makes sense.
Adding patch keyword for solution in comment #1
> If the page is in: > > A > B > C > D Well the whole idea of category browser is to put the article in D category and skipping A,B,C :-b array_unique() works there. But it is on display. We should be able to filter before rendering, I E when building the category tree.
I've also noticed that the hiddencats display regardless of the status of the Show Hidden Categories checkbox in user preferences. Need a way to actually hide the hidden cats..
Thanks for this bug report! I thought for sure I had something in my wiki configured incorrectly. I ended up hacking my 1.17 wiki to fix this. I replaced this line from includes/Skin.php: $tempout = array_unique(explode( "\n", $this->drawCategoryBrowser( $parenttree, $this ) )); with this: if ($wgUser->getBoolOption( 'showhiddencats' )) { $tempout = array_unique(explode( "\n", $this->drawCategoryBrowser( $parenttree, $this ) )); } else { $tempout = preg_grep( "/Hidden categories/", array_unique(explode( "\n", $this->drawCategoryBrowser( $parenttree, $this ) )), PREG_GREP_INVERT ); }
sorry, I pasted the wrong line for the "original" line. The original line is this (it does not have array_unique in it): $tempout = explode( "\n", $this->drawCategoryBrowser( $parenttree, $this ) );
(In reply to comment #6) > Thanks for this bug report! I thought for sure I had something in my wiki > configured incorrectly. > > I ended up hacking my 1.17 wiki to fix this. I replaced this line from > includes/Skin.php: Glad to hear you got this working on your wiki. (In response more to the patch keyword added by others then to your comment) We can't directly incorporate your code into core MediaWiki since there is no guarantee that the hidden category's name is actually "Hidden categories" (with i18n and all). Ideally this filtering would be done when querying the db/building the list of categories, as opposed to after the fact.
Correct me if I am wrong, but wouldn't it be feasible to replace: $tempout = explode( "\n", $this->drawCategoryBrowser( $parenttree, $this ) ); With this: if ($wgUser->getBoolOption( 'showhiddencats' )) { $tempout = array_unique(explode( "\n", $this->drawCategoryBrowser( $parenttree, $this ) )); } else { $tempout = preg_grep( "/MediaWiki:Hidden-categories/", array_unique(explode( "\n", $this->drawCategoryBrowser( $parenttree, $this ) )), PREG_GREP_INVERT ); } So that instead of specifically specifying it as the hidden category's name as "Hidden categories" you have it refer to the MediaWiki page that the name is actually set on?
> $tempout = preg_grep( "/MediaWiki:Hidden-categories/", can't be used: > We can't directly incorporate your code into core MediaWiki > since there is no guarantee that the hidden category's name is > actually "Hidden categories" (with i18n and all).
preg_grep( "/" . preg_quote( wfMsgForContent( "MediaWiki:Hidden-categories" ), "/" ) ."/", ... Would work, which is what I believe you were trying to get at (In theory anyways, I haven't tested it). However, I think it would be preferable to look for the cat_hidden prop in page_props table when doing the actual db query.
(In reply to comment #11) > preg_grep( "/" . preg_quote( wfMsgForContent( "MediaWiki:Hidden-categories" ), > "/" ) ."/", ... > > Would work, which is what I believe you were trying to get at (In theory > anyways, I haven't tested it). However, I think it would be preferable to look > for the cat_hidden prop in page_props table when doing the actual db query. That was what I was trying to get at.. That way, it wouldn't matter what the hiddencat name actually was, as it would be defined correctly in all instance on that page anyways.
I was also seeing duplicates in my small (1000 non-stubs) MW 1.18.2. I believe at least some of the duplicates are coming because the category browser drops the bottom level category off some entries. I've documented this behavior at http://www.mediawiki.org/w/index.php?title=Help:Categories&stable=0&shownotice=1&fromsection=Adding_a_page_to_a_category#Adding_a_page_to_a_category and I repeat it here: (At least in MediaWiki 1.18.2) if a category is a subcategory of more than one parent, both hierarchies will be listed, but the tagged category will be stripped off all but one of these. This creates the potential for what appear to be duplicate entries if a category with multiple parents and one of its parents are both tagged on a page. For example suppose Maryanne is a subcategory of both Mary and Anne. If a page tags categories Maryanne and Anne then the Category breadcrumbs will show Anne Anne Mary -> Maryanne "Anne" appears to be duplicated, but what is meant is Anne Anne -> Maryanne Mary -> Maryanne