Last modified: 2014-02-12 23:35:41 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T3497, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 1497 - Hierarchical category system is urgently needed


Summary:	Hierarchical category system is urgently needed

Status:	REOPENED

Product:	MediaWiki
Classification:	Unclassified
Component:	Categories (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Normal enhancement with 2 votes (vote)
Target Milestone:	---
Assigned To:	Brion Vibber

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2005-02-08 18:41 UTC by Matthias Kleine
Modified:	2014-02-12 23:35 UTC (History)
CC List:	8 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Matthias Kleine 2005-02-08 18:41:48 UTC

1. Categories in wikipedia are chaos.
2. The reason is: The system does not work hierarchically.
3. Example: When I add an article to category "Cat", it should also _automatically_ belong to categories 
"mammal", "animal" and "creature". When I now browse through the categorie "animal", I will find the 
article. This is not the case in the current system. The result is chaos.
4. Much work is now spent to solve the chaos in the current category system. Much work could be saved if 
there would be a sound technical foundation for a _true_ category system.
5. I discussed this issue with several engaged wikipedia authors and administrators in the german 
wikipedia. They all agree that this would be a desirable issue.

Best regards
Matthias Kleine

Comment 1 Brion Vibber 2005-02-09 02:31:04 UTC

Discussed this with Matthias a bit on IRC, will implement his plans once we've got a firmer idea how best to do this.

Comment 2 Peter Gervai (grin) 2005-02-16 21:31:13 UTC

Please see http://meta.wikimedia.org/wiki/Category_flatten

I am no php coder, but I think it's not really tough to get that done. Opinions are most welcome. And yes, we badly 
need that.

Comment 3 Brion Vibber 2005-02-16 21:36:04 UTC

The hard part is not proposing a flattened membership table to speed reads, 
but rather implementing it efficiently. Not only reads, but writes must be taken 
into account; if a major category hierarchy is rearranged (and this can be done 
with a simple edit to a single page), thus must be handled without killing the 
wiki for an hour rewriting the flattened membership table.

Comment 4 Matthias Kleine 2005-02-16 22:57:26 UTC

Anybody who is interested in finding an efficient solution for this problem may also take a look at 

http://en.wikipedia.org/wiki/User_talk:Brion_VIBBER#Categories

Regards, Matthias

Comment 5 Richard J. Holton 2005-02-17 00:11:10 UTC

Does this presume a change to a tree-structure for categories? If not it seems
like you could end up with a situation where adding an article to a category
could add that article to virtually every category on the system. Is this what
we want?

Or, if we are talking about a true tree-structured category system, is ''that''
what we want? It would be a significant change to the current system's behavior.

Comment 6 Matthias Kleine 2005-02-17 00:46:03 UTC

Its just like categories in human mind work. All kinds of cognitive science support that view of categories: Let it 
begin with Piagets studies of cognitive development, take a look at modern cognitive psychology, look how the studies of 
artificial intelligence deal with the problem ... categories are structured treelike, not listlike. Look at how 
scientific areas are structured. How are the books in your library sorted (I hope kind of different then articles in 
wikpipedia ...). Its just a natural way of dealing with issues, saying "this issue belongs to this broader issue, and 
this broader issue itself belongs to a more general issue ...".

I admit that this would not be a minor change in how things are done in wikipedia. Therefore, I appreciate the 
discussion. We should be aware that even when we keep the category system as a list, like it is now, users will continue 
to handle it like a tree, not knowing that the system will behave different than they think of it. 

Did you ever observe people creating very special categories like [[categorie:mysmallhometown]] and changing the links 
in dozens of articles? Its only a question of time until somebody even more weird will create [[categorie:
mysmallhometown (westside)]], changing the links again, so that [[categorie:mysmallhometown]] will lose quite many of 
its articles. In fact, this is what happens every day in the current system ...

Regards Matthias Kleine

Comment 7 Joern Schimmelpfeng 2005-02-19 23:18:56 UTC

Ever thought about creating an Ontoloy?

One major problem I see is to detect if a subcategory or an articel in a category belongs to a 
toplevel category. For example you have toplevel category "A" and "B" and you have subcategories "A1" 
and "B1" as well as "AB". Lets assume "AB" is subcategory of "A" and "B" and there is a looseley 
containment relationship from "AB" to "B1".

A       B
| \   / |
|  AB   |
|    \  |
A1     B1

So logically this means "B1" is subcategory of "A". But semantically it is not neccessarily. Within 
articles the problem is much more worth, because we sometimes have very looseley relationships there. 
Examples for that are "Computer Science", "Social Science" and "Computers and Society".

So one idea is to create a Ontoloy. This means that relationships are semantically well defined. (Eg. 
containment relationship, similarity relationship, "is part of" relationship,...). So you are able to 
"understand" what kind of relationships two categories or articles have - if it is strong or just 
informational.

We could adopt the Semantic Web approach (RDF/OWL) for that. I don't think that we should use it 
directly because of the complexity of RDF. 

What do you think?(In reply to comment #0)
> 1. Categories in wikipedia are chaos.
> 2. The reason is: The system does not work hierarchically.
> 3. Example: When I add an article to category "Cat", it should also _automatically_ belong to 
categories 
> "mammal", "animal" and "creature". When I now browse through the categorie "animal", I will find 
the 
> article. This is not the case in the current system. The result is chaos.
> 4. Much work is now spent to solve the chaos in the current category system. Much work could be 
saved if 
> there would be a sound technical foundation for a _true_ category system.
> 5. I discussed this issue with several engaged wikipedia authors and administrators in the german 
> wikipedia. They all agree that this would be a desirable issue.
> 
> Best regards
> Matthias Kleine

Comment 8 Matthias Kleine 2005-02-19 23:34:28 UTC

> A       B
> | \   / |
> |  AB   |
> |    \  |
> A1     B1
> 
> So logically this means "B1" is subcategory of "A". But semantically it is not neccessarily. 

In my eyes, this is clearly a problem of the user level. No architecture will prevent that a user "edits" the category 
tree in a way that semantically is nonsense (i.e. classifying a car as animal or something). Surely enough, there are a 
couple of models for knowledge represantation, which might be even better than a category tree (in my opinion, Minsky's 
frame logic would be quite fine, but this relies on a tree structure, too). However, this aim is too far to achieve. A 
simple tree would be three steps forward and might be realizable in quite a foreseeable time.

Comment 9 Joern Schimmelpfeng 2005-02-20 10:37:56 UTC

 
> In my eyes, this is clearly a problem of the user level. No architecture will prevent that a user 
"edits" the category 
> tree in a way that semantically is nonsense (i.e. classifying a car as animal or something).

The point is, that I don't believe it is allways nonsens. There are good reasons why a category may  
belong to multiple toplevel categories. But there are different types of relationships, that you 
cannot model today.

> Surely enough, there are a 
> couple of models for knowledge represantation, which might be even better than a category tree (in 
my opinion, Minsky's 
> frame logic would be quite fine, but this relies on a tree structure, too). However, this aim is 
too far to achieve. A 
> simple tree would be three steps forward and might be realizable in quite a foreseeable time.

I think to give a relationship a semantic definition is not hard to implement and not too confusing 
to use. Just two differnt types of relationships (isPartOf and isRelatedTo) would help a lot. One of 
them must be strictly hirarchichally the other one is a graph. This allows to automatically classify 
articels and categories. One interessing usecase for instance is, to use a cluster-algorithm to 
detect if a category makes sense at all or you should split it.  

Is there a way to disucss offline?

Comment 10 Peter Gervai (grin) 2005-02-21 17:16:30 UTC

I do think that discussions probably should not run in bugzilla. Why not move it to http://meta.wikimedia.org/w/index.
php?title=Talk:Category_flatten ?

Comment 11 Antoine "hashar" Musso (WMF) 2005-03-27 20:40:49 UTC

Continue discussion on meta: :
http://meta.wikimedia.org/wiki/Category_flatten

Closing as later.

Comment 12 Helder 2010-11-20 12:47:38 UTC

My apologies for not understanding, but why was this changed from LATER to FIXED?

Does MediaWiki currently have lists of "pages in a category and it's subcategories"? How can that be used? Specifically: how was fixed the problem exemplified in item (3) of comment #0 ?
(In reply to comment #0)
> 3. Example: When I add an article to category "Cat", it should also _automatically_ belong to categories 
> "mammal", "animal" and "creature". When I now browse through the categorie "animal", I will find the 
> article. This is not the case in the current system. The result is chaos.

Comment 13 Bawolff (Brian Wolff) 2011-04-04 21:41:39 UTC

Good question. re-opened.

Comment 14 Brett Zamir 2014-01-08 01:58:05 UTC

This would be very compelling for Special:RandomInCategory as one could essentially get the same enjoyable variety one gets within one's favorite television or radio station, getting say exposed to new Science articles without having to specify exactly which field one was interested in.

(Speaking of radio, it would be interesting if one could ask for random sound files in a category, and get the pages to load, play, and then load another random one in sequence; likewise for videos; scrolling through random images in a category ala Google Images would be cool too.)

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links