Last modified: 2014-02-12 23:35:41 UTC
1. Categories in wikipedia are chaos. 2. The reason is: The system does not work hierarchically. 3. Example: When I add an article to category "Cat", it should also _automatically_ belong to categories "mammal", "animal" and "creature". When I now browse through the categorie "animal", I will find the article. This is not the case in the current system. The result is chaos. 4. Much work is now spent to solve the chaos in the current category system. Much work could be saved if there would be a sound technical foundation for a _true_ category system. 5. I discussed this issue with several engaged wikipedia authors and administrators in the german wikipedia. They all agree that this would be a desirable issue. Best regards Matthias Kleine
Discussed this with Matthias a bit on IRC, will implement his plans once we've got a firmer idea how best to do this.
Please see http://meta.wikimedia.org/wiki/Category_flatten I am no php coder, but I think it's not really tough to get that done. Opinions are most welcome. And yes, we badly need that.
The hard part is not proposing a flattened membership table to speed reads, but rather implementing it efficiently. Not only reads, but writes must be taken into account; if a major category hierarchy is rearranged (and this can be done with a simple edit to a single page), thus must be handled without killing the wiki for an hour rewriting the flattened membership table.
Anybody who is interested in finding an efficient solution for this problem may also take a look at http://en.wikipedia.org/wiki/User_talk:Brion_VIBBER#Categories Regards, Matthias
Does this presume a change to a tree-structure for categories? If not it seems like you could end up with a situation where adding an article to a category could add that article to virtually every category on the system. Is this what we want? Or, if we are talking about a true tree-structured category system, is ''that'' what we want? It would be a significant change to the current system's behavior.
Its just like categories in human mind work. All kinds of cognitive science support that view of categories: Let it begin with Piagets studies of cognitive development, take a look at modern cognitive psychology, look how the studies of artificial intelligence deal with the problem ... categories are structured treelike, not listlike. Look at how scientific areas are structured. How are the books in your library sorted (I hope kind of different then articles in wikpipedia ...). Its just a natural way of dealing with issues, saying "this issue belongs to this broader issue, and this broader issue itself belongs to a more general issue ...". I admit that this would not be a minor change in how things are done in wikipedia. Therefore, I appreciate the discussion. We should be aware that even when we keep the category system as a list, like it is now, users will continue to handle it like a tree, not knowing that the system will behave different than they think of it. Did you ever observe people creating very special categories like [[categorie:mysmallhometown]] and changing the links in dozens of articles? Its only a question of time until somebody even more weird will create [[categorie: mysmallhometown (westside)]], changing the links again, so that [[categorie:mysmallhometown]] will lose quite many of its articles. In fact, this is what happens every day in the current system ... Regards Matthias Kleine
Ever thought about creating an Ontoloy? One major problem I see is to detect if a subcategory or an articel in a category belongs to a toplevel category. For example you have toplevel category "A" and "B" and you have subcategories "A1" and "B1" as well as "AB". Lets assume "AB" is subcategory of "A" and "B" and there is a looseley containment relationship from "AB" to "B1". A B | \ / | | AB | | \ | A1 B1 So logically this means "B1" is subcategory of "A". But semantically it is not neccessarily. Within articles the problem is much more worth, because we sometimes have very looseley relationships there. Examples for that are "Computer Science", "Social Science" and "Computers and Society". So one idea is to create a Ontoloy. This means that relationships are semantically well defined. (Eg. containment relationship, similarity relationship, "is part of" relationship,...). So you are able to "understand" what kind of relationships two categories or articles have - if it is strong or just informational. We could adopt the Semantic Web approach (RDF/OWL) for that. I don't think that we should use it directly because of the complexity of RDF. What do you think?(In reply to comment #0) > 1. Categories in wikipedia are chaos. > 2. The reason is: The system does not work hierarchically. > 3. Example: When I add an article to category "Cat", it should also _automatically_ belong to categories > "mammal", "animal" and "creature". When I now browse through the categorie "animal", I will find the > article. This is not the case in the current system. The result is chaos. > 4. Much work is now spent to solve the chaos in the current category system. Much work could be saved if > there would be a sound technical foundation for a _true_ category system. > 5. I discussed this issue with several engaged wikipedia authors and administrators in the german > wikipedia. They all agree that this would be a desirable issue. > > Best regards > Matthias Kleine
> A B > | \ / | > | AB | > | \ | > A1 B1 > > So logically this means "B1" is subcategory of "A". But semantically it is not neccessarily. In my eyes, this is clearly a problem of the user level. No architecture will prevent that a user "edits" the category tree in a way that semantically is nonsense (i.e. classifying a car as animal or something). Surely enough, there are a couple of models for knowledge represantation, which might be even better than a category tree (in my opinion, Minsky's frame logic would be quite fine, but this relies on a tree structure, too). However, this aim is too far to achieve. A simple tree would be three steps forward and might be realizable in quite a foreseeable time.
> In my eyes, this is clearly a problem of the user level. No architecture will prevent that a user "edits" the category > tree in a way that semantically is nonsense (i.e. classifying a car as animal or something). The point is, that I don't believe it is allways nonsens. There are good reasons why a category may belong to multiple toplevel categories. But there are different types of relationships, that you cannot model today. > Surely enough, there are a > couple of models for knowledge represantation, which might be even better than a category tree (in my opinion, Minsky's > frame logic would be quite fine, but this relies on a tree structure, too). However, this aim is too far to achieve. A > simple tree would be three steps forward and might be realizable in quite a foreseeable time. I think to give a relationship a semantic definition is not hard to implement and not too confusing to use. Just two differnt types of relationships (isPartOf and isRelatedTo) would help a lot. One of them must be strictly hirarchichally the other one is a graph. This allows to automatically classify articels and categories. One interessing usecase for instance is, to use a cluster-algorithm to detect if a category makes sense at all or you should split it. Is there a way to disucss offline?
I do think that discussions probably should not run in bugzilla. Why not move it to http://meta.wikimedia.org/w/index. php?title=Talk:Category_flatten ?
Continue discussion on meta: : http://meta.wikimedia.org/wiki/Category_flatten Closing as later.
My apologies for not understanding, but why was this changed from LATER to FIXED? Does MediaWiki currently have lists of "pages in a category and it's subcategories"? How can that be used? Specifically: how was fixed the problem exemplified in item (3) of comment #0 ? (In reply to comment #0) > 3. Example: When I add an article to category "Cat", it should also _automatically_ belong to categories > "mammal", "animal" and "creature". When I now browse through the categorie "animal", I will find the > article. This is not the case in the current system. The result is chaos.
Good question. re-opened.
This would be very compelling for Special:RandomInCategory as one could essentially get the same enjoyable variety one gets within one's favorite television or radio station, getting say exposed to new Science articles without having to specify exactly which field one was interested in. (Speaking of radio, it would be interesting if one could ask for random sound files in a category, and get the pages to load, play, and then load another random one in sequence; likewise for videos; scrolling through random images in a category ala Google Images would be cool too.)