Last modified: 2012-08-13 17:27:38 UTC
On the New Pages Feed list (formerly known as Page Triage), the red "Orphan" label is displayed, even for pages that have links, as shown here on testing: http://test.wikipedia.org/wiki/Special:NewPagesFeed Here are some of the pages that have links to other pages, but still show an 'Orphan' label on testing -- and prototype: http://test.wikipedia.org/wiki/Rue_%28The_Hunger_Games%29 http://ee-prototype.wmflabs.org/wiki/Midway%2C_Nevada_County%2C_Arkansas What I would have expected is that the 'Orphan' label would NOT be shown if there are links in the body of the article, even if these links are added after the article was created. Ian and Roan tell me that it may not be easy to display this kind of information if the links are added after creation. If that is not feasible to implement in this release, I would recommend we stop using the "Orphan" label on the New Pages Feed, because it would be both inaccurate and unfair.
If an article X is not referenced in any other wiki articles, then it's considered to be an "Orphan" article, Adding link to the body of article X won't remove the 'Orphan' status, I think this is how we define "Orphan": incoming links other than outgoing links To get up-to-date accurate data, we would need to compile all the articles referenced in an article during a save. In this case, we would need to compile all the articles referenced in Rue_(The_Hunger_Games), this is bad if there are thousands of articles referenced in here. I am also wondering if 'Orphan' is useful here, new articles would mostly likely be 'Orphan' since no one would really reference them,
Thanks, Benny, you make a good point that most articles will be orphaned by default, at least initially (unless they were created as a result of clicking on a red link). Note that I was able to confirm on prototype and testing that if I add a link in article X to to article Y, both article X and Y remain labeled as orphans. You are correct that article X should remain an orphan, but I would have expected article Y to lose its orphan label. As of yesterday, this was not happening. If this is really hard to do, we should probably postpone this feature -- and remove the word 'orphan' from the user interface until we can properly back it up.
The purpose of the orphan label isn't really to indicate a problem with the article itself, rather it lets the reviewer know that there is a high probability that the article is not on a notable subject (as none of the 4 million other articles on Wikipedia have tried to link to that topic). If you look at the NewPagesFeed on en.wiki, you'll see that only about 1 in every 4 or 5 new articles is an orphan. Making this information completely accurate, however, is difficult, as you would have to constantly monitor every edit to all 4 million articles to see if any of them had added a link to any new articles. This would be an extra quarter million queries per day with little benefit. The way it's working currently is to look for incoming links every time the new article is saved and to update the metadata then if necessary. This is probably good enough for most purposes, but won't always be up to date.
I'm afraid we don't have any realistic solution for this bug other than decreasing the data cache time (which we'll probably do eventually once we know that nothing is going to explode). In the meantime, I think having the orphan label is useful for judging the likelihood that an article is needed.