Last modified: 2014-01-24 13:23:41 UTC
On some of our wikis, on Special:Properties some properties appear twice (see attachment), once with "0 uses" and once with "X uses" (X>0, the real usage data). When clicking on the Property:XYZ link they both go to the same article with the property definition, whether it has been defined or not (red link). Depending on which one of the two occurrences ("0 uses" or "X uses") appears first on Special:Properties, we don't get any values when #asking for this property. This happens for properties of any type. Sometimes a full refresh (SMW_refreshData -ftpv, then another -v) fixes the issue, sometimes it doesn't. This is difficult to reproduce, we have several wikis with same version (OS, MW, SMW), configuration and content (partially) and we can have the issue in one but not the others. Content is exported/imported or synced using Special:Push, no copy-paste or manual sync. ATM we're using MW 1.19.3, SMW 1.8.0.4 with CentOS 6.2 (PHP 5.3.3, MySQL 5.1.61).
Created attachment 12369 [details] Duplicated properties screenshot
It happens when the edited article is accessed under two different names considered the same due to database collation and capitalisation.
Well, that could be it because one of the things I noticed while trying to diagnose this issue and bug 48707 is that we have different charsets and collations on some of our wikis (created at different times with different MW versions, but if the default changes shouldn't it be updated when running update.php? Oh well, that's another issue anyway.) I'll try to unify all our charset/collations when I get the time, it's something I wanted to look into anyway. In any case, if this was really the origin of the duplicates: why sometimes the issue doesn't go away after a SMW_refreshData -ftpv? If I got your reasoning right, the dup gets to the DB the moment someone accesses an article with a different - but equal according to the collation - name. But right after a refresh shouldn't be everything OK then?
> the dup gets to the DB the moment someone accesses an article with a different - but equal according to the collation - name. Not just accesses but edits. SMW_refreshData.php -ftpv is wrong. It purges type and property pages but after that you should run SMW_refreshData.php -fv to rebuild the pages themselves.
(In reply to comment #2) > It happens when the edited article is accessed under two different names > considered the same due to database collation and capitalisation. Are you talking about redirects? Normally, an article has one specific name so how can it be that I can access an article under two different names unless it is an redirect?
(In reply to comment #4) > Not just accesses but edits. Well, ok, but my point remains: right after a full refresh, shouldn't the DB be clean of dupes? > SMW_refreshData.php -ftpv is wrong. It purges type and property pages but > after > that you should run SMW_refreshData.php -fv to rebuild the pages themselves. Yes, that's what we do, -ftpv and then -v.
(In reply to comment #5) > Are you talking about redirects? Normally, an article has one specific name > so how can it be that I can access an article under two different names > unless it is an redirect? No, he means DB collation, the way the DB (not MediaWiki, but mySQL) compares two strings. It has to do with different charsets and different languages, e.g. considering capital and lower case equal or not, removing tildes, etc. All this is configured on a per-table and per-field basis. http://dev.mysql.com/doc/refman/5.0/en/charset-collation-effect.html
(In reply to comment #7) > (In reply to comment #5) > > Are you talking about redirects? Normally, an article has one specific name > > so how can it be that I can access an article under two different names > > unless it is an redirect? > > > http://dev.mysql.com/doc/refman/5.0/en/charset-collation-effect.html So, according to the link above, [[Has property::Muffler]] annotation with a latin1_swedish_ci collation would if being switched to a latin1_german2_ci collation being understood as [[Has property::Müller]]? Which would lead to [[Has property::Muffler]] and [[Has property::Müller]] for the same article?
(In reply to comment #8) > So, according to the link above, [[Has property::Muffler]] annotation with a > latin1_swedish_ci collation would if being switched to a latin1_german2_ci > collation being understood as [[Has property::Müller]]? That 1st example is about sorting, not comparison. But yes, if MW/SMW is not doing any more checks and is relying only on the DB (which I can't tell, I haven't looked at the code that closely), depending on the collation Bar == bär == BAR. Is that really the cause of this issue? I don't know. But my wikis do have a different charset/collation configuration so... maybe.
No, I don't think so. It's the collation in article title that causes duplication. If you have an article called Müller and set any property [[has property::value]] on it and then open http://your.site/wiki/Muller?action=edit and you DB collation treats u and ü as the same (that is, it will not allow to create Muller if there is already Müller) than there will be two of each properties for Müller and Muller and both of them (one red) will appear in any SMW query for those properties ({{#ask:[[has property::value]]|format=list}} will give Müller, Muller). You don't even need to change DB collation. Similar artifacts will be observable in MW logs: they will show the name under which the page was accessed (red if different) not stored.
Created attachment 12770 [details] Another example of duplicated properties Another example of duplicated properties.
I'm still seeing this issue on all the current master releases of SMW. I have duplication for a good number of properties as well. http://wikiapiary.com/wiki/Special:Properties Attaching a screenshot of the duplication for Has bot segment. If I can help with debugging I would be happy to do so. (Sorry for two entries, didn't know I could put that with the image attachment.)
Just noting that this duplication does not get counted when SMWInfo is used to ask for properties. For example, on Special:Properties it shows 210 properties: http://wikiapiary.com/w/index.php?title=Special:Properties&limit=500&offset=0 SMWInfo shows 169 and 160: http://wikiapiary.com/w/api.php?action=smwinfo