Last modified: 2014-01-24 13:23:41 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T50706, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 48706 - Semantic MediaWiki showing multiple unused instances of properties
Semantic MediaWiki showing multiple unused instances of properties
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
Semantic MediaWiki (Other open bugs)
master
All All
: Unprioritized major with 3 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-22 12:10 UTC by Vicente Aguilar
Modified: 2014-01-24 13:23 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Duplicated properties screenshot (2.34 KB, image/png)
2013-05-22 12:10 UTC, Vicente Aguilar
Details
Another example of duplicated properties (53.33 KB, image/png)
2013-07-06 14:41 UTC, Jamie Thingelstad
Details

Description Vicente Aguilar 2013-05-22 12:10:02 UTC
On some of our wikis, on Special:Properties some properties appear twice (see attachment), once with "0 uses" and once with "X uses" (X>0, the real usage data). When clicking on the Property:XYZ link they both go to the same article with the property definition, whether it has been defined or not (red link). Depending on which one of the two occurrences ("0 uses" or "X uses") appears first on Special:Properties, we don't get any values when #asking for this property. This happens for properties of any type.

Sometimes a full refresh (SMW_refreshData -ftpv, then another -v) fixes the issue, sometimes it doesn't.

This is difficult to reproduce, we have several wikis with same version (OS, MW, SMW), configuration and content (partially) and we can have the issue in one but not the others. Content is exported/imported or synced using Special:Push, no copy-paste or manual sync.

ATM we're using MW 1.19.3, SMW 1.8.0.4 with CentOS 6.2 (PHP 5.3.3, MySQL 5.1.61).
Comment 1 Vicente Aguilar 2013-05-22 12:10:38 UTC
Created attachment 12369 [details]
Duplicated properties screenshot
Comment 2 Alexander Mashin 2013-05-23 13:25:34 UTC
It happens when the edited article is accessed under two different names considered the same due to database collation and capitalisation.
Comment 3 Vicente Aguilar 2013-05-23 13:46:45 UTC
Well, that could be it because one of the things I noticed while trying to diagnose this issue and bug 48707 is that we have different charsets and collations on some of our wikis (created at different times with different MW versions, but if the default changes shouldn't it be updated when running update.php? Oh well, that's another issue anyway.)

I'll try to unify all our charset/collations when I get the time, it's something I wanted to look into anyway.

In any case, if this was really the origin of the duplicates: why sometimes the issue doesn't go away after a SMW_refreshData -ftpv? If I got your reasoning right, the dup gets to the DB the moment someone accesses an article with a different - but equal according to the collation - name. But right after a refresh shouldn't be everything OK then?
Comment 4 Alexander Mashin 2013-05-23 14:07:23 UTC
> the dup gets to the DB the moment someone accesses an article with a
different - but equal according to the collation - name.
Not just accesses but edits.

SMW_refreshData.php -ftpv is wrong. It purges type and property pages but after that you should run SMW_refreshData.php -fv to rebuild the pages themselves.
Comment 5 MWJames 2013-05-23 14:14:00 UTC
(In reply to comment #2)
> It happens when the edited article is accessed under two different names
> considered the same due to database collation and capitalisation.

Are you talking about redirects? Normally, an article has one specific name so how can it be that I can access an article under two different names unless it is an redirect?
Comment 6 Vicente Aguilar 2013-05-23 14:22:10 UTC
(In reply to comment #4)
> Not just accesses but edits.

Well, ok, but my point remains: right after a full refresh, shouldn't the DB be clean of dupes?
 
> SMW_refreshData.php -ftpv is wrong. It purges type and property pages but
> after
> that you should run SMW_refreshData.php -fv to rebuild the pages themselves.

Yes, that's what we do, -ftpv and then -v.
Comment 7 Vicente Aguilar 2013-05-23 14:27:19 UTC
(In reply to comment #5)
> Are you talking about redirects? Normally, an article has one specific name
> so how can it be that I can access an article under two different names 
> unless it is an redirect?

No, he means DB collation, the way the DB (not MediaWiki, but mySQL) compares two strings. It has to do with different charsets and different languages, e.g. considering capital and lower case equal or not, removing tildes, etc. All this is configured on a per-table and per-field basis.

http://dev.mysql.com/doc/refman/5.0/en/charset-collation-effect.html
Comment 8 MWJames 2013-05-23 14:41:46 UTC
(In reply to comment #7)
> (In reply to comment #5)
> > Are you talking about redirects? Normally, an article has one specific name
> > so how can it be that I can access an article under two different names 
> > unless it is an redirect?
> 
> 
> http://dev.mysql.com/doc/refman/5.0/en/charset-collation-effect.html

So, according to the link above, [[Has property::Muffler]] annotation with a latin1_swedish_ci collation would if being switched to a latin1_german2_ci collation being understood as [[Has property::Müller]]?

Which would lead to [[Has property::Muffler]] and [[Has property::Müller]] for the same article?
Comment 9 Vicente Aguilar 2013-05-23 14:55:56 UTC
(In reply to comment #8)
> So, according to the link above, [[Has property::Muffler]] annotation with a
> latin1_swedish_ci collation would if being switched to a latin1_german2_ci
> collation being understood as [[Has property::Müller]]?

That 1st example is about sorting, not comparison.

But yes, if MW/SMW is not doing any more checks and is relying only on the DB (which I can't tell, I haven't looked at the code that closely), depending on the collation Bar == bär == BAR.

Is that really the cause of this issue? I don't know. But my wikis do have a different charset/collation configuration so... maybe.
Comment 10 Alexander Mashin 2013-05-23 14:58:12 UTC
No, I don't think so. It's the collation in article title that causes duplication.

If you have an article called Müller and set any property [[has property::value]] on it and then open http://your.site/wiki/Muller?action=edit and you DB collation treats u and ü as the same (that is, it will not allow to create Muller if there is already Müller) than there will be two of each properties for Müller and Muller and both of them (one red) will appear in any SMW query for those properties ({{#ask:[[has property::value]]|format=list}} will give Müller, Muller).

You don't even need to change DB collation.

Similar artifacts will be observable in MW logs: they will show the name under which the page was accessed (red if different) not stored.
Comment 11 Jamie Thingelstad 2013-07-06 14:41:36 UTC
Created attachment 12770 [details]
Another example of duplicated properties

Another example of duplicated properties.
Comment 12 Jamie Thingelstad 2013-07-06 14:42:36 UTC
I'm still seeing this issue on all the current master releases of SMW. I have duplication for a good number of properties as well. 

http://wikiapiary.com/wiki/Special:Properties

Attaching a screenshot of the duplication for Has bot segment. If I can help with debugging I would be happy to do so.

(Sorry for two entries, didn't know I could put that with the image attachment.)
Comment 13 Jamie Thingelstad 2013-07-06 14:45:56 UTC
Just noting that this duplication does not get counted when SMWInfo is used to ask for properties. For example, on Special:Properties it shows 210 properties:

http://wikiapiary.com/w/index.php?title=Special:Properties&limit=500&offset=0

SMWInfo shows 169 and 160:

http://wikiapiary.com/w/api.php?action=smwinfo

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links