Last modified: 2014-11-17 09:41:44 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T54429, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 52429 - Language variants currently point to the same canonical URL
Language variants currently point to the same canonical URL
Status: NEW
Product: MediaWiki
Classification: Unclassified
Language converter (Other open bugs)
1.23.0
All All
: High normal (vote)
: ---
Assigned To: Tim Starling
: i18n
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-01 23:14 UTC by Rob Lanphier
Modified: 2014-11-17 09:41 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Google search in Italian for [[zh:汉语]] (107.33 KB, image/png)
2014-10-17 12:11 UTC, Nemo
Details

Description Rob Lanphier 2013-08-01 23:14:55 UTC
Language variants currently point to the same canonical URL. For example, on this page:

http://zh.wikipedia.org/zh-tw/%E6%B1%89%E8%AF%AD

...there is a rel=”canonical” pointing to
http://zh.wikipedia.org/wiki/%E6%B1%89%E8%AF%AD

This rel=”canonical” link asks search engines to index the Simplified Chinese page to represent the content on both pages, instead of separately indexing the Simplified Chinese and Traditional Chinese pages. Similar rel=”canonical” links are found on all zh-TW pages.  Google is reporting that we see a similar problem on other Chinese (e.g. zh-SG) and Serbian content pages.

(this may be caused by the fix to bug 48402)
Comment 1 Liangent 2013-08-14 12:23:29 UTC
If I understand the semantic meaning of rel="canonical" correctly, what it does now is the expected behavior.

http://zh.wikipedia.org/wiki/%E6%B1%89%E8%AF%AD is not "the Simplified Chinese
page", but an automatically converted page based on requests (prefs for users and Accept-Language for anons). We want all these links to show up in Google search results instead of links specifying a particular variant.

However Google seems not respecting it and indexing links to pages in every variant, and we have to workaround it: https://zh.wikipedia.org/w/index.php?title=MediaWiki:Gadget-variant-link-fix.js
Comment 2 fireattack 2013-08-14 13:04:57 UTC
Currently Google seems to mainly index /zh/ links instead of /wiki/'s (which is unexpected). /zh-tw/ or /zh-cn/'s are not indexed as expected though.
Comment 3 Andre Klapper 2014-03-13 13:54:44 UTC
RobLa: So should this still be high priority wrt Liangent's comment 1 here? 

If still high priority: 
Tim: Do you plan to work on this at some point?
Comment 4 Liangent 2014-03-13 16:51:01 UTC
(In reply to Andre Klapper from comment #3)
> RobLa: So should this still be high priority wrt Liangent's comment 1 here? 
> 
> If still high priority: 
> Tim: Do you plan to work on this at some point?

I guess Tim is just the default CC, but actually this issue seems not Wikimedia-specific.
Comment 5 Gerrit Notification Bot 2014-08-15 08:28:39 UTC
Change 154240 had a related patch set uploaded by Tim Starling:
Don't send rel=canonical to variant-neutral page

https://gerrit.wikimedia.org/r/154240
Comment 6 Gerrit Notification Bot 2014-08-26 21:45:42 UTC
Change 154240 merged by jenkins-bot:
Don't send rel=canonical to variant-neutral page

https://gerrit.wikimedia.org/r/154240
Comment 7 Andre Klapper 2014-10-17 11:42:56 UTC
All patches mentioned in this report were merged or abandoned - is there more work left to do here (if yes: please reset the bug report status to NEW or ASSIGNED), or can you close this ticket as RESOLVED FIXED?
Comment 8 Nemo 2014-10-17 12:00:20 UTC
(In reply to Rob Lanphier from comment #0)
> Language variants currently point to the same canonical URL. For example, on
> this page:
> 
> http://zh.wikipedia.org/zh-tw/%E6%B1%89%E8%AF%AD
> 
> ...there is a rel=”canonical” pointing to
> http://zh.wikipedia.org/wiki/%E6%B1%89%E8%AF%AD

Now has:

<link rel="alternate" hreflang="zh" href="/zh/%E6%B1%89%E8%AF%AD" />
[...]
<link rel="alternate" hreflang="zh-TW" href="/zh-tw/%E6%B1%89%E8%AF%AD" />
<link rel="alternate" hreflang="x-default" href="/wiki/%E6%B1%89%E8%AF%AD" />
[...]
<link rel="canonical" href="http://zh.wikipedia.org/zh-tw/%E6%B1%89%E8%AF%AD" />

But I'm not sure this is properly fixed in general, because this is still an issue:

(In reply to fireattack from comment #2)
> Currently Google seems to mainly index /zh/ links instead of /wiki/'s (which
> is unexpected). /zh-tw/ or /zh-cn/'s are not indexed as expected though.

The two URLs for "zh" version don't agree on which is canonical:

/zh/ says

<link rel="alternate" hreflang="zh" href="/zh/%E6%B1%89%E8%AF%AD" />
[...]
<link rel="canonical" href="http://zh.wikipedia.org/zh/%E6%B1%89%E8%AF%AD" />

/wiki/ says

<link rel="alternate" hreflang="zh" href="/zh/%E6%B1%89%E8%AF%AD" />
[...]
<link rel="canonical" href="http://zh.wikipedia.org/wiki/%E6%B1%89%E8%AF%AD" />
Comment 9 Nemo 2014-10-17 12:11:04 UTC
Created attachment 16795 [details]
Google search in Italian for [[zh:汉语]]

If I search a Latin alphabet string of that article I manage to get 4 variants from Google after asking to show me duplicate pages as well. None of them is /wiki/

Searching '"漢語,又称中文、华语" site:wikipedia.org' yielded two results including zh.wap.wikipedia.org/zh-tw/汉语 but that's another bug.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links