Last modified: 2014-11-17 09:41:44 UTC
Language variants currently point to the same canonical URL. For example, on this page: http://zh.wikipedia.org/zh-tw/%E6%B1%89%E8%AF%AD ...there is a rel=”canonical” pointing to http://zh.wikipedia.org/wiki/%E6%B1%89%E8%AF%AD This rel=”canonical” link asks search engines to index the Simplified Chinese page to represent the content on both pages, instead of separately indexing the Simplified Chinese and Traditional Chinese pages. Similar rel=”canonical” links are found on all zh-TW pages. Google is reporting that we see a similar problem on other Chinese (e.g. zh-SG) and Serbian content pages. (this may be caused by the fix to bug 48402)
If I understand the semantic meaning of rel="canonical" correctly, what it does now is the expected behavior. http://zh.wikipedia.org/wiki/%E6%B1%89%E8%AF%AD is not "the Simplified Chinese page", but an automatically converted page based on requests (prefs for users and Accept-Language for anons). We want all these links to show up in Google search results instead of links specifying a particular variant. However Google seems not respecting it and indexing links to pages in every variant, and we have to workaround it: https://zh.wikipedia.org/w/index.php?title=MediaWiki:Gadget-variant-link-fix.js
Currently Google seems to mainly index /zh/ links instead of /wiki/'s (which is unexpected). /zh-tw/ or /zh-cn/'s are not indexed as expected though.
RobLa: So should this still be high priority wrt Liangent's comment 1 here? If still high priority: Tim: Do you plan to work on this at some point?
(In reply to Andre Klapper from comment #3) > RobLa: So should this still be high priority wrt Liangent's comment 1 here? > > If still high priority: > Tim: Do you plan to work on this at some point? I guess Tim is just the default CC, but actually this issue seems not Wikimedia-specific.
Change 154240 had a related patch set uploaded by Tim Starling: Don't send rel=canonical to variant-neutral page https://gerrit.wikimedia.org/r/154240
Change 154240 merged by jenkins-bot: Don't send rel=canonical to variant-neutral page https://gerrit.wikimedia.org/r/154240
All patches mentioned in this report were merged or abandoned - is there more work left to do here (if yes: please reset the bug report status to NEW or ASSIGNED), or can you close this ticket as RESOLVED FIXED?
(In reply to Rob Lanphier from comment #0) > Language variants currently point to the same canonical URL. For example, on > this page: > > http://zh.wikipedia.org/zh-tw/%E6%B1%89%E8%AF%AD > > ...there is a rel=”canonical” pointing to > http://zh.wikipedia.org/wiki/%E6%B1%89%E8%AF%AD Now has: <link rel="alternate" hreflang="zh" href="/zh/%E6%B1%89%E8%AF%AD" /> [...] <link rel="alternate" hreflang="zh-TW" href="/zh-tw/%E6%B1%89%E8%AF%AD" /> <link rel="alternate" hreflang="x-default" href="/wiki/%E6%B1%89%E8%AF%AD" /> [...] <link rel="canonical" href="http://zh.wikipedia.org/zh-tw/%E6%B1%89%E8%AF%AD" /> But I'm not sure this is properly fixed in general, because this is still an issue: (In reply to fireattack from comment #2) > Currently Google seems to mainly index /zh/ links instead of /wiki/'s (which > is unexpected). /zh-tw/ or /zh-cn/'s are not indexed as expected though. The two URLs for "zh" version don't agree on which is canonical: /zh/ says <link rel="alternate" hreflang="zh" href="/zh/%E6%B1%89%E8%AF%AD" /> [...] <link rel="canonical" href="http://zh.wikipedia.org/zh/%E6%B1%89%E8%AF%AD" /> /wiki/ says <link rel="alternate" hreflang="zh" href="/zh/%E6%B1%89%E8%AF%AD" /> [...] <link rel="canonical" href="http://zh.wikipedia.org/wiki/%E6%B1%89%E8%AF%AD" />
Created attachment 16795 [details] Google search in Italian for [[zh:汉语]] If I search a Latin alphabet string of that article I manage to get 4 variants from Google after asking to show me duplicate pages as well. None of them is /wiki/ Searching '"漢語,又称中文、华语" site:wikipedia.org' yielded two results including zh.wap.wikipedia.org/zh-tw/汉语 but that's another bug.