Last modified: 2014-06-04 20:15:35 UTC
For example: http://en.wikipedia.org/wiki/File:Ford_manual_1919.djvu?page=2 http://en.wikipedia.org/w/index.php?title=File:Ford_manual_1919.djvu&page=2 http://ru.wikipedia.org/wiki/%D0%A4%D0%B0%D0%B9%D0%BB:%D0%97%D0%B0%D0%BA%D0%BE%D0%BD_%D0%BE_%D0%B7%D0%B5%D0%BC%D0%BB%D0%B5%D1%83%D1%81%D1%82%D1%80%D0%BE%D0%B9%D1%81%D1%82%D0%B2%D0%B5_1911.djvu?page=2 http://ru.wikipedia.org/w/index.php?title=%D0%A4%D0%B0%D0%B9%D0%BB:%D0%97%D0%B0%D0%BA%D0%BE%D0%BD_%D0%BE_%D0%B7%D0%B5%D0%BC%D0%BB%D0%B5%D1%83%D1%81%D1%82%D1%80%D0%BE%D0%B9%D1%81%D1%82%D0%B2%D0%B5_1911.djvu&page=2 And these pages indexed in google: http://webcache.googleusercontent.com/search?q=cache:http%3A%2F%2Fru.wikipedia.org%2Fwiki%2F%25D0%25A4%25D0%25B0%25D0%25B9%25D0%25BB%3A%25D0%2597%25D0%25B0%25D0%25BA%25D0%25BE%25D0%25BD_%25D0%25BE_%25D0%25B7%25D0%25B5%25D0%25BC%25D0%25BB%25D0%25B5%25D1%2583%25D1%2581%25D1%2582%25D1%2580%25D0%25BE%25D0%25B9%25D1%2581%25D1%2582%25D0%25B2%25D0%25B5_1911.djvu%3Fpage%3D2 Right way I think is redirect from http://en.wikipedia.org/wiki/File:Book.djvu?page=2 to http://en.wikipedia.org/w/index.php?title=File:Book.djvu&page=2 And also http://en.wikipedia.org/w/index.php?title=File:Book.djvu&page=1 is similar with http://en.wikipedia.org/wiki/File:Book.djvu
Can be achieved with .htaccess rule: RewriteCond %{QUERY_STRING} ^page\=[0-9]+$ RewriteRule ^wiki/(File.*)$ /index.php?title=$1 [R=301,NE,L,QSA] but it's not multilang.
Whats the bug. Why is multiple pages with the same url a problem in this case? Proper fix is probably a rel=canonical link.
(In reply to Bawolff (Brian Wolff) from comment #2) > Whats the bug. Why is multiple pages with the same url a problem in this > case? > > Proper fix is probably a rel=canonical link. rel=canonical is not a solution. Google Bot (and other search engines) still will download these pages, and he has limit for pages to download. So, if he download similar pages with different url, he not download other IMPORTANT pages and they not will be indexed faster.
> rel=canonical is not a solution. Google Bot (and other search engines) still > will download these pages, and he has limit for pages to download. So, if he > download similar pages with different url, he not download other IMPORTANT > pages and they not will be indexed faster. I doubt that. Do you have a reliable source to back up that? ----- Additionally, google won't index urls with index.php in them whatsoever (due to our robots.txt settings)
(In reply to Bawolff (Brian Wolff) from comment #4) > Additionally, google won't index urls with index.php in them whatsoever (due > to our robots.txt settings) There is no exclusion for index.php in the actual robots.txt file ( https://en.wikipedia.org/robots.txt ) There is a noindex meta on particular actions (e.g. https://en.wikipedia.org/w/index.php?title=Earth&action=edit). But view-sourcehttps://en.wikipedia.org/w/index.php?title=File:Ford_manual_1919.djvu&page=2 is not noindexed . Generally, there are rel="canonical" pointing to the intended page, though, which I think is an important factor here.
from robots.txt User-agent: * Allow: /w/api.php?action=mobileview& Disallow: /w/ [...] /w/index.php is inside /w/, so that rule prevents index.php from being indexed. I still fail to see what the actual bug being reported is.
(In reply to Bawolff (Brian Wolff) from comment #6) > I still fail to see what the actual bug being reported is. Because it's not bug. It's unexpected behavior.
Well its the behaviour I expect. Its also consistent with the rest of MediaWiki. Unless its actually causing a problem of some sort, I don't think it should be changed.