Last modified: 2014-06-04 20:15:35 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T68042, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 66042 - The same page with different URL (multipage djvu)
The same page with different URL (multipage djvu)
Status: RESOLVED WONTFIX
Product: MediaWiki
Classification: Unclassified
Redirects (Other open bugs)
1.24rc
All All
: Low normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-06-02 16:49 UTC by forwardin
Modified: 2014-06-04 20:15 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Comment 1 forwardin 2014-06-02 17:06:21 UTC
Can be achieved with .htaccess rule:

RewriteCond %{QUERY_STRING} ^page\=[0-9]+$
RewriteRule ^wiki/(File.*)$ /index.php?title=$1 [R=301,NE,L,QSA]

but it's not multilang.
Comment 2 Bawolff (Brian Wolff) 2014-06-02 17:09:47 UTC
Whats the bug. Why is multiple pages with the same url a problem in this case?

Proper fix is probably a rel=canonical link.
Comment 3 forwardin 2014-06-02 17:43:26 UTC
(In reply to Bawolff (Brian Wolff) from comment #2)
> Whats the bug. Why is multiple pages with the same url a problem in this
> case?
> 
> Proper fix is probably a rel=canonical link.

rel=canonical is not a solution. Google Bot (and other search engines) still will download these pages, and he has limit for pages to download. So, if he download similar pages with different url, he not download other IMPORTANT pages and they not will be indexed faster.
Comment 4 Bawolff (Brian Wolff) 2014-06-04 19:44:46 UTC
> rel=canonical is not a solution. Google Bot (and other search engines) still
> will download these pages, and he has limit for pages to download. So, if he
> download similar pages with different url, he not download other IMPORTANT
> pages and they not will be indexed faster.

I doubt that. Do you have a reliable source to back up that?

-----

Additionally, google won't index urls with index.php in them whatsoever (due to our robots.txt settings)
Comment 5 Matthew Flaschen 2014-06-04 19:51:43 UTC
(In reply to Bawolff (Brian Wolff) from comment #4)
> Additionally, google won't index urls with index.php in them whatsoever (due
> to our robots.txt settings)

There is no exclusion for index.php in the actual robots.txt file ( https://en.wikipedia.org/robots.txt )

There is a noindex meta on particular actions (e.g. https://en.wikipedia.org/w/index.php?title=Earth&action=edit).

But view-sourcehttps://en.wikipedia.org/w/index.php?title=File:Ford_manual_1919.djvu&page=2 is not noindexed .

Generally, there are rel="canonical" pointing to the intended page, though, which I think is an important factor here.
Comment 6 Bawolff (Brian Wolff) 2014-06-04 19:54:53 UTC
from robots.txt

User-agent: *
Allow: /w/api.php?action=mobileview&
Disallow: /w/
[...]


/w/index.php is inside /w/, so that rule prevents index.php from being indexed.


I still fail to see what the actual bug being reported is.
Comment 7 forwardin 2014-06-04 20:13:25 UTC
(In reply to Bawolff (Brian Wolff) from comment #6)
> I still fail to see what the actual bug being reported is.

Because it's not bug. It's unexpected behavior.
Comment 8 Bawolff (Brian Wolff) 2014-06-04 20:15:35 UTC
Well its the behaviour I expect. Its also consistent with the rest of MediaWiki. Unless its actually causing a problem of some sort, I don't think it should be changed.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links