Last modified: 2014-06-04 20:15:35 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T68042, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 66042 - The same page with different URL (multipage djvu)


Summary:	The same page with different URL (multipage djvu)

Status:	RESOLVED WONTFIX

Product:	MediaWiki
Classification:	Unclassified
Component:	Redirects (Other open bugs)
Version:	1.24rc
Hardware:	All All

Importance:	Low normal (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2014-06-02 16:49 UTC by forwardin
Modified:	2014-06-04 20:15 UTC (History)
CC List:	3 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description forwardin 2014-06-02 16:49:40 UTC

For example:
http://en.wikipedia.org/wiki/File:Ford_manual_1919.djvu?page=2
http://en.wikipedia.org/w/index.php?title=File:Ford_manual_1919.djvu&page=2

http://ru.wikipedia.org/wiki/%D0%A4%D0%B0%D0%B9%D0%BB:%D0%97%D0%B0%D0%BA%D0%BE%D0%BD_%D0%BE_%D0%B7%D0%B5%D0%BC%D0%BB%D0%B5%D1%83%D1%81%D1%82%D1%80%D0%BE%D0%B9%D1%81%D1%82%D0%B2%D0%B5_1911.djvu?page=2
http://ru.wikipedia.org/w/index.php?title=%D0%A4%D0%B0%D0%B9%D0%BB:%D0%97%D0%B0%D0%BA%D0%BE%D0%BD_%D0%BE_%D0%B7%D0%B5%D0%BC%D0%BB%D0%B5%D1%83%D1%81%D1%82%D1%80%D0%BE%D0%B9%D1%81%D1%82%D0%B2%D0%B5_1911.djvu&page=2

And these pages indexed in google: http://webcache.googleusercontent.com/search?q=cache:http%3A%2F%2Fru.wikipedia.org%2Fwiki%2F%25D0%25A4%25D0%25B0%25D0%25B9%25D0%25BB%3A%25D0%2597%25D0%25B0%25D0%25BA%25D0%25BE%25D0%25BD_%25D0%25BE_%25D0%25B7%25D0%25B5%25D0%25BC%25D0%25BB%25D0%25B5%25D1%2583%25D1%2581%25D1%2582%25D1%2580%25D0%25BE%25D0%25B9%25D1%2581%25D1%2582%25D0%25B2%25D0%25B5_1911.djvu%3Fpage%3D2

Right way I think is redirect from http://en.wikipedia.org/wiki/File:Book.djvu?page=2 to http://en.wikipedia.org/w/index.php?title=File:Book.djvu&page=2

And also http://en.wikipedia.org/w/index.php?title=File:Book.djvu&page=1 is similar with http://en.wikipedia.org/wiki/File:Book.djvu

Comment 1 forwardin 2014-06-02 17:06:21 UTC

Can be achieved with .htaccess rule:

RewriteCond %{QUERY_STRING} ^page\=[0-9]+$
RewriteRule ^wiki/(File.*)$ /index.php?title=$1 [R=301,NE,L,QSA]

but it's not multilang.

Comment 2 Bawolff (Brian Wolff) 2014-06-02 17:09:47 UTC

Whats the bug. Why is multiple pages with the same url a problem in this case?

Proper fix is probably a rel=canonical link.

Comment 3 forwardin 2014-06-02 17:43:26 UTC

(In reply to Bawolff (Brian Wolff) from comment #2)
> Whats the bug. Why is multiple pages with the same url a problem in this
> case?
> 
> Proper fix is probably a rel=canonical link.

rel=canonical is not a solution. Google Bot (and other search engines) still will download these pages, and he has limit for pages to download. So, if he download similar pages with different url, he not download other IMPORTANT pages and they not will be indexed faster.

Comment 4 Bawolff (Brian Wolff) 2014-06-04 19:44:46 UTC

> rel=canonical is not a solution. Google Bot (and other search engines) still
> will download these pages, and he has limit for pages to download. So, if he
> download similar pages with different url, he not download other IMPORTANT
> pages and they not will be indexed faster.

I doubt that. Do you have a reliable source to back up that?

-----

Additionally, google won't index urls with index.php in them whatsoever (due to our robots.txt settings)

Comment 5 Matthew Flaschen 2014-06-04 19:51:43 UTC

(In reply to Bawolff (Brian Wolff) from comment #4)
> Additionally, google won't index urls with index.php in them whatsoever (due
> to our robots.txt settings)

There is no exclusion for index.php in the actual robots.txt file ( https://en.wikipedia.org/robots.txt )

There is a noindex meta on particular actions (e.g. https://en.wikipedia.org/w/index.php?title=Earth&action=edit).

But view-sourcehttps://en.wikipedia.org/w/index.php?title=File:Ford_manual_1919.djvu&page=2 is not noindexed .

Generally, there are rel="canonical" pointing to the intended page, though, which I think is an important factor here.

Comment 6 Bawolff (Brian Wolff) 2014-06-04 19:54:53 UTC

from robots.txt

User-agent: *
Allow: /w/api.php?action=mobileview&
Disallow: /w/
[...]


/w/index.php is inside /w/, so that rule prevents index.php from being indexed.


I still fail to see what the actual bug being reported is.

Comment 7 forwardin 2014-06-04 20:13:25 UTC

(In reply to Bawolff (Brian Wolff) from comment #6)
> I still fail to see what the actual bug being reported is.

Because it's not bug. It's unexpected behavior.

Comment 8 Bawolff (Brian Wolff) 2014-06-04 20:15:35 UTC

Well its the behaviour I expect. Its also consistent with the rest of MediaWiki. Unless its actually causing a problem of some sort, I don't think it should be changed.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links