Last modified: 2014-09-13 17:08:06 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T72657, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 70657 - Same page served with two different adresses, with two different rel canonical


Summary:	Same page served with two different adresses, with two different rel canonical

Status:	UNCONFIRMED

Product:	Wikimedia
Classification:	Unclassified
Component:	General/Unknown (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Low trivial (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2014-09-10 14:39 UTC by Julien
Modified:	2014-09-13 17:08 UTC (History)
CC List:	2 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Julien 2014-09-10 14:39:11 UTC

I think I found two intricated "bugs":

== Wikipedia accept invalid URI in HTTP requests ==

According to some URI's RFC like 2396, 3986: "A URI is a sequence of characters from a very limited set, i.e. the letters of the basic Latin alphabet, digits, and a few special characters."

I'm aware of URI variants like IRI allowing any byte sequences, BUT the HTTP RFC specifies that HTTP accepts URIs, not IRIs. This does NOT render IRI useless, we still can use IRI on browsers, whose role is to convert to valid URIs (With the knowledge of the local encoding).

So this may fail, typically with a 400 bad request, instead of returning a 200 OK:
$ curl -si http://ar.wikipedia.org/wiki/حب | grep 'canonical\|HTTP/1.1'
HTTP/1.1 200 OK
<link rel="canonical" href="http://ar.wikipedia.org/wiki/حب" />

But I think if Wikipedia returns a 200, there may be a reason, and I think this ticket is a good opportunity do document it.

== Due to previous bug, Wikipedia have the same page behind two different URIs with two different rel-canonical ==

$ urlencode 'حب'
%D8%AD%D8%A8

$ curl -si http://ar.wikipedia.org/wiki/%D8%AD%D8%A8 | grep 'canonical\|HTTP/1.1'
HTTP/1.1 200 OK
<link rel="canonical" href="http://ar.wikipedia.org/wiki/%D8%AD%D8%A8" />

And I think this one is typically not normal, rel canonical should be I think set to the encoded (valid) form when requesting the invalid URI, if no 400 is given.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links