Last modified: 2014-09-13 17:08:06 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T72657, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 70657 - Same page served with two different adresses, with two different rel canonical
Same page served with two different adresses, with two different rel canonical
Status: UNCONFIRMED
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Low trivial (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-09-10 14:39 UTC by Julien
Modified: 2014-09-13 17:08 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Julien 2014-09-10 14:39:11 UTC
I think I found two intricated "bugs":

== Wikipedia accept invalid URI in HTTP requests ==

According to some URI's RFC like 2396, 3986: "A URI is a sequence of characters from a very limited set, i.e. the letters of the basic Latin alphabet, digits, and a few special characters."

I'm aware of URI variants like IRI allowing any byte sequences, BUT the HTTP RFC specifies that HTTP accepts URIs, not IRIs. This does NOT render IRI useless, we still can use IRI on browsers, whose role is to convert to valid URIs (With the knowledge of the local encoding).

So this may fail, typically with a 400 bad request, instead of returning a 200 OK:
$ curl -si http://ar.wikipedia.org/wiki/حب | grep 'canonical\|HTTP/1.1'
HTTP/1.1 200 OK
<link rel="canonical" href="http://ar.wikipedia.org/wiki/حب" />

But I think if Wikipedia returns a 200, there may be a reason, and I think this ticket is a good opportunity do document it.

== Due to previous bug, Wikipedia have the same page behind two different URIs with two different rel-canonical ==

$ urlencode 'حب'
%D8%AD%D8%A8

$ curl -si http://ar.wikipedia.org/wiki/%D8%AD%D8%A8 | grep 'canonical\|HTTP/1.1'
HTTP/1.1 200 OK
<link rel="canonical" href="http://ar.wikipedia.org/wiki/%D8%AD%D8%A8" />

And I think this one is typically not normal, rel canonical should be I think set to the encoded (valid) form when requesting the invalid URI, if no 400 is given.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links