Last modified: 2013-03-21 15:13:32 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T43847, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 41847 - Wikidata should use wikidata.org as a domain (not www.wikidata.org)
Wikidata should use wikidata.org as a domain (not www.wikidata.org)
Status: RESOLVED WONTFIX
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
wmf-deployment
All All
: High normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on: 44097
Blocks: 44019 44098
  Show dependency treegraph
 
Reported: 2012-11-07 13:55 UTC by MZMcBride
Modified: 2013-03-21 15:13 UTC (History)
16 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description MZMcBride 2012-11-07 13:55:49 UTC
There's a bit of funkiness with the wikidata.org --> www.wikidata.org redirect. For me, <http://wikidata.org/wiki/Hello> and <http://www.wikidata.org/wiki/Hello> both resolve ("HTTP/1.0 200 OK") without redirecting. This is the wrong behavior. There should only be one canonical form of the URL, similar to how mediawiki.org behaves. Everything should either redirect to www or everything should redirect to the un-prefixed form. A mixture is bad.
Comment 1 Daniel Kinzler 2012-11-07 21:25:06 UTC
We really want wikidata.org without a prefix, but apparently several bits of ops infrastructure assume that there is always a subdomain. Hence the confusion. But I agree that it needs to be sorted out.
Comment 2 Sam Reed (reedy) 2012-11-07 21:25:48 UTC
I think it used to work right when it first went live...
Comment 3 MZMcBride 2012-11-08 05:17:18 UTC
Yeah, the ops folks seemed to follow www.mediawiki.org as an example when they probably should have been following wikimediafoundation.org as an example. I'm not sure it was originally clear that the Wikidata folks wanted the non-www form.
Comment 4 MZMcBride 2012-12-04 17:23:25 UTC
There really ought to be only one canonical form of the URL/domain.

What's needed to either use www.wikidata.org or wikidata.org as the canonical form? Are there specific sub-tasks required? Do RT tickets need to be filed? How can we move this bug forward?
Comment 5 Sam Reed (reedy) 2012-12-04 19:50:17 UTC
You've got gerrit access, make the changes in the operations/apache-config.git repo ;)
Comment 6 Daniel Zahn 2013-01-17 21:50:19 UTC
I merged the MW and the Apache changes and deployed them. The www is now gone:

https://gerrit.wikimedia.org/r/#/c/44406/
https://gerrit.wikimedia.org/r/#/c/44407/

21:41 logmsgbot: dzahn gracefulled all apaches
21:41 mutante: dropping www from wikidata in mw and apache configs as requested
21:39 logmsgbot: dzahn synchronized ./wmf-config/CommonSettings.php
21:39 logmsgbot: dzahn synchronized ./wmf-config/InitialiseSettings.php
Comment 7 MZMcBride 2013-01-17 23:27:18 UTC
Sorry, this appears to still not be fixed.

---
$ curl -Is "http://www.wikidata.org/" | grep Location
Location: http://wikidata.org/wiki/Wikidata:Main_Page
---

^ This is correct.

---
$ curl -Is "http://www.wikidata.org/wiki/Hello" | head -1
HTTP/1.0 200 OK
---

^ This is incorrect.
Comment 8 Daniel Zahn 2013-01-17 23:47:12 UTC
Does this mean you want an additional redirect from www. to "without www"? looking..
Comment 9 Daniel Zahn 2013-01-18 02:12:49 UTC
Sorry, this whole attempt to drop "www" had to be reverted and caused some issues. We can't _not_ have www because of bits.
Comment 10 Daniel Zahn 2013-01-18 02:14:29 UTC
		Ifba23284	revert the whole www dropping for wikidata, cant use it due to bits (MERGED)	Dzahn	operations/mediawiki-config	master (master)	4:39 PM	
Dzahn
	
Dzahn
		Ie68957b9	revert the whole www dropping for wikidata, cant have it due to the way bits wor (MERGED)	Dzahn	operations/apache-config	master (master)	4:38 PM	
Dzahn
	
Dzahn
		I4035c527	fix wikidata redirect, for real, sry (MERGED)	Dzahn	operations/apache-config	master (master)	4:19 PM	
Dzahn
	
Dzahn
		I16e20aa9	fix wikidata redirect (MERGED)	Dzahn	operations/apache-config	master (master)	4:13 PM	
Dzahn
	
Dzahn
Comment 11 MZMcBride 2013-01-18 02:17:54 UTC
(In reply to comment #9)
> We can't _not_ have www because of bits.

Can you explain further? Plenty of wikis don't use "www" (commons.wikimedia.org, wikimediafoundation.org, etc.). I don't understand the issue.
Comment 12 Sam Reed (reedy) 2013-01-18 02:20:38 UTC
(In reply to comment #11)
> (In reply to comment #9)
> > We can't _not_ have www because of bits.
> 
> Can you explain further? Plenty of wikis don't use "www"
> (commons.wikimedia.org, wikimediafoundation.org, etc.). I don't understand
> the
> issue.

<mutante> we cant NOT have www due to the way bits works
<Reedy_> why? :/
<mutante> due to the way bits works and geolocation
<mutante> it needs a CNAME .. and NOT an A record
<mutante> but wikidata.org is an A record
<mutante> wikimediafoundation.org only works because it is not balanced between data centers at all
Comment 13 MZMcBride 2013-01-18 03:44:09 UTC
Something funky is definitely going on here, but I don't think bits is to blame.

Right now, I'm focused on a tangential issue: when you request a non-existent page, there's a 200 response code from the Wikidata domains, not a 404 response code as there should be.

Compare:

$ curl -Is 'http://en.wikipedia.org/wiki/This_should_return_a_404' | head -1
HTTP/1.0 404 Not Found

$ curl -Is 'http://zh.wikisource.org/wiki/This_should_return_a_404' | head -1
HTTP/1.0 404 Not Found

$ curl -Is 'http://www.wikidata.org/wiki/This_should_return_a_404' | head -1
HTTP/1.0 200 OK

$ curl -Is 'http://wikidata.org/wiki/This_should_return_a_404' | head -1
HTTP/1.0 200 OK

We can see that en.wikipedia.org and zh.wikisource.org properly return a 404, but www.wikidata.org and wikidata.org return a 200. This is wrong.

I'd generally say that these response codes are the subject of a separate bug, but I'm betting that solving this issue (this symptom) will lead to the resolution of this bug. If someone can prove otherwise, feel free to split the bug. :-)
Comment 14 Daniel Kinzler 2013-01-18 10:42:18 UTC
(In reply to comment #9)
> Sorry, this whole attempt to drop "www" had to be reverted and caused some
> issues. We can't _not_ have www because of bits.

For the record, the revert left broken stuff stuck in the european caches, see bug 44094.
Comment 15 Daniel Kinzler 2013-01-18 10:42:56 UTC
> <mutante> due to the way bits works and geolocation
> <mutante> it needs a CNAME .. and NOT an A record
> <mutante> but wikidata.org is an A record

Filed this as bug 44097.
Comment 16 MZMcBride 2013-01-18 14:53:26 UTC
Daniel K.: Can you please investigate the issue described in comment 13? Between cache pollution and incorrect response codes, it's become nearly impossible to test/debug this bug. I think it's important to get consistent response codes before this bug can move forward.
Comment 17 Daniel Kinzler 2013-01-18 15:08:21 UTC
(In reply to comment #16)
> Daniel K.: Can you please investigate the issue described in comment 13?

Filed as bug 44108.
Comment 18 Hazard-SJ 2013-01-20 20:52:23 UTC
I remember reporting a PyWikipedia bug about page.exists() giving one single result (don't remember offhand if it's True of False all the time), so they might be somehow linked. That hasn't been resolved either.
Comment 19 MZMcBride 2013-01-22 02:06:54 UTC
(In reply to comment #18)
> I remember reporting a PyWikipedia bug about page.exists() giving one single
> result (don't remember offhand if it's True of False all the time), so they
> might be somehow linked. That hasn't been resolved either.

Was this comment posted to the appropriate bug? I'm having difficulty understanding what you're saying in context. What pywikipedia bug?
Comment 20 Hazard-SJ 2013-01-26 06:13:02 UTC
http://sourceforge.net/tracker/?func=detail&atid=603138&aid=3594132&group_id=93107 but I'm not sure if they are related or not.
Comment 21 jeblad 2013-01-27 12:58:26 UTC
The Special:UserLogin page is now returning the user to a URL without subdomain and without namespace. This will more often than not fail to be a valid page.
Comment 22 Krinkle 2013-02-03 04:18:50 UTC
http://wikidata.org/wiki is a redirect loop, is a redirect loop.
Comment 23 MZMcBride 2013-02-03 04:32:50 UTC
(In reply to comment #22)
> http://wikidata.org/wiki is a redirect loop, is a redirect loop.

This is filed as bug 44612.
Comment 24 MZMcBride 2013-02-24 04:58:13 UTC
Okay, it now appears that the canonical form is www.wikidata.org:

---
# HTTP, NO PREFIX
$ curl -Is "http://wikidata.org/" | grep Location
Location: http://www.wikidata.org/wiki/Wikidata:Main_Page
$ curl -Is "http://wikidata.org/wiki" | grep Location
Location: http://www.wikidata.org/wiki/Wikidata:Main_Page
$ curl -Is "http://wikidata.org/wiki/" | grep Location
Location: http://www.wikidata.org/wiki/Wikidata:Main_Page

# HTTPS, NO PREFIX
$ curl -Is "https://wikidata.org/" | grep Location
Location: https://www.wikidata.org/wiki/Wikidata:Main_Page
$ curl -Is "https://wikidata.org/wiki" | grep Location
Location: https://www.wikidata.org/wiki/Wikidata:Main_Page
$ curl -Is "https://wikidata.org/wiki/" | grep Location
Location: https://www.wikidata.org/wiki/Wikidata:Main_Page

# HTTP, PREFIX
$ curl -Is "http://www.wikidata.org/" | grep Location
Location: http://www.wikidata.org/wiki/Wikidata:Main_Page
$ curl -Is "http://www.wikidata.org/wiki" | grep Location
Location: http://www.wikidata.org/wiki/Wikidata:Main_Page
$ curl -Is "http://www.wikidata.org/wiki/" | grep Location
Location: http://www.wikidata.org/wiki/Wikidata:Main_Page

# HTTPS, PREFIX
$ curl -Is "https://www.wikidata.org/" | grep Location
Location: https://www.wikidata.org/wiki/Wikidata:Main_Page
$ curl -Is "https://www.wikidata.org/wiki" | grep Location
Location: https://www.wikidata.org/wiki/Wikidata:Main_Page
$ curl -Is "https://www.wikidata.org/wiki/" | grep Location
Location: https://www.wikidata.org/wiki/Wikidata:Main_Page
---

Can this bug be marked resolved/fixed, then?
Comment 25 MZMcBride 2013-02-24 05:05:28 UTC
(In reply to comment #24)
> ---
> # HTTP, NO PREFIX
> $ curl -Is "http://wikidata.org/" | grep Location
> Location: http://www.wikidata.org/wiki/Wikidata:Main_Page
> $ curl -Is "http://wikidata.org/wiki" | grep Location
> Location: http://www.wikidata.org/wiki/Wikidata:Main_Page
> $ curl -Is "http://wikidata.org/wiki/" | grep Location
> Location: http://www.wikidata.org/wiki/Wikidata:Main_Page
> 
> # HTTPS, NO PREFIX
> $ curl -Is "https://wikidata.org/" | grep Location
> Location: https://www.wikidata.org/wiki/Wikidata:Main_Page
> $ curl -Is "https://wikidata.org/wiki" | grep Location
> Location: https://www.wikidata.org/wiki/Wikidata:Main_Page
> $ curl -Is "https://wikidata.org/wiki/" | grep Location
> Location: https://www.wikidata.org/wiki/Wikidata:Main_Page
> 
> [...]

Hmmm, no, missing one test:

---
# HTTPS, PREFIX, MAIN PAGE
$ curl -Is "https://www.wikidata.org/wiki/Wikidata:Main_Page" | grep Location
---

^ This is the correct behavior. No Location header, as the canonical form is www currently.

---
# HTTPS, NO PREFIX, MAIN PAGE
$ curl -Is "https://wikidata.org/wiki/Wikidata:Main_Page" | grep Location
---

^ This is wrong. This page currently loads without redirecting using a Location header. Compare:

---
> # HTTPS, NO PREFIX
> $ curl -Is "https://wikidata.org/" | grep Location
> Location: https://www.wikidata.org/wiki/Wikidata:Main_Page
> $ curl -Is "https://wikidata.org/wiki" | grep Location
> Location: https://www.wikidata.org/wiki/Wikidata:Main_Page
> $ curl -Is "https://wikidata.org/wiki/" | grep Location
> Location: https://www.wikidata.org/wiki/Wikidata:Main_Page
---

Three forms go from no-www to www. But one form (/wiki/Wikidata:Main_Page) stays at no-www. Hmm. :-/
Comment 26 Lydia Pintscher 2013-02-27 10:34:37 UTC
Can you please check again? It seems to be working fine here now.
Comment 27 Lydia Pintscher 2013-02-27 12:56:07 UTC
Ok I take that back. There's still one problem: When using :d in the wikitext you get links like en.wikidata.org or fi.wikidata.org and these do not redirect to www.wikidata.org.
Comment 28 MZMcBride 2013-02-27 15:52:24 UTC
(In reply to comment #27)
> Ok I take that back. There's still one problem: When using :d in the wikitext
> you get links like en.wikidata.org or fi.wikidata.org and these do not
> redirect to www.wikidata.org.

Hmmm, yeah, I see what you mean.

* https://en.wikidata.org/wiki/Wikidata:Main_Page
* https://wikidata.org/wiki/Wikidata:Main_Page

Both of these should redirect to <https://www.wikidata.org/wiki/Wikidata:Main_Page>, as I understand it. Currently neither do.
Comment 29 Daniel Kinzler 2013-02-27 17:58:06 UTC
(In reply to comment #28)
> * https://en.wikidata.org/wiki/Wikidata:Main_Page
> * https://wikidata.org/wiki/Wikidata:Main_Page
> 
> Both of these should redirect to
> <https://www.wikidata.org/wiki/Wikidata:Main_Page>, as I understand it.
> Currently neither do.

For now, redirecting to the main page will do, but we really want something more elaborate:

http://en.wikidata.org/wiki/Foo should redirect to http://www.wikidat.org/wiki/Special:ItemByTitle/enwiki/Foo.

That is:

(\w+).wikidata.org/wiki/(.*) should redirect to %PROTOCOL://www.wikidat.org/wiki/Special:ItemByTitle/$1wiki/$2

But perhaps that should be a separate request. I think just redirecting to the main page is fine for now. Just wanted to mention it, in case it has any impact on how this gets implemented.
Comment 30 Ori Livneh 2013-03-21 08:04:11 UTC
Attempting to remove a link on http://wikidata.org/wiki/Q169964 triggers an XHR to http://www.wikidata.org/w/api.php, which fails in Chromium 27 with:

XMLHttpRequest cannot load http://www.wikidata.org/w/api.php. Origin http://wikidata.org is not allowed by Access-Control-Allow-Origin.
Comment 31 Daniel Kinzler 2013-03-21 08:50:42 UTC
> Attempting to remove a link on http://wikidata.org/wiki/Q169964 triggers 

Actually, it should not be possible to even load  http://wikidata.org/wiki/Q169964.

Any request to wikidata.org should immediately be redirected to www.wikiedat.org.(In reply to comment #30)
Comment 32 Aude 2013-03-21 08:53:21 UTC
We've decided to do it the other way around and always redirect to www.wikidata.org

see https://bugzilla.wikimedia.org/45005
Comment 33 Lydia Pintscher 2013-03-21 08:54:24 UTC
We still need to solve the issues in the previous two comments though.
Comment 34 Aude 2013-03-21 12:16:47 UTC
This bug needs to be marked as resolved / wont fix.

This bug completely contradicts https://bugzilla.wikimedia.org/45005 which is what we decided to do.
Comment 35 Lydia Pintscher 2013-03-21 15:13:32 UTC
closing after discussion with Denny

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links