Last modified: 2013-09-18 15:40:29 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T54325, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 52325 - Be more strict when validating URLs
Be more strict when validating URLs
Status: VERIFIED FIXED
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Wikidata bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-31 14:25 UTC by Daniel Kinzler
Modified: 2013-09-18 15:40 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Daniel Kinzler 2013-07-31 14:25:43 UTC
validating url values currently allows for invalid URLs to pass, e.g.:

* "http://" is allowed
* "http://yadda.yadda/foo bar" is allowed

We should check at least that:

* there is no white space in the URL
* for http/https, that the ":" is followed by "//" and a non-empty host name.
Comment 1 Daniel Kinzler 2013-07-31 14:43:59 UTC
Quick summary of a discussion with Denny:

* definitly more checks for http/https
* maybe have a setting for allowed protocols, separate from the protocols supported in wikitext. 

Implementation idea: create a UrlValidator class that dispatches to a validator for each known protocol (and optionally to a special default validator for unknown protocols).
Comment 2 denny vrandecic 2013-07-31 14:58:55 UTC
Probably having a separate setting sounds better. If we have both, we would take the intersection of both, I guess, which would be extremely confusing in some cases. So we should have them as two different settings.

And I suggest to start with a small list, just http(s) for now, but that's deployment question :)
Comment 3 Derk-Jan Hartman 2013-07-31 18:55:43 UTC
"there is no white space in the URL"

You mean, you want to enforce people to paste encoded urls into the field (note that I believe browsers these days feed an unencoded url to the copy past board for readability of urls).
Comment 4 Gerrit Notification Bot 2013-08-01 20:10:22 UTC
Change 77183 had a related patch set uploaded by Daniel Kinzler:
(bug 52325) validators for url schemes.

https://gerrit.wikimedia.org/r/77183
Comment 5 Daniel Kinzler 2013-08-02 12:00:51 UTC
(In reply to comment #3)
> "there is no white space in the URL"

Note that in wikitext, this is also true. It's actually the assumption that led to the syntax for external links as we use it now.
 
> You mean, you want to enforce people to paste encoded urls into the field

I would like them to post a *valid* URL (or perhaps IRI), yes.

> (note
> that I believe browsers these days feed an unencoded url to the copy past
> board
> for readability of urls).

I can't even get firefox to show a URL with a space in it, it always gets converted to '+' right away. And Firefox will *show* https://ru.wikipedia.org/wiki/Вашингтон,_Джордж, but if you copy&paste it, you get https://ru.wikipedia.org/wiki/%D0%92%D0%B0%D1%88%D0%B8%D0%BD%D0%B3%D1%82%D0%BE%D0%BD,_%D0%94%D0%B6%D0%BE%D1%80%D0%B4%D0%B6. So your assumptions is wrong at least for firefox.

Now, i'd accept full unicode IRIs, but no spaces. Not sure yet if we should convert to true URL syntax internally, with encoded non-ascii characters. For now, we'll just save the URL as it comes in if it's valid.
Comment 6 Gerrit Notification Bot 2013-08-29 09:25:21 UTC
Change 77183 merged by jenkins-bot:
(bug 52325) validators for url schemes.

https://gerrit.wikimedia.org/r/77183

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links