Last modified: 2014-07-23 10:46:26 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T70372, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 68372 - Special characters in target of mmv url are hex encoded in url bar


Summary:	Special characters in target of mmv url are hex encoded in url bar

Status:	RESOLVED WONTFIX

Product:	MediaWiki extensions
Classification:	Unclassified
Component:	MultimediaViewer (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Unprioritized normal (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2014-07-22 11:21 UTC by Gerard Meijssen
Modified:	2014-07-23 10:46 UTC (History)
CC List:	6 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
expected URL (8.65 KB, image/png) 2014-07-22 23:15 UTC, Gerard Meijssen	Details
The URL with a malformed string (10.62 KB, image/png) 2014-07-22 23:18 UTC, Gerard Meijssen	Details
Add an attachment (proposed patch, testcase, etc.)

Description Gerard Meijssen 2014-07-22 11:21:21 UTC

the MediaViewer has gives us https://en.wikipedia.org/wiki/Milutin_Dostani%C4%87#mediaviewer/File:MilutinDostani%C4%87.jpg while Commons has it as https://commons.wikimedia.org/wiki/File:MilutinDostani%C4%87.jpg

Thanks,
    GerardM

Comment 1 Tisza Gergő 2014-07-22 18:16:51 UTC

Can you clarify what result you would expect? The MediaViewer URL hash contains the exact same string as the file page URL, so I'm not sure how is it not "what we usually do".

Comment 2 Gerard Meijssen 2014-07-22 19:37:37 UTC

Try the URL, you will find that what Bugzilla also shows does NOT show like shit from the Commons URL; it has a special c..
Thanks,
    GerardM

Comment 3 Tisza Gergő 2014-07-22 19:45:35 UTC

I'm afraid I have no idea what you are saying. Can you describe how expected and actual behavior differs? See [[mw:How to report a bug]].

Comment 4 Andre Klapper 2014-07-22 20:01:54 UTC

No clue either. Please provide clear exact output, screenshots, browser information. For future reference, https://bugzilla.wikimedia.org/enter_bug.cgi?format=guided might also help providing more useful hints.

Comment 5 Bawolff (Brian Wolff) 2014-07-22 20:23:34 UTC

per discussion on irc:

*Go to https://en.wikipedia.org/wiki/Milutin_Dostani%C4%87#mediaviewer/File:MilutinDostani%C4%87.jpg

Expected behaviour:
* Url bar in browser replaces hex encoding with the characters in question: for a url like: https://en.wikipedia.org/wiki/Milutin_Dostanić#mediaviewer/File:MilutinDostanić.jpg

Actual behaviour

Behaviour varies with browser:

* [on chrome] This only happens in path part of url, not fragment, giving a url like https://en.wikipedia.org/wiki/Milutin_Dostanić#mediaviewer/File:MilutinDostani%C4%87.jpg
* [on firefox 3.5] ć is in both places like expected.

Comment 6 Mark Holmquist 2014-07-22 20:25:34 UTC

Is this a Chromium bug, then?

Comment 7 Bawolff (Brian Wolff) 2014-07-22 20:29:05 UTC

> 
> * [on chrome] This only happens in path part of url, not fragment, giving a
> url like
> https://en.wikipedia.org/wiki/Milutin_Dostanić#mediaviewer/File:
> MilutinDostani%C4%87.jpg
> * [on firefox 3.5] ć is in both places like expected.

To be clear, chrome simply does not adjust the fragment part of the url but leaves it as is. If you type in https://en.wikipedia.org/wiki/Milutin_Dostani%C4%87#mediaviewer/File:MilutinDostani%C4%87.jpg it will display as https://en.wikipedia.org/wiki/Milutin_Dostanić#mediaviewer/File:MilutinDostani%C4%87.jpg. If you type in https://en.wikipedia.org/wiki/Milutin_Dostanić#mediaviewer/File:MilutinDostanić.jpg it will display as https://en.wikipedia.org/wiki/Milutin_Dostanić#mediaviewer/File:MilutinDostanić.jpg . Firefox (3.5) will always convert the url to https://en.wikipedia.org/wiki/Milutin_Dostanić#mediaviewer/File:MilutinDostanić.jpg

Comment 8 Tisza Gergő 2014-07-22 20:56:15 UTC

Thanks for the explanation! This should be a bug/feature request for non-firefox browsers. Closing as worksforme since what MediaViewer is doing is, as far as I can see, the correct way of representing non-ASCII characters in an URI. See my mail some time ago about standards and other considerations: http://lists.wikimedia.org/pipermail/wikitech-l/2014-April/076069.html

I reported this for Chromium at the time: https://code.google.com/p/chromium/issues/detail?id=367505
Tried to report for Safari/IE as well but lost motivation before getting halfway through the crap that's needed to report bugs for those browsers. (Opera partially got this right at the time; since then they switched engines, so that might have changed.) If someone is more persistent or already has the right kind of account, more upstream bug reports would be helpful.

Comment 9 Gerard Meijssen 2014-07-22 23:15:34 UTC

Created attachment 16007 [details]
expected URL

Comment 10 Gerard Meijssen 2014-07-22 23:18:20 UTC

Created attachment 16008 [details]
The URL with a malformed string

This is not human readable as you would expect it.

Comment 11 Tisza Gergő 2014-07-22 23:38:36 UTC

Again, you should report this to your browser vendor. The URL follows the standard method of encoding non-ASCII bytes, and including them unencoded would result in more serious issues, as I outlined in the mail.

Comment 12 Gerard Meijssen 2014-07-22 23:43:15 UTC

Why then is there a difference between how Commons does things and how the Multi Media Viewer does it... Consistent behaviour may be expected and has nothing to do with browser "vendors".
Thanks,
     GerardM

Comment 13 Tisza Gergő 2014-07-22 23:45:36 UTC

Because some browsers treat percent-encoded characters differently in the path/query part and the fragment part of the URL. As can be seen from the URLs you posted in comment 0, the actual representation is consistent.

Comment 14 Gerard Meijssen 2014-07-22 23:47:37 UTC

As can be seen by the screenshots (from the same browser & the same session) this is not the case.
Thanks,
    GerardM

Comment 15 Bawolff (Brian Wolff) 2014-07-22 23:51:34 UTC

(In reply to Gerard Meijssen from comment #14)
> As can be seen by the screenshots (from the same browser & the same session)
> this is not the case.
> Thanks,
>     GerardM

That doesn't make sense. Tgr said that some browsers (chrome) treat the part of the url after the '#' different from the part before the '#', and that you should complain to your browser maker. Your screenshots seem to agree with what tgr said.

Comment 16 Bawolff (Brian Wolff) 2014-07-22 23:54:18 UTC

Could media viewer just not urlencode the fragment? I suspect that's allowed in html5, and chrome seems to handle it fine.

Comment 17 Tisza Gergő 2014-07-23 00:04:46 UTC

Could, but should not, IMO. I'll quote the relevant part from the mail linked in comment 8:

1. Just put the file name as-is (with spaces replaced by underscores) in
   the URL fragment part.
   Pro: readable file names in URLs, easy to generate.
   Con: technically not a valid URI. [2] (It would be a valid IRI,
        probably, but browser support for that is not so great, so non-ASCII
        bytes might get encoded in unexpected ways.) Creates nasty usability 
        and security issues (injection vulnerabilities, RTL characters, 
        characters which break autolinking). Would make it very hard to
        introduce more complex URL formats later, as file names can contain 
        pretty much any character.

2. Use percent encoding (with underscores for spaces).
   Pro: this is the standard way of encoding fragments. [2][3] Always
        results in a valid URI. Readable file names in Firefox. Easy to 
        generate on-wiki (e.g. with {{urlencode}})
   Con: Non-Latin filenames look horrible in any browser that's not Firefox.


[2] http://tools.ietf.org/html/rfc3986#section-3.5
[3] https://tools.ietf.org/html/rfc3987#section-3.1

Comment 18 Bawolff (Brian Wolff) 2014-07-23 02:48:20 UTC

> 
> 1. Just put the file name as-is (with spaces replaced by underscores) in
>    the URL fragment part.
>    Pro: readable file names in URLs, easy to generate.
>    Con: technically not a valid URI. [2] (It would be a valid IRI,
>         probably, but browser support for that is not so great, so non-ASCII
>         bytes might get encoded in unexpected ways.)

Yeah, that sounds like the making of some not so fun bugs.

>Creates nasty usability 
>         and security issues (injection vulnerabilities, RTL characters, 
>         characters which break autolinking).

What sort of injection vulnerabilities do you mean ( < and > are disallowed in titles. Things should be escaped before injecting into html anyways). I doubt RTL characters would cause major problems. The annoying characters (bidi override, rtl mark, etc) are banned from file names anyways.

> Would make it very hard to
>         introduce more complex URL formats later, as file names can contain 
>         pretty much any character.
> 

true.

Comment 19 Tisza Gergő 2014-07-23 02:59:20 UTC

(In reply to Bawolff (Brian Wolff) from comment #18)
> What sort of injection vulnerabilities do you mean ( < and > are disallowed
> in titles. Things should be escaped before injecting into html anyways).

Quotes are allowed and can be used to break out from HTML attributes. The goal of having a custom URL in the first place is that people can copy-paste it, so escaping would be up to the reuser. People don't escape URLs they paste into blog posts.

> I doubt RTL characters would cause major problems. The annoying characters
> (bidi override, rtl mark, etc) are banned from file names anyways.

Here is an example: https://he.wikipedia.org/wiki/קובץ:תוכנית הפדרציה.png
Press "reply" and try to interact with it in the edit box (like deleting some character, adding ASCII characters). Not a major problem but an annoyance.

Plus, tofu in the editbox for more exotic scripts.

Autolinking is a bigger concern though. MediaWiki (and Gmail, Facebook, pretty much anything else) tends to end links characters like ")" which are pretty frequent in file names.

Comment 20 Andre Klapper 2014-07-23 09:02:51 UTC

I am closing this report again - see comment 8 and comment 17 for reasons to not change current behavior.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links