Last modified: 2014-07-23 10:46:26 UTC
the MediaViewer has gives us https://en.wikipedia.org/wiki/Milutin_Dostani%C4%87#mediaviewer/File:MilutinDostani%C4%87.jpg while Commons has it as https://commons.wikimedia.org/wiki/File:MilutinDostani%C4%87.jpg Thanks, GerardM
Can you clarify what result you would expect? The MediaViewer URL hash contains the exact same string as the file page URL, so I'm not sure how is it not "what we usually do".
Try the URL, you will find that what Bugzilla also shows does NOT show like shit from the Commons URL; it has a special c.. Thanks, GerardM
I'm afraid I have no idea what you are saying. Can you describe how expected and actual behavior differs? See [[mw:How to report a bug]].
No clue either. Please provide clear exact output, screenshots, browser information. For future reference, https://bugzilla.wikimedia.org/enter_bug.cgi?format=guided might also help providing more useful hints.
per discussion on irc: *Go to https://en.wikipedia.org/wiki/Milutin_Dostani%C4%87#mediaviewer/File:MilutinDostani%C4%87.jpg Expected behaviour: * Url bar in browser replaces hex encoding with the characters in question: for a url like: https://en.wikipedia.org/wiki/Milutin_Dostanić#mediaviewer/File:MilutinDostanić.jpg Actual behaviour Behaviour varies with browser: * [on chrome] This only happens in path part of url, not fragment, giving a url like https://en.wikipedia.org/wiki/Milutin_Dostanić#mediaviewer/File:MilutinDostani%C4%87.jpg * [on firefox 3.5] ć is in both places like expected.
Is this a Chromium bug, then?
> > * [on chrome] This only happens in path part of url, not fragment, giving a > url like > https://en.wikipedia.org/wiki/Milutin_Dostanić#mediaviewer/File: > MilutinDostani%C4%87.jpg > * [on firefox 3.5] ć is in both places like expected. To be clear, chrome simply does not adjust the fragment part of the url but leaves it as is. If you type in https://en.wikipedia.org/wiki/Milutin_Dostani%C4%87#mediaviewer/File:MilutinDostani%C4%87.jpg it will display as https://en.wikipedia.org/wiki/Milutin_Dostanić#mediaviewer/File:MilutinDostani%C4%87.jpg. If you type in https://en.wikipedia.org/wiki/Milutin_Dostanić#mediaviewer/File:MilutinDostanić.jpg it will display as https://en.wikipedia.org/wiki/Milutin_Dostanić#mediaviewer/File:MilutinDostanić.jpg . Firefox (3.5) will always convert the url to https://en.wikipedia.org/wiki/Milutin_Dostanić#mediaviewer/File:MilutinDostanić.jpg
Thanks for the explanation! This should be a bug/feature request for non-firefox browsers. Closing as worksforme since what MediaViewer is doing is, as far as I can see, the correct way of representing non-ASCII characters in an URI. See my mail some time ago about standards and other considerations: http://lists.wikimedia.org/pipermail/wikitech-l/2014-April/076069.html I reported this for Chromium at the time: https://code.google.com/p/chromium/issues/detail?id=367505 Tried to report for Safari/IE as well but lost motivation before getting halfway through the crap that's needed to report bugs for those browsers. (Opera partially got this right at the time; since then they switched engines, so that might have changed.) If someone is more persistent or already has the right kind of account, more upstream bug reports would be helpful.
Created attachment 16007 [details] expected URL
Created attachment 16008 [details] The URL with a malformed string This is not human readable as you would expect it.
Again, you should report this to your browser vendor. The URL follows the standard method of encoding non-ASCII bytes, and including them unencoded would result in more serious issues, as I outlined in the mail.
Why then is there a difference between how Commons does things and how the Multi Media Viewer does it... Consistent behaviour may be expected and has nothing to do with browser "vendors". Thanks, GerardM
Because some browsers treat percent-encoded characters differently in the path/query part and the fragment part of the URL. As can be seen from the URLs you posted in comment 0, the actual representation is consistent.
As can be seen by the screenshots (from the same browser & the same session) this is not the case. Thanks, GerardM
(In reply to Gerard Meijssen from comment #14) > As can be seen by the screenshots (from the same browser & the same session) > this is not the case. > Thanks, > GerardM That doesn't make sense. Tgr said that some browsers (chrome) treat the part of the url after the '#' different from the part before the '#', and that you should complain to your browser maker. Your screenshots seem to agree with what tgr said.
Could media viewer just not urlencode the fragment? I suspect that's allowed in html5, and chrome seems to handle it fine.
Could, but should not, IMO. I'll quote the relevant part from the mail linked in comment 8: 1. Just put the file name as-is (with spaces replaced by underscores) in the URL fragment part. Pro: readable file names in URLs, easy to generate. Con: technically not a valid URI. [2] (It would be a valid IRI, probably, but browser support for that is not so great, so non-ASCII bytes might get encoded in unexpected ways.) Creates nasty usability and security issues (injection vulnerabilities, RTL characters, characters which break autolinking). Would make it very hard to introduce more complex URL formats later, as file names can contain pretty much any character. 2. Use percent encoding (with underscores for spaces). Pro: this is the standard way of encoding fragments. [2][3] Always results in a valid URI. Readable file names in Firefox. Easy to generate on-wiki (e.g. with {{urlencode}}) Con: Non-Latin filenames look horrible in any browser that's not Firefox. [2] http://tools.ietf.org/html/rfc3986#section-3.5 [3] https://tools.ietf.org/html/rfc3987#section-3.1
> > 1. Just put the file name as-is (with spaces replaced by underscores) in > the URL fragment part. > Pro: readable file names in URLs, easy to generate. > Con: technically not a valid URI. [2] (It would be a valid IRI, > probably, but browser support for that is not so great, so non-ASCII > bytes might get encoded in unexpected ways.) Yeah, that sounds like the making of some not so fun bugs. >Creates nasty usability > and security issues (injection vulnerabilities, RTL characters, > characters which break autolinking). What sort of injection vulnerabilities do you mean ( < and > are disallowed in titles. Things should be escaped before injecting into html anyways). I doubt RTL characters would cause major problems. The annoying characters (bidi override, rtl mark, etc) are banned from file names anyways. > Would make it very hard to > introduce more complex URL formats later, as file names can contain > pretty much any character. > true.
(In reply to Bawolff (Brian Wolff) from comment #18) > What sort of injection vulnerabilities do you mean ( < and > are disallowed > in titles. Things should be escaped before injecting into html anyways). Quotes are allowed and can be used to break out from HTML attributes. The goal of having a custom URL in the first place is that people can copy-paste it, so escaping would be up to the reuser. People don't escape URLs they paste into blog posts. > I doubt RTL characters would cause major problems. The annoying characters > (bidi override, rtl mark, etc) are banned from file names anyways. Here is an example: https://he.wikipedia.org/wiki/קובץ:תוכנית הפדרציה.png Press "reply" and try to interact with it in the edit box (like deleting some character, adding ASCII characters). Not a major problem but an annoyance. Plus, tofu in the editbox for more exotic scripts. Autolinking is a bigger concern though. MediaWiki (and Gmail, Facebook, pretty much anything else) tends to end links characters like ")" which are pretty frequent in file names.
I am closing this report again - see comment 8 and comment 17 for reasons to not change current behavior.