Last modified: 2012-08-23 16:39:26 UTC
We really shouldn't suggest illegal filenames. For example, on upload by url, putting http://upload.wikimedia.org/wikipedia/commons/e/e0/Petrorhagia_prolifera_%281%29.JPG into the url box, our js determines that Petrorhagia_prolifera_%281%29.JPG is a good destination name (which is illegal because it conflicts with percent encodings). with url's, %-encoding issues will probably be somewhat common, but could also happen for normal upload from local file. This is a fairly low priority thing imo.
Rephrasing title to be a bug.
Created attachment 9409 [details] patch that removes url encoding in file name Hello, I have made a patch to the javascript destination file name, to call the javascript unescape function, that will remove the URL encoding where it exists.
(In reply to comment #2) > Created attachment 9409 [details] > patch that removes url encoding in file name > > Hello, > I have made a patch to the javascript destination file name, to call the > javascript unescape function, that will remove the URL encoding where it > exists. Thank you for patch :). However, I think a more complicated function is needed. unescape doesn't handle unicode properly (unescape("%C4%A3") => Ä£ where it should be ģ) and the set of illegal characters in a file name don't entirely match up with the set of illegal chars in a url (Things like ][{}|#<> can't appear in titles). Just getting rid of the percent-encoding issue would probably be a major improvement (decodeURIComponent works much better than unescape for that), but there might already be a function available for title validation in our js stuff (I'm not sure, not all that familar with the js stuff)
(In reply to comment #3) > (In reply to comment #2) > > Created attachment 9409 [details] > > patch that removes url encoding in file name > > > > Hello, > > I have made a patch to the javascript destination file name, to call the > > javascript unescape function, that will remove the URL encoding where it > > exists. > > Thank you for patch :). However, I think a more complicated function is needed. > unescape doesn't handle unicode properly (unescape("%C4%A3") => Ä£ where it > should be ģ) and the set of illegal characters in a file name don't entirely > match up with the set of illegal chars in a url (Things like ][{}|#<> can't > appear in titles). > > Just getting rid of the percent-encoding issue would probably be a major > improvement (decodeURIComponent works much better than unescape for that), but > there might already be a function available for title validation in our js > stuff (I'm not sure, not all that familar with the js stuff) Hello, Does decodeURIComponent handle unicode correctly? I will modify my patch to use this.
Created attachment 9413 [details] second version of patch using decodeURIcomponent I have modified my previous patch to use the decodeURIComponent function rather than un escape.
>Hello, >Does decodeURIComponent handle unicode correctly? I will modify my patch to use >this. Yes, it does. Perhaps something like decodeURIComponent(theSuggestedName).replace( /[\]\[\{\}\|#<>:]/g, '' ); (I suppose colon can be valid in certain configs, but they seem few - mostly that file namespace repo extension) (although, you do have to check for exceptions with decodeURIComponent, in case of invalid utf-8 - aka decodeURIComponent("%C4") will throw an exception).
I don't think we should do any modification to the filename such as replace, other than to decode the URI component/%-encoding.
Created attachment 9423 [details] patch that removes url encoding in file name Hello, I have made a new version of the patch for this bug. We decodeURIComponent to decode any URI coding in the destination file name. Where an exception occurs as a result invalid utf-8 as described in example in the bug report, we use the destination file name, without the uri coding removed, if there is any. No other changes are made to the destination file name, such are removing any other characters and replacing them.
Karun, I'm sorry for the slow response to your patch. https://www.mediawiki.org/wiki/Git/Tutorial shows you how you can directly get your patch into our source control system to get faster response and review. Thank you again for your contribution!
(In reply to comment #9) > Karun, I'm sorry for the slow response to your patch. > > https://www.mediawiki.org/wiki/Git/Tutorial shows you how you can directly get > your patch into our source control system to get faster response and review. > Thank you again for your contribution! Thanks, I will have a look at it. Previously it took too long with many bugs before they were reviewed, so I gave up contributing to the project.
Karun, thank you for committing the patch as https://gerrit.wikimedia.org/r/#/c/20532/ . And you have my regrets and apologies regarding those delays. The Wikimedia engineering community is trying to get better and faster at triaging bugs and reviewing and merging patches and I hope to see your contributions again. :)
> > Thanks, I will have a look at it. Previously it took too long with many bugs > before they were reviewed, so I gave up contributing to the project. And I would like to add my apology as well, since I was actively commenting on the bug, that's really my fault.
(In reply to comment #12) > > > > Thanks, I will have a look at it. Previously it took too long with many bugs > > before they were reviewed, so I gave up contributing to the project. > > And I would like to add my apology as well, since I was actively commenting on > the bug, that's really my fault. Ive submitted the patch to Gerrit. It looks like Gerrit is going to help with having patches reviewed quicker.
Comment on attachment 9423 [details] patch that removes url encoding in file name Marking patch obsolete as it is in Gerrit now (I3a48f8a8).