Last modified: 2012-08-23 16:39:26 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T32390, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 30390 - Suggested file name on Special:Upload should not contain illegal characters
Suggested file name on Special:Upload should not contain illegal characters
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Uploading (Other open bugs)
1.20.x
All All
: Low enhancement (vote)
: 1.20.0 release
Assigned To: Krinkle
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-16 05:36 UTC by Bawolff (Brian Wolff)
Modified: 2012-08-23 16:39 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
patch that removes url encoding in file name (502 bytes, patch)
2011-11-10 12:47 UTC, Karun
Details
second version of patch using decodeURIcomponent (561 bytes, patch)
2011-11-10 18:02 UTC, Karun
Details
patch that removes url encoding in file name (736 bytes, patch)
2011-11-11 21:38 UTC, Karun
Details

Description Bawolff (Brian Wolff) 2011-08-16 05:36:35 UTC
We really shouldn't suggest illegal filenames.

For example, on upload by url, putting http://upload.wikimedia.org/wikipedia/commons/e/e0/Petrorhagia_prolifera_%281%29.JPG into the url box, our js determines that Petrorhagia_prolifera_%281%29.JPG is a good destination name (which is illegal because it conflicts with percent encodings). with url's, %-encoding issues will probably be somewhat common, but could also happen for normal upload from local file.

This is a fairly low priority thing imo.
Comment 1 Krinkle 2011-08-24 23:26:11 UTC
Rephrasing title to be a bug.
Comment 2 Karun 2011-11-10 12:47:55 UTC
Created attachment 9409 [details]
patch that removes url encoding in file name

Hello,
I have made a patch to the javascript destination file name, to call the javascript unescape function, that will remove the URL encoding where it exists.
Comment 3 Bawolff (Brian Wolff) 2011-11-10 17:52:05 UTC
(In reply to comment #2)
> Created attachment 9409 [details]
> patch that removes url encoding in file name
> 
> Hello,
> I have made a patch to the javascript destination file name, to call the
> javascript unescape function, that will remove the URL encoding where it
> exists.

Thank you for patch :). However, I think a more complicated function is needed. unescape doesn't handle unicode properly (unescape("%C4%A3") => Ä£ where it should be ģ) and the set of illegal characters in a file name don't entirely match up with the set of illegal chars in a url (Things like ][{}|#<> can't appear in titles).

Just getting rid of the percent-encoding issue would probably be a major improvement (decodeURIComponent works much better than unescape for that), but there might already be a function available for title validation in our js stuff (I'm not sure, not all that familar with the js stuff)
Comment 4 Karun 2011-11-10 17:55:26 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > Created attachment 9409 [details]
> > patch that removes url encoding in file name
> > 
> > Hello,
> > I have made a patch to the javascript destination file name, to call the
> > javascript unescape function, that will remove the URL encoding where it
> > exists.
> 
> Thank you for patch :). However, I think a more complicated function is needed.
> unescape doesn't handle unicode properly (unescape("%C4%A3") => ģ where it
> should be ģ) and the set of illegal characters in a file name don't entirely
> match up with the set of illegal chars in a url (Things like ][{}|#<> can't
> appear in titles).
> 
> Just getting rid of the percent-encoding issue would probably be a major
> improvement (decodeURIComponent works much better than unescape for that), but
> there might already be a function available for title validation in our js
> stuff (I'm not sure, not all that familar with the js stuff)

Hello,
Does decodeURIComponent handle unicode correctly? I will modify my patch to use this.
Comment 5 Karun 2011-11-10 18:02:24 UTC
Created attachment 9413 [details]
second version of patch using decodeURIcomponent

I have modified my previous patch to use the decodeURIComponent function rather than un escape.
Comment 6 Bawolff (Brian Wolff) 2011-11-10 18:06:51 UTC
>Hello,
>Does decodeURIComponent handle unicode correctly? I will modify my patch to use
>this.

Yes, it does. Perhaps something like decodeURIComponent(theSuggestedName).replace( /[\]\[\{\}\|#<>:]/g, '' );

(I suppose colon can be valid in certain configs, but they seem few - mostly that file namespace repo extension)

(although, you do have to check for exceptions with decodeURIComponent, in case of invalid utf-8 - aka decodeURIComponent("%C4") will throw an exception).
Comment 7 Karun 2011-11-11 21:30:46 UTC
I don't think we should do any modification to the filename such as replace, other than to decode the URI component/%-encoding.
Comment 8 Karun 2011-11-11 21:38:49 UTC
Created attachment 9423 [details]
patch that removes url encoding in file name

Hello,
I have made a new version of the patch for this bug.
We decodeURIComponent to decode any URI coding in the destination file name.
Where an exception occurs as a result invalid utf-8 as described in example in the bug report, we use the destination file name, without the uri coding removed, if there is any.

No other changes are made to the destination file name, such are removing any other characters and replacing them.
Comment 9 Sumana Harihareswara 2012-08-17 11:14:06 UTC
Karun, I'm sorry for the slow response to your patch.

https://www.mediawiki.org/wiki/Git/Tutorial shows you how you can directly get your patch into our source control system to get faster response and review.  Thank you again for your contribution!
Comment 10 Karun 2012-08-17 21:58:02 UTC
(In reply to comment #9)
> Karun, I'm sorry for the slow response to your patch.
> 
> https://www.mediawiki.org/wiki/Git/Tutorial shows you how you can directly get
> your patch into our source control system to get faster response and review. 
> Thank you again for your contribution!

Thanks, I will have a look at it. Previously it took too long with many bugs before they were reviewed, so I gave up contributing to the project.
Comment 11 Sumana Harihareswara 2012-08-20 15:26:55 UTC
Karun, thank you for committing the patch as https://gerrit.wikimedia.org/r/#/c/20532/ .  And you have my regrets and apologies regarding those delays.  The Wikimedia engineering community is trying to get better and faster at triaging bugs and reviewing and merging patches and I hope to see your contributions again. :)
Comment 12 Bawolff (Brian Wolff) 2012-08-20 16:19:28 UTC
> 
> Thanks, I will have a look at it. Previously it took too long with many bugs
> before they were reviewed, so I gave up contributing to the project.

And I would like to add my apology as well, since I was actively commenting on the bug, that's really my fault.
Comment 13 Karun 2012-08-20 22:34:01 UTC
(In reply to comment #12)
> > 
> > Thanks, I will have a look at it. Previously it took too long with many bugs
> > before they were reviewed, so I gave up contributing to the project.
> 
> And I would like to add my apology as well, since I was actively commenting on
> the bug, that's really my fault.

Ive submitted the patch to Gerrit. It looks like Gerrit is going to help with having patches reviewed quicker.
Comment 14 Krinkle 2012-08-23 16:26:06 UTC
Comment on attachment 9423 [details]
patch that removes url encoding in file name

Marking patch obsolete as it is in Gerrit now (I3a48f8a8).

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links