Last modified: 2012-11-08 19:29:37 UTC
If you use a "/" in the filename for an api upload such as: Content-Disposition: form-data; name="filename" Content-Type: text/plain Gerald Ford Papers- Final Issues for Decision, Army Corps of Engineers- 12/4/74 - HEW and Labor(Gerald Ford Library)(1554461).pdf The response cuts out the everything before the final slash: <api> <upload result="Success" filename="74_-_HEW_and_Labor(Gerald_Ford_Library)(1554461).pdf"> <warnings badfilename="74_-_HEW_and_Labor(Gerald_Ford_Library)(1554461).pdf" exists="74_-_HEW_and_Labor(Gerald_Ford_Library)(1554461).pdf" /> The full filename isn't even given back in the badfilename.
Does this happen on Wikipedia, or on your own MediaWiki installation (if so: which version?)?
Smallman: I also wonder about the exact steps to reproduce this ("api upload" is a bit vague as there are several ways to upload data), and which filesystem you use locally to upload from.
It's especially helpful for us to know what version of MediaWiki this problem is affecting (see the Special:Version page, such as https://www.mediawiki.org/wiki/Special:Version ).
I should be more specific. It's for my Wikimedia commons upload bot: https://commons.wikimedia.org/wiki/Special:Contributions/Smallbot
Smallman: Still it's not yet clear to me how to reproduce this exactly. Elaborating very welcome!
Created attachment 11206 [details] Here's the request/response recorded by fiddler... the binary portion is mangled (3/4 cut out) to make the upload fit in 2MB...but the binary portion doesn't really matter for this bug.
To reproduce, try to upload with filename "Gerald Ford Papers- Final Issues for Decision, Army Corps of Engineers- 12/4/74(Gerald Ford Library)(1554461).pdf" to commons api with multipart post...not chunked and ignorewarnings. The file will come out "File:74 - HEW and Labor(Gerald Ford Library)(1554461).pdf" Let me know if you need more details.
Without ignorewarnings= you will get a badtitlename error, so WORKSFORME. The gui will give you a hint about the change of the name, when not checking the ignore warnings checkbox.
The content of attachment 11206 [details] has been deleted by Chad H. <innocentkiller@gmail.com> who provided the following reason: Contains private info (eg: session cookies) The token used to delete this attachment was generated at 2012-10-19 17:08:59 UTC.
Created attachment 11208 [details] Simple HTML form to upload a file through the api (targets Wikimedia Commons) to demonstrate the bug I've successfully created a simple form that targets the API of WikiMedia commons and provides all the required fields to upload a file for testing this bug. STEPS TO REPRODUCE: Each parameter is an input, you need to fill the token (a handy link on the html is included to get one) and the file you want to upload. I've prefilled the target title with the same title provided by Smallman but with a PNG extension, so choose a random PNG filename for upload (It's easier to create a PNG file rather than a PDF one). I've included the "stash" parameter to prevent the file from actually being uploaded to Commons. It's sufficient to see the results, so please leave this parameter checked ;) ACTUAL RESULTS This is the response with ignorewarnings unchecked: <upload result="Warning" filekey="10xx8haqtku8.von8i.46251.png" sessionkey="10xx8haqtku8.von8i.46251.png"> <warnings badfilename="74(Gerald_Ford_Library)(1554461).png" /> </upload> This is the response with ignorewarnings checked: <upload result="Success" filekey="10xx8s87hzog.f41ttt.46251.png" sessionkey="10xx8s87hzog.f41ttt.46251.png"> <warnings badfilename="74(Gerald_Ford_Library)(1554461).png" /> <imageinfo timestamp="2012-10-19T18:01:14Z" user="" userid="" anon="" size="80" width="11" height="11" parsedcomment="" comment="" url="http://commons.wikimedia.org/wiki/Special:UploadStash/file/10xx8s87hzog.f41ttt.46251.png" descriptionurl="http://commons.wikimedia.org/wiki/Special:UploadStash/file/10xx8s87hzog.f41ttt.46251.png" sha1="23d8e32905b6d3f4a2b89124c60db0c4bf64ac4d" mime="image/png" mediatype="UNKNOWN" bitdepth="0"> <metadata> <metadata name="frameCount" value="0" /> <metadata name="loopCount" value="1" /> <metadata name="duration" value="0" /> <metadata name="bitDepth" value="8" /> <metadata name="colorType" value="truecolour" /> <metadata name="metadata"> <value> <metadata name="_MW_PNG_VERSION" value="1" /> </value> </metadata> </metadata> </imageinfo> </upload> As you can see, <warnings badfilename="74(Gerald_Ford_Library)(1554461).png" /> doesn't match the original filename, which should be "Gerald Ford Papers- Final Issues for Decision, Army Corps of Engineers- 12/4/74(Gerald Ford Library)(1554461).png" and maybe the slashes converted to underscores or whatever, but the filename gets truncated instead.
Jesús Martínez Novo -- wow! Thank you for that very complete set of steps to reproduce, including the form! I'm asking a few more people to take a look at this. How often does the average media file have a slash in its filename, though? Is this a tiny case? Can we check for files on Commons that already have slashes in their filenames, if we permit that?
This isn't really an API bug, as it happens via Special:Upload as well. The root cause is that wfStripIllegalFilenameChars explicitly strips anything up to and including the last "/" character. But I'm not entirely sure this is a bug at all, rather than a case of "slashes are not allowed in filenames". If anything I suppose it could be changed to treat slashes as it does other characters (which would result in a filename of "Gerald Ford Papers- Final Issues for Decision, Army Corps of Engineers- 12-4-74(Gerald Ford Library)(1554461).png" for the example here), but before changing behavior that goes back in one form or another to at least *2003*[1] I'd want to get input from people more familiar with the file handling code. [1] https://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/specials/SpecialUpload.php?revision=1284&view=markup#l52
The idea behind the current behavior is that if people submit the full path name as file name, e.g., C:\Documents\Picture.jpg, MediaWiki chooses the "Picture.jpg" correctly as file name. As we see in this bug, there are also cases where the (back)slashes should be replaced by other characters. I'm not sure which case is most prevalent, but I would be in favor of keeping the current behavior (e.g., close WONTFIX).
@Bryan, if people submit the full path, then you could check if a drive letter is included. If it's not, then replace with dashes. Anyhow, the filepath shouldn't be submitted...only the filename. At the very minimum, the proper filename should be should be returned for badfilename.
(In reply to comment #14) > @Bryan, if people submit the full path, then you could check if a drive letter > is included. That's very Windows-centric. Users of other operating systems, such as OS X, don't have drive letters but might still submit the full path. > Anyhow, the filepath shouldn't be submitted...only the filename. Obviously. But users do strange things sometimes. > At the very minimum, the proper filename should be should be returned for > badfilename. Define "proper". You know the name you passed in, and it returns to you the name it would use. The former is not proper or it wouldn't be a problem, while you seem to consider the latter improper too.
@Brad Jorsch You're right that many OSs (including many Linux distros) don't use a drive letter. Also...you can't underestimate the ingenuity of the users. As such, you can close as WONTFIX. I'll add a section for this bug at http://www.mediawiki.org/wiki/API:Upload
Ok, closing as WONTFIX.