Last modified: 2014-10-11 13:40:03 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T40432, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 38432 - Uploading MS Word files doesn't work ("File extension does not match the detected MIME type of the file")
Uploading MS Word files doesn't work ("File extension does not match the dete...
Status: NEW
Product: MediaWiki
Classification: Unclassified
Uploading (Other open bugs)
1.19.1
All All
: Low normal with 2 votes (vote)
: ---
Assigned To: Mark A. Hershberger
:
: 34797 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-17 01:29 UTC by Sam Wilson
Modified: 2014-10-11 13:40 UTC (History)
11 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
A sample MS Word document that is giving the described error on upload. (21.50 KB, application/msword)
2012-07-18 03:09 UTC, Sam Wilson
Details
Patch to bypass xml document parser (734 bytes, patch)
2012-08-05 14:46 UTC, Derk-Jan Hartman
Details
An example java applet that would get through with this patch (712 bytes, application/x-forcedownload)
2013-04-28 00:45 UTC, Bawolff (Brian Wolff)
Details

Description Sam Wilson 2012-07-17 01:29:11 UTC
Uploading a .doc file to MW 1.19.1 (which, incidentally, isn't listed in the versions box here), I get the following error:

    File extension ".doc" does not match the detected MIME type of the file (application/zip).

I tried removing "doc" from application/msword in /includes/mime.types and then also adding it to application/zip (recommended at [1]).  The former edit did nothing; the latter resulted in:

    The file is a corrupt or otherwise unreadable ZIP file. It cannot be properly checked for security.

I tried keeping the above modifications to /includes/mime.types and adding $wgAllowJavaUploads = true; to LocalSettings.php, as recommended at [2], and uploading of these files works correctly.

However, this seems pretty hacky.  Surely allowing .doc files in $wgFileExtensions should be enough, and they shouldn't be treated as zip files or jars or anything?

--

[1] http://www.mediawiki.org/wiki/Manual_talk:Mime_type_detection#Fix_for_Uploading_MS_Word_2007_.28and_greater.29_Files
[2] http://www.mediawiki.org/wiki/Thread:Talk:MediaWiki_1.18/file_upload_error/reply
Comment 1 Derk-Jan Hartman 2012-07-17 09:01:48 UTC
The problem is that modern .doc files ARE actually zip files. Can you attach a specific file that has this behavior and detail the environment that you are running Mediawiki on  (OS, PHP version etc ?)

It's probably one of the external tools that is incorrectly identifying this as a zip file, so you will need to tweak something in the environment if you want Mediawiki to be able to properly identify the file.
Comment 2 Sam Wilson 2012-07-18 03:08:41 UTC
(In reply to comment #1)
> The problem is that modern .doc files ARE actually zip files. Can you attach a
> specific file that has this behavior and detail the environment that you are
> running Mediawiki on  (OS, PHP version etc ?)

Attached is a file that gives the "The file is a corrupt or otherwise unreadable ZIP file. It cannot be properly checked for security" error under the following configuration: 'doc' is listed as an extension for both application/msword and application/zip and $wgAllowJavaUploads is false (well, not set in LocalSettings.php, to be precise).

Environment is: Windows Server 2008, 64-bit; PHP 5.3.10; MediaWiki 1.19.1.

> It's probably one of the external tools that is incorrectly identifying this as
> a zip file, so you will need to tweak something in the environment if you want
> Mediawiki to be able to properly identify the file.

Let me know if I can provide any more information.  And thank you for your help!
Comment 3 Sam Wilson 2012-07-18 03:09:38 UTC
Created attachment 10854 [details]
A sample MS Word document that is giving the described error on upload.
Comment 4 Sumana Harihareswara 2012-07-27 13:57:53 UTC
Does this also happen on the currently running version of MW on WMF sites?  And does it happen on an install of master?
Comment 5 Sam Wilson 2012-07-30 00:01:34 UTC
None of the WMF sites permit the upload of MS Word files (so far as I know).

I'm installing from Git now, to let you know if it works on master.
Comment 6 Sam Wilson 2012-07-30 00:44:06 UTC
Okay, so uploading the above file to MW 1.20alpha (7ab935b) on PHP 5.3.8 and Apache still gives:

    The file is a corrupt or otherwise unreadable ZIP file. It cannot be
    properly checked for security.
Comment 7 Derk-Jan Hartman 2012-08-05 14:04:00 UTC
$ unzip test.doc

Archive:  test.doc
warning [test.doc]:  6034 extra bytes at beginning or within zipfile
  (attempting to process anyway)

I'm guessing it's these extra bytes, that are confusing our parser.
Comment 8 Derk-Jan Hartman 2012-08-05 14:12:11 UTC
The file starts with d0 cf 11 e0 a1 b1 1a e1

Which is the header for the old .doc 2003 filetype. This file is probably saved in compatibility mode as a 2003 .doc file, with an internal 2010 .docx file. Apparently our code finds the .zip header before it finds the .doc header.
Comment 9 Derk-Jan Hartman 2012-08-05 14:24:08 UTC
For future reference, info with even per version signature regexps of .doc files

http://beta.domd.info/category/mime-types/applicationmsword
Comment 10 Derk-Jan Hartman 2012-08-05 14:46:59 UTC
Created attachment 10936 [details]
Patch to bypass xml document parser

A patch that bypasses the zip format detector in case the file starts with an office Compound Document Format header.

This isn't a working patch yet, because the upload subsequently fails in the JAVA detector, with:
ZipDirectoryReader: Fatal error: trailing bytes after the end of the file comment


I'm not sure if this error is required to be fatal, it will have to be checked with Tim Starling.
Comment 11 Andre Klapper 2012-08-17 11:41:23 UTC
Thanks for the patch, Derk-Jan.

You can use Developer access
    https://www.mediawiki.org/wiki/Developer_access
to submit this as a Git branch directly into Gerrit:
    https://www.mediawiki.org/wiki/Git/Tutorial
Putting your branch in Git makes it easier for us to review it quickly.
Thanks again for your contribution!
Comment 12 Derk-Jan Hartman 2012-08-27 14:43:26 UTC
*** Bug 34797 has been marked as a duplicate of this bug. ***
Comment 13 Chris Steipp 2013-01-09 16:20:21 UTC
Has this patch been submitted in gerrit yet? It looks about right, so I don't think there should be a problem getting it merged. Feel free to add me as a reviewer.
Comment 14 Mark A. Hershberger 2013-01-17 02:06:40 UTC
Also http://beta.domd.info/pronom/fmt/40
Comment 15 Mark A. Hershberger 2013-01-17 05:08:45 UTC
I've got a small patch that you can look at: gerrit change If30b53dd

This WFM, but needs work.
Comment 16 Bawolff (Brian Wolff) 2013-04-28 00:45:26 UTC
Created attachment 12192 [details]
An example java applet that would get through with this patch

These tests are important to prevent uploading of java applets. Its fairly easy to make a java applet with the msword header, I've attached a "hello world" example. If you have a jar file handy, just prepend "\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1" to the beginning and run zip -A foo.jar (Using the standard zip utility that is usually on linux computers). If you really wanted you could probably even make a java applet that is a valid ms word doc.

-----
To fix this we would probably need to validate both parts of the file independantly (?). This is very similar to the issue with mixed pdf and odf files.
Comment 17 Gerrit Notification Bot 2013-06-16 20:35:34 UTC
https://gerrit.wikimedia.org/r/44379 (Gerrit Change If30b53dd9c05d92e64b893471b881ee34590ee5d) | change ABANDONED [by TheDJ]
Comment 18 John Mark Vandenberg 2014-10-11 13:40:03 UTC
Patch abandoned over a year ago.

fwiw, bug 31930 and bug 54105 suggest there are other false positives in the 'Prevent Java' detection algorithm.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links