Last modified: 2012-03-01 10:50:34 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T26073, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 24073 - cannot upload ms word 2007 files
cannot upload ms word 2007 files
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Uploading (Other open bugs)
1.15.x
All All
: Normal minor (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-06-22 15:54 UTC by Michael Gsandtner
Modified: 2012-03-01 10:50 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
word 2007 testdocument (443.00 KB, application/octet-stream)
2010-06-23 06:13 UTC, Michael Gsandtner
Details
Actual docx file (357.68 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2010-06-23 15:21 UTC, OverlordQ
Details
gifar cleanup (6.52 KB, patch)
2010-06-30 19:33 UTC, Derk-Jan Hartman
Details

Description Michael Gsandtner 2010-06-22 15:54:19 UTC
Trying to upload a .doc file generated with Microsoft Word 2007 results in:
"The file is corrupt or has an incorrect extension"

The Logfile:
MimeMagic::doGuessMimeType: ZIP header present at end of /tmp/phpq31oe6
MimeMagic::detectZipType: /^mimetype(application\/vnd\.oasis\.opendocument\.(?:chart-template|chart|formula-template|formu
la|graphics-template|graphics|image-template|image|presentation-template|presentation|spreadsheet-template|spreadsheet|tex
t-template|text-master|text-web|text))/
MimeMagic::detectZipType: unable to identify type of ZIP archive
MimeMagic::guessMimeType: final mime type of /tmp/phpq31oe6: application/zip


mime: <application/zip> extension: <doc>

UploadForm::verifyExtension: mime type application/zip mismatches file extension doc, rejecting file

This seems to be known, as http://www.mediawiki.org/wiki/Manual:$wgMimeDetectorCommand states "For example, 1.15.3 may misdetect .doc-files from MS Word 2007 as ZIP files", but I cannot find a corresponding bug.

23688, 23642, 18684 do not solve the problem.
Comment 1 Bryan Tong Minh 2010-06-22 16:59:01 UTC
Can you try the current 1.17alpha SVN version?

cc. TheDJ
Comment 2 Michael Gsandtner 2010-06-22 18:10:47 UTC
Behaviour does not change with 1.17alpha:

FileCache negative MISS for Testbericht_V02.doc
File::getPropsFromPath: Getting file info for /tmp/phpW9YCXV
MimeMagic::__construct: loading mime types from /magwien/var/gondor-phpserver/html/wiki-ma48/includes/mime.types
MimeMagic::__construct: loading mime info from /magwien/var/gondor-phpserver/html/wiki-ma48/includes/mime.info
MimeMagic::doGuessMimeType: ZIP header present at end of /tmp/phpW9YCXV
MimeMagic::detectZipType: /^mimetype(application\/vnd\.oasis\.opendocument\.(?:chart-template|chart|formula-template|formula|graphics-template|graphics|image-template|image|presentation-template|presentation|spreadsheet-template|spreadsheet|text-template|text-master|text-web|text))/
MimeMagic::detectZipType: unable to identify type of ZIP archive
MimeMagic::guessMimeType: final mime type of /tmp/phpW9YCXV: application/zip
MediaHandler::getHandler: no handler found for application/zip.
File::getPropsFromPath: /tmp/phpW9YCXV loaded, 453632 bytes, application/zip.
MacBinary::loadHeader: header bytes 0 and 74 not null
MimeMagic::doGuessMimeType: ZIP header present at end of /tmp/phpW9YCXV
MimeMagic::detectZipType: /^mimetype(application\/vnd\.oasis\.opendocument\.(?:chart-template|chart|formula-template|formula|graphics-template|graphics|image-template|image|presentation-template|presentation|spreadsheet-template|spreadsheet|text-template|text-master|text-web|text))/
MimeMagic::detectZipType: unable to identify type of ZIP archive
MimeMagic::guessMimeType: final mime type of /tmp/phpW9YCXV: application/zip


mime: <application/zip> extension: <doc>

UploadForm::verifyExtension: mime type application/zip mismatches file extension doc, rejecting file
Comment 3 Derk-Jan Hartman 2010-06-22 18:26:49 UTC
The extension for MS Office 2007 OpenXML documents is .docx not .doc

For this to work:
* rename the file to it's proper file extension
* you have to have a 1.17 checkout
* overwrite $wgMimeTypeBlacklist, so that application/x-opc+zip is not in the list
* Add .docx to the list of allowed filetype extensions. $wgFileExtensions

Although I have to say, that i'm expecting to see "detected an Open Packaging Conventions archive:" for these types of files in debug.
Comment 4 Michael Gsandtner 2010-06-23 06:13:51 UTC
Created attachment 7497 [details]
word 2007 testdocument
Comment 5 Michael Gsandtner 2010-06-23 06:15:06 UTC
I did it exactly as you described, here the debug:

File::getPropsFromPath: Getting file info for /tmp/phplRPqef
MimeMagic::__construct: loading mime types from /magwien/var/gondor-phpserver/html/mwiki/includes/mime.types
MimeMagic::__construct: loading mime info from /magwien/var/gondor-phpserver/html/mwiki/includes/mime.info
MimeMagic::doGuessMimeType: ZIP header present at end of /tmp/phplRPqef
MimeMagic::detectZipType: /^mimetype(application\/vnd\.oasis\.opendocument\.(?:chart-template|chart|formula-template|formula|graphics-template|graphics|image-template|image|presentation-template|presentation|spreadsheet-template|spreadsheet|text-template|text-master|text-web|text))/
MimeMagic::detectZipType: unable to identify type of ZIP archive
MimeMagic::guessMimeType: final mime type of /tmp/phplRPqef: application/zip
MediaHandler::getHandler: no handler found for application/zip.
File::getPropsFromPath: /tmp/phplRPqef loaded, 453632 bytes, application/zip.
MacBinary::loadHeader: header bytes 0 and 74 not null
MimeMagic::doGuessMimeType: ZIP header present at end of /tmp/phplRPqef
MimeMagic::detectZipType: /^mimetype(application\/vnd\.oasis\.opendocument\.(?:chart-template|chart|formula-template|formula|graphics-template|graphics|image-template|image|presentation-template|presentation|spreadsheet-template|spreadsheet|text-template|text-master|text-web|text))/
MimeMagic::detectZipType: unable to identify type of ZIP archive
MimeMagic::guessMimeType: final mime type of /tmp/phplRPqef: application/zip


mime: <application/zip> extension: <docx>

UploadBase::verifyExtension: mime type application/zip mismatches file extension docx, rejecting file

Perhaps you can take a look at the attached Testdocument, may be it is not in the format you expect.
Comment 6 OverlordQ 2010-06-23 06:23:51 UTC
(In reply to comment #3)
> Although I have to say, that i'm expecting to see "detected an Open Packaging
> Conventions archive:" for these types of files in debug.

That'd be kinda hard to do since it's just a zip file, it'd have to look inside the file to determine if it's just a zip or if it's a 'special' zip. That just opens a whole 'nother can of worms.
Comment 7 Derk-Jan Hartman 2010-06-23 12:08:36 UTC
Looking at this file, but it doesn't seem to be an openXML file to me. Will take some time to figure out what is going on. (zipped .doc perhaps ?)
Comment 8 OverlordQ 2010-06-23 15:21:26 UTC
Created attachment 7500 [details]
Actual docx file

Testbericht_V02.docx: Microsoft Office Document

If you rename it to .doc it opens fine in word so I'm thinking it's a normal Word Document, resaved as Word Document in Word 2007 and now it identifies as

Testbericht_V02.docx: Zip archive data, at least v2.0 to extract
Comment 9 Platonides 2010-06-23 15:25:13 UTC
I think that when saving in the old format, Word 2007 creates a kind of mixed format, by appending a zip structure to the .doc format.
warning [Testbericht_V02.docx]:  430308 extra bytes at beginning or within zipfile

Also see bug 23642 comment 5.
Comment 10 Derk-Jan Hartman 2010-06-23 20:32:17 UTC
Platonides is right. Basically, 2007 saves a .doc file, but appends a .zip with OPC index to it.

I'll add a check for this, by scanning for the magic bytes of older MS Office documents in some way.

http://www.garykessler.net/library/file_sigs.html
MSOffice header: D0 CF 11 E0 A1 B1 1A E1

Office subheaders at bytepos 512

EC A5 C1 00	 	[512 byte offset]
DOC	 	Word document subheader (MS Office)

FD FF FF FF nn 00 00 00	 	[512 byte offset]
PPT	 	PowerPoint presentation subheader (MS Office)
(where nn has been seen with values 0x0E, 0x1C, and 0x43)

FD FF FF FF nn 00	 	[512 byte offset] or
FD FF FF FF nn 02	 	[512 byte offset]
XLS	 	Excel spreadsheet subheader (MS Office)
(where nn = 0x10, 0x1F, 0x22, 0x23, 0x28, or 0x29)
Comment 11 p858snake 2010-06-24 01:03:31 UTC
Should we really be doing this? we don't allow openoffice files which are also zips because of security vulnerabilities which would be a bit weird if we preferred Word over OO.
Comment 12 Bryan Tong Minh 2010-06-24 08:50:17 UTC
(In reply to comment #11)
> Should we really be doing this? we don't allow openoffice files which are also
> zips because of security vulnerabilities which would be a bit weird if we
> preferred Word over OO.

Users who wish to enable OpenXML files, should be able to do so, just like with OpenOffice now.
Comment 13 Derk-Jan Hartman 2010-06-24 16:23:07 UTC
I got this working, but it is starting to become a bit of a mess. I'm considering introducing a new configuration variable to allow/disallow all zip types, because i already have:

ODF, OpenXML, MS Office+OPC zip trailer and setting all that up will start to become more difficult for each and every zip type. With a seperate option, we could just remove the zip and the fake opc mime from the mimeblacklist and adding a seperate config option will make documenting and explaining the risks of zip based fileformats on open websites a lot easier I think.
Comment 14 Platonides 2010-06-25 18:39:59 UTC
$wgAllowZipFilesWhichCouldCompromiseMyUsers ?

I'd like to have Special:Upload ask to remove the (apparently useless) zip trailer.
Comment 15 p858snake 2010-06-26 01:00:23 UTC
(In reply to comment #14)
> $wgAllowZipFilesWhichCouldCompromiseMyUsers ?
> 
> I'd like to have Special:Upload ask to remove the (apparently useless) zip
> trailer.
Would that not damage the files if people wanted to download and reopen them, some systems are very pedantic about the formatting of their files?
Comment 16 Michael Gsandtner 2010-06-26 06:25:04 UTC
Microsoft seems to create different .doc formats (2003, 2003 from 2007). Should not simply be seen this as a Microsoft bug, and longer be a mediawiki issue ?
Comment 17 Derk-Jan Hartman 2010-06-26 12:18:45 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > $wgAllowZipFilesWhichCouldCompromiseMyUsers ?
> > 
> > I'd like to have Special:Upload ask to remove the (apparently useless) zip
> > trailer.
> Would that not damage the files if people wanted to download and reopen them,
> some systems are very pedantic about the formatting of their files?

If I understand correctly, the OPC trailer stores information that can not be saved in the 2003 format. So it is a method of creating a 2003 compatible file that still has all the 2007 and later features of the original file when opened in 2007 or later. Actually kinda handy I have to say.

but yes, the idea would be $wgAllowUploadsOfZipFilesBecauseItrustMyUploaders or something.
Comment 18 Platonides 2010-06-26 13:50:10 UTC
> (In reply to comment #14)
> Would that not damage the files if people wanted to download and reopen them,
> some systems are very pedantic about the formatting of their files?

The newer Word still need to open pre-2007 files which don't have the trailer so no backwards compatibility issues there.
The provided trialer contains a "font Theme". That won't be a fundamental feature in most cases but some users might need it.

Note that while I support file stripping in certain cases, it should always happen with the uploader consent.
Comment 19 Bryan Tong Minh 2010-06-26 14:00:19 UTC
(In reply to comment #18)
> Note that while I support file stripping in certain cases, it should always
> happen with the uploader consent.
And the user should have the possibility to upload the unstripped file (if allowed by the site administrator).

A generic upload post processing API would be nice; other things like image rotation from EXIF info falls in that category as well.
Comment 20 Derk-Jan Hartman 2010-06-30 19:33:13 UTC
Created attachment 7534 [details]
gifar cleanup

A patch of what I am proposing:

1: Move zip and virus checks before mime checks
2: ZIP gifar check is now separate from mime checks
3: Added $wgAllowGIFARVulnerableFiles global variable
4: Add zip mime detection support for openxml trailers on 2003 Office files.

This will allow people to either choose to basically allow zip files uploads when they want. They would still need to whitelist filetypes, and in the case of actual zip files, they have to change the mime blacklist. But when setting $wgAllowGIFARVulnerableFiles=true and adding .doc .docx .odt to their whitelist, they will be able to upload such files none the less (and actual GIFAR files).

We could consider expanding on this to add a "best-effort" mode to detectGIFAR(), where it will only allow opendocument/openxml files, and disallow the rest, though that is somewhat of a fake security model in my opinion.
Comment 21 Derk-Jan Hartman 2010-07-02 12:12:31 UTC
Went with the original solution after all.

Fixed in r68873
Comment 22 Derk-Jan Hartman 2010-07-06 10:47:35 UTC
forgot to close the ticket
Comment 23 Antoine "hashar" Musso (WMF) 2012-03-01 10:50:34 UTC
Can people there have a look at Bug 34797 - Cannot upload Office 97-2003 DOC and XLS files

Seems a related issue :-)  Thanks!

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links