Last modified: 2013-10-08 23:55:49 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T57363, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 55363 - Writer function inefficient mimetype dirent entry rewriting
Writer function inefficient mimetype dirent entry rewriting
Status: ASSIGNED
Product: openZIM
Classification: Unclassified
zimlib (Other open bugs)
unspecified
All All
: Normal enhancement
: ---
Assigned To: Kelson [Emmanuel Engelhart]
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-06 10:27 UTC by Kelson [Emmanuel Engelhart]
Modified: 2013-10-08 23:55 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Kelson [Emmanuel Engelhart] 2013-10-06 10:27:59 UTC
During the development of zimdiff/zimpatch we had the problem that two ZIM files were almost equal, except that the mimetypes were not sorted in the same way, so all dirent entry mimetype values were different.

This was an issue because zimpatches files were not equal to the original files. To avoid this, the zimlib forces currently the order of the mimetypes in the list in the header. They are sorted alphabeticaly.

Unfortunately, I see two problems with this:
* This changes the specification of the format (we still don't have changed anything in the specifications)
* The sorting of the mimetypes is done after all articles are inserted and this needs to rewrite all the dirent entries. Something which is really not efficient/elegant.

I think an alternative approach would be to allow to force the mime-type list before inserting the articles. This would shortcut the dynamic creation of this mime-type list and consequently avoid the two problems listed above.
Comment 1 Tommi Mäkitalo 2013-10-06 17:19:46 UTC
1. Sorting the mime types do not break the current specification. The specification just does not force the mime types to be sorted. If we specify the mime types, that they must be sorted, the previously created zim files break the specification. This is not really a big problem since even when the file break the sorting rule, they remain readable with the new zimlib.

2. Sorting is done after collecting the directory entries. The directory entries are held in memory anyways and hence I do not expect sorting to slow down generation of zim files. Compression the data is by far more expensive than sorting the directory entries. I see no need to change anything here.

Note that if we force the generator to deliver the mime types prior to directory entries, we break the interface of the generator and make it more difficult to implement the generator interface. Delivering the mime types may be even more expensive in the generator than in the zimcreator.
Comment 2 Kelson [Emmanuel Engelhart] 2013-10-06 18:01:12 UTC
I re-assign the bug to me. It's not clear here how we want to proceed.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links