Last modified: 2013-10-06 10:29:05 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T54324, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 52324 - zimwriter should sort the mime types it gets before writing to file.
zimwriter should sort the mime types it gets before writing to file.
Status: RESOLVED FIXED
Product: openZIM
Classification: Unclassified
zimwriter (Other open bugs)
unspecified
All All
: High blocker
: ---
Assigned To: Tommi Mäkitalo
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-31 14:16 UTC by Kiran Mathew Koshy
Modified: 2013-10-06 10:29 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Kiran Mathew Koshy 2013-07-31 14:16:58 UTC
Currently, zimwriter simply makes a list of mime types and assigns the mime codes to each. It would be better if the mime codes are sorted before they are assigned the codes, and written to the file. This would be ensure uniformity. 

For example, in zimpatch, the output file( obtained from the patch file and start_file) has different library mime types, since the articles were fed to zimwriter in a different order.
Comment 1 Kiran Mathew Koshy 2013-08-13 22:02:15 UTC
Issue has been resolved.

However, since this will be the new standard, it will be required to regenerate older zim files in order to obtain checksum match during zimdiff/zimpatch.

https://gerrit.wikimedia.org/r/#/c/79021/
Comment 2 Andre Klapper 2013-08-14 09:08:15 UTC
(In reply to comment #1)
> Issue has been resolved.

If the issue has really been resolved (how?) and a potential fix has been *merged* into the codebase, feel free to set RESOLVED status here. :)
Comment 3 Kiran Mathew Koshy 2013-08-14 10:56:58 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > Issue has been resolved.
> 
> If the issue has really been resolved (how?) 

The issue has been resolved. Well, at least it worked on my system. If you found any errors, please comment.

The objective was to ensure that zimlib sorts the MIME types before writing them to the file. According to the existing zimlib code, the articles obtained using getNextArticle() function are collected and stored in dirents, in the order in which they are obtained. The problem was that the list of MIME Types is created in the order in which the articles are sent to zimlib, and the LibraryMimeType code is added to each dirent immediately.

The patch I wrote creates a new sorted list of MIME Types just before writing to file, and creates a mapping between the old and new LibraryMimeType. All the dirents are visited again and their LIbrarYMimeTypes are updated using the mapping.

> and a potential fix has been
> *merged* into the codebase, 

Well, it hasn't been merged into the codebase. I sent it for review, but it hasn't been approved yet. Tommi manages the commits, and he is on vacation.

> feel free to set RESOLVED status here. :)
I will, as soon as it is approved.
Comment 4 Andre Klapper 2013-08-14 12:34:38 UTC
I see, different interpretations of words. In Bugzilla, "resolved" does not mean "I wrote a patch and it works for me", but "patch has been reviewed and merged". Hence my confusion.
Comment 5 Kelson [Emmanuel Engelhart] 2013-08-14 19:52:22 UTC
#Tommi
May you please check that everything is OK with Kiran's fix before closing it?
Comment 6 Kelson [Emmanuel Engelhart] 2013-09-03 18:21:49 UTC
Code to review is here:
https://gerrit.wikimedia.org/r/#/c/79021/
Comment 7 Kelson [Emmanuel Engelhart] 2013-09-23 18:54:52 UTC
This code was reviewed and merged. But IMO the solution is buggy because new generated files have a wrong checksum.

I can not explain why, but it seems to me that these files are bigger than they should (because header checksum position is not equal anymore to filesize-16).

In addition, I ask myself if the method of sorting the mimetypes is the good one. This forces to loop through all dirents during the file creation (to change the mimetype id), something which is not very elegant.

Maybe a better approach would be to allow to force a certain mimetype header at the beginning of the file creation process. In that case, the list of mime-types would not be created dynamically during the article insertion and we would have fixed the problem we have with zimdiff/zimpatch.

In any case, I think we should rollback or fix this, because the zimlib is currently somehow "broken".
Comment 8 Kiran Mathew Koshy 2013-10-02 11:04:05 UTC
The problem seems to be more severe- The test program createzim-t in test folder in zimwriter now gives a segmentation fault during write process.
Comment 9 Tommi Mäkitalo 2013-10-02 12:20:36 UTC
It is not that severe then any more. If it is easily reproducible the bug is much easier to find.

The whole problem was, that the remapping tried to fetch a mapped mime type for directory entries, which do not have mime types. There are special mime types "redirectMimeType", "linktargetMimeType" and "deletedMimeType", which have the values 0xfffd-0xffff. They must not be used when accessing the mapping vector.
Comment 10 Kelson [Emmanuel Engelhart] 2013-10-06 10:29:05 UTC
Nice! I confirm the problem is fixed. I have open a new feature request to try to improve/fix this perfectible approach regarding the mime-types:
https://bugzilla.wikimedia.org/show_bug.cgi?id=55363

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links