Last modified: 2012-10-14 22:07:43 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T34130, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 32130 - Unusable md5sums for the latest dumps
Unusable md5sums for the latest dumps
Status: NEW
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Ariel T. Glenn
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-11-01 21:32 UTC by Strainu
Modified: 2012-10-14 22:07 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Strainu 2011-11-01 21:32:20 UTC
The md5sums file for the /latest/ folders contain file names in the format LANG-DATE-XXX, but the files in the folder are named LANG-latest-XXX. This means that one cannot use md5sum -c FILE on that particular file.

This is true for all the languages and projects.
Comment 1 Platonides 2011-11-01 22:01:00 UTC
I'm not sure that's a bad idea. If the latest folder links redirected to the long form, we wouldn't then find files with latest name which you don't know its timestamp.
Comment 2 Strainu 2011-11-01 22:08:28 UTC
The "latest" keyword allows for simple downloading scripts. For my usage, I don't care one bit about the actual date of the dump. I just want to download it, extract and apply it. If the archive is actually the latest, i'm ok with that.

However, the current bug is not about the dump folder structure, but about the mismatch between the file names from the folder and the file names from the md5sums file. This need to be fixed one way or the other, in order for the md5sums to be usable.
Comment 3 Zach 2012-10-12 22:04:57 UTC
(In reply to comment #2)
> However, the current bug is not about the dump folder structure, but about the
> mismatch between the file names from the folder and the file names from the
> md5sums file. This need to be fixed one way or the other, in order for the
> md5sums to be usable.

There is another layer to the mismatch beyond the file naming pattern. If all of the LANG-latest-XXX files were renamed with the LANG-DATE-XXX format of the files they point to, then at least at certain times (early in the month?) we would see a mixture of LANG-DATE_IN_THIS_MONTH-XXX dumps and LANG-DATE_IN_LAST_MONTH-XXX dumps in /latest. However, until all dumps from this month have been completed, LANG-latest-md5sums.txt will point to LANG-DATE_IN_LAST_MONTH-md5sums.txt which will only contain the hashes from last month's dumps.

In other words, the xml dumps in /latest "roll over"/update *individually* as soon as they are completed, but the md5sums file only updates when all dumps from that month have been completed.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links