Last modified: 2011-02-06 15:35:55 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T16631, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 14631 - Include uncompressed sizes in dump file RSS info
Include uncompressed sizes in dump file RSS info
Status: NEW
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Ariel T. Glenn
http://download.wikipedia.org/enwikip...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-06-24 17:15 UTC by Andrew Dunbar
Modified: 2011-02-06 15:35 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Andrew Dunbar 2008-06-24 17:15:50 UTC
It would be handy to have info in the RSS feeds on the full size in bytes of each compressed file.

Bzip2 archives in particular provide no interface for revealing a files uncompressed size.

I'm working on a Firefox extension which could use this information unavailable elsewhere to provide a progress bar when decompressing large files.

The current RSS feeds are very minimal and there is plenty of space to include the extra information which should be trivially available to the scripts which create the dump download areas.
Comment 1 Brion Vibber 2008-06-24 18:26:30 UTC
Note that uncompressed size is not currently available. It should be possible to make a little wrapper tool to pipe the data through before compression which will count bytes and save it, which could then be pulled to the report & RSS outputs.
Comment 2 Melancholie 2008-06-25 03:59:28 UTC
Sorry for that silly question, but where can I find this RSS feed?
There is no feed <link>ed at download.wikimedia.org.
Comment 3 Andrew Dunbar 2008-06-25 17:22:46 UTC
There is one feed per file per project. Oddly they are in a place whch doesn't seem to have any links to the outside world. I had to ask people on the dev IRC channel to find out about it:

http://download.wikipedia.org/enwikipedia/latest/

Comment 4 Andrew Dunbar 2008-06-28 13:55:36 UTC
For bzip2 files at least the uncompressed file size is available without the wrapper tool Brion suggests. Simply providing the -v switch will provide the details to stderr. I don't yet grok the code in backuup/worker.py but it should be easy to parse the verbose reply. Example follows:

  (stdin):  1.512:1,  5.291 bits/byte, 33.87% saved, 688 in, 455 out.
Comment 5 Andrew Dunbar 2009-05-16 11:22:31 UTC
See also bug 6064

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links