Last modified: 2013-08-22 22:07:57 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T55033, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 53033 - Compress / git gc pywikibot repositories
Compress / git gc pywikibot repositories
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
Git/Gerrit (Other open bugs)
wmf-deployment
All All
: Unprioritized normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-19 13:09 UTC by Merlijn van Deen (test)
Modified: 2013-08-22 22:07 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Merlijn van Deen (test) 2013-08-19 13:09:58 UTC
Currently, the pywikibot/compat repository takes 200M to clone, but after git gc ---aggressive, only 14M remains. It would be great if this can be done server-side, so people only have to clone 14M.

For pywikibot/core (40M->14M) and pywikibot/i18n (5M->0.8M) this is also true, but the problem is less severe.
Comment 1 Chad H. 2013-08-19 13:23:27 UTC
jgit gc runs weekly on all repositories already :\
Comment 2 Merlijn van Deen (test) 2013-08-19 13:49:59 UTC
A normal gc run might not be enough:



$ git clone https://git.wikimedia.org/git/pywikibot/compat.git pwb-compat
Cloning into 'pwb-compat'...
remote: Counting objects: 437, done
remote: Finding sources: 100% (136/136)
remote: Getting sizes: 100% (102/102)
remote: Compressing objects: 100% (4472758/4472758)
remote: Total 37453 (delta 11), reused 37317 (delta 0)
Receiving objects: 100% (37453/37453), 164.74 MiB | 14.07 MiB/s, done.
Resolving deltas: 100% (24602/24602), done.
$ cd pwb-compat/
$ du -hs .
171M    .
$ git gc
Counting objects: 37453, done.
Compressing objects: 100% (12606/12606), done.
Writing objects: 100% (37453/37453), done.
Total 37453 (delta 24602), reused 37453 (delta 24602)
Checking connectivity: 37453, done.
$ du -hs .
171M    .
$ git gc --aggressive
Counting objects: 37453, done.
Compressing objects: 100% (37208/37208), done.
Writing objects: 100% (37453/37453), done.
Total 37453 (delta 26626), reused 10434 (delta 0)
$ du -hs .
14M     .


According to the docs, it uses a more aggressive repack command:

"The optional configuration variable gc.aggressiveWindow controls how much time is spent optimizing the delta compression of the objects in the repository when the --aggressive option is specified. The larger the value, the more time is spent optimizing the delta compression. See the documentation for the --window' option in git-repack(1) for more details. This defaults to 250."


I couldn't find anything on jgit gc/repack, except for the following, which is something that probably should be considered before running a repack outside of jgit:

https://code.google.com/p/gerrit/issues/detail?id=81
Comment 3 Chad H. 2013-08-22 17:35:11 UTC
Jgit gc is nice because it generates bitmaps that make clones and large fetches wayyyyy faster.

Sometimes you need an aggressive repack though, jgit doesn't do that within gerrit.

I did it, it's now at 14M on disk.
Comment 4 Merlijn van Deen (test) 2013-08-22 22:07:57 UTC
You are my hero. Now a clone is just 8 MB! :-)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links