Last modified: 2013-08-22 22:07:57 UTC
Currently, the pywikibot/compat repository takes 200M to clone, but after git gc ---aggressive, only 14M remains. It would be great if this can be done server-side, so people only have to clone 14M. For pywikibot/core (40M->14M) and pywikibot/i18n (5M->0.8M) this is also true, but the problem is less severe.
jgit gc runs weekly on all repositories already :\
A normal gc run might not be enough: $ git clone https://git.wikimedia.org/git/pywikibot/compat.git pwb-compat Cloning into 'pwb-compat'... remote: Counting objects: 437, done remote: Finding sources: 100% (136/136) remote: Getting sizes: 100% (102/102) remote: Compressing objects: 100% (4472758/4472758) remote: Total 37453 (delta 11), reused 37317 (delta 0) Receiving objects: 100% (37453/37453), 164.74 MiB | 14.07 MiB/s, done. Resolving deltas: 100% (24602/24602), done. $ cd pwb-compat/ $ du -hs . 171M . $ git gc Counting objects: 37453, done. Compressing objects: 100% (12606/12606), done. Writing objects: 100% (37453/37453), done. Total 37453 (delta 24602), reused 37453 (delta 24602) Checking connectivity: 37453, done. $ du -hs . 171M . $ git gc --aggressive Counting objects: 37453, done. Compressing objects: 100% (37208/37208), done. Writing objects: 100% (37453/37453), done. Total 37453 (delta 26626), reused 10434 (delta 0) $ du -hs . 14M . According to the docs, it uses a more aggressive repack command: "The optional configuration variable gc.aggressiveWindow controls how much time is spent optimizing the delta compression of the objects in the repository when the --aggressive option is specified. The larger the value, the more time is spent optimizing the delta compression. See the documentation for the --window' option in git-repack(1) for more details. This defaults to 250." I couldn't find anything on jgit gc/repack, except for the following, which is something that probably should be considered before running a repack outside of jgit: https://code.google.com/p/gerrit/issues/detail?id=81
Jgit gc is nice because it generates bitmaps that make clones and large fetches wayyyyy faster. Sometimes you need an aggressive repack though, jgit doesn't do that within gerrit. I did it, it's now at 14M on disk.
You are my hero. Now a clone is just 8 MB! :-)