Thread: Turn on git pack bitmaps?
Hi, Right now cloning/refreshing a repository often spends a fair amount of its time in the 'Counting objects' stage. Since git 2.0 git has a 'bitmap index' feature for packs. I wonder if that could be enabled on gitmaster/git.pg.o? Saves both time and CPU. # enable it everywhere git config --global pack.writebitmaps on git config --global pack.writeBitmapHashCache on git config --global pack.threads 0 # make gc/repack faster, autodetect cpus # and write it by repacking git gc --aggressive --no-prune Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Feb 19, 2015 at 4:41 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Hi,
Right now cloning/refreshing a repository often spends a fair amount of
its time in the 'Counting objects' stage. Since git 2.0 git has a
'bitmap index' feature for packs. I wonder if that could be enabled on
gitmaster/git.pg.o? Saves both time and CPU.
We do pretty strongly want to stick to the packaged versions of git (and other software) to keep our work at reasonable levels. And right now,. only 1.7 is packaged for Debian Wheezy which is what we run. 2.0 isn't even available in backports. However, it will become 2.1 once we can migrate to Jessie, but it's not released yet.
So unless it's something really critical, which I don't think it is (though it does sound like a nice convenience), I would suggest we wait until we go to Jessie, and then enable it.
# enable it everywhere
git config --global pack.writebitmaps on
git config --global pack.writeBitmapHashCache on
git config --global pack.threads 0 # make gc/repack faster, autodetect cpus
# and write it by repacking
git gc --aggressive --no-prune
Is that something that would have to be done in a cronjob repeatedly, or are you saying you have to run that once and then it'll be enabled for the future?
On 2015-02-19 16:50:56 +0100, Magnus Hagander wrote: > On Thu, Feb 19, 2015 at 4:41 PM, Andres Freund <andres@2ndquadrant.com> > wrote: > > > Hi, > > > > Right now cloning/refreshing a repository often spends a fair amount of > > its time in the 'Counting objects' stage. Since git 2.0 git has a > > 'bitmap index' feature for packs. I wonder if that could be enabled on > > gitmaster/git.pg.o? Saves both time and CPU. > We do pretty strongly want to stick to the packaged versions of git (and > other software) to keep our work at reasonable levels. And right now,. only > 1.7 is packaged for Debian Wheezy which is what we run. 2.0 isn't even > available in backports. However, it will become 2.1 once we can migrate to > Jessie, but it's not released yet. Hm. I somehow thought 2.1 were in bpo ;) > So unless it's something really critical, which I don't think it is (though > it does sound like a nice convenience), I would suggest we wait until we go > to Jessie, and then enable it. It sure isn't critical, just saves some time/cpu (server side) every now and then. > # enable it everywhere > > git config --global pack.writebitmaps on > > git config --global pack.writeBitmapHashCache on > > git config --global pack.threads 0 # make gc/repack faster, autodetect cpus > > > > # and write it by repacking > > git gc --aggressive --no-prune > > > > > Is that something that would have to be done in a cronjob repeatedly, or > are you saying you have to run that once and then it'll be enabled for the > future? It'll stay enabled/in effect for the future. But it'd be a good idea to regularly schedule a repack independent of bitmaps - or enable automatic gc's on push (not sure which version made that available). Right now git.pg.o's postgresql.git is ~40MB bigger than my local one... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Feb 19, 2015 at 5:01 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Well, it does them now and then, IIRC? We haven't set up a cronjob for it, but I'm fairly certain I've gotten stuck with it doing GC during a commit someitme...On 2015-02-19 16:50:56 +0100, Magnus Hagander wrote:
> On Thu, Feb 19, 2015 at 4:41 PM, Andres Freund <andres@2ndquadrant.com>
> wrote:
>
> > Hi,
> >
> > Right now cloning/refreshing a repository often spends a fair amount of
> > its time in the 'Counting objects' stage. Since git 2.0 git has a
> > 'bitmap index' feature for packs. I wonder if that could be enabled on
> > gitmaster/git.pg.o? Saves both time and CPU.
> We do pretty strongly want to stick to the packaged versions of git (and
> other software) to keep our work at reasonable levels. And right now,. only
> 1.7 is packaged for Debian Wheezy which is what we run. 2.0 isn't even
> available in backports. However, it will become 2.1 once we can migrate to
> Jessie, but it's not released yet.
Hm. I somehow thought 2.1 were in bpo ;)
> So unless it's something really critical, which I don't think it is (though
> it does sound like a nice convenience), I would suggest we wait until we go
> to Jessie, and then enable it.
It sure isn't critical, just saves some time/cpu (server side) every now
and then.
Ok - then let's postpone it until we go jessie.
> # enable it everywhere
> > git config --global pack.writebitmaps on
> > git config --global pack.writeBitmapHashCache on
> > git config --global pack.threads 0 # make gc/repack faster, autodetect cpus
> >
> > # and write it by repacking
> > git gc --aggressive --no-prune
> >
> >
> Is that something that would have to be done in a cronjob repeatedly, or
> are you saying you have to run that once and then it'll be enabled for the
> future?
It'll stay enabled/in effect for the future. But it'd be a good idea to
regularly schedule a repack independent of bitmaps - or enable automatic
gc's on push (not sure which version made that available). Right now
git.pg.o's postgresql.git is ~40MB bigger than my local one...
Magnus Hagander <magnus@hagander.net> writes: > On Thu, Feb 19, 2015 at 5:01 PM, Andres Freund <andres@2ndquadrant.com> > wrote: >> It'll stay enabled/in effect for the future. But it'd be a good idea to >> regularly schedule a repack independent of bitmaps - or enable automatic >> gc's on push (not sure which version made that available). Right now >> git.pg.o's postgresql.git is ~40MB bigger than my local one... > Well, it does them now and then, IIRC? We haven't set up a cronjob for it, > but I'm fairly certain I've gotten stuck with it doing GC during a commit > someitme... Yeah, I got one just a couple of minutes ago. [postgres@sss1 pgsql]$ git push Counting objects: 74, done. Delta compression using up to 8 threads. Compressing objects: 100% (24/24), done. Writing objects: 100% (24/24), 6.61 KiB | 0 bytes/s, done. Total 24 (delta 20), reused 0 (delta 0) Auto packing the repository for optimum performance. To ssh://git@gitmaster.postgresql.org/postgresql.git c86f8f3..83c3115 REL9_2_STABLE -> REL9_2_STABLE a196e67..f389b6e REL9_3_STABLE -> REL9_3_STABLE 66463a3..9c15a77 REL9_4_STABLE -> REL9_4_STABLE 64235fe..b26e208 master-> master There was a noticeable delay after the "auto packing" message. I don't mind this happening once in awhile, but I don't see a need to do it on every commit; please let's not have auto-gc-on-commit. No opinion about whether a cron job would be worth the trouble. regards, tom lane
On 2015-02-21 13:34:06 -0500, Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: > > On Thu, Feb 19, 2015 at 5:01 PM, Andres Freund <andres@2ndquadrant.com> > > wrote: > >> It'll stay enabled/in effect for the future. But it'd be a good idea to > >> regularly schedule a repack independent of bitmaps - or enable automatic > >> gc's on push (not sure which version made that available). Right now > >> git.pg.o's postgresql.git is ~40MB bigger than my local one... > > > Well, it does them now and then, IIRC? We haven't set up a cronjob for it, > > but I'm fairly certain I've gotten stuck with it doing GC during a commit > > someitme... Those don't have the same effect as a full repack though, as they only pack recently created objects into a new pack file and occasionally collapse pack files. Doing a full repack, with more aggressive settings, can considerably shrink the repository. > Yeah, I got one just a couple of minutes ago. > > There was a noticeable delay after the "auto packing" message. That's gone in a newer git btw, it moves that automatically into the background. > mind this happening once in awhile, but I don't see a need to do it > on every commit; please let's not have auto-gc-on-commit. Yea, doing it on every commit wouldn't make any sense. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services