Thread: Turn on git pack bitmaps?

Turn on git pack bitmaps?

From
Andres Freund
Date:
Hi,

Right now cloning/refreshing a repository often spends a fair amount of
its time in the 'Counting objects' stage. Since git 2.0 git has a
'bitmap index' feature for packs. I wonder if that could be enabled on
gitmaster/git.pg.o? Saves both time and CPU.

# enable it everywhere
git config --global pack.writebitmaps on
git config --global pack.writeBitmapHashCache on
git config --global pack.threads 0 # make gc/repack faster, autodetect cpus

# and write it by repacking
git gc --aggressive --no-prune

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Turn on git pack bitmaps?

From
Magnus Hagander
Date:
On Thu, Feb 19, 2015 at 4:41 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Hi,

Right now cloning/refreshing a repository often spends a fair amount of
its time in the 'Counting objects' stage. Since git 2.0 git has a
'bitmap index' feature for packs. I wonder if that could be enabled on
gitmaster/git.pg.o? Saves both time and CPU.

We do pretty strongly want to stick to the packaged versions of git (and other software) to keep our work at reasonable levels. And right now,. only 1.7 is packaged for Debian Wheezy which is what we run. 2.0 isn't even available in backports. However, it will become 2.1 once we can migrate to Jessie, but it's not released yet.

So unless it's something really critical, which I don't think it is (though it does sound like a nice convenience), I would suggest we wait until we go to Jessie, and then enable it.


# enable it everywhere
git config --global pack.writebitmaps on
git config --global pack.writeBitmapHashCache on
git config --global pack.threads 0 # make gc/repack faster, autodetect cpus

# and write it by repacking
git gc --aggressive --no-prune


Is that something that would have to be done in a cronjob repeatedly, or are you saying you have to run that once and then it'll be enabled for the future? 


--

Re: Turn on git pack bitmaps?

From
Andres Freund
Date:
On 2015-02-19 16:50:56 +0100, Magnus Hagander wrote:
> On Thu, Feb 19, 2015 at 4:41 PM, Andres Freund <andres@2ndquadrant.com>
> wrote:
> 
> > Hi,
> >
> > Right now cloning/refreshing a repository often spends a fair amount of
> > its time in the 'Counting objects' stage. Since git 2.0 git has a
> > 'bitmap index' feature for packs. I wonder if that could be enabled on
> > gitmaster/git.pg.o? Saves both time and CPU.

> We do pretty strongly want to stick to the packaged versions of git (and
> other software) to keep our work at reasonable levels. And right now,. only
> 1.7 is packaged for Debian Wheezy which is what we run. 2.0 isn't even
> available in backports. However, it will become 2.1 once we can migrate to
> Jessie, but it's not released yet.

Hm. I somehow thought 2.1 were in bpo ;)

> So unless it's something really critical, which I don't think it is (though
> it does sound like a nice convenience), I would suggest we wait until we go
> to Jessie, and then enable it.

It sure isn't critical, just saves some time/cpu (server side) every now
and then.

> # enable it everywhere
> > git config --global pack.writebitmaps on
> > git config --global pack.writeBitmapHashCache on
> > git config --global pack.threads 0 # make gc/repack faster, autodetect cpus
> >
> > # and write it by repacking
> > git gc --aggressive --no-prune
> >
> >
> Is that something that would have to be done in a cronjob repeatedly, or
> are you saying you have to run that once and then it'll be enabled for the
> future?

It'll stay enabled/in effect for the future. But it'd be a good idea to
regularly schedule a repack independent of bitmaps - or enable automatic
gc's on push (not sure which version made that available). Right now
git.pg.o's postgresql.git is ~40MB bigger than my local one...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Turn on git pack bitmaps?

From
Magnus Hagander
Date:
On Thu, Feb 19, 2015 at 5:01 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2015-02-19 16:50:56 +0100, Magnus Hagander wrote:
> On Thu, Feb 19, 2015 at 4:41 PM, Andres Freund <andres@2ndquadrant.com>
> wrote:
>
> > Hi,
> >
> > Right now cloning/refreshing a repository often spends a fair amount of
> > its time in the 'Counting objects' stage. Since git 2.0 git has a
> > 'bitmap index' feature for packs. I wonder if that could be enabled on
> > gitmaster/git.pg.o? Saves both time and CPU.

> We do pretty strongly want to stick to the packaged versions of git (and
> other software) to keep our work at reasonable levels. And right now,. only
> 1.7 is packaged for Debian Wheezy which is what we run. 2.0 isn't even
> available in backports. However, it will become 2.1 once we can migrate to
> Jessie, but it's not released yet.

Hm. I somehow thought 2.1 were in bpo ;)

> So unless it's something really critical, which I don't think it is (though
> it does sound like a nice convenience), I would suggest we wait until we go
> to Jessie, and then enable it.

It sure isn't critical, just saves some time/cpu (server side) every now
and then.

Ok - then let's postpone it until we go jessie.


> # enable it everywhere
> > git config --global pack.writebitmaps on
> > git config --global pack.writeBitmapHashCache on
> > git config --global pack.threads 0 # make gc/repack faster, autodetect cpus
> >
> > # and write it by repacking
> > git gc --aggressive --no-prune
> >
> >
> Is that something that would have to be done in a cronjob repeatedly, or
> are you saying you have to run that once and then it'll be enabled for the
> future?

It'll stay enabled/in effect for the future. But it'd be a good idea to
regularly schedule a repack independent of bitmaps - or enable automatic
gc's on push (not sure which version made that available). Right now
git.pg.o's postgresql.git is ~40MB bigger than my local one...

Well, it does them now and then, IIRC? We haven't set up a cronjob for it, but I'm fairly certain I've gotten stuck with it doing GC during a commit someitme...

--

Re: Turn on git pack bitmaps?

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> On Thu, Feb 19, 2015 at 5:01 PM, Andres Freund <andres@2ndquadrant.com>
> wrote:
>> It'll stay enabled/in effect for the future. But it'd be a good idea to
>> regularly schedule a repack independent of bitmaps - or enable automatic
>> gc's on push (not sure which version made that available). Right now
>> git.pg.o's postgresql.git is ~40MB bigger than my local one...

> Well, it does them now and then, IIRC? We haven't set up a cronjob for it,
> but I'm fairly certain I've gotten stuck with it doing GC during a commit
> someitme...

Yeah, I got one just a couple of minutes ago.

[postgres@sss1 pgsql]$ git push   
Counting objects: 74, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (24/24), done.
Writing objects: 100% (24/24), 6.61 KiB | 0 bytes/s, done.
Total 24 (delta 20), reused 0 (delta 0)
Auto packing the repository for optimum performance.
To ssh://git@gitmaster.postgresql.org/postgresql.git  c86f8f3..83c3115  REL9_2_STABLE -> REL9_2_STABLE
a196e67..f389b6e REL9_3_STABLE -> REL9_3_STABLE  66463a3..9c15a77  REL9_4_STABLE -> REL9_4_STABLE  64235fe..b26e208
master-> master
 

There was a noticeable delay after the "auto packing" message.  I don't
mind this happening once in awhile, but I don't see a need to do it
on every commit; please let's not have auto-gc-on-commit.  No opinion
about whether a cron job would be worth the trouble.
        regards, tom lane



Re: Turn on git pack bitmaps?

From
Andres Freund
Date:
On 2015-02-21 13:34:06 -0500, Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
> > On Thu, Feb 19, 2015 at 5:01 PM, Andres Freund <andres@2ndquadrant.com>
> > wrote:
> >> It'll stay enabled/in effect for the future. But it'd be a good idea to
> >> regularly schedule a repack independent of bitmaps - or enable automatic
> >> gc's on push (not sure which version made that available). Right now
> >> git.pg.o's postgresql.git is ~40MB bigger than my local one...
> 
> > Well, it does them now and then, IIRC? We haven't set up a cronjob for it,
> > but I'm fairly certain I've gotten stuck with it doing GC during a commit
> > someitme...

Those don't have the same effect as a full repack though, as they only
pack recently created objects into a new pack file and occasionally
collapse pack files. Doing a full repack, with more aggressive settings,
can considerably shrink the repository.

> Yeah, I got one just a couple of minutes ago.
> 
> There was a noticeable delay after the "auto packing" message.

That's gone in a newer git btw, it moves that automatically into the
background.

> mind this happening once in awhile, but I don't see a need to do it
> on every commit; please let's not have auto-gc-on-commit.

Yea, doing it on every commit wouldn't make any sense.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services