Re: 10+hrs vs 15min because of just one index - Mailing list pgsql-performance

From Aaron Turner
Subject Re: 10+hrs vs 15min because of just one index
Date
Msg-id 1ca1c1410602121104p2c231d77u883871d7e2bddbfa@mail.gmail.com
Whole thread Raw
In response to Re: 10+hrs vs 15min because of just one index  ("Marc Morin" <marc@sandvine.com>)
List pgsql-performance
On 2/12/06, Marc Morin <marc@sandvine.com> wrote:
> From your config, a check point will be forced when
>
> (checkpoint_segments * 16 M) < rows * (8K/N*h + (1-h)*8K) * B
>
> Where h is the "hitrate" or correlation between the update scan and the
> index. Do you have a sense of what this is?

I know my checkpoints happen > 30 secs apart, since PG isn't
complaining in my log.  I have no clue what the correlation is.

> In the limits, we have 100%
> correlation or 0% correlation.  N is the lower cost of putting the
> change in the WAL entry, not sure what this is, but small, I am
> assuming, say N=100.  B is the average number of blocks changed per
> updated row (assume B=1.1 for your case, heap,serial index have very
> high correlation)
>
> In the 0% correlation case, each updated row will cause the index update
> to read/modify the block. The modified block will be entirely written to
> the WAL log.  After (30 * 16M) / (8K) / 1.1 ~ 55k rows, a checkpoint
> will be forced and all modified blocks in shared buffers will be written
> out.
>
> Increasing checkpoint_segments to 300 and seeing if that makes a
> difference. If so, the excessive WAL checkpoints are your issue. If
> performance is exactly the same, then I would assume that you have close
> to 0% correlation between the rows in the heap and index.

Ok, i'll have to give that a try.

> Can you increase shared_buffers? With a low correlation index, the only
> solution is to hold the working set of blocks in memory.  Also, make
> sure that the checkpoint segments are big enough for you to modify them
> in place, don't want checkpoints occurring....

I'll have to look at my memory usage on this server... with only 2GB
and a bunch of other processes running around I'm not sure if I can go
up much more without causing swapping.  Of course RAM is cheap...

> Note that the more updates you do, the larger the tables/index become
> and the worse the problem becomes.  Vacuuming the table is an "answer"
> but unfortunately, it tends to decrease correlation from our
> observations. :-(

Good to know.

> From our observations, dropping index and rebuilding them is not always
> practical, depends on your application; table will be exclusively locked
> during the transaction due to drop index.

Yep.  In my case it's not a huge problem right now, but I know it will
become a serious one sooner or later.

Thanks a lot Marc.  Lots of useful info.

--
Aaron Turner
http://synfin.net/

pgsql-performance by date:

Previous
From: "Marc Morin"
Date:
Subject: Re: 10+hrs vs 15min because of just one index
Next
From: Aaron Turner
Date:
Subject: Re: 10+hrs vs 15min because of just one index