Thread: lowering impact of checkpoints

lowering impact of checkpoints

From
hubert depesz lubaczewski
Date:
hi,
our system is handling between 600 and 2000 transactions per second. all
of them are very small, very fast. typical query runs in under 1ms.
yes - sometimes we get queries that take longer than then should get.
simple check shows that we have a very visible pattern of
every-5-minutes peak.
in the minute that there is checkpoint - we get usually 15-20 times more
queries "over 500 ms" than in other minutes.

we are using 8.2.4 (upgrade will be soon), with these settings:
# select name, setting, unit from pg_settings where name ~* 'bgwriter|wal|checkpoint';
         name          |  setting  |  unit
-----------------------+-----------+--------
 bgwriter_all_maxpages | 5         |
 bgwriter_all_percent  | 0.333     | [null]
 bgwriter_delay        | 200       | ms
 bgwriter_lru_maxpages | 5         |
 bgwriter_lru_percent  | 1         | [null]
 checkpoint_segments   | 32        |
 checkpoint_timeout    | 300       | s
 checkpoint_warning    | 30        | s
 wal_buffers           | 128       | 8kB
 wal_sync_method       | fdatasync | [null]
(10 rows)

is there anything i can change to make it "smoother"?

depesz

--
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

Re: lowering impact of checkpoints

From
Brad Nicholson
Date:
On Tue, 2007-09-25 at 11:58 +0200, hubert depesz lubaczewski wrote:
> hi,
> our system is handling between 600 and 2000 transactions per second. all
> of them are very small, very fast. typical query runs in under 1ms.
> yes - sometimes we get queries that take longer than then should get.
> simple check shows that we have a very visible pattern of
> every-5-minutes peak.
> in the minute that there is checkpoint - we get usually 15-20 times more
> queries "over 500 ms" than in other minutes.
>
> we are using 8.2.4 (upgrade will be soon), with these settings:
> # select name, setting, unit from pg_settings where name ~* 'bgwriter|wal|checkpoint';
>          name          |  setting  |  unit
> -----------------------+-----------+--------
>  bgwriter_all_maxpages | 5         |
>  bgwriter_all_percent  | 0.333     | [null]
>  bgwriter_delay        | 200       | ms
>  bgwriter_lru_maxpages | 5         |
>  bgwriter_lru_percent  | 1         | [null]
>  checkpoint_segments   | 32        |
>  checkpoint_timeout    | 300       | s
>  checkpoint_warning    | 30        | s
>  wal_buffers           | 128       | 8kB
>  wal_sync_method       | fdatasync | [null]
> (10 rows)
>
> is there anything i can change to make it "smoother"?

Sounds like bgwriter is not flushing dirty pages quickly enough, so
there is still a lot of work to do at checkpoint time.  You probably
need to tune it.  This can be a tough thing to do properly though.
There are no magic values to suggest, as what will work is highly
dependent on your hardware and your applications pattern of use.

If possible, up the settings for bgwriter_all_percent a *little* and
perhaps  bgwriter_all_maxpages and see if it helps.  You can change
these with a reload.  If you are doing this on a production system as
opposed to a test system, keep a close eye on what is going on, as it is
possible that you can make things worse.

I would start with something like 2% for bgwriter_all_maxpages and see
if that helps things out.

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


Re: lowering impact of checkpoints

From
Greg Smith
Date:
On Tue, 25 Sep 2007, hubert depesz lubaczewski wrote:

>         name          |  setting  |  unit
> -----------------------+-----------+--------
> bgwriter_all_maxpages | 5         |
> bgwriter_all_percent  | 0.333     | [null]
> bgwriter_delay        | 200       | ms
> bgwriter_lru_maxpages | 5         |
> bgwriter_lru_percent  | 1         | [null]

The background writer can help smooth out checkpoints a bit in 8.2.4, but
these settings are barely doing anything; they aren't even in the right
ballpark for a system that's doing 600-2000 TPS.  You need to be careful
here because making the background writer run too often can result in
things running slower all the time, and in some cases it can even make the
checkpoints worse.

I would suggest changing these parameters one at a time, in the following
order, and see what happens to the number of slow queries at checkpoint
time after each change:

bgwriter_lru_maxpages:  increase to 500
bgwriter_lru_percent:  increase to 5
bgwriter_all_maxpage:  increase to 250
bgwriter_all_percent:  increase to 2

If these are all positive changes, you might even want to increase these
further; potentially you could double all of the above and still not have
the settings high enough, and if that's the case you may have to adjust
bgwriter_delay downward.  Here is a message I'd suggest reading carefully
from someone who went through the process you're starting now and ended up
with a much more aggressive set of settings even than these:

http://archives.postgresql.org/pgsql-hackers/2006-12/msg00383.php

As Kevin suggests there, one thing that varies a bit based on the exact
work you're doing is the ratio between how heavily you balance the all vs.
lru weighting.  His final settings use the all writer a bit more heavily
than I'd normally recommend, but with his particular system that worked
out well.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: lowering impact of checkpoints

From
Gregory Stark
Date:
"Greg Smith" <gsmith@gregsmith.com> writes:

> On Tue, 25 Sep 2007, hubert depesz lubaczewski wrote:
>
>>         name          |  setting  |  unit
>> -----------------------+-----------+--------
>> bgwriter_all_maxpages | 5         |
>> bgwriter_all_percent  | 0.333     | [null]
>> bgwriter_delay        | 200       | ms
>> bgwriter_lru_maxpages | 5         |
>> bgwriter_lru_percent  | 1         | [null]
>
> The background writer can help smooth out checkpoints a bit in 8.2.4,

Not to disagree with anything Greg says here but you should keep in mind that
all of these parameters will have to be retuned from scratch with 8.3 which
has been optimized somewhat for just this problem.

> If these are all positive changes, you might even want to increase these
> further; potentially you could double all of the above and still not have the
> settings high enough, and if that's the case you may have to adjust
> bgwriter_delay downward.

I'm surprised you don't start by suggesting lowering bgwriter_delay for a busy
dedicated system. Does it cause too much wasted cpu work in the "all" cycle in
8.2?

I also wonder if it doesn't make more sense in 8.2 if your goal is to avoid
drop-outs to just give up on the lru cycle entirely and set the delay to
something like 60s and the all_percent to 100. Effectively saying to flush all
dirty buffers once a minute to smooth the checkpoint. I haven't tried doing
anything like that though.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: lowering impact of checkpoints

From
Greg Smith
Date:
On Tue, 25 Sep 2007, Gregory Stark wrote:

> I'm surprised you don't start by suggesting lowering bgwriter_delay for a busy
> dedicated system. Does it cause too much wasted cpu work in the "all" cycle in
> 8.2?

I've just found it easier to sort through this class of problem by getting
the maxpages parameters into at least the 200-500 range before even
thinking about lowering the delay.  There may very well be a different way
to approach this problem by making the delay more of a primary tunable.
Certainly there's potentially an advantage to lowering the delay in that
it gets writes trickling out to disk more regularly.

> I also wonder if it doesn't make more sense in 8.2 if your goal is to avoid
> drop-outs to just give up on the lru cycle entirely and set the delay to
> something like 60s and the all_percent to 100.

There are some workloads where flushing the buffers that haven't been used
recently in the lru cycle is more useful than what the all scan does; it's
hard to figure out whether your system is such a case or not in 8.2
though.

In addition, the main problem with using a longer cycle/higher percentage
is that the way some operating systems buffer writes favors writing small
blocks more frequently.  In the Linux case there are situations where
writes sit there for a full 30 seconds so getting the physical disk
started earlier is a benefit.  I'd be concerned that all_percent=100 would
end up generating something close to a checkpoint I/O spike every cycle,
and that the background writer waiting for that big write to finish might
delay checkpoint requests from processing in a timely fashion.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD