Thread: checkpoint spikes

checkpoint spikes

From
Janning
Date:
Hi,

we currently encounter an increasing load on our website. With the increasing
load we see some problems on our database. so we checked what happens and we
saw spikes in our load when checkpoints are about to finish.

Our configuration:

max_connections = 125
ssl = false
shared_buffers = 500MB
work_mem = 15MB
maintenance_work_mem = 250MB
synchronous_commit = off
full_page_writes = off

checkpoint_segments = 10
checkpoint_timeout = 10min
checkpoint_completion_target = 0.9

random_page_cost = 2
effective_cache_size = 5000MB

autovacuum = off

(we put autovacuum to off because we suspected it).

all the other parameters are set to default (beside log parameters and so on).

Our machine has 12 GB RAM, i7-975 CPU and a SW-Raid-1 for datadir and another
one for the rest of the server (including postgresql logfiles). Disk are
"Barracuda 7200.11 SATA 3Gb/s 1.5-TB" and we are running debian lenny.

these are our checkpoint log statements:

2010-06-09 17:24:27 CEST [6375]: [1-1] LOG:  checkpoint starting: time
2010-06-09 17:28:09 CEST [6375]: [2-1] LOG:  checkpoint complete: wrote 2861
buffers (4.5%); 0 transaction log file(s) added, 0 removed, 1 recycled;
write=193.057 s, sync=29.259 s, total=222.353 s
2010-06-09 17:34:27 CEST [6375]: [3-1] LOG:  checkpoint starting: time
2010-06-09 17:39:09 CEST [6375]: [4-1] LOG:  checkpoint complete: wrote 3247
buffers (5.1%); 0 transaction log file(s) added, 0 removed, 2 recycled;
write=255.255 s, sync=26.911 s, total=282.177 s
2010-06-09 17:44:27 CEST [6375]: [5-1] LOG:  checkpoint starting: time
2010-06-09 17:49:41 CEST [6375]: [6-1] LOG:  checkpoint complete: wrote 2746
buffers (4.3%); 0 transaction log file(s) added, 0 removed, 2 recycled;
write=280.743 s, sync=33.392 s, total=314.147 s
2010-06-09 17:54:27 CEST [6375]: [7-1] LOG:  checkpoint starting: time
2010-06-09 17:58:59 CEST [6375]: [8-1] LOG:  checkpoint complete: wrote 3118
buffers (4.9%); 0 transaction log file(s) added, 0 removed, 1 recycled;
write=253.293 s, sync=18.585 s, total=271.892 s
2010-06-09 18:04:27 CEST [6375]: [9-1] LOG:  checkpoint starting: time
2010-06-09 18:08:46 CEST [6375]: [10-1] LOG:  checkpoint complete: wrote 2695
buffers (4.2%); 0 transaction log file(s) added, 0 removed, 2 recycled;
write=225.173 s, sync=33.789 s, total=258.972 s
2010-06-09 18:14:27 CEST [6375]: [11-1] LOG:  checkpoint starting: time
2010-06-09 18:18:30 CEST [6375]: [12-1] LOG:  checkpoint complete: wrote 2868
buffers (4.5%); 0 transaction log file(s) added, 0 removed, 2 recycled;
write=215.561 s, sync=27.701 s, total=243.271 s

What we saw is a rather long sync time. And exactly at this time, our
responses become slow and the server load increases.

this is from pg_stat_bgwriter:

 checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean |
maxwritten_clean | buffers_backend | buffers_alloc

-------------------+-----------------+--------------------+---------------+------------------+-----------------+---------------
              3495 |               0 |            9070242 |      10798927 |
14421 |         6412707 |     208340755


We tried to increase checkpoint_timeout to 20 minutes but it was getting
worse:

2010-06-09 18:34:27 CEST [6375]: [13-1] LOG:  checkpoint starting: time
2010-06-09 18:42:49 CEST [6375]: [14-1] LOG:  checkpoint complete: wrote 2956
buffers (4.6%); 0 transaction log file(s) added, 0 removed, 1 recycled;
write=448.265 s, sync=54.087 s, total=502.377 s

here we see a 54 sec sync time and a much higher load on sync time compared to
the 10 minutes checkpoint_timeout.

do you have any hints for us how to tune our configuration to avoid spikes?

kind regards
Janning








Re: checkpoint spikes

From
Janning
Date:
Hi again,

nobody answered my question :-(, so i did some research. I was convinced to
set:

checkpoint_segments = 16
checkpoint_timeout = 60min

and
echo 2 > /proc/sys/vm/dirty_background_ratio

this helped a lot. Our freeze time was reduced from 10 seconds to 5 seconds.
But this is still way too long. I know the best fix would be good disks, but at
the moment it is not affordable.

Behalf of my research I have lots of questions concerning Checkpoints:

1. With raising checkpoint_timeout, is there any downgrade other than slower
after-crash recovery?

2. Is there a way to calculate the after-crash recovery time with a certain
checkpoint_timeout? How long would be approx. for a checkpoint_timeout of 60
minutes?

3. Is it sane to set checkpoint_timeout to 120min or even to 600min?

4. We have checkpoint_completion_target set to 0.9. The checkpoint is always
ready much earlier. Is it ought to be like this? I thought the work would be
spread across 90% of 60 minutes? Instead it takes only 10 Minutes for the
checkpoint to be finished.

5. Does anybody know if I can set dirty_background_ratio to 0.5? As we have 12
GB RAM and rather slow disks 0,5% would result in a maximum of 61MB dirty
pages.

Your help is very appreciated?

kind regards
Janning

PS: Do I need to post this question on pgsql-perfomance? If so, please let me
know.



On Wednesday 09 June 2010 18:53:23 Janning wrote:
> Hi,
>
> we currently encounter an increasing load on our website. With the
> increasing load we see some problems on our database. so we checked what
> happens and we saw spikes in our load when checkpoints are about to finish.
>
> Our configuration:
>
> max_connections = 125
> ssl = false
> shared_buffers = 500MB
> work_mem = 15MB
> maintenance_work_mem = 250MB
> synchronous_commit = off
> full_page_writes = off
>
> checkpoint_segments = 10
> checkpoint_timeout = 10min
> checkpoint_completion_target = 0.9
>
> random_page_cost = 2
> effective_cache_size = 5000MB
>
> autovacuum = off
>
> (we put autovacuum to off because we suspected it).
>
> all the other parameters are set to default (beside log parameters and so
> on).
>
> Our machine has 12 GB RAM, i7-975 CPU and a SW-Raid-1 for datadir and
> another one for the rest of the server (including postgresql logfiles).
> Disk are "Barracuda 7200.11 SATA 3Gb/s 1.5-TB" and we are running debian
> lenny.
>
> these are our checkpoint log statements:
>
> 2010-06-09 17:24:27 CEST [6375]: [1-1] LOG:  checkpoint starting: time
> 2010-06-09 17:28:09 CEST [6375]: [2-1] LOG:  checkpoint complete: wrote
> 2861 buffers (4.5%); 0 transaction log file(s) added, 0 removed, 1
> recycled; write=193.057 s, sync=29.259 s, total=222.353 s
> 2010-06-09 17:34:27 CEST [6375]: [3-1] LOG:  checkpoint starting: time
> 2010-06-09 17:39:09 CEST [6375]: [4-1] LOG:  checkpoint complete: wrote
> 3247 buffers (5.1%); 0 transaction log file(s) added, 0 removed, 2
> recycled; write=255.255 s, sync=26.911 s, total=282.177 s
> 2010-06-09 17:44:27 CEST [6375]: [5-1] LOG:  checkpoint starting: time
> 2010-06-09 17:49:41 CEST [6375]: [6-1] LOG:  checkpoint complete: wrote
> 2746 buffers (4.3%); 0 transaction log file(s) added, 0 removed, 2
> recycled; write=280.743 s, sync=33.392 s, total=314.147 s
> 2010-06-09 17:54:27 CEST [6375]: [7-1] LOG:  checkpoint starting: time
> 2010-06-09 17:58:59 CEST [6375]: [8-1] LOG:  checkpoint complete: wrote
> 3118 buffers (4.9%); 0 transaction log file(s) added, 0 removed, 1
> recycled; write=253.293 s, sync=18.585 s, total=271.892 s
> 2010-06-09 18:04:27 CEST [6375]: [9-1] LOG:  checkpoint starting: time
> 2010-06-09 18:08:46 CEST [6375]: [10-1] LOG:  checkpoint complete: wrote
> 2695 buffers (4.2%); 0 transaction log file(s) added, 0 removed, 2
> recycled; write=225.173 s, sync=33.789 s, total=258.972 s
> 2010-06-09 18:14:27 CEST [6375]: [11-1] LOG:  checkpoint starting: time
> 2010-06-09 18:18:30 CEST [6375]: [12-1] LOG:  checkpoint complete: wrote
> 2868 buffers (4.5%); 0 transaction log file(s) added, 0 removed, 2
> recycled; write=215.561 s, sync=27.701 s, total=243.271 s
>
> What we saw is a rather long sync time. And exactly at this time, our
> responses become slow and the server load increases.
>
> this is from pg_stat_bgwriter:
>
>  checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean |
> maxwritten_clean | buffers_backend | buffers_alloc
> -------------------+-----------------+--------------------+---------------+
>------------------+-----------------+--------------- 3495 |               0
> |            9070242 |      10798927 | 14421 |         6412707 |
> 208340755
>
>
> We tried to increase checkpoint_timeout to 20 minutes but it was getting
> worse:
>
> 2010-06-09 18:34:27 CEST [6375]: [13-1] LOG:  checkpoint starting: time
> 2010-06-09 18:42:49 CEST [6375]: [14-1] LOG:  checkpoint complete: wrote
> 2956 buffers (4.6%); 0 transaction log file(s) added, 0 removed, 1
> recycled; write=448.265 s, sync=54.087 s, total=502.377 s
>
> here we see a 54 sec sync time and a much higher load on sync time compared
> to the 10 minutes checkpoint_timeout.
>
> do you have any hints for us how to tune our configuration to avoid spikes?
>
> kind regards
> Janning




Re: checkpoint spikes

From
Vick Khera
Date:
On Thu, Jun 10, 2010 at 12:49 PM, Janning <ml@planwerk6.de> wrote:
> 1. With raising checkpoint_timeout, is there any downgrade other than slower
> after-crash recovery?

Depends on how busy your DB is, and how many checkpoint segments you
have.  All the timeout does is say, "if we have not done a checkpoint
this long, do it now."  So, if you run out of checkpoint segments, a
checkpoint will be run.  How often you fill those segments depends on
how much writing you do and how many of them you have.

>
> 2. Is there a way to calculate the after-crash recovery time with a certain
> checkpoint_timeout? How long would be approx. for a checkpoint_timeout of 60
> minutes?

It depends on the number of segments with data that needs to be pushed
to permanent table storage.  So it is not so much dependent on the
timeout but the number of segments and how much writing you do, as
above.

>
> 3. Is it sane to set checkpoint_timeout to 120min or even to 600min?

I don't think so :-)

>
> 4. We have checkpoint_completion_target set to 0.9. The checkpoint is always
> ready much earlier. Is it ought to be like this? I thought the work would be
> spread across 90% of 60 minutes? Instead it takes only 10 Minutes for the
> checkpoint to be finished.

I use the following:

checkpoint_segments = 64
checkpoint_timeout = 15min
checkpoint_completion_target = 0.8

on a server with 20GB RAM and very big fast disks with Pg 8.3.

You should also investigate the bgwriter, which pushes dirty pages to
disk. Perhaps being more aggressive with that will reduce your need to
checkpoint.

Greg Smith has the best info on tuning these parameters that I have
found: http://www.westnet.com/~gsmith/gregsmith/content/postgresql/index.htm

Re: checkpoint spikes

From
Greg Smith
Date:
Janning wrote:
> 1. With raising checkpoint_timeout, is there any downgrade other than slower
> after-crash recovery?
>

Checkpoint spikes happen when too much I/O has been saved up for
checkpoint time than the server can handle.  While this is normally
handled by the checkpoint spreading logic, you may find that with your
limited disk configuration there's no other way to handle the problem
but to make checkpoints much more frequent, rather than slower.  At
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm I
write about how you had to tune PostgreSQL 8.0 to 8.2 in order to keep
checkpoint spikes from getting too big.  You may have to follow those
same techniques for your server.  Just don't try to increase the
background writer settings in your case--the 8.3 one is different enough
that you can't tune that the way was suggested for 8.2.

> 2. Is there a way to calculate the after-crash recovery time with a certain
> checkpoint_timeout? How long would be approx. for a checkpoint_timeout of 60
> minutes?
>

Simulate it.  No way to estimate.


> 3. Is it sane to set checkpoint_timeout to 120min or even to 600min?
>

Checkpoints happen when you reach either checkpoint_segments of WAL
written *or* reach checkpoint_timeout, whichever happens first.  You'd
have to set both to extremely large values to get checkpoints to happen
really infrequently.  Which I suspect is the exactly opposite of what
you want--you can't handle the spike from a long delayed checkpoint, and
probably want to tune for shorter and smaller ones instead.

Every now and then we run into someone who had to retune their system to
something like:

shared_buffers=512MB
checkpoint_segments=3

In order to avoid spikes from killing them.  That may be the direction
you have to head.  The longer the time between checkpoints, the bigger
the spike at the end is going to be to some extend; you can't completely
spread that out.


> 5. Does anybody know if I can set dirty_background_ratio to 0.5? As we have 12
> GB RAM and rather slow disks 0,5% would result in a maximum of 61MB dirty
> pages.
>

Nope.  Linux has absolutely terrible controls for this critical
performance parameter.   The sort of multi-second spikes you're seeing
are extremely common and very difficult to get rid of.


> PS: Do I need to post this question on pgsql-perfomance? If so, please let me
> know.
>

That would have been the better list for it originally.  I also wrote
something about a technique that uses pg_stat_bgwriter snapshots to help
model what the server is doing in these cases better you might find
useful on the admin list, it's at
http://archives.postgresql.org/pgsql-admin/2010-06/msg00074.php

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: checkpoint spikes

From
Janning
Date:
On Thursday 10 June 2010 22:00:54 Greg Smith wrote:
> Janning wrote:
> > 1. With raising checkpoint_timeout, is there any downgrade other than
> > slower after-crash recovery?
>
> Checkpoint spikes happen when too much I/O has been saved up for
> checkpoint time than the server can handle.  While this is normally
> handled by the checkpoint spreading logic, you may find that with your
> limited disk configuration there's no other way to handle the problem
> but to make checkpoints much more frequent, rather than slower.

Uhh! I had so much success with less frequent checkpoints. At least the spike
is the same but it does not happen so often.

> At
> http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm I
> write about how you had to tune PostgreSQL 8.0 to 8.2 in order to keep
> checkpoint spikes from getting too big.  You may have to follow those
> same techniques for your server.  Just don't try to increase the
> background writer settings in your case--the 8.3 one is different enough
> that you can't tune that the way was suggested for 8.2.

we are runing 8.4 and of course I read your article. I just raised
bgwriter_lru_maxpage to 200 :-(

most docs I found relates to 8.2 and 8.3. In Things of checkpoints, is 8.4
comparable to 8.3? It would be nice if you update your article to reflect 8.4.

> > 2. Is there a way to calculate the after-crash recovery time with a
> > certain checkpoint_timeout? How long would be approx. for a
> > checkpoint_timeout of 60 minutes?
>
> Simulate it.  No way to estimate.

Ok. won't try it now :-)

> > 3. Is it sane to set checkpoint_timeout to 120min or even to 600min?
>
> Checkpoints happen when you reach either checkpoint_segments of WAL
> written *or* reach checkpoint_timeout, whichever happens first.

Sorry the question was not precise. I already raised checkpoint_segments to
reach that goal of less frequent checkpoints.

> You'd
> have to set both to extremely large values to get checkpoints to happen
> really infrequently.  Which I suspect is the exactly opposite of what
> you want--you can't handle the spike from a long delayed checkpoint, and
> probably want to tune for shorter and smaller ones instead.
>
> Every now and then we run into someone who had to retune their system to
> something like:
>
> shared_buffers=512MB
> checkpoint_segments=3
>
> In order to avoid spikes from killing them.  That may be the direction
> you have to head.  The longer the time between checkpoints, the bigger
> the spike at the end is going to be to some extend; you can't completely
> spread that out.

I am really afraid of doing it right now. In my experience the spike is the
same but we only have it once an hour.

> > 5. Does anybody know if I can set dirty_background_ratio to 0.5? As we
> > have 12 GB RAM and rather slow disks 0,5% would result in a maximum of
> > 61MB dirty pages.
>
> Nope.  Linux has absolutely terrible controls for this critical
> performance parameter.   The sort of multi-second spikes you're seeing
> are extremely common and very difficult to get rid of.

ok, thanks.

> > PS: Do I need to post this question on pgsql-perfomance? If so, please
> > let me know.
>
> That would have been the better list for it originally.  I also wrote
> something about a technique that uses pg_stat_bgwriter snapshots to help
> model what the server is doing in these cases better you might find
> useful on the admin list, it's at
> http://archives.postgresql.org/pgsql-admin/2010-06/msg00074.php

thank you very much for your help!

best regards
Janning



Re: checkpoint spikes

From
Greg Smith
Date:
Janning wrote:
> most docs I found relates to 8.2 and 8.3. In Things of checkpoints, is 8.4
> comparable to 8.3? It would be nice if you update your article to reflect 8.4

There haven't been any changes made in this area since 8.3, that's why
there's been no update.  8.4 and 9.0 have exactly the same checkpoint
behavior and background writer behavior as 8.3 from the perspective of
regular use.  The changes made only impact how things happen on a
replication standby, not the master server.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: checkpoint spikes

From
Martijn van Oosterhout
Date:
On Thu, Jun 10, 2010 at 04:00:54PM -0400, Greg Smith wrote:
>> 5. Does anybody know if I can set dirty_background_ratio to 0.5? As we
>> have 12 GB RAM and rather slow disks 0,5% would result in a maximum of
>> 61MB dirty pages.
>
> Nope.  Linux has absolutely terrible controls for this critical
> performance parameter.   The sort of multi-second spikes you're seeing
> are extremely common and very difficult to get rid of.

Another relevent parameter is /proc/sys/vm/dirty_writeback_centisecs.
By default linux only wakes up once every 5 seconds to check if there
is stuff to write out. I have found that reducing this tends to smooth
out bursty spikes. However, see:

http://www.westnet.com/~gsmith/content/linux-pdflush.htm

which indicates that kernel may try to defeat you here...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
>                                       - Charles de Gaulle

Attachment

Re: checkpoint spikes

From
Janning
Date:
Hi Martijn, hi Greg,

thanks you very much for your help. We finally got rid of these annoying
spikes.

First we tried to set
  checkpoint_segments = 3    # before 16
  checkpoint_timeout = 5min  # before: 60min
which didn't really help.

we had the same spikes but more often. Then we tried to lower
dirty_writeback_centisecs from 500 to 100

This helped a little bit, but we investigated this problem further.

we monitored "Dirty" memory from /proc/meminfo and we saw very long query
durations (>5s, sometimes 10s and more) correlating with the kernel writing
out Dirty buffer. When we saw a massive reduction in "Dirty" memory, there were
spikes in the query duration. As long as the kernel didn't write out the dirty
memory, everything did run fine.

So finally we tried
echo 0 > /proc/sys/vm/dirty_background_ratio

and now everything runs very smooth. We see a few longer query durations over
2 seconds but no more spikes of 5 or 10 seconds.

Our average response time from our our tomcat servers suddenly dropped from
300ms to 100ms. Great!!

We know that our limitation is cheap disks, but with /dirty_background_ratio =
0 you really have big advantages and much better performance.

So for further reference for other people reading this thread, I really
recommend trying this out.

best regards
Janning

On Friday 11 June 2010 21:48:54 Martijn van Oosterhout wrote:
> On Thu, Jun 10, 2010 at 04:00:54PM -0400, Greg Smith wrote:
> >> 5. Does anybody know if I can set dirty_background_ratio to 0.5? As we
> >> have 12 GB RAM and rather slow disks 0,5% would result in a maximum of
> >> 61MB dirty pages.
> >
> > Nope.  Linux has absolutely terrible controls for this critical
> > performance parameter.   The sort of multi-second spikes you're seeing
> > are extremely common and very difficult to get rid of.
>
> Another relevent parameter is /proc/sys/vm/dirty_writeback_centisecs.
> By default linux only wakes up once every 5 seconds to check if there
> is stuff to write out. I have found that reducing this tends to smooth
> out bursty spikes. However, see:
>
> http://www.westnet.com/~gsmith/content/linux-pdflush.htm
>
> which indicates that kernel may try to defeat you here...
>
> Have a nice day,