Thread: checkpoint segments

checkpoint segments

From
"David Parker"
Date:
I was recently running a test with multiple client shell processes running psql commands (inserts) when all the client processes appeared to hang simultaneously. I assumed that I had an application deadlock somewhere, but after a few seconds - less than a minute, but certainly noticeable - all the clients picked up again and went on their way.
 
In the database log at that time there was a "recycling transaction log" message which seems to correspond to the time when the clients were paused, though I don't have it concretely correlated.
 
I've seen these messages in the log before, and am aware of the need to increase checkpoint_segments, but I wasn't aware that recycling a transaction log could be that damaging to performance. There may have been some local hiccup in this case, but I'm wondering if recycling is known to be a big hit in general, and if I should strive to tune so that it never happens (if that's possible)?
 
Thanks.

- DAP
----------------------------------------------------------------------------------
David Parker    Tazz Networks    (401) 709-5130
 

 

Re: checkpoint segments

From
Josh Berkus
Date:
David,

> I've seen these messages in the log before, and am aware of the need to
> increase checkpoint_segments, but I wasn't aware that recycling a
> transaction log could be that damaging to performance. There may have
> been some local hiccup in this case, but I'm wondering if recycling is
> known to be a big hit in general, and if I should strive to tune so that
> it never happens (if that's possible)?

Yes, and yes.   Simply allocating more checkpoint segments (which can eat a
lot of disk space -- requirements are 16mb*(2 * segments +1) ) will prevent
this problem.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: checkpoint segments

From
Tom Lane
Date:
"David Parker" <dparker@tazznetworks.com> writes:
> I was recently running a test with multiple client shell processes
> running psql commands (inserts) when all the client processes appeared
> to hang simultaneously. I assumed that I had an application deadlock
> somewhere, but after a few seconds - less than a minute, but certainly
> noticeable - all the clients picked up again and went on their way.
>
> In the database log at that time there was a "recycling transaction log"
> message which seems to correspond to the time when the clients were
> paused, though I don't have it concretely correlated.

I think what you saw was the disk being hogged by checkpoint writes.
"Recycling transaction log" is a routine operation, and by itself is a
reasonably cheap operation, but it's only done as the last step in a
checkpoint (in fact, from a technical point of view, it's done after the
checkpoint finishes).  My guess is that the actual performance hit
occurred while the checkpoint was pushing out dirty buffers.

What you want is to reduce the amount of deferred I/O that has to happen
when a checkpoint occurs.  There is not any way to do that before PG
8.0 (the obvious idea of reducing the interval between checkpoints is
counterproductive, IMHO).  In 8.0 you can fool around with the bgwriter
parameters with an eye to "dribbling out" writes of dirty pages between
checkpoints.

            regards, tom lane

Re: checkpoint segments

From
Alvaro Herrera
Date:
On Sun, May 15, 2005 at 08:22:13PM -0400, David Parker wrote:

> In the database log at that time there was a "recycling transaction log"
> message which seems to correspond to the time when the clients were
> paused, though I don't have it concretely correlated.

Maybe what you need is make the bgwriter more aggressive, so that I/O is
more evenly spread between checkpoint intervals -- that way, at
checkpoint there's less work to do.

> I've seen these messages in the log before, and am aware of the need to
> increase checkpoint_segments, but I wasn't aware that recycling a
> transaction log could be that damaging to performance. There may have
> been some local hiccup in this case, but I'm wondering if recycling is
> known to be a big hit in general, and if I should strive to tune so that
> it never happens (if that's possible)?

Well, recycling is actually a *good* thing -- it saves you from having
to remove WAL segment files and allocate new files for the new logs.  So
what you really want doesn't have anything to do with the recycling
itself, but rather with the simultaneous checkpoint that's going on at
the same time.

--
Alvaro Herrera (<alvherre[a]surnet.cl>)
Licensee shall have no right to use the Licensed Software
for productive or commercial use. (Licencia de StarOffice 6.0 beta)

Re: checkpoint segments

From
Alvaro Herrera
Date:
On Sun, May 15, 2005 at 08:26:02PM -0700, Josh Berkus wrote:
> David,
>
> > I've seen these messages in the log before, and am aware of the need to
> > increase checkpoint_segments, but I wasn't aware that recycling a
> > transaction log could be that damaging to performance. There may have
> > been some local hiccup in this case, but I'm wondering if recycling is
> > known to be a big hit in general, and if I should strive to tune so that
> > it never happens (if that's possible)?
>
> Yes, and yes.   Simply allocating more checkpoint segments (which can eat a
> lot of disk space -- requirements are 16mb*(2 * segments +1) ) will prevent
> this problem.

Hmm?  I disagree -- it will only make things worse when the checkpoint
does occur.

--
Alvaro Herrera (<alvherre[a]surnet.cl>)
"Lo esencial es invisible para los ojos" (A. de Saint Exúpery)

Re: checkpoint segments

From
Josh Berkus
Date:
Alvaro,

> > Yes, and yes.   Simply allocating more checkpoint segments (which can eat
> > a lot of disk space -- requirements are 16mb*(2 * segments +1) ) will
> > prevent this problem.
>
> Hmm?  I disagree -- it will only make things worse when the checkpoint
> does occur.

Unless you allocate enough logs that you don't need to checkpoint until the
load is over with.   In multiple data tests involving large quantities of
data loading, increasing the number of checkpoints and the checkpoint
interval has been an overall benefit to overall load speed.   It's possible
that the checkpoints which do occur are worse, but they're not enough worse
to counterbalance their infrequency.

I have not yet been able to do a full scalability series on bgwriter.

--
Josh Berkus
Aglio Database Solutions
San Francisco