Re: Checkpoint cost, looks like it is WAL/CRC - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Checkpoint cost, looks like it is WAL/CRC
Date
Msg-id 1122414979.3670.96.camel@localhost.localdomain
Whole thread Raw
In response to Re: Checkpoint cost, looks like it is WAL/CRC  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Checkpoint cost, looks like it is WAL/CRC  (Josh Berkus <josh@agliodbs.com>)
List pgsql-hackers
On Fri, 2005-07-22 at 19:11 -0400, Tom Lane wrote:
> Hmm.  Eyeballing the NOTPM trace for cases 302912 and 302909, it sure
> looks like the post-checkpoint performance recovery is *slower* in
> the latter.  And why is 302902 visibly slower overall than 302905?
> I thought for a bit that you had gotten "patch" vs "no patch" backwards,
> but the oprofile results linked to these pages look right: XLogInsert
> takes significantly more time in the "no patch" cases.
> 
> There's something awfully weird going on here.  I was prepared to see
> no statistically-significant differences, but not multiple cases that
> seem to be going the "wrong direction".

All of the tests have been performed with wal_buffers = 8, so there will
be massive contention for those buffers, leading to increased I/O...

All of the tests show that there is a CPU utilisation drop, and an I/O
wait increase immediately following checkpoints.

When we advance the insert pointer and a wal_buffer still needs writing,
we clean it by attempting to perform an I/O while holding WALInsertLock.
Very probably the WALWriteLock is currently held, so we wait on the
WALWriteLock and everybody else waits on us. Normally, its fairly hard
for that to occur since we attempt to XLogWrite when walbuffers are more
than half full, but we do this with a conditional acquire, so when we're
busy we just keep filling up wal_buffers. Normally, thats OK.

When we have a checkpoint, almost every xlog write has at least a whole
block appended to it. So we can easily fill up wal_buffers very quickly
while WALWriteLock is held. Once there is no space available, we then
effectively halt all transactions while we write out that buffer. 

My conjecture is that the removal of the CPU bottleneck has merely moved
the problem by allowing users to fill wal buffers faster and go into a
wait state quicker than they did before. The beneficial effect of the
conditional acquire when wal buffers is full never occurs, and
performance drops.

We should run tests with much higher wal_buffers numbers to nullify the
effect described above and reduce contention. That way we will move
towards the log disk speed being the limiting factor, patch or no patch.

So, I think Tom's improvement of CRC/hole compression will prove itself
when we have higher values of wal_buffers,

Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Chris Browne
Date:
Subject: Interesting COPY edge case...
Next
From: "Dave Page"
Date:
Subject: Re: For review: Server instrumentation patch