Thread: CheckpointStartLock starvation

CheckpointStartLock starvation

From

Heikki Linnakangas

Date:

02 April 2007, 16:00:18

I'm seeing a problem on my benchmark machine: checkpoints stop happening 
after the ramp-up period.

It looks like the bgwriter gets starved waiting on the 
CheckpointStartLock. The CheckpointStartLock is held in shared mode over 
an XLogFlush when committing, which on an extremely busy system like a 
benchmark is always long enough to have a new transaction to acquire the 
CheckpointStartLock again.

I'm running another test with more logging to confirm that's what's 
happening, but I'm pretty sure that's it...

As a proposed fix, instead of acquiring the CheckpointStartLock in 
RecordTransactionCommit, we set a flag in MyProc saying "commit in 
progress". Checkpoint will scan through the procarray and make note of 
any commit in progress transactions, after computing the new redo record 
ptr, and wait for all of them to finish before flushing clog.

Unless someone has a better idea, I'll write a patch to do the above.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: CheckpointStartLock starvation

From

Tom Lane

Date:

02 April 2007, 16:46:10

Heikki Linnakangas <heikki@enterprisedb.com> writes:
> As a proposed fix, instead of acquiring the CheckpointStartLock in 
> RecordTransactionCommit, we set a flag in MyProc saying "commit in 
> progress". Checkpoint will scan through the procarray and make note of 
> any commit in progress transactions, after computing the new redo record 
> ptr, and wait for all of them to finish before flushing clog.

What sort of "wait for finish" mechanism do you have in mind?  While
I've always thought CheckpointStartLock is a pretty ugly solution,
I'm not sure the above is better.
        regards, tom lane

Re: CheckpointStartLock starvation

From

Heikki Linnakangas

Date:

02 April 2007, 17:51:08

Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> As a proposed fix, instead of acquiring the CheckpointStartLock in 
>> RecordTransactionCommit, we set a flag in MyProc saying "commit in 
>> progress". Checkpoint will scan through the procarray and make note of 
>> any commit in progress transactions, after computing the new redo record 
>> ptr, and wait for all of them to finish before flushing clog.
> 
> What sort of "wait for finish" mechanism do you have in mind?  While
> I've always thought CheckpointStartLock is a pretty ugly solution,
> I'm not sure the above is better.

I was thinking of XactLockTableWait.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: CheckpointStartLock starvation

From

Tom Lane

Date:

02 April 2007, 18:33:04

Heikki Linnakangas <heikki@enterprisedb.com> writes:
> Tom Lane wrote:
>> What sort of "wait for finish" mechanism do you have in mind?

> I was thinking of XactLockTableWait.

Ugh.  I don't think the bgwriter can participate in heavyweight-lockmgr
operations, or should become able to.

Nor will that work for prepared xacts --- you don't want to wait for the
eventual commit, only for PREPARE TRANSACTION to exit its critical
section.
        regards, tom lane

Re: CheckpointStartLock starvation

From

ITAGAKI Takahiro

Date:

02 April 2007, 21:55:28

Heikki Linnakangas <heikki@enterprisedb.com> wrote:

> It looks like the bgwriter gets starved waiting on the 
> CheckpointStartLock. The CheckpointStartLock is held in shared mode over 
> an XLogFlush when committing, which on an extremely busy system like a 
> benchmark is always long enough to have a new transaction to acquire the 
> CheckpointStartLock again.

If the starvation comes from giving unfair priorities on shared locks
against exclusive locks, does the below TODO item help us?

| Locking
| Fix priority ordering of read and write light-weight locks (Neil) 
| http://archives.postgresql.org/pgsql-hackers/2004-11/msg00893.php
| http://archives.postgresql.org/pgsql-hackers/2004-11/msg00905.php 

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Re: CheckpointStartLock starvation

From

Tom Lane

Date:

02 April 2007, 22:16:31

ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:
> Heikki Linnakangas <heikki@enterprisedb.com> wrote:
>> It looks like the bgwriter gets starved waiting on the 
>> CheckpointStartLock. The CheckpointStartLock is held in shared mode over 
>> an XLogFlush when committing, which on an extremely busy system like a 
>> benchmark is always long enough to have a new transaction to acquire the 
>> CheckpointStartLock again.

> If the starvation comes from giving unfair priorities on shared locks
> against exclusive locks, does the below TODO item help us?

Tweaking the lock rules was my first thought too, but the side-effects
might be undesirable.  In this particular case it would certainly be
better to not have a lock at all, since having checkpoint block commits
even briefly is not what we'd like.  I think Heikki's plan of having
backends show in PGPROC that they're in a commit critical section is
basically sound, we just have to get the details straight.

Since checkpoint doesn't need to be instantaneous, it's probably
sufficient to just have it sleep 10 msec or so and recheck to see
if all the blockers are gone, instead of doing any kind of fancy
signaling.
        regards, tom lane

Re: CheckpointStartLock starvation

From

Heikki Linnakangas

Date:

03 April 2007, 05:56:08

Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> Tom Lane wrote:
>>> What sort of "wait for finish" mechanism do you have in mind?
> 
>> I was thinking of XactLockTableWait.
> 
> Ugh.  I don't think the bgwriter can participate in heavyweight-lockmgr
> operations, or should become able to.

Oh, good point.

I suppose we could just poll and sleep, checkpoint is a heavy operation 
anyway so a little delay wouldn't hurt.

> Nor will that work for prepared xacts --- you don't want to wait for the
> eventual commit, only for PREPARE TRANSACTION to exit its critical
> section.

PREPARE TRANSACTION wouldn't set the flag in MyProc; there's no clog 
changes to protect from at that point. It would be set in 
RecordTransactionCommitPrepared when we're really committing. Just like 
we use the CheckpointStartLock today.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: CheckpointStartLock starvation

From

Tom Lane

Date:

03 April 2007, 10:48:35

Heikki Linnakangas <heikki@enterprisedb.com> writes:
> Tom Lane wrote:
>> Nor will that work for prepared xacts --- you don't want to wait for the
>> eventual commit, only for PREPARE TRANSACTION to exit its critical
>> section.

> PREPARE TRANSACTION wouldn't set the flag in MyProc; there's no clog 
> changes to protect from at that point. It would be set in 
> RecordTransactionCommitPrepared when we're really committing. Just like 
> we use the CheckpointStartLock today.

Indeed --- you'd better take another look at where we use the
CheckpointStartLock today.
        regards, tom lane

Re: CheckpointStartLock starvation

From

Heikki Linnakangas

Date:

03 April 2007, 10:52:05

Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> Tom Lane wrote:
>>> Nor will that work for prepared xacts --- you don't want to wait for the
>>> eventual commit, only for PREPARE TRANSACTION to exit its critical
>>> section.
> 
>> PREPARE TRANSACTION wouldn't set the flag in MyProc; there's no clog 
>> changes to protect from at that point. It would be set in 
>> RecordTransactionCommitPrepared when we're really committing. Just like 
>> we use the CheckpointStartLock today.
> 
> Indeed --- you'd better take another look at where we use the
> CheckpointStartLock today.

Yeah, while writing the patch I noticed that we really do use it in 
EndPrepare to avoid a similar race condition with the twophase state file..

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: CheckpointStartLock starvation

From

"Simon Riggs"

Date:

04 April 2007, 08:18:29

On Mon, 2007-04-02 at 21:16 -0400, Tom Lane wrote:
> ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:
> > Heikki Linnakangas <heikki@enterprisedb.com> wrote:
> >> It looks like the bgwriter gets starved waiting on the 
> >> CheckpointStartLock. The CheckpointStartLock is held in shared mode over 
> >> an XLogFlush when committing, which on an extremely busy system like a 
> >> benchmark is always long enough to have a new transaction to acquire the 
> >> CheckpointStartLock again.
> 
> > If the starvation comes from giving unfair priorities on shared locks
> > against exclusive locks, does the below TODO item help us?
> 
> Tweaking the lock rules was my first thought too, but the side-effects
> might be undesirable.  In this particular case it would certainly be
> better to not have a lock at all, since having checkpoint block commits
> even briefly is not what we'd like. 

Itagaki-san:

Tried that way of handling the problem in June last year and it just
moved the problem, rather than removing it. Lock free solution is the
only way, so Heikki's method is better, ISTM.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com