Thread: CheckpointStartLock starvation
I'm seeing a problem on my benchmark machine: checkpoints stop happening after the ramp-up period. It looks like the bgwriter gets starved waiting on the CheckpointStartLock. The CheckpointStartLock is held in shared mode over an XLogFlush when committing, which on an extremely busy system like a benchmark is always long enough to have a new transaction to acquire the CheckpointStartLock again. I'm running another test with more logging to confirm that's what's happening, but I'm pretty sure that's it... As a proposed fix, instead of acquiring the CheckpointStartLock in RecordTransactionCommit, we set a flag in MyProc saying "commit in progress". Checkpoint will scan through the procarray and make note of any commit in progress transactions, after computing the new redo record ptr, and wait for all of them to finish before flushing clog. Unless someone has a better idea, I'll write a patch to do the above. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki@enterprisedb.com> writes: > As a proposed fix, instead of acquiring the CheckpointStartLock in > RecordTransactionCommit, we set a flag in MyProc saying "commit in > progress". Checkpoint will scan through the procarray and make note of > any commit in progress transactions, after computing the new redo record > ptr, and wait for all of them to finish before flushing clog. What sort of "wait for finish" mechanism do you have in mind? While I've always thought CheckpointStartLock is a pretty ugly solution, I'm not sure the above is better. regards, tom lane
Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: >> As a proposed fix, instead of acquiring the CheckpointStartLock in >> RecordTransactionCommit, we set a flag in MyProc saying "commit in >> progress". Checkpoint will scan through the procarray and make note of >> any commit in progress transactions, after computing the new redo record >> ptr, and wait for all of them to finish before flushing clog. > > What sort of "wait for finish" mechanism do you have in mind? While > I've always thought CheckpointStartLock is a pretty ugly solution, > I'm not sure the above is better. I was thinking of XactLockTableWait. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki@enterprisedb.com> writes: > Tom Lane wrote: >> What sort of "wait for finish" mechanism do you have in mind? > I was thinking of XactLockTableWait. Ugh. I don't think the bgwriter can participate in heavyweight-lockmgr operations, or should become able to. Nor will that work for prepared xacts --- you don't want to wait for the eventual commit, only for PREPARE TRANSACTION to exit its critical section. regards, tom lane
Heikki Linnakangas <heikki@enterprisedb.com> wrote: > It looks like the bgwriter gets starved waiting on the > CheckpointStartLock. The CheckpointStartLock is held in shared mode over > an XLogFlush when committing, which on an extremely busy system like a > benchmark is always long enough to have a new transaction to acquire the > CheckpointStartLock again. If the starvation comes from giving unfair priorities on shared locks against exclusive locks, does the below TODO item help us? | Locking | Fix priority ordering of read and write light-weight locks (Neil) | http://archives.postgresql.org/pgsql-hackers/2004-11/msg00893.php | http://archives.postgresql.org/pgsql-hackers/2004-11/msg00905.php Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes: > Heikki Linnakangas <heikki@enterprisedb.com> wrote: >> It looks like the bgwriter gets starved waiting on the >> CheckpointStartLock. The CheckpointStartLock is held in shared mode over >> an XLogFlush when committing, which on an extremely busy system like a >> benchmark is always long enough to have a new transaction to acquire the >> CheckpointStartLock again. > If the starvation comes from giving unfair priorities on shared locks > against exclusive locks, does the below TODO item help us? Tweaking the lock rules was my first thought too, but the side-effects might be undesirable. In this particular case it would certainly be better to not have a lock at all, since having checkpoint block commits even briefly is not what we'd like. I think Heikki's plan of having backends show in PGPROC that they're in a commit critical section is basically sound, we just have to get the details straight. Since checkpoint doesn't need to be instantaneous, it's probably sufficient to just have it sleep 10 msec or so and recheck to see if all the blockers are gone, instead of doing any kind of fancy signaling. regards, tom lane
Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: >> Tom Lane wrote: >>> What sort of "wait for finish" mechanism do you have in mind? > >> I was thinking of XactLockTableWait. > > Ugh. I don't think the bgwriter can participate in heavyweight-lockmgr > operations, or should become able to. Oh, good point. I suppose we could just poll and sleep, checkpoint is a heavy operation anyway so a little delay wouldn't hurt. > Nor will that work for prepared xacts --- you don't want to wait for the > eventual commit, only for PREPARE TRANSACTION to exit its critical > section. PREPARE TRANSACTION wouldn't set the flag in MyProc; there's no clog changes to protect from at that point. It would be set in RecordTransactionCommitPrepared when we're really committing. Just like we use the CheckpointStartLock today. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki@enterprisedb.com> writes: > Tom Lane wrote: >> Nor will that work for prepared xacts --- you don't want to wait for the >> eventual commit, only for PREPARE TRANSACTION to exit its critical >> section. > PREPARE TRANSACTION wouldn't set the flag in MyProc; there's no clog > changes to protect from at that point. It would be set in > RecordTransactionCommitPrepared when we're really committing. Just like > we use the CheckpointStartLock today. Indeed --- you'd better take another look at where we use the CheckpointStartLock today. regards, tom lane
Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: >> Tom Lane wrote: >>> Nor will that work for prepared xacts --- you don't want to wait for the >>> eventual commit, only for PREPARE TRANSACTION to exit its critical >>> section. > >> PREPARE TRANSACTION wouldn't set the flag in MyProc; there's no clog >> changes to protect from at that point. It would be set in >> RecordTransactionCommitPrepared when we're really committing. Just like >> we use the CheckpointStartLock today. > > Indeed --- you'd better take another look at where we use the > CheckpointStartLock today. Yeah, while writing the patch I noticed that we really do use it in EndPrepare to avoid a similar race condition with the twophase state file.. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, 2007-04-02 at 21:16 -0400, Tom Lane wrote: > ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes: > > Heikki Linnakangas <heikki@enterprisedb.com> wrote: > >> It looks like the bgwriter gets starved waiting on the > >> CheckpointStartLock. The CheckpointStartLock is held in shared mode over > >> an XLogFlush when committing, which on an extremely busy system like a > >> benchmark is always long enough to have a new transaction to acquire the > >> CheckpointStartLock again. > > > If the starvation comes from giving unfair priorities on shared locks > > against exclusive locks, does the below TODO item help us? > > Tweaking the lock rules was my first thought too, but the side-effects > might be undesirable. In this particular case it would certainly be > better to not have a lock at all, since having checkpoint block commits > even briefly is not what we'd like. Itagaki-san: Tried that way of handling the problem in June last year and it just moved the problem, rather than removing it. Lock free solution is the only way, so Heikki's method is better, ISTM. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com