Thread: Hot standby, prepared xacts, locks

Hot standby, prepared xacts, locks

From
Heikki Linnakangas
Date:
The Hot Standby patch changes lock_twophase_recover() so that when we're
starting up from Hot Standby mode to normal operation, as opposed to
crash recovery, we assume that all AccessExcusiveLocks are already held
by the startup process and instead of acquiring them anew they are
transferred from the startup process to the dummy PGPROC entry.

As the patch stands, that works. However, if we don't see a
running-xacts record, but only some XLOG_RELATION_LOCK records followed
by a PREPARE record, the startup process will hold some of the
transactions locks but not all of them. That's OK because we're not
letting read-only backends in yet. If we then failover and bring the
standby server up as a new master, lock_twophase_recover() will grant
the locks directly for the dummy process. But the startup process is
already holding some of them, so we briefly have a situation where two
processes - the dummy process and the startup process - are both holding
the same AccessExclusiveLock. AFAICS, the lock manager can cope with
that situation just fine, even though it can't normally happen, but it's
worth a mention.

A while back in your branch you changed ProcArrayApplyRecoverInfo() so
that the standby is put into "pending" mode when it sees a running-xacts
record marked as 'locks overflowed', instead of just ignoring the
record. If we see such a record, go into pending mode, and then try to
bring up the standby as the new master, we will run into this error in
lock_twophase_recover():

>         /*
>          * We should already hold the desired lock.
>          */
>         if (!(proclock->holdMask & LOCKBIT_ON(lockmode)))
>             elog(ERROR, "lock %s on object %u/%u/%u by xid %u was not held by startup process",
>                  lockMethodTable->lockModeNames[lockmode],
>                  locktag->locktag_field1, locktag->locktag_field2,
>                  locktag->locktag_field3,
>                  xid);
> 

lock_twophase_recover() assumes that all locks held by the prepared
transactions are already held by the startup process, but that might not
be true in the pending state.

Given that even without that change in ProcArrayApplyRecoveryInfo(), we
can end up in a situation where both the dummy process and the startup
process hold the same AccessExclusiveLock, I think we could simply
revert all the changes in lock_twophase_recover() and rely on that
behavior whenever we end recovery.

BTW, this self-proclaimed ugly hack in lock_twophase_recover() looks
utterly broken:

>         /*
>          * Transfer the lock->procLocks entry so that the lock is now owned
>          * by the new gxact proc.  Nothing else need change.
>          * This can happen safely because the startup process has the lock
>          * and wishes to give the lock to a gxact dummy process that has
>          * no associated backend. So neither proc can change concurrently.
>          * This technique might normally be considered an ugly hack, yet
>          * managing dummy procs all the way through recovery resulted in
>          * more complex code, which was worse. Chiedere il permesso.
>          */
>         SHMQueueInit(&lock->procLocks);
>         SHMQueueInsertBefore(&lock->procLocks, &proclock->lockLink);
>         SHMQueueInsertBefore(&(proc->myProcLocks[partition]),
>                              &proclock->procLink);

It will blow away any PROCLOCKs belonging to other backends waiting for
the lock, and you can't just transfer a PROCLOCK entry from one backend
to another anyway, because the PGPROC pointer is part of its the hash
key. But then again, I don't think it's actually transferring the
startup process' PROCLOCK entry anyway, because the PROCLOCK entry
searched/created just above that belongs to the dummy proc, not the
startup process. So this is really just blowing away all other PROCLOCKs
from the lock's procLock queue, and possibly inserting a duplicate entry
into proc->myProcLocks[partition]. Ugh.

So, I'm quite eager to just revert all those lock_twophase_recover()
changes, and always rely on the "grant lock to dummy proc, then release
it in startup process" method. If we don't want to rely on that,
PostPrepare_Locks is an example of how to transfer lock ownership from
one process to another correctly.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot standby, prepared xacts, locks

From
Simon Riggs
Date:
On Wed, 2009-10-21 at 19:37 +0300, Heikki Linnakangas wrote:

> So, I'm quite eager to just revert all those lock_twophase_recover()
> changes, and always rely on the "grant lock to dummy proc, then
> release
> it in startup process" method. If we don't want to rely on that,
> PostPrepare_Locks is an example of how to transfer lock ownership from
> one process to another correctly.

Yes, I realised after I wrote it that PostPrepare already does that
switch, just been busy with other stuff to switch over the code.

I think we do need some special code because of handling whole lock
queues. i.e. if there is a backend requesting an AEL but a prepared xact
has it, the lock queue will initially be Backend->Startup and needs to
end up looking like Backend->DummyProc.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot standby, prepared xacts, locks

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Wed, 2009-10-21 at 19:37 +0300, Heikki Linnakangas wrote:
> 
>> So, I'm quite eager to just revert all those lock_twophase_recover()
>> changes, and always rely on the "grant lock to dummy proc, then
>> release
>> it in startup process" method. If we don't want to rely on that,
>> PostPrepare_Locks is an example of how to transfer lock ownership from
>> one process to another correctly.
> 
> Yes, I realised after I wrote it that PostPrepare already does that
> switch, just been busy with other stuff to switch over the code.
> 
> I think we do need some special code because of handling whole lock
> queues. i.e. if there is a backend requesting an AEL but a prepared xact
> has it, the lock queue will initially be Backend->Startup and needs to
> end up looking like Backend->DummyProc.

Hmm, dunno about that, but there is one problem with the "grant to dummy
proc, then release in startup process" approach. What if there isn't
enough shared memory available to re-acquire the lock for the dummy
proc? It would be rather unfortunate to throw an error and shut down the
standby, instead of promoting it to a new master.

In fact, what happens if you ran out of shared memory when replaying a
relation_redo_lock record? Panic?

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot standby, prepared xacts, locks

From
Simon Riggs
Date:
On Wed, 2009-10-21 at 23:02 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Wed, 2009-10-21 at 19:37 +0300, Heikki Linnakangas wrote:
> > 
> >> So, I'm quite eager to just revert all those lock_twophase_recover()
> >> changes, and always rely on the "grant lock to dummy proc, then
> >> release
> >> it in startup process" method. If we don't want to rely on that,
> >> PostPrepare_Locks is an example of how to transfer lock ownership from
> >> one process to another correctly.
> > 
> > Yes, I realised after I wrote it that PostPrepare already does that
> > switch, just been busy with other stuff to switch over the code.
> > 
> > I think we do need some special code because of handling whole lock
> > queues. i.e. if there is a backend requesting an AEL but a prepared xact
> > has it, the lock queue will initially be Backend->Startup and needs to
> > end up looking like Backend->DummyProc.
> 
> Hmm, dunno about that, but there is one problem with the "grant to dummy
> proc, then release in startup process" approach. What if there isn't
> enough shared memory available to re-acquire the lock for the dummy
> proc? It would be rather unfortunate to throw an error and shut down the
> standby, instead of promoting it to a new master.

Any error would be unfortunate at that point. That particular error
seems unlikely, since we are only talking about AccessExclusiveLocks. If
the server has a problem with that many locks, then it is severely in
danger from prepared transactions in the general case, since such errors
could be also be thrown by the current code in mildly different
circumstances.

Do you see any alternative approaches to the one taken?

I have documented the requirement for max_locks_per_transaction to be as
high or higher than on master, as is the case for other parameters.

> In fact, what happens if you ran out of shared memory when replaying a
> relation_redo_lock record? Panic?

An ERROR in the startup process will cause it to upgrade to FATAL,
AFAIK. That means the server will do a crash shutdown, AIUI. That is the
equivalent of a PANIC, I guess. How else could it behave?

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot standby, prepared xacts, locks

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Wed, 2009-10-21 at 23:02 +0300, Heikki Linnakangas wrote:
>> Hmm, dunno about that, but there is one problem with the "grant to dummy
>> proc, then release in startup process" approach. What if there isn't
>> enough shared memory available to re-acquire the lock for the dummy
>> proc? It would be rather unfortunate to throw an error and shut down the
>> standby, instead of promoting it to a new master.
> 
> Any error would be unfortunate at that point. That particular error
> seems unlikely, since we are only talking about AccessExclusiveLocks. If
> the server has a problem with that many locks, then it is severely in
> danger from prepared transactions in the general case, since such errors
> could be also be thrown by the current code in mildly different
> circumstances.

Note that read-only backends also occupy lock space when they run
queries, taking AccessShareLocks.

> Do you see any alternative approaches to the one taken?

Making some effort to transfer locks instead of acquiring+releasing
would eliminate the need for having extra lock space available when
switching from hot standby mode to normal operation.

>> In fact, what happens if you ran out of shared memory when replaying a
>> relation_redo_lock record? Panic?
> 
> An ERROR in the startup process will cause it to upgrade to FATAL,
> AFAIK. That means the server will do a crash shutdown, AIUI. That is the
> equivalent of a PANIC, I guess. How else could it behave?

It could wait for backends to release locks. If there isn't any, then
you don't have much choice.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot standby, prepared xacts, locks

From
Simon Riggs
Date:
On Thu, 2009-10-22 at 07:55 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Wed, 2009-10-21 at 23:02 +0300, Heikki Linnakangas wrote:
> >> Hmm, dunno about that, but there is one problem with the "grant to dummy
> >> proc, then release in startup process" approach. What if there isn't
> >> enough shared memory available to re-acquire the lock for the dummy
> >> proc? It would be rather unfortunate to throw an error and shut down the
> >> standby, instead of promoting it to a new master.
> > 
> > Any error would be unfortunate at that point. That particular error
> > seems unlikely, since we are only talking about AccessExclusiveLocks. If
> > the server has a problem with that many locks, then it is severely in
> > danger from prepared transactions in the general case, since such errors
> > could be also be thrown by the current code in mildly different
> > circumstances.
> 
> Note that read-only backends also occupy lock space when they run
> queries, taking AccessShareLocks.
> 
> > Do you see any alternative approaches to the one taken?
> 
> Making some effort to transfer locks instead of acquiring+releasing
> would eliminate the need for having extra lock space available when
> switching from hot standby mode to normal operation.

This isn't very clear. You started by saying you were quite eager to
always grant and then release; this sounds like you don't want that now,
but you now again like the approach I had already attempted to take.
Please explain a little more.

Well spotted on the pending bug, BTW.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot standby, prepared xacts, locks

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Thu, 2009-10-22 at 07:55 +0300, Heikki Linnakangas wrote:
>> Making some effort to transfer locks instead of acquiring+releasing
>> would eliminate the need for having extra lock space available when
>> switching from hot standby mode to normal operation.
> 
> This isn't very clear. You started by saying you were quite eager to
> always grant and then release; this sounds like you don't want that now,
> but you now again like the approach I had already attempted to take.

Yeah, I haven't made up my mind. What's in there now is certainly
broken, so we need to do something. The simplest approach would be to
revert the changes in lock_twophase_recover(), while transfering the
locks with something like AtPrepare_Locks() would be more robust in the
face of shared memory shortage.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot standby, prepared xacts, locks

From
Simon Riggs
Date:
On Thu, 2009-10-22 at 09:41 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Thu, 2009-10-22 at 07:55 +0300, Heikki Linnakangas wrote:
> >> Making some effort to transfer locks instead of acquiring+releasing
> >> would eliminate the need for having extra lock space available when
> >> switching from hot standby mode to normal operation.
> > 
> > This isn't very clear. You started by saying you were quite eager to
> > always grant and then release; this sounds like you don't want that now,
> > but you now again like the approach I had already attempted to take.
> 
> Yeah, I haven't made up my mind. What's in there now is certainly
> broken, so we need to do something. 

Agreed

> The simplest approach 

is the best

> would be to
> revert the changes in lock_twophase_recover(), while transfering the
> locks with something like AtPrepare_Locks() would be more robust in the
> face of shared memory shortage.

Will look into it

-- Simon Riggs           www.2ndQuadrant.com