Thread: Hot standby, prepared xacts, locks
The Hot Standby patch changes lock_twophase_recover() so that when we're starting up from Hot Standby mode to normal operation, as opposed to crash recovery, we assume that all AccessExcusiveLocks are already held by the startup process and instead of acquiring them anew they are transferred from the startup process to the dummy PGPROC entry. As the patch stands, that works. However, if we don't see a running-xacts record, but only some XLOG_RELATION_LOCK records followed by a PREPARE record, the startup process will hold some of the transactions locks but not all of them. That's OK because we're not letting read-only backends in yet. If we then failover and bring the standby server up as a new master, lock_twophase_recover() will grant the locks directly for the dummy process. But the startup process is already holding some of them, so we briefly have a situation where two processes - the dummy process and the startup process - are both holding the same AccessExclusiveLock. AFAICS, the lock manager can cope with that situation just fine, even though it can't normally happen, but it's worth a mention. A while back in your branch you changed ProcArrayApplyRecoverInfo() so that the standby is put into "pending" mode when it sees a running-xacts record marked as 'locks overflowed', instead of just ignoring the record. If we see such a record, go into pending mode, and then try to bring up the standby as the new master, we will run into this error in lock_twophase_recover(): > /* > * We should already hold the desired lock. > */ > if (!(proclock->holdMask & LOCKBIT_ON(lockmode))) > elog(ERROR, "lock %s on object %u/%u/%u by xid %u was not held by startup process", > lockMethodTable->lockModeNames[lockmode], > locktag->locktag_field1, locktag->locktag_field2, > locktag->locktag_field3, > xid); > lock_twophase_recover() assumes that all locks held by the prepared transactions are already held by the startup process, but that might not be true in the pending state. Given that even without that change in ProcArrayApplyRecoveryInfo(), we can end up in a situation where both the dummy process and the startup process hold the same AccessExclusiveLock, I think we could simply revert all the changes in lock_twophase_recover() and rely on that behavior whenever we end recovery. BTW, this self-proclaimed ugly hack in lock_twophase_recover() looks utterly broken: > /* > * Transfer the lock->procLocks entry so that the lock is now owned > * by the new gxact proc. Nothing else need change. > * This can happen safely because the startup process has the lock > * and wishes to give the lock to a gxact dummy process that has > * no associated backend. So neither proc can change concurrently. > * This technique might normally be considered an ugly hack, yet > * managing dummy procs all the way through recovery resulted in > * more complex code, which was worse. Chiedere il permesso. > */ > SHMQueueInit(&lock->procLocks); > SHMQueueInsertBefore(&lock->procLocks, &proclock->lockLink); > SHMQueueInsertBefore(&(proc->myProcLocks[partition]), > &proclock->procLink); It will blow away any PROCLOCKs belonging to other backends waiting for the lock, and you can't just transfer a PROCLOCK entry from one backend to another anyway, because the PGPROC pointer is part of its the hash key. But then again, I don't think it's actually transferring the startup process' PROCLOCK entry anyway, because the PROCLOCK entry searched/created just above that belongs to the dummy proc, not the startup process. So this is really just blowing away all other PROCLOCKs from the lock's procLock queue, and possibly inserting a duplicate entry into proc->myProcLocks[partition]. Ugh. So, I'm quite eager to just revert all those lock_twophase_recover() changes, and always rely on the "grant lock to dummy proc, then release it in startup process" method. If we don't want to rely on that, PostPrepare_Locks is an example of how to transfer lock ownership from one process to another correctly. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2009-10-21 at 19:37 +0300, Heikki Linnakangas wrote: > So, I'm quite eager to just revert all those lock_twophase_recover() > changes, and always rely on the "grant lock to dummy proc, then > release > it in startup process" method. If we don't want to rely on that, > PostPrepare_Locks is an example of how to transfer lock ownership from > one process to another correctly. Yes, I realised after I wrote it that PostPrepare already does that switch, just been busy with other stuff to switch over the code. I think we do need some special code because of handling whole lock queues. i.e. if there is a backend requesting an AEL but a prepared xact has it, the lock queue will initially be Backend->Startup and needs to end up looking like Backend->DummyProc. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > On Wed, 2009-10-21 at 19:37 +0300, Heikki Linnakangas wrote: > >> So, I'm quite eager to just revert all those lock_twophase_recover() >> changes, and always rely on the "grant lock to dummy proc, then >> release >> it in startup process" method. If we don't want to rely on that, >> PostPrepare_Locks is an example of how to transfer lock ownership from >> one process to another correctly. > > Yes, I realised after I wrote it that PostPrepare already does that > switch, just been busy with other stuff to switch over the code. > > I think we do need some special code because of handling whole lock > queues. i.e. if there is a backend requesting an AEL but a prepared xact > has it, the lock queue will initially be Backend->Startup and needs to > end up looking like Backend->DummyProc. Hmm, dunno about that, but there is one problem with the "grant to dummy proc, then release in startup process" approach. What if there isn't enough shared memory available to re-acquire the lock for the dummy proc? It would be rather unfortunate to throw an error and shut down the standby, instead of promoting it to a new master. In fact, what happens if you ran out of shared memory when replaying a relation_redo_lock record? Panic? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2009-10-21 at 23:02 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Wed, 2009-10-21 at 19:37 +0300, Heikki Linnakangas wrote: > > > >> So, I'm quite eager to just revert all those lock_twophase_recover() > >> changes, and always rely on the "grant lock to dummy proc, then > >> release > >> it in startup process" method. If we don't want to rely on that, > >> PostPrepare_Locks is an example of how to transfer lock ownership from > >> one process to another correctly. > > > > Yes, I realised after I wrote it that PostPrepare already does that > > switch, just been busy with other stuff to switch over the code. > > > > I think we do need some special code because of handling whole lock > > queues. i.e. if there is a backend requesting an AEL but a prepared xact > > has it, the lock queue will initially be Backend->Startup and needs to > > end up looking like Backend->DummyProc. > > Hmm, dunno about that, but there is one problem with the "grant to dummy > proc, then release in startup process" approach. What if there isn't > enough shared memory available to re-acquire the lock for the dummy > proc? It would be rather unfortunate to throw an error and shut down the > standby, instead of promoting it to a new master. Any error would be unfortunate at that point. That particular error seems unlikely, since we are only talking about AccessExclusiveLocks. If the server has a problem with that many locks, then it is severely in danger from prepared transactions in the general case, since such errors could be also be thrown by the current code in mildly different circumstances. Do you see any alternative approaches to the one taken? I have documented the requirement for max_locks_per_transaction to be as high or higher than on master, as is the case for other parameters. > In fact, what happens if you ran out of shared memory when replaying a > relation_redo_lock record? Panic? An ERROR in the startup process will cause it to upgrade to FATAL, AFAIK. That means the server will do a crash shutdown, AIUI. That is the equivalent of a PANIC, I guess. How else could it behave? -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > On Wed, 2009-10-21 at 23:02 +0300, Heikki Linnakangas wrote: >> Hmm, dunno about that, but there is one problem with the "grant to dummy >> proc, then release in startup process" approach. What if there isn't >> enough shared memory available to re-acquire the lock for the dummy >> proc? It would be rather unfortunate to throw an error and shut down the >> standby, instead of promoting it to a new master. > > Any error would be unfortunate at that point. That particular error > seems unlikely, since we are only talking about AccessExclusiveLocks. If > the server has a problem with that many locks, then it is severely in > danger from prepared transactions in the general case, since such errors > could be also be thrown by the current code in mildly different > circumstances. Note that read-only backends also occupy lock space when they run queries, taking AccessShareLocks. > Do you see any alternative approaches to the one taken? Making some effort to transfer locks instead of acquiring+releasing would eliminate the need for having extra lock space available when switching from hot standby mode to normal operation. >> In fact, what happens if you ran out of shared memory when replaying a >> relation_redo_lock record? Panic? > > An ERROR in the startup process will cause it to upgrade to FATAL, > AFAIK. That means the server will do a crash shutdown, AIUI. That is the > equivalent of a PANIC, I guess. How else could it behave? It could wait for backends to release locks. If there isn't any, then you don't have much choice. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, 2009-10-22 at 07:55 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Wed, 2009-10-21 at 23:02 +0300, Heikki Linnakangas wrote: > >> Hmm, dunno about that, but there is one problem with the "grant to dummy > >> proc, then release in startup process" approach. What if there isn't > >> enough shared memory available to re-acquire the lock for the dummy > >> proc? It would be rather unfortunate to throw an error and shut down the > >> standby, instead of promoting it to a new master. > > > > Any error would be unfortunate at that point. That particular error > > seems unlikely, since we are only talking about AccessExclusiveLocks. If > > the server has a problem with that many locks, then it is severely in > > danger from prepared transactions in the general case, since such errors > > could be also be thrown by the current code in mildly different > > circumstances. > > Note that read-only backends also occupy lock space when they run > queries, taking AccessShareLocks. > > > Do you see any alternative approaches to the one taken? > > Making some effort to transfer locks instead of acquiring+releasing > would eliminate the need for having extra lock space available when > switching from hot standby mode to normal operation. This isn't very clear. You started by saying you were quite eager to always grant and then release; this sounds like you don't want that now, but you now again like the approach I had already attempted to take. Please explain a little more. Well spotted on the pending bug, BTW. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > On Thu, 2009-10-22 at 07:55 +0300, Heikki Linnakangas wrote: >> Making some effort to transfer locks instead of acquiring+releasing >> would eliminate the need for having extra lock space available when >> switching from hot standby mode to normal operation. > > This isn't very clear. You started by saying you were quite eager to > always grant and then release; this sounds like you don't want that now, > but you now again like the approach I had already attempted to take. Yeah, I haven't made up my mind. What's in there now is certainly broken, so we need to do something. The simplest approach would be to revert the changes in lock_twophase_recover(), while transfering the locks with something like AtPrepare_Locks() would be more robust in the face of shared memory shortage. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, 2009-10-22 at 09:41 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Thu, 2009-10-22 at 07:55 +0300, Heikki Linnakangas wrote: > >> Making some effort to transfer locks instead of acquiring+releasing > >> would eliminate the need for having extra lock space available when > >> switching from hot standby mode to normal operation. > > > > This isn't very clear. You started by saying you were quite eager to > > always grant and then release; this sounds like you don't want that now, > > but you now again like the approach I had already attempted to take. > > Yeah, I haven't made up my mind. What's in there now is certainly > broken, so we need to do something. Agreed > The simplest approach is the best > would be to > revert the changes in lock_twophase_recover(), while transfering the > locks with something like AtPrepare_Locks() would be more robust in the > face of shared memory shortage. Will look into it -- Simon Riggs www.2ndQuadrant.com