Thread: Hot Standby and prepared transactions
I get this when testing prepared transactions, which looks like it is unrelated to Hot Standby. 2009-12-15 17:28:08 GMT[10385]LOG: archive recovery complete 2009-12-15 17:28:08 GMT[10428]LOG: checkpoint starting: end-of-recovery immediate wait 2009-12-15 17:28:08 GMT[10428]DEBUG: creating and filling new WAL file 2009-12-15 17:28:08 GMT[10428]DEBUG: done creating and filling new WAL file 2009-12-15 17:28:08 GMT[10428]LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=0.790 s 2009-12-15 17:28:08 GMT[10385]LOG: recovering prepared transaction 122944 TRAP: FailedAssertion("!(*ptr == ((TransactionId) 0))", File: "subtrans.c", Line: 87) 2009-12-15 17:28:09 GMT[10380]LOG: startup process (PID 10385) was terminated by signal 6: Aborted This looks like the code is setting the parent to be zero. That assertion wasn't added by the patch. 2-phase state file attached, zipped. I think we need some tools to inspect state files. Can you look at this please? -- Simon Riggs www.2ndQuadrant.com
Attachment
On Tue, 2009-12-15 at 18:49 +0000, Simon Riggs wrote: > This looks like the code is setting the parent to be zero. [Sorry, that comment is completely off, misread the line number.] The assertion is failing because the parent entry for that subxid had already been set. Investigating how that could come about, it looks like there is some fairly strange stuff going on here. StandbyRecoverPreparedTransactions() is never called at all. The problem itself seems to be that xid assignment records cause us to SubTransSetParent(), which is then repeated again during RecoverPreparedTransactions(). So in Hot Standby mode it is possible that we could mark the parent twice, just not in most cases. So proposed fix is * Include StandbyRecoverPreparedTransactions() somewhere in Hot Standby startup, presumably after StartSubtrans(). * Skip assert when doing SubTransSetParent in RecoverPreparedTransactions() when in Hot Standby mode because some subxids will have been marked already ISTM there is a further problem here: we log AccessExclusiveLocks for all transactions, even ones that eventually get prepared. So we have already separately logged the locks a prepared transaction will take. All we need to do is reconstruct the subtrans entries. So StandbyRecoverPreparedTransactions() can be edited down somewhat. If you think this is correct, please say. If you know or suspect other issues exist, let me know. Just for reference, the issues relate to a number of distinct cases: * xacts that prepare prior to starting point of HS * xacts that remain prepared at end of recovery * xacts that do both of the above things We also need to cover the cases where we have prepared xacts with AccessExclusiveLocks and/or subxacts (subcase of >64 subxids). AFAICS we have 2 bugs already and a enough complexity to make me believe there are more lurking. I'm about ready to disable Hot Standby completely when max_prepared_transactions > 0, at least for Alpha3 and possibly for 8.5 altogether. Comments? -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > Investigating how that could come about, it looks like there is some > fairly strange stuff going on here. StandbyRecoverPreparedTransactions() > is never called at all. I told you so: http://archives.postgresql.org/message-id/4B260B5E.3010401@enterprisedb.com StandbyRecoverPreparedTransactions() is only needed to pick up the stateof prepared transactions when starting standby froma shutdown checkpoint. At online checkpoints. they are included in the running-xacts record like all other transactions. > * Skip assert when doing SubTransSetParent in > RecoverPreparedTransactions() when in Hot Standby mode because some > subxids will have been marked already Seems reasonable. Let's still Assert that the parent is the same as last time. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2009-12-16 at 09:10 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > Investigating how that could come about, it looks like there is some > > fairly strange stuff going on here. StandbyRecoverPreparedTransactions() > > is never called at all. > > I told you so: > http://archives.postgresql.org/message-id/4B260B5E.3010401@enterprisedb.com OK, I didn't understand that. Removed. > > * Skip assert when doing SubTransSetParent in > > RecoverPreparedTransactions() when in Hot Standby mode because some > > subxids will have been marked already > > Seems reasonable. Let's still Assert that the parent is the same as last > time. Fix committed to git. --- That fixes or explains all known issues, from me. Are there any other things you know about that I haven't responded to? Do you think we have addressed every issue, except deferred items? I will be looking to commit to CVS later today; waiting on any objections. -- Simon Riggs www.2ndQuadrant.com
>That fixes or explains all known issues, from me. Are there any other >things you know about that I haven't responded to? Do you think we have >addressed every issue, except deferred items? > >I will be looking to commit to CVS later today; waiting on any >objections. > Is following problem reported or fixed ? ----- 1. configure with --enable-cassert option, then make, make install 2. initdb, enable WAL archiving 3. run the server 4. run pgbench -i, with scaling factor 10 or more 5. server dies with following backtrace (gdb) backtrace #0 0x009e17a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x00a22815 in raise () from /lib/tls/libc.so.6 #2 0x00a24279 in abort () from /lib/tls/libc.so.6 #3 0x082dbf98 in ExceptionalCondition (conditionName=0x84201d4 "!(lock->nGranted == 1)", errorType=0x8308dd4 "FailedAssertion", fileName=0x8420fb2 "lock.c", lineNumber=2296) at assert.c:57 #4 0x08231127 in GetRunningTransactionLocks (nlocks=0x0) at lock.c:2296 #5 0x0822c110 in LogStandbySnapshot (oldestActiveXid=0x0, nextXid=0x0) at standby.c:578 #6 0x080cc13f in CreateCheckPoint (flags=32) at xlog.c:6826 #7 0x08204cf6 in BackgroundWriterMain () at bgwriter.c:490 #8 0x080ec291 in AuxiliaryProcessMain (argc=2, argv=0xbff25cc4) at bootstrap.c:413 #9 0x0820b0af in StartChildProcess (type=Variable "type" is not available. ) at postmaster.c:4218 #10 0x0820c722 in reaper (postgres_signal_arg=17) at postmaster.c:2322 #11 <signal handler called> #12 0x009e17a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #13 0x00abcbbd in ___newselect_nocancel () from /lib/tls/libc.so.6 #14 0x0820b2b8 in ServerLoop () at postmaster.c:1360 #15 0x0820d59e in PostmasterMain (argc=3, argv=0x8579860) at postmaster.c:1065 #16 0x081b78f8 in main (argc=3, argv=0x8579860) at main.c:188 ----- Also, is the problem reported in http://archives.postgresql.org/pgsql-hackers/2009-12/msg01324.php fixed or deferred ? regrards, -- Hiroyuki YAMADA Kokolink Corporation yamada@kokolink.net
On Wed, 2009-12-16 at 18:08 +0900, Hiroyuki Yamada wrote: > >That fixes or explains all known issues, from me. Are there any other > >things you know about that I haven't responded to? Do you think we have > >addressed every issue, except deferred items? > > > >I will be looking to commit to CVS later today; waiting on any > >objections. > > > > Is following problem reported or fixed ? That is fixed, as of a couple of days ago. Thanks for your vigilence. > Also, is the problem reported in http://archives.postgresql.org/pgsql-hackers/2009-12/msg01324.php > fixed or deferred ? Wasn't aware of it. Am now. Will respond on that thread -- Simon Riggs www.2ndQuadrant.com
>On Wed, 2009-12-16 at 18:08 +0900, Hiroyuki Yamada wrote: >> >That fixes or explains all known issues, from me. Are there any other >> >things you know about that I haven't responded to? Do you think we have >> >addressed every issue, except deferred items? >> > >> >I will be looking to commit to CVS later today; waiting on any >> >objections. >> > >> >> Is following problem reported or fixed ? > >That is fixed, as of a couple of days ago. Thanks for your vigilence. > I tested somewhat older patch(the RC patch in this mailing list). Sorry for annoying you. By the way, reading LogStandbySnapshot() and GetRunningTransactionLocks() raised following questions. * There is a window beween gathering lock information in GetRunningTransactionLocks() and writing WAL in LogAccessExclusiveLocks().*In current lock redo algorithm, locks are released when the transaction holding the lock arecommited or aborted. ... then what happens if any transaction holding ACCESS EXCLUSIVE lock commits in the window ? Similary, * There is a window beween writing COMMIT WAL in RecordTransactionCommit() and releasing locks in ResourceOwnerRelease() ... then what happens when GetRunningTransactionLocks() gathers ACCESS EXCLUSIVE locks whose holder has already writtenthe COMMIT WAL ? Are there any chances of releasing locks which have no COMMIT WAL for releasing them ? regards, -- Hiroyuki YAMADA Kokolink Corporation yamada@kokolink.net
Hiroyuki Yamada wrote: > By the way, reading LogStandbySnapshot() and GetRunningTransactionLocks() > raised following questions. > > * There is a window beween gathering lock information in GetRunningTransactionLocks() > and writing WAL in LogAccessExclusiveLocks(). > * In current lock redo algorithm, locks are released when the transaction holding the lock > are commited or aborted. > > ... then what happens if any transaction holding ACCESS EXCLUSIVE lock commits in the > window ? Hmm, when replying the locks record, we should be ignoring entries belonging to already-committed transactions. But I don't actually see that in the code. That needs to be fixed, I think. > Similary, > > * There is a window beween writing COMMIT WAL in RecordTransactionCommit() and > releasing locks in ResourceOwnerRelease() > > ... then what happens when GetRunningTransactionLocks() gathers ACCESS EXCLUSIVE > locks whose holder has already written the COMMIT WAL ? This is handled in standby in the same manner as the above case. > Are there any chances of releasing locks which have no COMMIT WAL for releasing them ? Yes. In normal operation, there should eventually be a commit or abort WAL record for every transaction. But as a safety valve, when a running-xacts record is replayed, we mark as aborted all transactions that are older than the oldest still-running XID according to the running-xacts record, and release their locks. See StandbyReleaseOldLocks(). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2009-12-16 at 19:35 +0900, Hiroyuki Yamada wrote: > Sorry for annoying you. Not at all! Good to get fresh eyes on this. -- Simon Riggs www.2ndQuadrant.com
On Wed, 2009-12-16 at 19:35 +0900, Hiroyuki Yamada wrote: > * There is a window beween gathering lock information in GetRunningTransactionLocks() > and writing WAL in LogAccessExclusiveLocks(). > * In current lock redo algorithm, locks are released when the transaction holding the lock > are commited or aborted. > > ... then what happens if any transaction holding ACCESS EXCLUSIVE lock commits in the > window ? Yes, was a problem in that code. Fixed in git. We were doing it for prepared transactions but not for normal xacts. I will look again at that code. Thanks very much for reading the code. Any more?!? > Similary, > > * There is a window beween writing COMMIT WAL in RecordTransactionCommit() and > releasing locks in ResourceOwnerRelease() > > ... then what happens when GetRunningTransactionLocks() gathers ACCESS EXCLUSIVE > locks whose holder has already written the COMMIT WAL ? Same issue, so fixed as well. Thanks to Heikki for making that the same code path, so we have nothing to do for that. > Are there any chances of releasing locks which have no COMMIT WAL for releasing them ? No -- Simon Riggs www.2ndQuadrant.com
>On Wed, 2009-12-16 at 19:35 +0900, Hiroyuki Yamada wrote: > >> * There is a window beween gathering lock information in GetRunningTransactionLocks() >> and writing WAL in LogAccessExclusiveLocks(). >> * In current lock redo algorithm, locks are released when the transaction holding the lock >> are commited or aborted. >> >> ... then what happens if any transaction holding ACCESS EXCLUSIVE lock commits in the >> window ? > >Yes, was a problem in that code. Fixed in git. > >We were doing it for prepared transactions but not for normal xacts. >I will look again at that code. > >Thanks very much for reading the code. Any more?!? > Well, I've read some more and have a question. The implementation assumes that transactions write COMMIT/ABORT WAL at the end of them, while it does not seem to write ABORT WAL in immediate shutdown. So, 1. acquire ACCESS EXCLUSIVE lock in table A in xact 1 2. execute immediate shutdown of the active node 3. restart it 4. acquire ACCESS EXCLUSIVE lock in table A in xact 2 ...then, duplicate lock acquisition by two diffrent transactions can occur in the standby node. Am I missing something ? Or is this already reported ? regards, -- Hiroyuki YAMADA Kokolink Corporation yamada@kokolink.net
Hiroyuki Yamada wrote: > The implementation assumes that transactions write COMMIT/ABORT WAL at the end > of them, while it does not seem to write ABORT WAL in immediate shutdown. So, > > 1. acquire ACCESS EXCLUSIVE lock in table A in xact 1 > 2. execute immediate shutdown of the active node > 3. restart it > 4. acquire ACCESS EXCLUSIVE lock in table A in xact 2 > > ...then, duplicate lock acquisition by two diffrent transactions can occur in the standby node. > > Am I missing something ? Or is this already reported ? Replay of the shutdown checkpoint that is written when the server is restarted should abort the old transaction and release its locks. Hmm, looking at the code, I think Simon threw that baby with the bathwater when he removed support for starting standby from a shutdown checkpoint. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, 2009-12-17 at 19:55 +0900, Hiroyuki Yamada wrote: > Well, I've read some more and have a question. > > The implementation assumes that transactions write COMMIT/ABORT WAL at the end > of them, while it does not seem to write ABORT WAL in immediate shutdown. So, > > 1. acquire ACCESS EXCLUSIVE lock in table A in xact 1 > 2. execute immediate shutdown of the active node > 3. restart it > 4. acquire ACCESS EXCLUSIVE lock in table A in xact 2 > > ...then, duplicate lock acquisition by two diffrent transactions can occur in the standby node. > > Am I missing something ? Or is this already reported ? This was a tricky point in the design because we already knew that abort records are not always written for every transaction. ProcArrayApplyRecoveryInfo() was specifically designed to prune away older transactions that might have become stuck in that state, which calls StandbyReleaseOldLocks() to remove any such locks. The pruning operation is also one of the reasons why we need to log XLOG_RUNNING_XACTS on a regular basis. Duplicate lock acquisition isn't specifically checked for, since it is blocked on primary, but lock release if duplicates did exist is handled. I've checked the code some more to see if the above is possible. At step (3) we perform a shutdown checkpoint, which would/should be responsible for performing the prune operation that would prevent your case from being a problem. The code around shutdown checkpoint has changed a few times and it looks like that bug has been introduced by my edit on Dec 6 to prevent shutdown checkpoint as starting places. They need to be handled, since we now don't write a XLOG_RUNNING_XACTS record during a shutdown checkpoint. I will re-add this now. Thanks again: keep going, you're on a roll. -- Simon Riggs www.2ndQuadrant.com
On Thu, 2009-12-17 at 13:24 +0200, Heikki Linnakangas wrote: > Hmm, looking at the code, I think Simon threw that baby with the > bathwater when he removed support for starting standby from a shutdown > checkpoint. Hmm, I think that code was just for starting points only. It would not have been executed in normal running of the standby, so it appears the bug was older than that. Absence of baby now appears inconclusive. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > On Thu, 2009-12-17 at 13:24 +0200, Heikki Linnakangas wrote: > >> Hmm, looking at the code, I think Simon threw that baby with the >> bathwater when he removed support for starting standby from a shutdown >> checkpoint. > > Hmm, I think that code was just for starting points only. It would not > have been executed in normal running of the standby, so it appears the > bug was older than that. Absence of baby now appears inconclusive. This is the piece of code we're talking about, in xlog_redo(): --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7340,41 +7340,6 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record) ShmemVariableCache->oldestXid = checkPoint.oldestXid; ShmemVariableCache->oldestXidDB = checkPoint.oldestXidDB; - /* - * We know nothing was running on the master at this point, though - * we can only use it as a starting point iff wal_standby_info was - * enabled, otherwise we may not get further information about changes - * from the master. - */ - if (standbyState >= STANDBY_UNINITIALIZED && checkPoint.XLogStandbyInfoMode) - { - RunningTransactionsData running; - TransactionId oldestRunningXid; - TransactionId *xids; - int nxids; - - oldestRunningXid = PrescanPreparedTransactions(&xids, &nxids); - - /* - * Construct a RunningTransactions snapshot representing a shut - * down server, with only prepared transactions still alive. - * - * StandbyRecoverPreparedTransactions will call SubTransSetParent - * for any subtransactions, so we consider this a non-overflowed - * snapshot. - */ - running.xcnt = nxids; - running.subxid_overflow = false; - running.nextXid = checkPoint.nextXid; - running.oldestRunningXid = oldestRunningXid; - running.xids = xids; - - ProcArrayInitRecoveryInfo(oldestRunningXid); - ProcArrayApplyRecoveryInfo(&running); - - StandbyRecoverPreparedTransactions(); - } - /* Check to see if any changes to max_connections give problems */ if (standbyState != STANDBY_DISABLED) CheckRequiredParameterValues(checkPoint); That removed piece of code was executed in the standby whenever we saw a shutdown checkpoint. It calls ProcArrayApplyRecoveryInfo(), which calls ExpireOldKnownAssignedTransactionIds() and StandbyReleaseOldLocks() to clean up known-assigned-xid entries and locks of the implicitly-aborted transactions. I see now that in the presence of prepared transactions, we would fail to clean up failed transations with XID > the oldest prepared transaction, but other than that it looks fine to me. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, 2009-12-17 at 15:18 +0200, Heikki Linnakangas wrote: > That removed piece of code was executed in the standby whenever we saw a > shutdown checkpoint. It calls ProcArrayApplyRecoveryInfo(), which calls > ExpireOldKnownAssignedTransactionIds() and StandbyReleaseOldLocks() to > clean up known-assigned-xid entries and locks of the implicitly-aborted > transactions. OK, I was presuming that running StandbyRecoverPreparedTransactions() and ProcArrayInitRecoveryInfo() twice would cause problems. > I see now that in the presence of prepared transactions, we would fail > to clean up failed transations with XID > the oldest prepared > transaction Good! I just spotted that also, just prior to posting my fix, so rewriting it again now. -- Simon Riggs www.2ndQuadrant.com
On Thu, 2009-12-17 at 13:38 +0000, Simon Riggs wrote: > > I see now that in the presence of prepared transactions, we would fail > > to clean up failed transations with XID > the oldest prepared > > transaction > > Good! I just spotted that also, just prior to posting my fix, so > rewriting it again now. solution pushed to git -- Simon Riggs www.2ndQuadrant.com