Thread: Hot Standby and prepared transactions

Hot Standby and prepared transactions

From
Simon Riggs
Date:
I get this when testing prepared transactions, which looks like it is
unrelated to Hot Standby.

2009-12-15 17:28:08 GMT[10385]LOG:  archive recovery complete
2009-12-15 17:28:08 GMT[10428]LOG:  checkpoint starting: end-of-recovery
immediate wait
2009-12-15 17:28:08 GMT[10428]DEBUG:  creating and filling new WAL file
2009-12-15 17:28:08 GMT[10428]DEBUG:  done creating and filling new WAL
file
2009-12-15 17:28:08 GMT[10428]LOG:  checkpoint complete: wrote 0 buffers
(0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled;
write=0.000 s, sync=0.000 s, total=0.790 s
2009-12-15 17:28:08 GMT[10385]LOG:  recovering prepared transaction
122944
TRAP: FailedAssertion("!(*ptr == ((TransactionId) 0))", File:
"subtrans.c", Line: 87)
2009-12-15 17:28:09 GMT[10380]LOG:  startup process (PID 10385) was
terminated by signal 6: Aborted

This looks like the code is setting the parent to be zero. That
assertion wasn't added by the patch.

2-phase state file attached, zipped. I think we need some tools to
inspect state files.

Can you look at this please?

--
 Simon Riggs           www.2ndQuadrant.com

Attachment

Re: Hot Standby and prepared transactions

From
Simon Riggs
Date:
On Tue, 2009-12-15 at 18:49 +0000, Simon Riggs wrote:
> This looks like the code is setting the parent to be zero.

[Sorry, that comment is completely off, misread the line number.]

The assertion is failing because the parent entry for that subxid had
already been set.

Investigating how that could come about, it looks like there is some
fairly strange stuff going on here. StandbyRecoverPreparedTransactions()
is never called at all.

The problem itself seems to be that xid assignment records cause us to
SubTransSetParent(), which is then repeated again during
RecoverPreparedTransactions(). So in Hot Standby mode it is possible
that we could mark the parent twice, just not in most cases.

So proposed fix is

* Include StandbyRecoverPreparedTransactions() somewhere in Hot Standby
startup, presumably after StartSubtrans().

* Skip assert when doing SubTransSetParent in
RecoverPreparedTransactions() when in Hot Standby mode because some
subxids will have been marked already

ISTM there is a further problem here: we log AccessExclusiveLocks for
all transactions, even ones that eventually get prepared. So we have
already separately logged the locks a prepared transaction will take.
All we need to do is reconstruct the subtrans entries. So
StandbyRecoverPreparedTransactions() can be edited down somewhat.

If you think this is correct, please say. If you know or suspect other
issues exist, let me know.

Just for reference, the issues relate to a number of distinct cases:
* xacts that prepare prior to starting point of HS
* xacts that remain prepared at end of recovery
* xacts that do both of the above things

We also need to cover the cases where we have prepared xacts with
AccessExclusiveLocks and/or subxacts (subcase of >64 subxids).

AFAICS we have 2 bugs already and a enough complexity to make me believe
there are more lurking. I'm about ready to disable Hot Standby
completely when max_prepared_transactions > 0, at least for Alpha3 and
possibly for 8.5 altogether.

Comments?

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby and prepared transactions

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> Investigating how that could come about, it looks like there is some
> fairly strange stuff going on here. StandbyRecoverPreparedTransactions()
> is never called at all.

I told you so:
http://archives.postgresql.org/message-id/4B260B5E.3010401@enterprisedb.com

StandbyRecoverPreparedTransactions() is only needed to pick up the stateof prepared transactions when starting standby
froma shutdown
 
checkpoint. At online checkpoints. they are included in the
running-xacts record like all other transactions.

> * Skip assert when doing SubTransSetParent in
> RecoverPreparedTransactions() when in Hot Standby mode because some
> subxids will have been marked already

Seems reasonable. Let's still Assert that the parent is the same as last
time.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby and prepared transactions

From
Simon Riggs
Date:
On Wed, 2009-12-16 at 09:10 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > Investigating how that could come about, it looks like there is some
> > fairly strange stuff going on here. StandbyRecoverPreparedTransactions()
> > is never called at all.
> 
> I told you so:
> http://archives.postgresql.org/message-id/4B260B5E.3010401@enterprisedb.com

OK, I didn't understand that. Removed.


> > * Skip assert when doing SubTransSetParent in
> > RecoverPreparedTransactions() when in Hot Standby mode because some
> > subxids will have been marked already
> 
> Seems reasonable. Let's still Assert that the parent is the same as last
> time.

Fix committed to git.

---

That fixes or explains all known issues, from me. Are there any other
things you know about that I haven't responded to? Do you think we have
addressed every issue, except deferred items?

I will be looking to commit to CVS later today; waiting on any
objections.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby and prepared transactions

From
Hiroyuki Yamada
Date:
>That fixes or explains all known issues, from me. Are there any other
>things you know about that I haven't responded to? Do you think we have
>addressed every issue, except deferred items?
>
>I will be looking to commit to CVS later today; waiting on any
>objections.
>

Is following problem reported or fixed ?

-----
1. configure with --enable-cassert option, then make, make install
2. initdb, enable WAL archiving
3. run the server
4. run pgbench -i, with scaling factor 10 or more
5. server dies with following backtrace

(gdb) backtrace
#0  0x009e17a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00a22815 in raise () from /lib/tls/libc.so.6
#2  0x00a24279 in abort () from /lib/tls/libc.so.6
#3  0x082dbf98 in ExceptionalCondition (conditionName=0x84201d4 "!(lock->nGranted == 1)", errorType=0x8308dd4
"FailedAssertion",  fileName=0x8420fb2 "lock.c", lineNumber=2296) at assert.c:57
 
#4  0x08231127 in GetRunningTransactionLocks (nlocks=0x0) at lock.c:2296
#5  0x0822c110 in LogStandbySnapshot (oldestActiveXid=0x0, nextXid=0x0) at standby.c:578
#6  0x080cc13f in CreateCheckPoint (flags=32) at xlog.c:6826
#7  0x08204cf6 in BackgroundWriterMain () at bgwriter.c:490
#8  0x080ec291 in AuxiliaryProcessMain (argc=2, argv=0xbff25cc4) at bootstrap.c:413
#9  0x0820b0af in StartChildProcess (type=Variable "type" is not available.
) at postmaster.c:4218
#10 0x0820c722 in reaper (postgres_signal_arg=17) at postmaster.c:2322
#11 <signal handler called>
#12 0x009e17a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#13 0x00abcbbd in ___newselect_nocancel () from /lib/tls/libc.so.6
#14 0x0820b2b8 in ServerLoop () at postmaster.c:1360
#15 0x0820d59e in PostmasterMain (argc=3, argv=0x8579860) at postmaster.c:1065
#16 0x081b78f8 in main (argc=3, argv=0x8579860) at main.c:188
-----

Also, is the problem reported in http://archives.postgresql.org/pgsql-hackers/2009-12/msg01324.php
fixed or deferred ?


regrards,


-- Hiroyuki YAMADA Kokolink Corporation yamada@kokolink.net


Re: Hot Standby and prepared transactions

From
Simon Riggs
Date:
On Wed, 2009-12-16 at 18:08 +0900, Hiroyuki Yamada wrote:
> >That fixes or explains all known issues, from me. Are there any other
> >things you know about that I haven't responded to? Do you think we have
> >addressed every issue, except deferred items?
> >
> >I will be looking to commit to CVS later today; waiting on any
> >objections.
> >
> 
> Is following problem reported or fixed ?

That is fixed, as of a couple of days ago. Thanks for your vigilence.

> Also, is the problem reported in http://archives.postgresql.org/pgsql-hackers/2009-12/msg01324.php
> fixed or deferred ?

Wasn't aware of it. Am now. Will respond on that thread

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby and prepared transactions

From
Hiroyuki Yamada
Date:
>On Wed, 2009-12-16 at 18:08 +0900, Hiroyuki Yamada wrote:
>> >That fixes or explains all known issues, from me. Are there any other
>> >things you know about that I haven't responded to? Do you think we have
>> >addressed every issue, except deferred items?
>> >
>> >I will be looking to commit to CVS later today; waiting on any
>> >objections.
>> >
>> 
>> Is following problem reported or fixed ?
>
>That is fixed, as of a couple of days ago. Thanks for your vigilence.
>

I tested somewhat older patch(the RC patch in this mailing list). Sorry for annoying you.


By the way, reading LogStandbySnapshot() and GetRunningTransactionLocks()
raised following questions.
* There is a window beween gathering lock information in GetRunningTransactionLocks()  and writing WAL in
LogAccessExclusiveLocks().*In current lock redo algorithm, locks are released when the transaction holding the lock
arecommited or aborted.
 

... then what happens if any transaction holding ACCESS EXCLUSIVE lock commits in the   window ?

Similary,
* There is a window beween writing COMMIT WAL in RecordTransactionCommit() and   releasing locks in
ResourceOwnerRelease()

... then what happens when GetRunningTransactionLocks() gathers ACCESS EXCLUSIVE   locks whose holder has already
writtenthe COMMIT WAL ?
 


Are there any chances of releasing locks which have no COMMIT WAL for releasing them ?


regards,

-- Hiroyuki YAMADA Kokolink Corporation yamada@kokolink.net


Re: Hot Standby and prepared transactions

From
Heikki Linnakangas
Date:
Hiroyuki Yamada wrote:
> By the way, reading LogStandbySnapshot() and GetRunningTransactionLocks()
> raised following questions.
> 
>  * There is a window beween gathering lock information in GetRunningTransactionLocks()
>    and writing WAL in LogAccessExclusiveLocks().
>  * In current lock redo algorithm, locks are released when the transaction holding the lock
>    are commited or aborted.
> 
> ... then what happens if any transaction holding ACCESS EXCLUSIVE lock commits in the 
>    window ?

Hmm, when replying the locks record, we should be ignoring entries
belonging to already-committed transactions. But I don't actually see
that in the code. That needs to be fixed, I think.

> Similary,
> 
>  * There is a window beween writing COMMIT WAL in RecordTransactionCommit() and
>     releasing locks in ResourceOwnerRelease()
> 
> ... then what happens when GetRunningTransactionLocks() gathers ACCESS EXCLUSIVE 
>    locks whose holder has already written the COMMIT WAL ?

This is handled in standby in the same manner as the above case.

> Are there any chances of releasing locks which have no COMMIT WAL for releasing them ?

Yes. In normal operation, there should eventually be a commit or abort
WAL record for every transaction. But as a safety valve, when a
running-xacts record is replayed, we mark as aborted all transactions
that are older than the oldest still-running XID according to the
running-xacts record, and release their locks. See StandbyReleaseOldLocks().

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby and prepared transactions

From
Simon Riggs
Date:
On Wed, 2009-12-16 at 19:35 +0900, Hiroyuki Yamada wrote:
> Sorry for annoying you.

Not at all! Good to get fresh eyes on this.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby and prepared transactions

From
Simon Riggs
Date:
On Wed, 2009-12-16 at 19:35 +0900, Hiroyuki Yamada wrote:

>  * There is a window beween gathering lock information in GetRunningTransactionLocks()
>    and writing WAL in LogAccessExclusiveLocks().
>  * In current lock redo algorithm, locks are released when the transaction holding the lock
>    are commited or aborted.
> 
> ... then what happens if any transaction holding ACCESS EXCLUSIVE lock commits in the 
>    window ?

Yes, was a problem in that code. Fixed in git.

We were doing it for prepared transactions but not for normal xacts.
I will look again at that code.

Thanks very much for reading the code. Any more?!?

> Similary,
> 
>  * There is a window beween writing COMMIT WAL in RecordTransactionCommit() and
>     releasing locks in ResourceOwnerRelease()
> 
> ... then what happens when GetRunningTransactionLocks() gathers ACCESS EXCLUSIVE 
>    locks whose holder has already written the COMMIT WAL ?

Same issue, so fixed as well. Thanks to Heikki for making that the same
code path, so we have nothing to do for that.

> Are there any chances of releasing locks which have no COMMIT WAL for releasing them ?

No

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby and prepared transactions

From
Hiroyuki Yamada
Date:
>On Wed, 2009-12-16 at 19:35 +0900, Hiroyuki Yamada wrote:
>
>>  * There is a window beween gathering lock information in GetRunningTransactionLocks()
>>    and writing WAL in LogAccessExclusiveLocks().
>>  * In current lock redo algorithm, locks are released when the transaction holding the lock
>>    are commited or aborted.
>> 
>> ... then what happens if any transaction holding ACCESS EXCLUSIVE lock commits in the 
>>    window ?
>
>Yes, was a problem in that code. Fixed in git.
>
>We were doing it for prepared transactions but not for normal xacts.
>I will look again at that code.
>
>Thanks very much for reading the code. Any more?!?
>

Well, I've read some more and have a question.

The implementation assumes that transactions write COMMIT/ABORT WAL at the end
of them, while it does not seem to write ABORT WAL in immediate shutdown. So,

1. acquire ACCESS EXCLUSIVE lock in table A in xact 1
2. execute immediate shutdown of the active node
3. restart it
4. acquire ACCESS EXCLUSIVE lock in table A in xact 2

...then, duplicate lock acquisition by two diffrent transactions can occur in the standby node.

Am I missing something ? Or is this already reported ?


regards,

-- Hiroyuki YAMADA Kokolink Corporation yamada@kokolink.net


Re: Hot Standby and prepared transactions

From
Heikki Linnakangas
Date:
Hiroyuki Yamada wrote:
> The implementation assumes that transactions write COMMIT/ABORT WAL at the end
> of them, while it does not seem to write ABORT WAL in immediate shutdown. So,
> 
> 1. acquire ACCESS EXCLUSIVE lock in table A in xact 1
> 2. execute immediate shutdown of the active node
> 3. restart it
> 4. acquire ACCESS EXCLUSIVE lock in table A in xact 2
> 
> ...then, duplicate lock acquisition by two diffrent transactions can occur in the standby node.
> 
> Am I missing something ? Or is this already reported ?

Replay of the shutdown checkpoint that is written when the server is
restarted should abort the old transaction and release its locks.

Hmm, looking at the code, I think Simon threw that baby with the
bathwater when he removed support for starting standby from a shutdown
checkpoint.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby and prepared transactions

From
Simon Riggs
Date:
On Thu, 2009-12-17 at 19:55 +0900, Hiroyuki Yamada wrote:

> Well, I've read some more and have a question.
> 
> The implementation assumes that transactions write COMMIT/ABORT WAL at the end
> of them, while it does not seem to write ABORT WAL in immediate shutdown. So,
> 
> 1. acquire ACCESS EXCLUSIVE lock in table A in xact 1
> 2. execute immediate shutdown of the active node
> 3. restart it
> 4. acquire ACCESS EXCLUSIVE lock in table A in xact 2
> 
> ...then, duplicate lock acquisition by two diffrent transactions can occur in the standby node.
> 
> Am I missing something ? Or is this already reported ?

This was a tricky point in the design because we already knew that abort
records are not always written for every transaction.

ProcArrayApplyRecoveryInfo() was specifically designed to prune away
older transactions that might have become stuck in that state, which
calls StandbyReleaseOldLocks() to remove any such locks. The pruning
operation is also one of the reasons why we need to log
XLOG_RUNNING_XACTS on a regular basis.

Duplicate lock acquisition isn't specifically checked for, since it is
blocked on primary, but lock release if duplicates did exist is handled.

I've checked the code some more to see if the above is possible. At step
(3) we perform a shutdown checkpoint, which would/should be responsible
for performing the prune operation that would prevent your case from
being a problem.

The code around shutdown checkpoint has changed a few times and it looks
like that bug has been introduced by my edit on Dec 6 to prevent
shutdown checkpoint as starting places. They need to be handled, since
we now don't write a XLOG_RUNNING_XACTS record during a shutdown
checkpoint.

I will re-add this now.

Thanks again: keep going, you're on a roll.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby and prepared transactions

From
Simon Riggs
Date:
On Thu, 2009-12-17 at 13:24 +0200, Heikki Linnakangas wrote:

> Hmm, looking at the code, I think Simon threw that baby with the
> bathwater when he removed support for starting standby from a shutdown
> checkpoint.

Hmm, I think that code was just for starting points only. It would not
have been executed in normal running of the standby, so it appears the
bug was older than that. Absence of baby now appears inconclusive.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby and prepared transactions

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Thu, 2009-12-17 at 13:24 +0200, Heikki Linnakangas wrote:
> 
>> Hmm, looking at the code, I think Simon threw that baby with the
>> bathwater when he removed support for starting standby from a shutdown
>> checkpoint.
> 
> Hmm, I think that code was just for starting points only. It would not
> have been executed in normal running of the standby, so it appears the
> bug was older than that. Absence of baby now appears inconclusive.

This is the piece of code we're talking about, in xlog_redo():

--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7340,41 +7340,6 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record)        ShmemVariableCache->oldestXid =
checkPoint.oldestXid;       ShmemVariableCache->oldestXidDB = checkPoint.oldestXidDB;
 

-        /*
-         * We know nothing was running on the master at this point, though
-         * we can only use it as a starting point iff wal_standby_info was
-         * enabled, otherwise we may not get further information about changes
-         * from the master.
-         */
-        if (standbyState >= STANDBY_UNINITIALIZED &&
checkPoint.XLogStandbyInfoMode)
-        {
-            RunningTransactionsData running;
-            TransactionId oldestRunningXid;
-            TransactionId *xids;
-            int nxids;
-
-            oldestRunningXid = PrescanPreparedTransactions(&xids, &nxids);
-
-            /*
-             * Construct a RunningTransactions snapshot representing a shut
-             * down server, with only prepared transactions still alive.
-             *
-             * StandbyRecoverPreparedTransactions will call SubTransSetParent
-             * for any subtransactions, so we consider this a non-overflowed
-             * snapshot.
-             */
-            running.xcnt = nxids;
-            running.subxid_overflow = false;
-            running.nextXid = checkPoint.nextXid;
-            running.oldestRunningXid = oldestRunningXid;
-            running.xids = xids;
-
-            ProcArrayInitRecoveryInfo(oldestRunningXid);
-            ProcArrayApplyRecoveryInfo(&running);
-
-            StandbyRecoverPreparedTransactions();
-        }
-        /* Check to see if any changes to max_connections give problems */        if (standbyState !=
STANDBY_DISABLED)           CheckRequiredParameterValues(checkPoint);
 


That removed piece of code was executed in the standby whenever we saw a
shutdown checkpoint. It calls ProcArrayApplyRecoveryInfo(), which calls
ExpireOldKnownAssignedTransactionIds() and StandbyReleaseOldLocks() to
clean up known-assigned-xid entries and locks of the implicitly-aborted
transactions.

I see now that in the presence of prepared transactions, we would fail
to clean up failed transations with XID > the oldest prepared
transaction, but other than that it looks fine to me.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby and prepared transactions

From
Simon Riggs
Date:
On Thu, 2009-12-17 at 15:18 +0200, Heikki Linnakangas wrote:

> That removed piece of code was executed in the standby whenever we saw a
> shutdown checkpoint. It calls ProcArrayApplyRecoveryInfo(), which calls
> ExpireOldKnownAssignedTransactionIds() and StandbyReleaseOldLocks() to
> clean up known-assigned-xid entries and locks of the implicitly-aborted
> transactions.

OK, I was presuming that running StandbyRecoverPreparedTransactions()
and ProcArrayInitRecoveryInfo() twice would cause problems.

> I see now that in the presence of prepared transactions, we would fail
> to clean up failed transations with XID > the oldest prepared
> transaction

Good! I just spotted that also, just prior to posting my fix, so
rewriting it again now.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby and prepared transactions

From
Simon Riggs
Date:
On Thu, 2009-12-17 at 13:38 +0000, Simon Riggs wrote:

> > I see now that in the presence of prepared transactions, we would fail
> > to clean up failed transations with XID > the oldest prepared
> > transaction
> 
> Good! I just spotted that also, just prior to posting my fix, so
> rewriting it again now.

solution pushed to git

-- Simon Riggs           www.2ndQuadrant.com