Thread: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

[HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

From

Tom Lane

Date:

23 April 2017, 05:55:18

Now that we've got consistent failure reports about the 009_twophase.pl
recovery test, I set out to find out why it's failing.  It looks to me
like the reason is that this (twophase.c:2145):
           SubTransSetParent(xid, subxid, overwriteOK);

ought to be this:
           SubTransSetParent(subxid, xid, overwriteOK);

because the definition of SubTransSetParent is

void
SubTransSetParent(TransactionId xid, TransactionId parent, bool overwriteOK)

not the other way 'round.  

While "git blame" blames this line on the recent commit 728bd991c,
that just moved the call from somewhere else.  AFAICS this has actually
been wrong since StandbyRecoverPreparedTransactions was written,
in 361bd1662 of 2010-04-13.

It's not clear to me how much potential this has to create user data
corruption, but it doesn't look good at first glance.  Discuss.

Also, when I fix that, it gets further but still crashes at the same
Assert in SubTransSetParent.  The proximate cause this time seems to be
that RecoverPreparedTransactions's calculation of overwriteOK is wrong:
it's computing that as "false", but in reality the subtrans link in
question has already been set.
        regards, tom lane

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly

From

Andres Freund

Date:

23 April 2017, 06:19:31

On 2017-04-22 19:55:18 -0400, Tom Lane wrote:
> Now that we've got consistent failure reports about the 009_twophase.pl
> recovery test, I set out to find out why it's failing.  It looks to me
> like the reason is that this (twophase.c:2145):
>
>             SubTransSetParent(xid, subxid, overwriteOK);
>
> ought to be this:
>
>             SubTransSetParent(subxid, xid, overwriteOK);
>
> because the definition of SubTransSetParent is
>
> void
> SubTransSetParent(TransactionId xid, TransactionId parent, bool overwriteOK)
>
> not the other way 'round.
>
> While "git blame" blames this line on the recent commit 728bd991c,
> that just moved the call from somewhere else.  AFAICS this has actually
> been wrong since StandbyRecoverPreparedTransactions was written,
> in 361bd1662 of 2010-04-13.

> Also, when I fix that, it gets further but still crashes at the same
> Assert in SubTransSetParent.  The proximate cause this time seems to be
> that RecoverPreparedTransactions's calculation of overwriteOK is wrong:
> it's computing that as "false", but in reality the subtrans link in
> question has already been set.
>

Yikes.  This is clearly way undertested.  It's also pretty scary that
the code has recently been whacked out quite heavily (both 9.6 and
master), without hitting anything around this - doesn't seem to bode
well for how in-depth the testing was.

> It's not clear to me how much potential this has to create user data
> corruption, but it doesn't look good at first glance.  Discuss.

Hm. I think it can cause wrong tqual.c results in some edge cases.
During HS, lastOverflowedXid will be set in some cases, and then
XidInMVCCSnapshot etc will not find the actual toplevel xid, which'll in
turn cause lookups snapshot->subxip (where all HS xids reside)
potentially return wrong results.

I was about to say that I don't see how it could result in persistent
corruption however - the subtrans lookups are only done for
(snapshot->xmin, snapshot->xmax] and subtrans is truncated
regularly. But these days CHECKPOINT_END_OF_RECOVERY isn't obligatory
anymore, so that might be delayed.  Hm.

- Andres

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly

From

Simon Riggs

Date:

23 April 2017, 15:59:32

On 23 April 2017 at 00:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Now that we've got consistent failure reports about the 009_twophase.pl
> recovery test, I set out to find out why it's failing.  It looks to me
> like the reason is that this (twophase.c:2145):
>
>             SubTransSetParent(xid, subxid, overwriteOK);
>
> ought to be this:
>
>             SubTransSetParent(subxid, xid, overwriteOK);
>
> because the definition of SubTransSetParent is
>
> void
> SubTransSetParent(TransactionId xid, TransactionId parent, bool overwriteOK)
>
> not the other way 'round.
>
> While "git blame" blames this line on the recent commit 728bd991c,
> that just moved the call from somewhere else.  AFAICS this has actually
> been wrong since StandbyRecoverPreparedTransactions was written,
> in 361bd1662 of 2010-04-13.

Thanks for finding that.

> It's not clear to me how much potential this has to create user data
> corruption, but it doesn't look good at first glance.  Discuss.

Well, strangely for different reason I was looking at that particular
call a couple of days back. I didn't spot that issue, but I was
thinking why we even make that call at that time. My conclusion was
that it could be optimized away mostly, since it isn't needed very
often, but its not really worth optimizing.

SubTransSetParent() only matters for the case where we are half-way
through a commit with a large commit. Since we do atomic updates of
commit and subcommmit when on same page, this problem would only
matter when top level xid and other subxids were separated across
multiple pages and the issue would only affect a read only query
checking visibility during the commit for that prepared transaction.
Furthermore, the nature of the corruption is that we set the xid to
point to the subxid; since we never mark a top-level commit as
subcommitted, subtrans would never be consulted and so the corruption
would not lead to any incorrect answer even during the window of
exposure. So it looks to me like this error shouldn't cause a problem.

We can fix that, but...

> Also, when I fix that, it gets further but still crashes at the same
> Assert in SubTransSetParent.  The proximate cause this time seems to be
> that RecoverPreparedTransactions's calculation of overwriteOK is wrong:
> it's computing that as "false", but in reality the subtrans link in
> question has already been set.

Not sure about that, investigating.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly

From

Simon Riggs

Date:

23 April 2017, 16:05:44

On 23 April 2017 at 01:19, Andres Freund <andres@anarazel.de> wrote:
> On 2017-04-22 19:55:18 -0400, Tom Lane wrote:
>> Now that we've got consistent failure reports about the 009_twophase.pl
>> recovery test, I set out to find out why it's failing.  It looks to me
>> like the reason is that this (twophase.c:2145):
>>
>>             SubTransSetParent(xid, subxid, overwriteOK);
>>
>> ought to be this:
>>
>>             SubTransSetParent(subxid, xid, overwriteOK);
>>
>> because the definition of SubTransSetParent is
>>
>> void
>> SubTransSetParent(TransactionId xid, TransactionId parent, bool overwriteOK)
>>
>> not the other way 'round.
>>
>> While "git blame" blames this line on the recent commit 728bd991c,
>> that just moved the call from somewhere else.  AFAICS this has actually
>> been wrong since StandbyRecoverPreparedTransactions was written,
>> in 361bd1662 of 2010-04-13.
>
>> Also, when I fix that, it gets further but still crashes at the same
>> Assert in SubTransSetParent.  The proximate cause this time seems to be
>> that RecoverPreparedTransactions's calculation of overwriteOK is wrong:
>> it's computing that as "false", but in reality the subtrans link in
>> question has already been set.
>>
>
> Yikes.  This is clearly way undertested.  It's also pretty scary that
> the code has recently been whacked out quite heavily (both 9.6 and
> master), without hitting anything around this - doesn't seem to bode
> well for how in-depth the testing was.

Obviously if there is a bug it's because tests didn't find it and
therefore it is by definition undertested for that specific bug.

I'm not sure what other conclusion you wish to draw, if any?


>> It's not clear to me how much potential this has to create user data
>> corruption, but it doesn't look good at first glance.  Discuss.
>
> Hm. I think it can cause wrong tqual.c results in some edge cases.
> During HS, lastOverflowedXid will be set in some cases, and then
> XidInMVCCSnapshot etc will not find the actual toplevel xid, which'll in
> turn cause lookups snapshot->subxip (where all HS xids reside)
> potentially return wrong results.
>
> I was about to say that I don't see how it could result in persistent
> corruption however - the subtrans lookups are only done for
> (snapshot->xmin, snapshot->xmax] and subtrans is truncated
> regularly. But these days CHECKPOINT_END_OF_RECOVERY isn't obligatory
> anymore, so that might be delayed.  Hm.

I've not found any reason, yet, to believe we return wrong answers in
any case, even though the transient data structure pg_subtrans is
corrupted by the switched call Tom discovers.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

From

Tom Lane

Date:

23 April 2017, 22:17:12

Simon Riggs <simon@2ndquadrant.com> writes:
>> Also, when I fix that, it gets further but still crashes at the same
>> Assert in SubTransSetParent.  The proximate cause this time seems to be
>> that RecoverPreparedTransactions's calculation of overwriteOK is wrong:
>> it's computing that as "false", but in reality the subtrans link in
>> question has already been set.

> Not sure about that, investigating.

As a quick hack, I just hotwired RecoverPreparedTransactions to set
overwriteOK = true always, and with that and the SubTransSetParent
argument-order fix, HEAD passes the recovery tests.  Maybe we can
be smarter than that, but this might be a good short-term fix to get
the buildfarm green again.
        regards, tom lane

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

From

Tom Lane

Date:

23 April 2017, 23:33:21

Simon Riggs <simon@2ndquadrant.com> writes:
> On 23 April 2017 at 00:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It's not clear to me how much potential this has to create user data
>> corruption, but it doesn't look good at first glance.  Discuss.

> SubTransSetParent() only matters for the case where we are half-way
> through a commit with a large commit. Since we do atomic updates of
> commit and subcommmit when on same page, this problem would only
> matter when top level xid and other subxids were separated across
> multiple pages and the issue would only affect a read only query
> checking visibility during the commit for that prepared transaction.
> Furthermore, the nature of the corruption is that we set the xid to
> point to the subxid; since we never mark a top-level commit as
> subcommitted, subtrans would never be consulted and so the corruption
> would not lead to any incorrect answer even during the window of
> exposure. So it looks to me like this error shouldn't cause a problem.

Still trying to wrap my head around this argument ... I agree that
incorrectly setting the parent's pg_subtrans entry can't cause a
visible problem, because it will never be consulted.  However, failing
to set the child's entry seems like it could cause transient problems.
There would only be a risk during the eventual commit of the prepared
transaction, and only when the pg_xact entries span multiple pages,
but then there would be a window where the child xact is marked
subcommitted but it has a zero entry in pg_subtrans.  That would
result in a WARNING from TransactionIdDidCommit, and a false result,
which depending on timing might mean that commit of the overall
transaction appears non-atomic to onlookers.  (That is, the parent
xact might already appear committed to them, but the subtransaction
not yet.)

I can't find any indication in the archives that anyone's ever reported
seeing that WARNING, though that might only mean that they'd not noticed
it.  But in any case, it seems like the worst consequence is a narrow
window for seeing non-atomic commit of a prepared transaction on a standby
server.  That seems pretty unlikely to cause real-world problems.

The other point about overwriteOK not being set when it needs to be
also seems like it could not cause any problems in production builds,
since that flag is only examined by an Assert.
        regards, tom lane

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly

From

Simon Riggs

Date:

23 April 2017, 23:41:41

On 23 April 2017 at 17:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>>> Also, when I fix that, it gets further but still crashes at the same
>>> Assert in SubTransSetParent.  The proximate cause this time seems to be
>>> that RecoverPreparedTransactions's calculation of overwriteOK is wrong:
>>> it's computing that as "false", but in reality the subtrans link in
>>> question has already been set.
>
>> Not sure about that, investigating.
>
> As a quick hack, I just hotwired RecoverPreparedTransactions to set
> overwriteOK = true always, and with that and the SubTransSetParent
> argument-order fix, HEAD passes the recovery tests.  Maybe we can
> be smarter than that, but this might be a good short-term fix to get
> the buildfarm green again.

That would work. I've been looking into a fix I can explain, but "do
it always" may actually be it.

OK, I'll do that.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly

From

Simon Riggs

Date:

24 April 2017, 03:15:36

On 23 April 2017 at 18:41, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 23 April 2017 at 17:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Simon Riggs <simon@2ndquadrant.com> writes:
>>>> Also, when I fix that, it gets further but still crashes at the same
>>>> Assert in SubTransSetParent.  The proximate cause this time seems to be
>>>> that RecoverPreparedTransactions's calculation of overwriteOK is wrong:
>>>> it's computing that as "false", but in reality the subtrans link in
>>>> question has already been set.
>>
>>> Not sure about that, investigating.
>>
>> As a quick hack, I just hotwired RecoverPreparedTransactions to set
>> overwriteOK = true always, and with that and the SubTransSetParent
>> argument-order fix, HEAD passes the recovery tests.  Maybe we can
>> be smarter than that, but this might be a good short-term fix to get
>> the buildfarm green again.
>
> That would work. I've been looking into a fix I can explain, but "do
> it always" may actually be it.
>
> OK, I'll do that.

Done, tests pass again for me now.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly

From

Andres Freund

Date:

24 April 2017, 05:25:04

Hi,

On 2017-04-23 11:05:44 +0100, Simon Riggs wrote:
> > Yikes.  This is clearly way undertested.  It's also pretty scary that
> > the code has recently been whacked out quite heavily (both 9.6 and
> > master), without hitting anything around this - doesn't seem to bode
> > well for how in-depth the testing was.
> 
> Obviously if there is a bug it's because tests didn't find it and
> therefore it is by definition undertested for that specific bug.

Sure. But there's bugs that are hard to catch, and there's bugs that are
easily testable. This seems to largely fall into the "relatively easy to
test" camp.

> I'm not sure what other conclusion you wish to draw, if any?

That the changes around whacking around twophase.c in several of the
last releases, didn't yield enough verified testing infrastructure.

> >> It's not clear to me how much potential this has to create user data
> >> corruption, but it doesn't look good at first glance.  Discuss.
> >
> > Hm. I think it can cause wrong tqual.c results in some edge cases.
> > During HS, lastOverflowedXid will be set in some cases, and then
> > XidInMVCCSnapshot etc will not find the actual toplevel xid, which'll in
> > turn cause lookups snapshot->subxip (where all HS xids reside)
> > potentially return wrong results.
> >
> > I was about to say that I don't see how it could result in persistent
> > corruption however - the subtrans lookups are only done for
> > (snapshot->xmin, snapshot->xmax] and subtrans is truncated
> > regularly. But these days CHECKPOINT_END_OF_RECOVERY isn't obligatory
> > anymore, so that might be delayed.  Hm.
> 
> I've not found any reason, yet, to believe we return wrong answers in
> any case, even though the transient data structure pg_subtrans is
> corrupted by the switched call Tom discovers.

I think I pointed out a danger above. Consider what happens if query on
a standby has a suboverflowed snapshot:
Snapshot
GetSnapshotData(Snapshot snapshot)
{
...    if (TransactionIdPrecedesOrEquals(xmin, procArray->lastOverflowedXid))        suboverflowed = true;}
..snapshot->suboverflowed = suboverflowed;
}

In that case we rely on pg_subtrans for visibility determinations:
bool
HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,                   Buffer buffer)
{
...if (!HeapTupleHeaderXminCommitted(tuple)){
...    else if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))        return false;

and
static bool
XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
{
...if (!snapshot->takenDuringRecovery){
...else{
...    if (snapshot->suboverflowed)    {        /*         * Snapshot overflowed, so convert xid to top-level.  This is
safe        * because we eliminated too-old XIDs above.         */        xid = SubTransGetTopmostTransaction(xid);

if the subxid->xid mapping doesn't actually exist - as it's the case
with this bug afaics - we'll not get the correct toplevel
transaction. Which'll mean the following block:    /*     * We now have either a top-level xid higher than xmin or an
 * indeterminate xid. We don't know whether it's top level or subxact     * but it doesn't matter. If it's present, the
xidis visible.     */    for (j = 0; j < snapshot->subxcnt; j++)    {        if (TransactionIdEquals(xid,
snapshot->subxip[j]))           return true;    }

won't work correctly if suboverflowed.

I don't see anything prevent wrong results here?

Andres Freund

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly

From

Simon Riggs

Date:

24 April 2017, 18:29:11

On 24 April 2017 at 00:25, Andres Freund <andres@anarazel.de> wrote:

>> >> It's not clear to me how much potential this has to create user data
>> >> corruption, but it doesn't look good at first glance.  Discuss.
>> >
>> > Hm. I think it can cause wrong tqual.c results in some edge cases.
>> > During HS, lastOverflowedXid will be set in some cases, and then
>> > XidInMVCCSnapshot etc will not find the actual toplevel xid, which'll in
>> > turn cause lookups snapshot->subxip (where all HS xids reside)
>> > potentially return wrong results.
>> >
>> > I was about to say that I don't see how it could result in persistent
>> > corruption however - the subtrans lookups are only done for
>> > (snapshot->xmin, snapshot->xmax] and subtrans is truncated
>> > regularly. But these days CHECKPOINT_END_OF_RECOVERY isn't obligatory
>> > anymore, so that might be delayed.  Hm.
>>
>> I've not found any reason, yet, to believe we return wrong answers in
>> any case, even though the transient data structure pg_subtrans is
>> corrupted by the switched call Tom discovers.
>
> I think I pointed out a danger above. Consider what happens if query on
> a standby has a suboverflowed snapshot:
> Snapshot
> GetSnapshotData(Snapshot snapshot)
> {
> ...
>                 if (TransactionIdPrecedesOrEquals(xmin, procArray->lastOverflowedXid))
>                         suboverflowed = true;
>         }
> ..
>         snapshot->suboverflowed = suboverflowed;
> }
>
> In that case we rely on pg_subtrans for visibility determinations:
> bool
> HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
>                                            Buffer buffer)
> {
> ...
>         if (!HeapTupleHeaderXminCommitted(tuple))
>         {
> ...
>                 else if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))
>                         return false;
>
> and
> static bool
> XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
> {
> ...
>         if (!snapshot->takenDuringRecovery)
>         {
> ...
>         else
>         {
> ...
>                 if (snapshot->suboverflowed)
>                 {
>                         /*
>                          * Snapshot overflowed, so convert xid to top-level.  This is safe
>                          * because we eliminated too-old XIDs above.
>                          */
>                         xid = SubTransGetTopmostTransaction(xid);
>
> if the subxid->xid mapping doesn't actually exist - as it's the case
> with this bug afaics - we'll not get the correct toplevel
> transaction.

The nature of the corruption is that in some cases
* a subxid will point to nothing (even though in most cases it was
already set correctly)
* the parent will point to a subxid

So both wrong.

> Which'll mean the following block:
>                 /*
>                  * We now have either a top-level xid higher than xmin or an
>                  * indeterminate xid. We don't know whether it's top level or subxact
>                  * but it doesn't matter. If it's present, the xid is visible.
>                  */
>                 for (j = 0; j < snapshot->subxcnt; j++)
>                 {
>                         if (TransactionIdEquals(xid, snapshot->subxip[j]))
>                                 return true;
>                 }
> won't work correctly if suboverflowed.

Your example of snapshots taken during recovery is not correct.

Note that SubTransGetTopmostTransaction() returns a valid, running
xid, even though it is the wrong one.

Snapshots work differently on standbys - we store all known running
xids, so the test still passes correctly, even when overflowed.

I'd call that just plain luck. We behave correctly, but for the wrong
reasons, at least in this case.


> I don't see anything prevent wrong results here?

I've had an even better look around now and I think I've found
something but need to turn it into a repeatable test case so I can
double-check this before reporting in full, so I don't cry wolf.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly

From

Nikhil Sontakke

Date:

24 April 2017, 21:01:32

>>>> Also, when I fix that, it gets further but still crashes at the same
>>>> Assert in SubTransSetParent. The proximate cause this time seems to be
>>>> that RecoverPreparedTransactions's calculation of overwriteOK is wrong:
>>>> it's computing that as "false", but in reality the subtrans link in
>>>> question has already been set.
>>
>>> Not sure about that, investigating.
>>
>> As a quick hack, I just hotwired RecoverPreparedTransactions to set
>> overwriteOK = true always, and with that and the SubTransSetParent
>> argument-order fix, HEAD passes the recovery tests. Maybe we can
>> be smarter than that, but this might be a good short-term fix to get
>> the buildfarm green again.
>
> That would work. I've been looking into a fix I can explain, but "do
> it always" may actually be it.

Here's what's happening:

On Master:

BEGIN;

INSERT INTO t_009_tbl VALUES (42);

SAVEPOINT s1;

INSERT INTO t_009_tbl VALUES (43);

PREPARE TRANSACTION 'xact_009_1';

On Master:

Do a fast shutdown

On Slave:

Restart the slave. This causes StandbyRecoverPreparedTransactions() to be invoked which causes ProcessTwoPhaseBuffer() to be invoked with setParent set to true. This sets up parent xid for the above subtransaction.

On Slave:

Promote the slave. This causes RecoverPreparedTransactions() to be invoked which ends up calling SubTransSetParent() for the above subxid. The overwriteOK bool remains false since we don't go over the PGPROC_MAX_CACHED_SUBXIDS limit. Since the parent xid was already set on restart above, we hit the assert.

-------------------

I am wondering if StandbyRecoverPreparedTransactions() needs the parent linkage at all? I don't see SubTransGetParent() and related functions being called on the standby but we need to be careful here. As a quick test, If I don't call SubTransSetParent() as part of StandbyRecoverPreparedTransactions() then all recovery tests pass ok. But the fact that StandbyRecoverPreparedTransactions() takes overwriteOK as a parameter means it might not be so simple..

Regards,

Nikhils

Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly

From

Andres Freund

Date:

25 April 2017, 00:59:44

On 2017-04-24 13:29:11 +0100, Simon Riggs wrote:
> On 24 April 2017 at 00:25, Andres Freund <andres@anarazel.de> wrote:
> > if the subxid->xid mapping doesn't actually exist - as it's the case
> > with this bug afaics - we'll not get the correct toplevel
> > transaction.
> 
> The nature of the corruption is that in some cases
> * a subxid will point to nothing (even though in most cases it was
> already set correctly)
> * the parent will point to a subxid

Right. Those cases aren't that different from the point of trying to
find the parent of an subxid.

> > Which'll mean the following block:
> >                 /*
> >                  * We now have either a top-level xid higher than xmin or an
> >                  * indeterminate xid. We don't know whether it's top level or subxact
> >                  * but it doesn't matter. If it's present, the xid is visible.
> >                  */
> >                 for (j = 0; j < snapshot->subxcnt; j++)
> >                 {
> >                         if (TransactionIdEquals(xid, snapshot->subxip[j]))
> >                                 return true;
> >                 }
> > won't work correctly if suboverflowed.
> 
> Your example of snapshots taken during recovery is not correct.

Oh?

> Note that SubTransGetTopmostTransaction() returns a valid, running
> xid, even though it is the wrong one.

Sure.

> Snapshots work differently on standbys - we store all known running
> xids, so the test still passes correctly, even when overflowed.

I don't think that's generally true.  Isn't that precisely what
ProcArrayStruct->lastOverflowedXid is about?  If we have a snapshot
that's suboverflowed due to the lastOverflowedXid cutoff, then we the
subxip array does *not* contain all known running xids anymore, we rely
on pg_subtrans to only guarantee that toplevel xids are stored in the
KnownAssignedXids machinery.

See:* When we throw away subXIDs from KnownAssignedXids, we need to keep track of* that, similarly to tracking overflow
ofa PGPROC's subxids array.  We do* that by remembering the lastOverflowedXID, ie the last thrown-away subXID.* As long
asthat is within the range of interesting XIDs, we have to assume* that subXIDs are missing from snapshots.  (Note that
subXIDoverflow occurs* on primary when 65th subXID arrives, whereas on standby it occurs when 64th* subXID arrives -
thatis not an error.)

/* * Highest subxid that has been removed from KnownAssignedXids array to * prevent overflow; or InvalidTransactionId
ifnone.  We track this for * similar reasons to tracking overflowing cached subxids in PGXACT * entries.  Must hold
exclusiveProcArrayLock to change this, and shared * lock to read it. */TransactionId lastOverflowedXid;

Greetings,

Andres Freund

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly

From

Simon Riggs

Date:

25 April 2017, 12:47:38

On 23 April 2017 at 17:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>>> Also, when I fix that, it gets further but still crashes at the same
>>> Assert in SubTransSetParent.  The proximate cause this time seems to be
>>> that RecoverPreparedTransactions's calculation of overwriteOK is wrong:
>>> it's computing that as "false", but in reality the subtrans link in
>>> question has already been set.
>
>> Not sure about that, investigating.
>
> As a quick hack, I just hotwired RecoverPreparedTransactions to set
> overwriteOK = true always, and with that and the SubTransSetParent
> argument-order fix, HEAD passes the recovery tests.  Maybe we can
> be smarter than that, but this might be a good short-term fix to get
> the buildfarm green again.

So my thoughts are now that this is indeed the long term fix.

I can't think of why it would be necessary to call SubTransSetParent()
twice with two different values. A subtransaction's top level xid
never changes, so pg_subtrans should be either zero or the xid of the
parent and never anything else.

I can't see any reason now why overwriteOK should exist at all. I'm
guessing that the whole "overwriteOK" idea was an incorrect response
to xids appearing where they shouldn't have done because of the
mistake you just corrected. So I will now remove the parameter from
the call.

(I'll address the discussion of the impact of that bug in separate post).

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

2PC_remove_overwriteOK.v1.patch

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

From

Tom Lane

Date:

25 April 2017, 21:28:48

Simon Riggs <simon@2ndquadrant.com> writes:
> I can't see any reason now why overwriteOK should exist at all. I'm
> guessing that the whole "overwriteOK" idea was an incorrect response
> to xids appearing where they shouldn't have done because of the
> mistake you just corrected. So I will now remove the parameter from
> the call.

Seems reasonable, but I don't like the logic change you made in
SubTransSetParent; you broke the former invariant, for non-Assert
builds, that the target pg_subtrans entry is guaranteed to have
the correct value on exit.  I do like fixing it to not dirty the
page unnecessarily, but I'd suggest that we write it like
if (*ptr != parent){    Assert(*ptr == InvalidTransactionId);    *ptr = parent;
SubTransCtl->shared->page_dirty[slotno]= true;}

        regards, tom lane

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly

From

Simon Riggs

Date:

26 April 2017, 01:53:58

On 25 April 2017 at 16:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> I can't see any reason now why overwriteOK should exist at all. I'm
>> guessing that the whole "overwriteOK" idea was an incorrect response
>> to xids appearing where they shouldn't have done because of the
>> mistake you just corrected. So I will now remove the parameter from
>> the call.
>
> Seems reasonable, but I don't like the logic change you made in
> SubTransSetParent; you broke the former invariant, for non-Assert
> builds, that the target pg_subtrans entry is guaranteed to have
> the correct value on exit.  I do like fixing it to not dirty the
> page unnecessarily, but I'd suggest that we write it like
>
>         if (*ptr != parent)
>         {
>                 Assert(*ptr == InvalidTransactionId);
>                 *ptr = parent;
>                 SubTransCtl->shared->page_dirty[slotno] = true;
>         }

OK, thanks. I'll commit that tomorrow.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly

From

Simon Riggs

Date:

26 April 2017, 02:22:44

On 24 April 2017 at 19:59, Andres Freund <andres@anarazel.de> wrote:

> I don't think that's generally true.

If you think that, from a risk perspective it is enough for me to
continue to investigate and I have been doing that.

As I said before I thought I had found a problem.

I'm suggesting we take the approach that if there is a problem we can
recreate it as a way of exploring what conditions are required and
therefore work out the impact. Nikhil Sontakke appears to have
re-created something, but not quite what I had expected. I think he
will post here tomorrow with an update for us to discuss.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly

From

Andres Freund

Date:

26 April 2017, 02:56:44

On 2017-04-25 21:22:44 +0100, Simon Riggs wrote:
> On 24 April 2017 at 19:59, Andres Freund <andres@anarazel.de> wrote:
> 
> > I don't think that's generally true.
> 
> If you think that, from a risk perspective it is enough for me to
> continue to investigate and I have been doing that.

I'm not sure it's worth digging it all that much - I think we just
shouldn't claim in the release notes that the bug doesn't have any
consequences, but that it's though to only matter in unlikely corner
cases.

- Andres

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly

From

Michael Paquier

Date:

26 April 2017, 08:03:18

On Wed, Apr 26, 2017 at 4:53 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 25 April 2017 at 16:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Simon Riggs <simon@2ndquadrant.com> writes:
>>> I can't see any reason now why overwriteOK should exist at all. I'm
>>> guessing that the whole "overwriteOK" idea was an incorrect response
>>> to xids appearing where they shouldn't have done because of the
>>> mistake you just corrected. So I will now remove the parameter from
>>> the call.
>>
>> Seems reasonable, but I don't like the logic change you made in
>> SubTransSetParent; you broke the former invariant, for non-Assert
>> builds, that the target pg_subtrans entry is guaranteed to have
>> the correct value on exit.  I do like fixing it to not dirty the
>> page unnecessarily, but I'd suggest that we write it like
>>
>>         if (*ptr != parent)
>>         {
>>                 Assert(*ptr == InvalidTransactionId);
>>                 *ptr = parent;
>>                 SubTransCtl->shared->page_dirty[slotno] = true;
>>         }
>
> OK, thanks. I'll commit that tomorrow.

A couple of comments about the last patch of this thread.

Nit: there is a complain about a trailing whitespace:
$ git diff --check
src/backend/access/transam/twophase.c:2011: trailing whitespace.
+                     bool fromdisk,

+        * It's possible that SubTransSetParent has been set before, if
+        * the prepared transaction generated xid assignment records.        */       for (i = 0; i < hdr->nsubxacts;
i++)
-           SubTransSetParent(subxids[i], xid, true);
+           SubTransSetParent(subxids[i], xid);
You can remove this part in RecoverPreparedTransactions() and just
switch ProcessTwoPhaseBuffer()'s setParent to true to do the same
thing. The existing comment block should be moved before
ProcessTwoPhaseBuffer() call. It is critical to mention that
pg_subtrans is not kept after a restart.
-- 
Michael

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly

From

Nikhil Sontakke

Date:

26 April 2017, 20:00:50

I'm suggesting we take the approach that if there is a problem we can
recreate it as a way of exploring what conditions are required and
therefore work out the impact. Nikhil Sontakke appears to have
re-created something, but not quite what I had expected. I think he
will post here tomorrow with an update for us to discuss.

So, I reverted commit 0874d4f3e183757ba15a4b3f3bf563e0393dd9c2 to go back to the earlier bad swapped arguments to SubTransSetParent resulting in incorrect parent linkages and used the attached TAP test patch.

The test prepares a 2PC with more than 64 subtransactions. It then stops the master and promotes the standby.

A SELECT query on the newly promoted master on any of the tables involved in the 2PC hangs. The hang is due to a loop in SubTransGetTopmostTransaction(). Due to incorrect linkages, we get a circular reference in parentxid <-> subxid inducing the infinite loop.

Any further DML on these objects which will need to check visibility of these tuples hangs as well. All unrelated objects and new transactions are ok AFAICS.

I do not see any data loss, which is good. However tables involved in the 2PC are inaccessible till after a hard restart.

The attached TAP test patch can be considered for commit to test handling 2PC with large subtransactions on promoted standby instances.

Regards,

Nikhils

Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services

Attachment

subxid_bug_with_test_case_v1.0.patch

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

From

Tom Lane

Date:

26 April 2017, 20:28:46

Nikhil Sontakke <nikhils@2ndquadrant.com> writes:
> A SELECT query on the newly promoted master on any of the tables involved
> in the 2PC hangs. The hang is due to a loop in
> SubTransGetTopmostTransaction(). Due to incorrect linkages, we get a
> circular reference in parentxid <-> subxid inducing the infinite loop.

I'm inclined to propose that

(1) SubTransSetParent should contain an assert that Assert(TransactionIdFollows(xid, parent));
There is a similar assertion near one of the call sites, but that's
obviously too far away from the action.

(2) SubTransGetTopmostTransaction ought to contain an actual runtime
test and ereport (not just Assert) that the parent XID it got from
pg_subtrans precedes the child XID that was looked up.  This would
protect us against similar infinite loops in the future.  Even without
bugs, such a loop could arise due to corrupted data.
        regards, tom lane

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly

From

Simon Riggs

Date:

26 April 2017, 23:01:28

On 26 April 2017 at 15:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Nikhil Sontakke <nikhils@2ndquadrant.com> writes:
>> A SELECT query on the newly promoted master on any of the tables involved
>> in the 2PC hangs. The hang is due to a loop in
>> SubTransGetTopmostTransaction(). Due to incorrect linkages, we get a
>> circular reference in parentxid <-> subxid inducing the infinite loop.
>
> I'm inclined to propose that
>
> (1) SubTransSetParent should contain an assert that
>         Assert(TransactionIdFollows(xid, parent));
> There is a similar assertion near one of the call sites, but that's
> obviously too far away from the action.
>
> (2) SubTransGetTopmostTransaction ought to contain an actual runtime
> test and ereport (not just Assert) that the parent XID it got from
> pg_subtrans precedes the child XID that was looked up.  This would
> protect us against similar infinite loops in the future.  Even without
> bugs, such a loop could arise due to corrupted data.

Pretty much what I was thinking.


I've added code following Michael and Tom's comments to the previous
patch. New patch attached.

Passes with and without Nikhil's new test.

I plan to apply both patches tomorrow morning my time to give time for
further comments.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

2PC_rework.v2.patch

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

From

Tom Lane

Date:

26 April 2017, 23:25:39

Simon Riggs <simon@2ndquadrant.com> writes:
> I've added code following Michael and Tom's comments to the previous
> patch. New patch attached.

Couple of minor suggestions:

* Rather than deleting the comment for SubTransSetParent entirely,
maybe make it say "It's possible that the parent was already recorded.
However, we should never be asked to change an already-set entry to
something else."

* In SubTransGetTopmostTransaction, maybe it'd be better to spell
"TransactionIdFollowsOrEquals" as "!TransactionIdPrecedes" to make
it look more like the test just above.  Matter of taste though.

* Less a matter of taste is that I think that should be just elog(ERROR);
there's no good reason to make it FATAL.

* Also, I think there should be a comment there, along the lines of
"check for reversed linkage to ensure this isn't an infinite loop".
        regards, tom lane

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly

From

Michael Paquier

Date:

27 April 2017, 08:10:34

On Thu, Apr 27, 2017 at 2:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> I've added code following Michael and Tom's comments to the previous
>> patch. New patch attached.
>
> Couple of minor suggestions:
>
> * Rather than deleting the comment for SubTransSetParent entirely,
> maybe make it say "It's possible that the parent was already recorded.
> However, we should never be asked to change an already-set entry to
> something else."
>
> * In SubTransGetTopmostTransaction, maybe it'd be better to spell
> "TransactionIdFollowsOrEquals" as "!TransactionIdPrecedes" to make
> it look more like the test just above.  Matter of taste though.
>
> * Less a matter of taste is that I think that should be just elog(ERROR);
> there's no good reason to make it FATAL.
>
> * Also, I think there should be a comment there, along the lines of
> "check for reversed linkage to ensure this isn't an infinite loop".

No more comments from here, thanks for working on the patch.
-- 
Michael