Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly
Date
Msg-id 20170423001931.wk4os4x6dwtdfwgd@alap3.anarazel.de
Whole thread Raw
In response to [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
On 2017-04-22 19:55:18 -0400, Tom Lane wrote:
> Now that we've got consistent failure reports about the 009_twophase.pl
> recovery test, I set out to find out why it's failing.  It looks to me
> like the reason is that this (twophase.c:2145):
>
>             SubTransSetParent(xid, subxid, overwriteOK);
>
> ought to be this:
>
>             SubTransSetParent(subxid, xid, overwriteOK);
>
> because the definition of SubTransSetParent is
>
> void
> SubTransSetParent(TransactionId xid, TransactionId parent, bool overwriteOK)
>
> not the other way 'round.
>
> While "git blame" blames this line on the recent commit 728bd991c,
> that just moved the call from somewhere else.  AFAICS this has actually
> been wrong since StandbyRecoverPreparedTransactions was written,
> in 361bd1662 of 2010-04-13.

> Also, when I fix that, it gets further but still crashes at the same
> Assert in SubTransSetParent.  The proximate cause this time seems to be
> that RecoverPreparedTransactions's calculation of overwriteOK is wrong:
> it's computing that as "false", but in reality the subtrans link in
> question has already been set.
>

Yikes.  This is clearly way undertested.  It's also pretty scary that
the code has recently been whacked out quite heavily (both 9.6 and
master), without hitting anything around this - doesn't seem to bode
well for how in-depth the testing was.


> It's not clear to me how much potential this has to create user data
> corruption, but it doesn't look good at first glance.  Discuss.

Hm. I think it can cause wrong tqual.c results in some edge cases.
During HS, lastOverflowedXid will be set in some cases, and then
XidInMVCCSnapshot etc will not find the actual toplevel xid, which'll in
turn cause lookups snapshot->subxip (where all HS xids reside)
potentially return wrong results.

I was about to say that I don't see how it could result in persistent
corruption however - the subtrans lookups are only done for
(snapshot->xmin, snapshot->xmax] and subtrans is truncated
regularly. But these days CHECKPOINT_END_OF_RECOVERY isn't obligatory
anymore, so that might be delayed.  Hm.


- Andres



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly
Next
From: Andrew Dunstan
Date:
Subject: Re: [HACKERS] PostgresNode::append_conf considered dangerous