On 2017-04-24 13:29:11 +0100, Simon Riggs wrote:
> On 24 April 2017 at 00:25, Andres Freund <andres@anarazel.de> wrote:
> > if the subxid->xid mapping doesn't actually exist - as it's the case
> > with this bug afaics - we'll not get the correct toplevel
> > transaction.
>
> The nature of the corruption is that in some cases
> * a subxid will point to nothing (even though in most cases it was
> already set correctly)
> * the parent will point to a subxid
Right. Those cases aren't that different from the point of trying to
find the parent of an subxid.
> > Which'll mean the following block:
> > /*
> > * We now have either a top-level xid higher than xmin or an
> > * indeterminate xid. We don't know whether it's top level or subxact
> > * but it doesn't matter. If it's present, the xid is visible.
> > */
> > for (j = 0; j < snapshot->subxcnt; j++)
> > {
> > if (TransactionIdEquals(xid, snapshot->subxip[j]))
> > return true;
> > }
> > won't work correctly if suboverflowed.
>
> Your example of snapshots taken during recovery is not correct.
Oh?
> Note that SubTransGetTopmostTransaction() returns a valid, running
> xid, even though it is the wrong one.
Sure.
> Snapshots work differently on standbys - we store all known running
> xids, so the test still passes correctly, even when overflowed.
I don't think that's generally true. Isn't that precisely what
ProcArrayStruct->lastOverflowedXid is about? If we have a snapshot
that's suboverflowed due to the lastOverflowedXid cutoff, then we the
subxip array does *not* contain all known running xids anymore, we rely
on pg_subtrans to only guarantee that toplevel xids are stored in the
KnownAssignedXids machinery.
See:* When we throw away subXIDs from KnownAssignedXids, we need to keep track of* that, similarly to tracking overflow
ofa PGPROC's subxids array. We do* that by remembering the lastOverflowedXID, ie the last thrown-away subXID.* As long
asthat is within the range of interesting XIDs, we have to assume* that subXIDs are missing from snapshots. (Note that
subXIDoverflow occurs* on primary when 65th subXID arrives, whereas on standby it occurs when 64th* subXID arrives -
thatis not an error.)
/* * Highest subxid that has been removed from KnownAssignedXids array to * prevent overflow; or InvalidTransactionId
ifnone. We track this for * similar reasons to tracking overflowing cached subxids in PGXACT * entries. Must hold
exclusiveProcArrayLock to change this, and shared * lock to read it. */TransactionId lastOverflowedXid;
Greetings,
Andres Freund