Hi,
On 2023-01-09 13:55:02 -0800, Mark Dilger wrote:
> > On Jan 9, 2023, at 11:34 AM, Andres Freund <andres@anarazel.de> wrote:
> >
> > 1) Because ctx->next_xid is set after the XidFromFullTransactionId() call in
> > update_cached_xid_range(), we end up using the xid 0 (or an outdated value in
> > subsequent calls) to determine whether epoch needs to be reduced.
>
> Can you say a bit more about your analysis here, preferably with pointers to
> the lines of code you are analyzing? Does the problem exist in amcheck as
> currently committed, or are you thinking about a problem that arises only
> after applying your patch? I'm a bit fuzzy on where xid 0 gets used.
The problems exist in the code as currently committed. I'm not sure what
exactly the consequences are, the result is that oldest_fxid will be, at least
temporarily, bogus.
Consider the first call to update_cached_xid_range():
/*
* Update our cached range of valid transaction IDs.
*/
static void
update_cached_xid_range(HeapCheckContext *ctx)
{
/* Make cached copies */
LWLockAcquire(XidGenLock, LW_SHARED);
ctx->next_fxid = ShmemVariableCache->nextXid;
ctx->oldest_xid = ShmemVariableCache->oldestXid;
LWLockRelease(XidGenLock);
/* And compute alternate versions of the same */
ctx->oldest_fxid = FullTransactionIdFromXidAndCtx(ctx->oldest_xid, ctx);
ctx->next_xid = XidFromFullTransactionId(ctx->next_fxid);
}
The problem is that the call to FullTransactionIdFromXidAndCtx() happens
before ctx->next_xid is assigned, even though FullTransactionIdFromXidAndCtx()
uses ctx->next_xid.
static FullTransactionId
FullTransactionIdFromXidAndCtx(TransactionId xid, const HeapCheckContext *ctx)
{
uint32 epoch;
if (!TransactionIdIsNormal(xid))
return FullTransactionIdFromEpochAndXid(0, xid);
epoch = EpochFromFullTransactionId(ctx->next_fxid);
if (xid > ctx->next_xid)
epoch--;
return FullTransactionIdFromEpochAndXid(epoch, xid);
}
Because ctx->next_xid is 0, due to not having been set yet, "xid > ctx->next_xid"
will always be true, leading to epoch being reduced by one.
In the common case of there never having been an xid wraparound, we'll thus
underflow epoch, generating an xid far into the future.
The tests encounter the issue today. If you add
Assert(TransactionIdIsValid(ctx->next_xid));
Assert(FullTransactionIdIsValid(ctx->next_fxid));
early in FullTransactionIdFromXidAndCtx() it'll be hit in the
amcheck/pg_amcheck tests.
Greetings,
Andres Freund