Re: Snapshot related assert failure on skink - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Snapshot related assert failure on skink
Date
Msg-id 697c7314-ed97-491f-815e-0f58064b68a4@iki.fi
Whole thread Raw
In response to Re: Snapshot related assert failure on skink  (Andres Freund <andres@anarazel.de>)
Responses Re: Snapshot related assert failure on skink
List pgsql-hackers
On 21/03/2025 17:16, Andres Freund wrote:
> Am I right in understanding that the only scenario (when in
> STANDBY_SNAPSHOT_READY), where ExpireOldKnownAssignedTransactionIds() would
> "legally" remove a transaction, rather than the commit / abort records doing
> so, is if the primary crash-restarted while transactions were ongoing?
> 
> Those transactions won't have a commit/abort records and thus won't trigger
> ExpireTreeKnownAssignedTransactionIds(), which otherwise would have updated
> ->xactCompletionCount?

Correct.

> When writing the snapshot caching patch, I tried to make sure that all the
> places that maintain ->latestCompletedXid also update
> ->xactCompletionCount. Afaict that's still the case. Which, I think, means
> that we're also missing calls to MaintainLatestCompletedXidRecovery()?

Yep, I was just looking at that too.

> If latestCompletedXid is incorrect visibility determinations end up wrong...

I think it happens to work, because the transactions are effectively 
aborted. latestCompletedXid is used to initialize xmax in 
GetSnapshotData. If, for example, latestCompletedXid is incorrectly set 
to 1000 even though XID 1001 already aborted, a snapshot with xmax=1000 
still correctly considers XID 1001 as "not visible". As soon as a 
transaction commits, latestCompletedXid is fixed.

AFAICS we could skip updating latestCompletedXid on aborts altogether 
and rename it to latestCommittedXid. But it's hardly worth optimizing 
for aborts.

For the same reason, I believe the assertion failure we're discussing 
here is also harmless. Even though the reused snapshot has a smaller 
xmin than expected, all those transactions aborted and are thus not 
visible anyway.

In any case, let's be tidy and fix both latestCompletedXid and 
xactCompletionCount.

-- 
Heikki Linnakangas
Neon (https://neon.tech)



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: AIO v2.5
Next
From: Andres Freund
Date:
Subject: Re: AIO v2.5