Re: Snapshot related assert failure on skink - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Snapshot related assert failure on skink
Date
Msg-id 82a7e769-8189-401c-a666-248e9cef2a04@vondra.me
Whole thread Raw
In response to Re: Snapshot related assert failure on skink  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Snapshot related assert failure on skink
List pgsql-hackers

On 3/23/25 17:43, Heikki Linnakangas wrote:
> On 21/03/2025 17:16, Andres Freund wrote:
>> Am I right in understanding that the only scenario (when in
>> STANDBY_SNAPSHOT_READY), where ExpireOldKnownAssignedTransactionIds()
>> would
>> "legally" remove a transaction, rather than the commit / abort records
>> doing
>> so, is if the primary crash-restarted while transactions were ongoing?
>>
>> Those transactions won't have a commit/abort records and thus won't
>> trigger
>> ExpireTreeKnownAssignedTransactionIds(), which otherwise would have
>> updated
>> ->xactCompletionCount?
> 
> Correct.
> 
>> When writing the snapshot caching patch, I tried to make sure that all
>> the
>> places that maintain ->latestCompletedXid also update
>> ->xactCompletionCount. Afaict that's still the case. Which, I think,
>> means
>> that we're also missing calls to MaintainLatestCompletedXidRecovery()?
> 
> Yep, I was just looking at that too.
> 
>> If latestCompletedXid is incorrect visibility determinations end up
>> wrong...
> 
> I think it happens to work, because the transactions are effectively
> aborted. latestCompletedXid is used to initialize xmax in
> GetSnapshotData. If, for example, latestCompletedXid is incorrectly set
> to 1000 even though XID 1001 already aborted, a snapshot with xmax=1000
> still correctly considers XID 1001 as "not visible". As soon as a
> transaction commits, latestCompletedXid is fixed.
> 
> AFAICS we could skip updating latestCompletedXid on aborts altogether
> and rename it to latestCommittedXid. But it's hardly worth optimizing
> for aborts.
> 
> For the same reason, I believe the assertion failure we're discussing
> here is also harmless. Even though the reused snapshot has a smaller
> xmin than expected, all those transactions aborted and are thus not
> visible anyway.
> 
> In any case, let's be tidy and fix both latestCompletedXid and
> xactCompletionCount.
> 

Thanks for looking into this and pushing the fix.

Would it make sense to add a comment documenting this reasoning about
not handling aborts? Otherwise someone will get to rediscover this in
the future ...


regards

-- 
Tomas Vondra




pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: vacuum_truncate configuration parameter and isset_offset
Next
From: Nikolay Shaplov
Date:
Subject: Re: vacuum_truncate configuration parameter and isset_offset