On 3/23/25 17:43, Heikki Linnakangas wrote:
> On 21/03/2025 17:16, Andres Freund wrote:
>> Am I right in understanding that the only scenario (when in
>> STANDBY_SNAPSHOT_READY), where ExpireOldKnownAssignedTransactionIds()
>> would
>> "legally" remove a transaction, rather than the commit / abort records
>> doing
>> so, is if the primary crash-restarted while transactions were ongoing?
>>
>> Those transactions won't have a commit/abort records and thus won't
>> trigger
>> ExpireTreeKnownAssignedTransactionIds(), which otherwise would have
>> updated
>> ->xactCompletionCount?
>
> Correct.
>
>> When writing the snapshot caching patch, I tried to make sure that all
>> the
>> places that maintain ->latestCompletedXid also update
>> ->xactCompletionCount. Afaict that's still the case. Which, I think,
>> means
>> that we're also missing calls to MaintainLatestCompletedXidRecovery()?
>
> Yep, I was just looking at that too.
>
>> If latestCompletedXid is incorrect visibility determinations end up
>> wrong...
>
> I think it happens to work, because the transactions are effectively
> aborted. latestCompletedXid is used to initialize xmax in
> GetSnapshotData. If, for example, latestCompletedXid is incorrectly set
> to 1000 even though XID 1001 already aborted, a snapshot with xmax=1000
> still correctly considers XID 1001 as "not visible". As soon as a
> transaction commits, latestCompletedXid is fixed.
>
> AFAICS we could skip updating latestCompletedXid on aborts altogether
> and rename it to latestCommittedXid. But it's hardly worth optimizing
> for aborts.
>
> For the same reason, I believe the assertion failure we're discussing
> here is also harmless. Even though the reused snapshot has a smaller
> xmin than expected, all those transactions aborted and are thus not
> visible anyway.
>
> In any case, let's be tidy and fix both latestCompletedXid and
> xactCompletionCount.
>
Thanks for looking into this and pushing the fix.
Would it make sense to add a comment documenting this reasoning about
not handling aborts? Otherwise someone will get to rediscover this in
the future ...
regards
--
Tomas Vondra