Re: Snapshot related assert failure on skink - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Snapshot related assert failure on skink
Date
Msg-id 605d6217-1050-43c8-83f5-7c52598c54cc@iki.fi
Whole thread Raw
In response to Re: Snapshot related assert failure on skink  (Tomas Vondra <tomas@vondra.me>)
Responses Re: Snapshot related assert failure on skink
Re: Snapshot related assert failure on skink
List pgsql-hackers
On 19/03/2025 04:22, Tomas Vondra wrote:
> I kept stress-testing this, and while the frequency massively increased
> on PG18, I managed to reproduce this all the way back to PG14. I see
> ~100x more corefiles on PG18.
> 
> That is not a proof the issue was introduced in PG14, maybe it's just
> the assert that was added there or something. Or maybe there's another
> bug in PG18, making the impact worse.
> 
> But I'd suspect this is a bug in
> 
> commit 623a9ba79bbdd11c5eccb30b8bd5c446130e521c
> Author: Andres Freund <andres@anarazel.de>
> Date:   Mon Aug 17 21:07:10 2020 -0700
> 
>      snapshot scalability: cache snapshots using a xact completion counter.
> 
>      Previous commits made it faster/more scalable to compute snapshots.
> But not
>      building a snapshot is still faster. Now that GetSnapshotData() does not
>      maintain RecentGlobal* anymore, that is actually not too hard:
> 
>      ...

Looking at the code, shouldn't ExpireAllKnownAssignedTransactionIds() 
and ExpireOldKnownAssignedTransactionIds() update xactCompletionCount? 
This can happen during hot standby:

1. Backend acquires snapshot A with xmin 1000
2. Startup process calls ExpireOldKnownAssignedTransactionIds(),
3. Backend acquires snapshot B with xmin 1050
4. Backend releases snapshot A, updating TransactionXmin to 1050
5. Backend acquires new snapshot, calls GetSnapshotDataReuse(), reusing 
snapshot A's data.

Because xactCompletionCount is not updated in step 2, the 
GetSnapshotDataReuse() call will reuse the snapshot A. But snapshot A 
has a lower xmin.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




pgsql-hackers by date:

Previous
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
Next
From: Dmitry Dolgov
Date:
Subject: Re: pg_stat_statements and "IN" conditions