Re: table_tuple_lock's snapshot argument - Mailing list pgsql-hackers

From Ranier Vilela
Subject Re: table_tuple_lock's snapshot argument
Date
Msg-id CAEudQAr-0oLQ3WdmEdN854EE2BhPzS_ftKbHJ1hw2trdK=oQFA@mail.gmail.com
Whole thread Raw
In response to Re: table_tuple_lock's snapshot argument  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: table_tuple_lock's snapshot argument
List pgsql-hackers
Em seg., 10 de mar. de 2025 às 12:22, Heikki Linnakangas <hlinnaka@iki.fi> escreveu:
On 21/01/2025 12:05, Alexander Korotkov wrote:
> On Tue, Jan 21, 2025 at 12:03 PM Alexander Korotkov
> <aekorotkov@gmail.com> wrote:
>> On Sun, Dec 29, 2024 at 3:59 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>> However, I think GetLatestSnapshot() is wrong here anyway. None of this
>>> matters for heapam which just ignores the 'snapshot' argument, but let's
>>> assume a different AM that would use the snapshot to identify the tuple
>>> version. The TID was fetched from an index using an index scan with
>>> SnapshotDirty. There's no guarantee that the LatestSnapshot would match
>>> the same tuple version that the index scan found. If an MVCC snapshot is
>>> needed, surely it should be acquired before the index scan, and used for
>>> the index scan as well.
>>>
>>> I see three options:
>>>
>>> 1. Remove the 'snapshot' argument, since it's not used by heapam. If we
>>> get a table AM where a single TID represents multiple row versions, this
>>> will need to be revisited.
>>>
>>> 2. Rewrite the recheck code in execReplication.c so that it uses the
>>> snapshot in a more consistent fashion. Call GetLatestSnapshot() first,
>>> and use the same snapshot in the index scan and table_tuple_lock().
>>> Acquiring a snapshot isn't free though, so it would be nice to avoid
>>> doing that when the heapam is just going to ignore it anyway. If we go
>>> with this option, I think we could reuse the snapshot that is already
>>> active in most cases, and only take a new snapshot if the tuple was
>>> concurrently updated.
>>>
>>> 3. Modify the tableam interface so that the index scan can return a more
>>> unique identifier of the tuple version. In heapam, it could be the TID
>>> like today, but a different AM could return some other token. Continue
>>> to use SnapshotDirty in the index scan, but in the call to
>>> table_tuple_lock(), instead of passing GetLatestSnapshot() and TID, pass
>>> the token you got index_getnext_slot().
>>
>> Thank you for your finding and analysis.  I would vote for #3.
>> Currently snapshot argument is not properly used, and the heapam code
>> doesn't check it.  I think allowing flexible tuple identifier is
>> essential for future of table AM API.  Therefore, any
>> snapshot/visibility information could be put inside that tuple
>> identifier if needed.
>
> To be more specific, I vote for #1 + #3. Remove the snapshot argument
> for now, then work on flexible tuple identifier.

I committed and backpatched the trivial fix for not, just replacing
GetLatestSnapshot() with GetActiveSnapshot(). I got cold feet on just
removing the argument without providing a replacement, because it's not
100% clear to me if the current situation is broken or not.

To summarize the issue, RelationFindReplTupleByIndex() performs an index
scan with DirtySnapshot, and then calls table_tuple_lock() with a fresh
MVCC snapshot to lock the row. There's no guarantee that the index scan
found the same row version that table_tuple_lock() will lock, if the TID
alone doesn't uniquely identify it. But that's still OK as long as the
key column hasn't changed, like with heapam's HOT updates. I couldn't
convince myself that it's broken nor that it's guaranteed to be correct
with other AMs.
Just for curiosity.
*RelationFindReplTupleSeq* on the same source, does not suffer
from the same issue?

PushActiveSnapshot(GetLatestSnapshot());

res = table_tuple_lock(rel, &(outslot->tts_tid), GetLatestSnapshot(),

best regards,
Ranier Vilela

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Printing window function OVER clauses in EXPLAIN
Next
From: Heikki Linnakangas
Date:
Subject: Re: table_tuple_lock's snapshot argument