It seems like I've identified the cause of the issue.
Currently, any DirtySnapshot (or SnapshotSelf) scan over a B-tree index may skip (not find the TID for) some records in the case of parallel updates.
The following scenario is possible:
* Session 1 reads a B-tree page using SnapshotDirty and copies item X to the buffer.
* Session 2 updates item X, inserting a new TID Y into the same page.
* Session 2 commits its transaction.
* Session 1 starts to fetch from the heap and tries to fetch X, but it was already deleted by session 2. So, it goes to the B-tree for the next TID.
* The B-tree goes to the next page, skipping Y.
* Therefore, the search finds nothing, but tuple Y is still alive.
This situation is somewhat controversial. DirtySnapshot might seem to show more (or more recent, even uncommitted) data than MVCC, but not less. So, DirtySnapshot scan over a B-tree does not provide any guarantees, as far as I understand.
Why does it work for MVCC? Because tuple X will be visible due to the snapshot, making Y unnecessary.
This might be "as designed," but I think it needs to be clearly documented (I couldn't find any documentation on this particular case, only _bt_drop_lock_and_maybe_pin - related).
Here are the potential consequences of the issue:
* check_exclusion_or_unique_constraint
It may not find a record in a UNIQUE index during INSERT ON CONFLICT UPDATE. However, this is just a minor performance issue.
* Exclusion constraints with B-tree, like ADD CONSTRAINT exclusion_data EXCLUDE USING btree (data WITH =)
It should work correctly because the first inserter may "skip" the TID from a concurrent inserter, but the second one should still find the TID from the first.
* RelationFindReplTupleByIndex
Amit, this is why I've included you in this previously solo thread :)
RelationFindReplTupleByIndex uses DirtySnapshot and may not find some records if they are updated by a parallel transaction. This could lead to lost deletes/updates, especially in the case of streaming=parallel mode.
I'm not familiar with how parallel workers apply transactions, so maybe this isn't possible.
Best regards,
Mikhail