Re: [BUG?] check_exclusion_or_unique_constraint false negative - Mailing list pgsql-hackers

From Mihail Nikalayeu
Subject Re: [BUG?] check_exclusion_or_unique_constraint false negative
Date
Msg-id CADzfLwWuXh8KO=OZvB71pZnQ8nH0NYXfuGbFU6FBiVZUbmuFGg@mail.gmail.com
Whole thread Raw
In response to Re: [BUG?] check_exclusion_or_unique_constraint false negative  (Michail Nikolaev <michail.nikolaev@gmail.com>)
List pgsql-hackers
Hello, everyone and Peter!

Peter, I have added you because you may be interested in (or already know about) this btree-related issue.

Short description of the problem:

I noticed a concurrency issue in btree index scans that affects SnapshotDirty and SnapshotSelf scan types.
When using these non-MVCC snapshot types, a scan could miss tuples if concurrent transactions delete existing tuples and insert new one with different TIDs on the same page.

The problem occurs because:
  1. The scan reads a page and caches its tuples in backend-local storage
  2. A concurrent transaction deletes a tuple and inserts a new one with a different TID
  3. The scan misses the new tuple because it was already deleted by a committed transaction and does not pass visibility check
  4. But new version on the page is missed, because not in cached tuples

This may cause issues with:
  - logical replication (RelationFindReplTupleByIndex fail) - invalid conflict message (MISSING instead of ORIGIN_DIFFERS), probably other issues with upcoming conflict resolution for logical replication
  - check_exclusion_or_unique_constraint false negative (but currently it does not cause any real issues as far as I can see)

The fix implemented in this version of the patch:
  - Retains the read lock on a page for SnapshotDirty and SnapshotSelf scans until we're completely done with all tuples from that page
  - Introduces a new 'extra_unlock' field in BTScanPos to track when a lock is being held longer than usual
  - Updates documentation to explain this special locking behavior

Yes, it may cause some degradation in performance because of that additional lock.
Another possible idea is to use a fresh MVCC snapshot for such cases (but I think it is still better to fix or at least document that issue anyway).

Best regards,
Mikhail.
Attachment

pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Conflict detection for multiple_unique_conflicts in logical replication
Next
From: Noah Misch
Date:
Subject: Re: Back-patch of: avoid multiple hard links to same WAL file after a crash