Thread: [BUG?] check_exclusion_or_unique_constraint false negative

[BUG?] check_exclusion_or_unique_constraint false negative

From

Michail Nikolaev

Date:

14 July 2024, 18:01:50

Hello, everyone!

While reviewing [1], I noticed that check_exclusion_or_unique_constraint occasionally returns false negatives for btree unique indexes during UPSERT operations.

Although this doesn't cause any real issues with INSERT ON CONFLICT, I wanted to bring it to your attention, as it might indicate an underlying problem.

Attached is a patch to reproduce the issue.

make -C src/test/modules/test_misc/ check PROVE_TESTS='t/006_*'

....

Failed test 'concurrent INSERTs status (got 2 vs expected 0)'
# at t/006_concurrently_unique_fail.pl line 26.

# Failed test 'concurrent INSERTs stderr /(?^:^$)/'
# at t/006_concurrently_unique_fail.pl line 26.
# 'pgbench: error: client 34 script 0 aborted in command 0 query 0: ERROR: we know 31337 in the index!

Best regards,

Mikhail,

[1]: https://www.postgresql.org/message-id/flat/CANtu0ogs10w%3DDgbYzZ8MswXE3PUC3J4SGDc0YEuZZeWbL0b6HA%40mail.gmail.com#8c01dcf6051e28c47d25e9471736947e

Attachment

test_+_assert_.pat

Re: [BUG?] check_exclusion_or_unique_constraint false negative

From

Michail Nikolaev

Date:

21 July 2024, 15:27:01

Hello, Andres.

Sorry to bother you, but I feel it's necessary to validate the possible issue regarding someone who can decide whether it is okay or not.
The issue is reproducible with the first UPSERT implementation (your commit 168d5805e4c08bed7b95d351bf097cff7c07dd65 from 2015) and up to now.

The problem appears as follows:
* A unique index contains a specific value (in the test, it is the only value for the entire index).
* check_exclusion_or_unique_constraint returns FALSE for that value in some random cases.
* Technically, this means index_getnext finds 0 records, even though we know the value exists in the index.

I was able to reproduce this only with an UNLOGGED table.
I can't find any scenarios that are actually broken (since the issue is resolved by speculative insertion later), but this looks suspicious to me. It could be a symptom of some tricky race condition in the btree.

Best regards,
Mikhail

Re: [BUG?] check_exclusion_or_unique_constraint false negative

From

Michail Nikolaev

Date:

24 July 2024, 20:01:23

Hello, everyone!

Updates so far:

* issue happens with both LOGGED and UNLOGGED relations

* issue happens with DirtySnapshot

* not happens with SnapshotSelf

* not happens with SnapshotAny

* not related to speculative inserted tuples - I have commented the code of its insertion - and the issue continues to occur.

Best regards,

Mikhail.

Re: [BUG?] check_exclusion_or_unique_constraint false negative

From

Michail Nikolaev

Date:

31 July 2024, 20:57:00

It seems like I've identified the cause of the issue.

Currently, any DirtySnapshot (or SnapshotSelf) scan over a B-tree index may skip (not find the TID for) some records in the case of parallel updates.

The following scenario is possible:

* Session 1 reads a B-tree page using SnapshotDirty and copies item X to the buffer.
* Session 2 updates item X, inserting a new TID Y into the same page.
* Session 2 commits its transaction.
* Session 1 starts to fetch from the heap and tries to fetch X, but it was already deleted by session 2. So, it goes to the B-tree for the next TID.
* The B-tree goes to the next page, skipping Y.
* Therefore, the search finds nothing, but tuple Y is still alive.

This situation is somewhat controversial. DirtySnapshot might seem to show more (or more recent, even uncommitted) data than MVCC, but not less. So, DirtySnapshot scan over a B-tree does not provide any guarantees, as far as I understand.
Why does it work for MVCC? Because tuple X will be visible due to the snapshot, making Y unnecessary.
This might be "as designed," but I think it needs to be clearly documented (I couldn't find any documentation on this particular case, only _bt_drop_lock_and_maybe_pin - related).

Here are the potential consequences of the issue:

* check_exclusion_or_unique_constraint

It may not find a record in a UNIQUE index during INSERT ON CONFLICT UPDATE. However, this is just a minor performance issue.

* Exclusion constraints with B-tree, like ADD CONSTRAINT exclusion_data EXCLUDE USING btree (data WITH =)

It should work correctly because the first inserter may "skip" the TID from a concurrent inserter, but the second one should still find the TID from the first.

* RelationFindReplTupleByIndex

Amit, this is why I've included you in this previously solo thread :)
RelationFindReplTupleByIndex uses DirtySnapshot and may not find some records if they are updated by a parallel transaction. This could lead to lost deletes/updates, especially in the case of streaming=parallel mode.
I'm not familiar with how parallel workers apply transactions, so maybe this isn't possible.

Best regards,
Mikhail

RE: [BUG?] check_exclusion_or_unique_constraint false negative

From

"Hayato Kuroda (Fujitsu)"

Date:

01 August 2024, 05:54:37

Dear Michail,

Thanks for pointing out the issue!

>* RelationFindReplTupleByIndex
>
>Amit, this is why I've included you in this previously solo thread :)
>RelationFindReplTupleByIndex uses DirtySnapshot and may not find some records
>if they are updated by a parallel transaction. This could lead to lost
>deletes/updates, especially in the case of streaming=parallel mode. 
>I'm not familiar with how parallel workers apply transactions, so maybe this
>isn't possible.

IIUC, the issue can happen when two concurrent transactions using DirtySnapshot access
the same tuples, which is not specific to the parallel apply. Consider that two
subscriptions exist and publishers modify the same tuple of the same table.
In this case, two workers access the tuple, so one of the changes may be missed
by the scenario you said. I feel we do not need special treatments for parallel
apply.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Re: [BUG?] check_exclusion_or_unique_constraint false negative

From

Michail Nikolaev

Date:

01 August 2024, 09:25:36

Hello, Hayato!

> Thanks for pointing out the issue!

Thanks for your attention!

> IIUC, the issue can happen when two concurrent transactions using DirtySnapshot access
> the same tuples, which is not specific to the parallel apply

Not exactly, it happens for any DirtySnapshot scan over a B-tree index with some other transaction updating the same index page (even using the MVCC snapshot).

So, logical replication related scenario looks like this:

* subscriber worker receives a tuple update\delete from the publisher
* it calls RelationFindReplTupleByIndex to find the tuple in the local table
* some other transaction updates the tuple in the local table (on subscriber side) in parallel
* RelationFindReplTupleByIndex may not find the tuple because it uses DirtySnapshot
* update\delete is lost

Parallel apply mode looks like more dangerous because it uses multiple workers on the subscriber side, so the probability of the issue is higher.
In that case, "some other transaction" is just another worker applying changes of different transaction in parallel.

Best regards,
Mikhail.

Re: [BUG?] check_exclusion_or_unique_constraint false negative

From

Amit Kapila

Date:

02 August 2024, 04:56:40

On Thu, Aug 1, 2024 at 2:55 PM Michail Nikolaev
<michail.nikolaev@gmail.com> wrote:
>
> > Thanks for pointing out the issue!
>
> Thanks for your attention!
>
> > IIUC, the issue can happen when two concurrent transactions using DirtySnapshot access
> > the same tuples, which is not specific to the parallel apply
>
> Not exactly, it happens for any DirtySnapshot scan over a B-tree index with some other transaction updating the same
indexpage (even using the MVCC snapshot). 
>
> So, logical replication related scenario looks like this:
>
> * subscriber worker receives a tuple update\delete from the publisher
> * it calls RelationFindReplTupleByIndex to find the tuple in the local table
> * some other transaction updates the tuple in the local table (on subscriber side) in parallel
> * RelationFindReplTupleByIndex may not find the tuple because it uses DirtySnapshot
> * update\delete is lost
>
> Parallel apply mode looks like more dangerous because it uses multiple workers on the subscriber side, so the
probabilityof the issue is higher. 
> In that case, "some other transaction" is just another worker applying changes of different transaction in parallel.
>

I think it is rather less likely or not possible in a parallel apply
case because such conflicting updates (updates on the same tuple)
should be serialized at the publisher itself. So one of the updates
will be after the commit that has the second update.

I haven't tried the test based on your description of the general
problem with DirtySnapshot scan. In case of logical replication, we
will LOG update_missing type of conflict and the user may need to take
some manual action based on that. I have not tried a test so I could
be wrong as well. I am not sure we can do anything specific to logical
replication for this but feel free to suggest if you have ideas to
solve this problem in general or specific to logical replication.

--
With Regards,
Amit Kapila.

Re: [BUG?] check_exclusion_or_unique_constraint false negative

From

Michail Nikolaev

Date:

02 August 2024, 17:08:39

Hello, Amit!

> I think it is rather less likely or not possible in a parallel apply
> case because such conflicting updates (updates on the same tuple)
> should be serialized at the publisher itself. So one of the updates
> will be after the commit that has the second update.

Glad to hear! But anyway, such logic looks very fragile to me.

> I haven't tried the test based on your description of the general
> problem with DirtySnapshot scan. In case of logical replication, we
> will LOG update_missing type of conflict and the user may need to take
> some manual action based on that.

Current it is just DEBUG1, so it will be probably missed by the user.

> * XXX should this be promoted to ereport(LOG) perhaps?
> */
> elog(DEBUG1,
> "logical replication did not find row to be updated "
> "in replication target relation \"%s\"",
> RelationGetRelationName(localrel));
> }

> I have not tried a test so I could
> be wrong as well. I am not sure we can do anything specific to logical
> replication for this but feel free to suggest if you have ideas to
> solve this problem in general or specific to logical replication.

I've implemented a solution to address the problem more generally, attached the patch (and also the link [1]).

Here's a summary of the changes:

* For each tuple skipped because it was deleted, we now accumulate the maximum xmax.
* Before the scan begins, we store the value of the latest completed transaction.
* If no tuples are found in the index, we check the max(xmax) value. If this value is newer than the latest completed transaction stored before the scan, it indicates that a tuple was deleted by another transaction after the scan started. To ensure all tuples are correctly processed we then rescan the index.

Also added a test case to cover this scenario using the new injection point mechanism and
updated the b-tree index documentation to include a description of this case.

I'll add this into the next commitfest.

Best regards,
Mikhail.

[1]: https://github.com/postgres/postgres/compare/master...michail-nikolaev:postgres:concurrent_unique

Attachment

v1-0001-fix-for-lost-record-in-case-of-DirtySnapshot-inde.patch

Re: [BUG?] check_exclusion_or_unique_constraint false negative

From

Amit Kapila

Date:

05 August 2024, 10:16:20

On Fri, Aug 2, 2024 at 10:38 PM Michail Nikolaev
<michail.nikolaev@gmail.com> wrote:
>
> > I think it is rather less likely or not possible in a parallel apply
> > case because such conflicting updates (updates on the same tuple)
> > should be serialized at the publisher itself. So one of the updates
> > will be after the commit that has the second update.
>
> Glad to hear! But anyway, such logic looks very fragile to me.
>
> > I haven't tried the test based on your description of the general
> > problem with DirtySnapshot scan. In case of logical replication, we
> > will LOG update_missing type of conflict and the user may need to take
> > some manual action based on that.
>
> Current it is just DEBUG1, so it will be probably missed by the user.
>
> > * XXX should this be promoted to ereport(LOG) perhaps?
> > */
> > elog(DEBUG1,
> > "logical replication did not find row to be updated "
> > "in replication target relation \"%s\"",
> > RelationGetRelationName(localrel));
> > }
>

Right, but we are extending this functionality to detect and resolve
such conflicts [1][2]. I am hoping after that such updates won't be
missed.

[1] - https://commitfest.postgresql.org/49/5064/
[2] - https://commitfest.postgresql.org/49/5021/


--
With Regards,
Amit Kapila.

Re: [BUG?] check_exclusion_or_unique_constraint false negative

From

Michail Nikolaev

Date:

05 August 2024, 11:11:50

Hello!

> Right, but we are extending this functionality to detect and resolve
> such conflicts [1][2]. I am hoping after that such updates won't be
> missed.

Yes, this is a nice feature. However, without the DirtySnapshot index scan fix, it will fail in numerous instances, especially in master-master replication.

The update_missing feature is helpful in this case, but it is still not the correct event because a real tuple exists, and we should receive update_differ instead. As a result, some conflict resolution systems may malfunction. For example, if the resolution method is set to apply_or_skip, it will insert the new row, causing two rows to exist. This system is quite fragile, and I am sure there are many more complicated scenarios that could arise.

Best regards,

Mikhail.

RE: [BUG?] check_exclusion_or_unique_constraint false negative

From

"Zhijie Hou (Fujitsu)"

Date:

12 August 2024, 03:32:32

Hi,

Thanks for reporting the issue !

I tried to reproduce this in logical replication but failed. If possible,
could you please share some steps to reproduce it in logicalrep context ?

In my test, if the tuple is updated and new tuple is in the same page,
heapam_index_fetch_tuple should find the new tuple using HOT chain. So, it's a
bit unclear to me how the updated tuple is missing. Maybe I missed some other
conditions for this issue.

It would be better if we can reproduce this by adding some breakpoints using
gdb, which may help us to write a tap test using injection point to reproduce
this reliably. I see the tap test you shared used pgbench to reproduce this,
it works, but It would be great if we can analyze the issue more deeply by
debugging the code.

And I have few questions related the steps you shared:

> * Session 1 reads a B-tree page using SnapshotDirty and copies item X to the buffer.
> * Session 2 updates item X, inserting a new TID Y into the same page.
> * Session 2 commits its transaction.
> * Session 1 starts to fetch from the heap and tries to fetch X, but it was
>   already deleted by session 2. So, it goes to the B-tree for the next TID.
> * The B-tree goes to the next page, skipping Y.
> * Therefore, the search finds nothing, but tuple Y is still alive.

I am wondering at which point should the update happen ? should it happen after
calling index_getnext_tid and before index_fetch_heap ? It would be great if
you could give more details in above steps. Thanks !

Best Regards,
Hou zj

Re: [BUG?] check_exclusion_or_unique_constraint false negative

From

Michail Nikolaev

Date:

12 August 2024, 11:11:26

Hello, Hou zj!

> In my test, if the tuple is updated and new tuple is in the same page,
> heapam_index_fetch_tuple should find the new tuple using HOT chain. So, it's a
> bit unclear to me how the updated tuple is missing. Maybe I missed some other
> conditions for this issue.

Yeah, I think the pgbench-based reproducer may also cause page splits in btree.
But we may add an index to the table to disable HOT.

I have attached a reproducer for this case using a spec and injection points.

I hope it helps, check the attached files.

Best regards,
Mikhail.

Attachment

RE: [BUG?] check_exclusion_or_unique_constraint false negative

From

"Zhijie Hou (Fujitsu)"

Date:

15 August 2024, 11:06:03

On Monday, August 12, 2024 7:11 PM Michail Nikolaev <michail.nikolaev@gmail.com>  wrote:
> > In my test, if the tuple is updated and new tuple is in the same page,
> > heapam_index_fetch_tuple should find the new tuple using HOT chain. So, it's a
> > bit unclear to me how the updated tuple is missing. Maybe I missed some other
> > conditions for this issue.
> 
> Yeah, I think the pgbench-based reproducer may also cause page splits in btree.
> But we may add an index to the table to disable HOT.
> 
> I have attached a reproducer for this case using a spec and injection points.
> 
> I hope it helps, check the attached files.

Thanks a lot for the steps!

I successfully reproduced the issue you mentioned in the context of logical
replication[1]. As you said, it could increase the possibility of tuple missing
when applying updates or deletes in the logical apply worker. I think this is a
long-standing issue and I will investigate the fix you proposed.

In addition, I think the bug is not a blocker for the conflict detection
feature. As the feature simply reports the current behavior of the logical
apply worker (either unique violation or tuple missing) without introducing any
new functionality. Furthermore, I think that the new ExecCheckIndexConstraints
call after ExecInsertIndexTuples() is not affected by the dirty snapshot bug.
This is because a tuple has already been inserted into the btree before the
dirty snapshot scan, which means that a concurrent non-HOT update would not be
possible (it would be blocked after finding the just inserted tuple and wait
for the apply worker to commit the current transaction).

It would be good if others could also share their opinion on this.

[1] The steps to reproduce the tuple missing in logical replication.

1. setup pub/sub env, and publish a table with 1 row.

pub:
CREATE TABLE t(a int primary key, b int);
INSERT INTO t VALUES(1,1);
CREATE PUBLICATION pub FOR TABLE t;

sub:
CREATE TABLE t (a int primary key, b int check (b < 5));
CREATE INDEX t_b_idx ON t(b);
CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres port=$port_publisher' PUBLICATION pub;

2. Execute an UPDATE(UPDATE t set b = b + 1) on the publisher and use gdb to
stop the apply worker at the point after index_getnext_tid() and before
index_fetch_heap().

3. execute a concurrent update(UPDATE t set b = b + 100) on the subscriber to
update a non-key column value and commit the update.

4. release the apply worker and it would report the update_missing conflict.

Best Regards,
Hou zj

Re: [BUG?] check_exclusion_or_unique_constraint false negative

From

Michail Nikolaev

Date:

16 August 2024, 16:31:55

Hello!

> In addition, I think the bug is not a blocker for the conflict detection
> feature. As the feature simply reports the current behavior of the logical
> apply worker (either unique violation or tuple missing) without introducing any
> new functionality. Furthermore, I think that the new ExecCheckIndexConstraints
> call after ExecInsertIndexTuples() is not affected by the dirty snapshot bug.
> This is because a tuple has already been inserted into the btree before the
> dirty snapshot scan, which means that a concurrent non-HOT update would not be
> possible (it would be blocked after finding the just inserted tuple and wait
> for the apply worker to commit the current transaction).

> It would be good if others could also share their opinion on this.

Yes, you are right. At least, I can't find any scenario for that case.

Best regards,

Mikhail.

Re: [BUG?] check_exclusion_or_unique_constraint false negative

From

Michail Nikolaev

Date:

21 October 2024, 20:06:59

Hello, Hou!

I have sent [0] reproducer within the context of conflict detection and resolution to the original thread.

[0]: https://www.postgresql.org/message-id/flat/CANtu0ojMjAwMRJK%3DH8y0YBB0ZEcN%2BJbdZeoXQn8dWO5F67jgsA%40mail.gmail.com#f5d1baf4702685aedf23daa9addc012e

Re: [BUG?] check_exclusion_or_unique_constraint false negative

From

Michail Nikolaev

Date:

12 February, 02:39:09

Hello!

I realize proposed solution does not guarantee absent of false negative cases...

It happens because I am looking just at XID values, but them have nothing with transaction commitment order in the common case.

I'll look for some other option.

Best regards,

Mikhail.

Re: [BUG?] check_exclusion_or_unique_constraint false negative

From

Mihail Nikalayeu

Date:

12 March, 04:05:50

Hello, everyone and Peter!

Peter, I have added you because you may be interested in (or already know about) this btree-related issue.

Short description of the problem:

I noticed a concurrency issue in btree index scans that affects SnapshotDirty and SnapshotSelf scan types.

When using these non-MVCC snapshot types, a scan could miss tuples if concurrent transactions delete existing tuples and insert new one with different TIDs on the same page.

The problem occurs because:
1. The scan reads a page and caches its tuples in backend-local storage
2. A concurrent transaction deletes a tuple and inserts a new one with a different TID
3. The scan misses the new tuple because it was already deleted by a committed transaction and does not pass visibility check

4. But new version on the page is missed, because not in cached tuples

This may cause issues with:

- logical replication (RelationFindReplTupleByIndex fail) - invalid conflict message (MISSING instead of ORIGIN_DIFFERS), probably other issues with upcoming conflict resolution for logical replication

- check_exclusion_or_unique_constraint false negative (but currently it does not cause any real issues as far as I can see)

The fix implemented in this version of the patch:
- Retains the read lock on a page for SnapshotDirty and SnapshotSelf scans until we're completely done with all tuples from that page
- Introduces a new 'extra_unlock' field in BTScanPos to track when a lock is being held longer than usual
- Updates documentation to explain this special locking behavior

Yes, it may cause some degradation in performance because of that additional lock.

Another possible idea is to use a fresh MVCC snapshot for such cases (but I think it is still better to fix or at least document that issue anyway).

Best regards,

Mikhail.

Attachment

Re: [BUG?] check_exclusion_or_unique_constraint false negative

From

Mihail Nikalayeu

Date:

18 May, 17:36:00

Hello, everyone!

Rebased + fix for compilation due the new INEJCTION_POINT signature.

Best regards,
Mikhail.

Attachment

Re: [BUG?] check_exclusion_or_unique_constraint false negative

From

Mihail Nikalayeu

Date:

10 June, 09:00:00

Hello!

Rebased\reworked to align with the changes of [0].

Best regards.

[0]:
https://www.postgresql.org/message-id/flat/CAH2-WznZBhWqDBDVGh1VhVBLgLqaYHEkPhmVV7mJCr1Y3ZQhQQ%40mail.gmail.com#d6f1debb1405d1b4a983cbb46b24f41b