Thread: [BUG?] check_exclusion_or_unique_constraint false negative

[BUG?] check_exclusion_or_unique_constraint false negative

From
Michail Nikolaev
Date:
Hello, everyone!

While reviewing [1], I noticed that check_exclusion_or_unique_constraint occasionally returns false negatives for btree unique indexes during UPSERT operations.
Although this doesn't cause any real issues with INSERT ON CONFLICT, I wanted to bring it to your attention, as it might indicate an underlying problem.

Attached is a patch to reproduce the issue.

make -C src/test/modules/test_misc/ check PROVE_TESTS='t/006_*'
....
   Failed test 'concurrent INSERTs status (got 2 vs expected 0)'
#   at t/006_concurrently_unique_fail.pl line 26.

#   Failed test 'concurrent INSERTs stderr /(?^:^$)/'
#   at t/006_concurrently_unique_fail.pl line 26.
#                   'pgbench: error: client 34 script 0 aborted in command 0 query 0: ERROR:  we know 31337 in the index!

Best regards,
Mikhail,

Attachment

Re: [BUG?] check_exclusion_or_unique_constraint false negative

From
Michail Nikolaev
Date:
Hello, Andres.

Sorry to bother you, but I feel it's necessary to validate the possible issue regarding someone who can decide whether it is okay or not.
The issue is reproducible with the first UPSERT implementation (your commit 168d5805e4c08bed7b95d351bf097cff7c07dd65 from 2015) and up to now.

The problem appears as follows:
* A unique index contains a specific value (in the test, it is the only value for the entire index).
* check_exclusion_or_unique_constraint returns FALSE for that value in some random cases.
* Technically, this means index_getnext finds 0 records, even though we know the value exists in the index.

I was able to reproduce this only with an UNLOGGED table.
I can't find any scenarios that are actually broken (since the issue is resolved by speculative insertion later), but this looks suspicious to me. It could be a symptom of some tricky race condition in the btree.

Best regards,
Mikhail

Re: [BUG?] check_exclusion_or_unique_constraint false negative

From
Michail Nikolaev
Date:
Hello, everyone!

Updates so far:
* issue happens with both LOGGED and UNLOGGED relations
* issue happens with DirtySnapshot
* not happens with SnapshotSelf
* not happens with SnapshotAny
* not related to speculative inserted tuples - I have commented the code of its insertion - and the issue continues to occur.

Best regards,
Mikhail.

Re: [BUG?] check_exclusion_or_unique_constraint false negative

From
Michail Nikolaev
Date:
It seems like I've identified the cause of the issue.

Currently, any DirtySnapshot (or SnapshotSelf) scan over a B-tree index may skip (not find the TID for) some records in the case of parallel updates.

The following scenario is possible:

* Session 1 reads a B-tree page using SnapshotDirty and copies item X to the buffer.
* Session 2 updates item X, inserting a new TID Y into the same page.
* Session 2 commits its transaction.
* Session 1 starts to fetch from the heap and tries to fetch X, but it was already deleted by session 2. So, it goes to the B-tree for the next TID.
* The B-tree goes to the next page, skipping Y.
* Therefore, the search finds nothing, but tuple Y is still alive.

This situation is somewhat controversial. DirtySnapshot might seem to show more (or more recent, even uncommitted) data than MVCC, but not less. So, DirtySnapshot scan over a B-tree does not provide any guarantees, as far as I understand.
Why does it work for MVCC? Because tuple X will be visible due to the snapshot, making Y unnecessary.
This might be "as designed," but I think it needs to be clearly documented (I couldn't find any documentation on this particular case, only _bt_drop_lock_and_maybe_pin - related).

Here are the potential consequences of the issue:

* check_exclusion_or_unique_constraint

It may not find a record in a UNIQUE index during INSERT ON CONFLICT UPDATE. However, this is just a minor performance issue.

* Exclusion constraints with B-tree, like ADD CONSTRAINT exclusion_data EXCLUDE USING btree (data WITH =)

It should work correctly because the first inserter may "skip" the TID from a concurrent inserter, but the second one should still find the TID from the first.

* RelationFindReplTupleByIndex

Amit, this is why I've included you in this previously solo thread :)
RelationFindReplTupleByIndex uses DirtySnapshot and may not find some records if they are updated by a parallel transaction. This could lead to lost deletes/updates, especially in the case of streaming=parallel mode.
I'm not familiar with how parallel workers apply transactions, so maybe this isn't possible.

Best regards,
Mikhail

RE: [BUG?] check_exclusion_or_unique_constraint false negative

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Michail,

Thanks for pointing out the issue!

>* RelationFindReplTupleByIndex
>
>Amit, this is why I've included you in this previously solo thread :)
>RelationFindReplTupleByIndex uses DirtySnapshot and may not find some records
>if they are updated by a parallel transaction. This could lead to lost
>deletes/updates, especially in the case of streaming=parallel mode. 
>I'm not familiar with how parallel workers apply transactions, so maybe this
>isn't possible.

IIUC, the issue can happen when two concurrent transactions using DirtySnapshot access
the same tuples, which is not specific to the parallel apply. Consider that two
subscriptions exist and publishers modify the same tuple of the same table.
In this case, two workers access the tuple, so one of the changes may be missed
by the scenario you said. I feel we do not need special treatments for parallel
apply.

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Re: [BUG?] check_exclusion_or_unique_constraint false negative

From
Michail Nikolaev
Date:
Hello, Hayato!

> Thanks for pointing out the issue!

Thanks for your attention!

> IIUC, the issue can happen when two concurrent transactions using DirtySnapshot access
> the same tuples, which is not specific to the parallel apply

Not exactly, it happens for any DirtySnapshot scan over a B-tree index with some other transaction updating the same index page (even using the MVCC snapshot).

So, logical replication related scenario looks like this:

* subscriber worker receives a tuple update\delete from the publisher
* it calls RelationFindReplTupleByIndex to find the tuple in the local table
* some other transaction updates the tuple in the local table (on subscriber side) in parallel
* RelationFindReplTupleByIndex may not find the tuple because it uses DirtySnapshot
* update\delete is lost

Parallel apply mode looks like more dangerous because it uses multiple workers on the subscriber side, so the probability of the issue is higher.
In that case, "some other transaction" is just another worker applying changes of different transaction in parallel.

Best regards,
Mikhail.
 

Re: [BUG?] check_exclusion_or_unique_constraint false negative

From
Amit Kapila
Date:
On Thu, Aug 1, 2024 at 2:55 PM Michail Nikolaev
<michail.nikolaev@gmail.com> wrote:
>
> > Thanks for pointing out the issue!
>
> Thanks for your attention!
>
> > IIUC, the issue can happen when two concurrent transactions using DirtySnapshot access
> > the same tuples, which is not specific to the parallel apply
>
> Not exactly, it happens for any DirtySnapshot scan over a B-tree index with some other transaction updating the same
indexpage (even using the MVCC snapshot). 
>
> So, logical replication related scenario looks like this:
>
> * subscriber worker receives a tuple update\delete from the publisher
> * it calls RelationFindReplTupleByIndex to find the tuple in the local table
> * some other transaction updates the tuple in the local table (on subscriber side) in parallel
> * RelationFindReplTupleByIndex may not find the tuple because it uses DirtySnapshot
> * update\delete is lost
>
> Parallel apply mode looks like more dangerous because it uses multiple workers on the subscriber side, so the
probabilityof the issue is higher. 
> In that case, "some other transaction" is just another worker applying changes of different transaction in parallel.
>

I think it is rather less likely or not possible in a parallel apply
case because such conflicting updates (updates on the same tuple)
should be serialized at the publisher itself. So one of the updates
will be after the commit that has the second update.

I haven't tried the test based on your description of the general
problem with DirtySnapshot scan. In case of logical replication, we
will LOG update_missing type of conflict and the user may need to take
some manual action based on that. I have not tried a test so I could
be wrong as well. I am not sure we can do anything specific to logical
replication for this but feel free to suggest if you have ideas to
solve this problem in general or specific to logical replication.

--
With Regards,
Amit Kapila.



Re: [BUG?] check_exclusion_or_unique_constraint false negative

From
Michail Nikolaev
Date:
Hello, Amit!

> I think it is rather less likely or not possible in a parallel apply
> case because such conflicting updates (updates on the same tuple)
> should be serialized at the publisher itself. So one of the updates
> will be after the commit that has the second update.

Glad to hear! But anyway, such logic looks very fragile to me.

> I haven't tried the test based on your description of the general
> problem with DirtySnapshot scan. In case of logical replication, we
> will LOG update_missing type of conflict and the user may need to take
> some manual action based on that.

Current it is just DEBUG1, so it will be probably missed by the user.

> * XXX should this be promoted to ereport(LOG) perhaps?
> */
> elog(DEBUG1,
> "logical replication did not find row to be updated "
> "in replication target relation \"%s\"",
> RelationGetRelationName(localrel));
> }

> I have not tried a test so I could
> be wrong as well. I am not sure we can do anything specific to logical
> replication for this but feel free to suggest if you have ideas to
> solve this problem in general or specific to logical replication.

I've implemented a solution to address the problem more generally, attached the patch (and also the link [1]).

Here's a summary of the changes:

* For each tuple skipped because it was deleted, we now accumulate the maximum xmax.
* Before the scan begins, we store the value of the latest completed transaction.
* If no tuples are found in the index, we check the max(xmax) value. If this value is newer than the latest completed transaction stored before the scan, it indicates that a tuple was deleted by another transaction after the scan started. To ensure all tuples are correctly processed we then rescan the index.


Also added a test case to cover this scenario using the new injection point mechanism and 
updated the b-tree index documentation to include a description of this case.

I'll add this into the next commitfest.

Best regards,
Mikhail.

[1]: https://github.com/postgres/postgres/compare/master...michail-nikolaev:postgres:concurrent_unique
Attachment

Re: [BUG?] check_exclusion_or_unique_constraint false negative

From
Amit Kapila
Date:
On Fri, Aug 2, 2024 at 10:38 PM Michail Nikolaev
<michail.nikolaev@gmail.com> wrote:
>
> > I think it is rather less likely or not possible in a parallel apply
> > case because such conflicting updates (updates on the same tuple)
> > should be serialized at the publisher itself. So one of the updates
> > will be after the commit that has the second update.
>
> Glad to hear! But anyway, such logic looks very fragile to me.
>
> > I haven't tried the test based on your description of the general
> > problem with DirtySnapshot scan. In case of logical replication, we
> > will LOG update_missing type of conflict and the user may need to take
> > some manual action based on that.
>
> Current it is just DEBUG1, so it will be probably missed by the user.
>
> > * XXX should this be promoted to ereport(LOG) perhaps?
> > */
> > elog(DEBUG1,
> > "logical replication did not find row to be updated "
> > "in replication target relation \"%s\"",
> > RelationGetRelationName(localrel));
> > }
>

Right, but we are extending this functionality to detect and resolve
such conflicts [1][2]. I am hoping after that such updates won't be
missed.

[1] - https://commitfest.postgresql.org/49/5064/
[2] - https://commitfest.postgresql.org/49/5021/


--
With Regards,
Amit Kapila.



Re: [BUG?] check_exclusion_or_unique_constraint false negative

From
Michail Nikolaev
Date:
Hello!

> Right, but we are extending this functionality to detect and resolve
> such conflicts [1][2]. I am hoping after that such updates won't be
> missed.

Yes, this is a nice feature. However, without the DirtySnapshot index scan fix, it will fail in numerous instances, especially in master-master replication.

The update_missing feature is helpful in this case, but it is still not the correct event because a real tuple exists, and we should receive update_differ instead. As a result, some conflict resolution systems may malfunction. For example, if the resolution method is set to apply_or_skip, it will insert the new row, causing two rows to exist. This system is quite fragile, and I am sure there are many more complicated scenarios that could arise.

Best regards,
Mikhail.



RE: [BUG?] check_exclusion_or_unique_constraint false negative

From
"Zhijie Hou (Fujitsu)"
Date:
Hi,

Thanks for reporting the issue !

I tried to reproduce this in logical replication but failed. If possible,
could you please share some steps to reproduce it in logicalrep context ?

In my test, if the tuple is updated and new tuple is in the same page,
heapam_index_fetch_tuple should find the new tuple using HOT chain. So, it's a
bit unclear to me how the updated tuple is missing. Maybe I missed some other
conditions for this issue.

It would be better if we can reproduce this by adding some breakpoints using
gdb, which may help us to write a tap test using injection point to reproduce
this reliably. I see the tap test you shared used pgbench to reproduce this,
it works, but It would be great if we can analyze the issue more deeply by
debugging the code.

And I have few questions related the steps you shared:

> * Session 1 reads a B-tree page using SnapshotDirty and copies item X to the buffer.
> * Session 2 updates item X, inserting a new TID Y into the same page.
> * Session 2 commits its transaction.
> * Session 1 starts to fetch from the heap and tries to fetch X, but it was
>   already deleted by session 2. So, it goes to the B-tree for the next TID.
> * The B-tree goes to the next page, skipping Y.
> * Therefore, the search finds nothing, but tuple Y is still alive.

I am wondering at which point should the update happen ? should it happen after
calling index_getnext_tid and before index_fetch_heap ? It would be great if
you could give more details in above steps. Thanks !

Best Regards,
Hou zj

Re: [BUG?] check_exclusion_or_unique_constraint false negative

From
Michail Nikolaev
Date:
Hello, Hou zj!

> In my test, if the tuple is updated and new tuple is in the same page,
> heapam_index_fetch_tuple should find the new tuple using HOT chain. So, it's a
> bit unclear to me how the updated tuple is missing. Maybe I missed some other
> conditions for this issue.

Yeah, I think the pgbench-based reproducer may also cause page splits in btree.
But we may add an index to the table to disable HOT.

I have attached a reproducer for this case using a spec and injection points.

I hope it helps, check the attached files.

Best regards,
Mikhail.
Attachment