RE: [BUG?] check_exclusion_or_unique_constraint false negative - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: [BUG?] check_exclusion_or_unique_constraint false negative
Date
Msg-id OS0PR01MB5716E30952F542E256DD72E294802@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: [BUG?] check_exclusion_or_unique_constraint false negative  (Michail Nikolaev <michail.nikolaev@gmail.com>)
Responses Re: [BUG?] check_exclusion_or_unique_constraint false negative
List pgsql-hackers
On Monday, August 12, 2024 7:11 PM Michail Nikolaev <michail.nikolaev@gmail.com>  wrote:
> > In my test, if the tuple is updated and new tuple is in the same page,
> > heapam_index_fetch_tuple should find the new tuple using HOT chain. So, it's a
> > bit unclear to me how the updated tuple is missing. Maybe I missed some other
> > conditions for this issue.
> 
> Yeah, I think the pgbench-based reproducer may also cause page splits in btree.
> But we may add an index to the table to disable HOT.
> 
> I have attached a reproducer for this case using a spec and injection points.
> 
> I hope it helps, check the attached files.

Thanks a lot for the steps!

I successfully reproduced the issue you mentioned in the context of logical
replication[1]. As you said, it could increase the possibility of tuple missing
when applying updates or deletes in the logical apply worker. I think this is a
long-standing issue and I will investigate the fix you proposed.

In addition, I think the bug is not a blocker for the conflict detection
feature. As the feature simply reports the current behavior of the logical
apply worker (either unique violation or tuple missing) without introducing any
new functionality. Furthermore, I think that the new ExecCheckIndexConstraints
call after ExecInsertIndexTuples() is not affected by the dirty snapshot bug.
This is because a tuple has already been inserted into the btree before the
dirty snapshot scan, which means that a concurrent non-HOT update would not be
possible (it would be blocked after finding the just inserted tuple and wait
for the apply worker to commit the current transaction).

It would be good if others could also share their opinion on this.


[1] The steps to reproduce the tuple missing in logical replication.

1. setup pub/sub env, and publish a table with 1 row.

pub:
CREATE TABLE t(a int primary key, b int);
INSERT INTO t VALUES(1,1);
CREATE PUBLICATION pub FOR TABLE t;

sub:
CREATE TABLE t (a int primary key, b int check (b < 5));
CREATE INDEX t_b_idx ON t(b);
CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres port=$port_publisher' PUBLICATION pub;

2. Execute an UPDATE(UPDATE t set b = b + 1) on the publisher and use gdb to
stop the apply worker at the point after index_getnext_tid() and before
index_fetch_heap().

3. execute a concurrent update(UPDATE t set b = b + 100) on the subscriber to
update a non-key column value and commit the update.

4. release the apply worker and it would report the update_missing conflict.

Best Regards,
Hou zj

pgsql-hackers by date:

Previous
From: Michael Banck
Date:
Subject: Re: Enable data checksums by default
Next
From: Thomas Munro
Date:
Subject: Re: Remaining dependency on setlocale()