RE: Conflict detection and logging in logical replication - Mailing list pgsql-hackers
From | Zhijie Hou (Fujitsu) |
---|---|
Subject | RE: Conflict detection and logging in logical replication |
Date | |
Msg-id | OS0PR01MB5716837A0DFA3B4FA903DA9C94BD2@OS0PR01MB5716.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Conflict detection and logging in logical replication (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Conflict detection and logging in logical replication
|
List | pgsql-hackers |
On Friday, July 26, 2024 2:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jul 26, 2024 at 9:39 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Thu, Jul 11, 2024 at 7:47 AM Zhijie Hou (Fujitsu) > > <houzj.fnst@fujitsu.com> wrote: > > > > > > On Wednesday, July 10, 2024 5:39 PM shveta malik > <shveta.malik@gmail.com> wrote: > > > > > > > > > > 2) > > > > Another case which might confuse user: > > > > > > > > CREATE TABLE t1 (pk integer primary key, val1 integer, val2 > > > > integer); > > > > > > > > On PUB: insert into t1 values(1,10,10); insert into t1 > > > > values(2,20,20); > > > > > > > > On SUB: update t1 set pk=3 where pk=2; > > > > > > > > Data on PUB: {1,10,10}, {2,20,20} > > > > Data on SUB: {1,10,10}, {3,20,20} > > > > > > > > Now on PUB: update t1 set val1=200 where val1=20; > > > > > > > > On Sub, I get this: > > > > 2024-07-10 14:44:00.160 IST [648287] LOG: conflict update_missing > > > > detected on relation "public.t1" > > > > 2024-07-10 14:44:00.160 IST [648287] DETAIL: Did not find the row > > > > to be updated. > > > > 2024-07-10 14:44:00.160 IST [648287] CONTEXT: processing remote > > > > data for replication origin "pg_16389" during message type > > > > "UPDATE" for replication target relation "public.t1" in > > > > transaction 760, finished at 0/156D658 > > > > > > > > To user, it could be quite confusing, as val1=20 exists on sub but > > > > still he gets update_missing conflict and the 'DETAIL' is not > > > > sufficient to give the clarity. I think on HEAD as well (have not > > > > tested), we will get same behavior i.e. update will be ignored as > > > > we make search based on RI (pk in this case). So we are not > > > > worsening the situation, but now since we are detecting conflict, is it > possible to give better details in 'DETAIL' section indicating what is actually > missing? > > > > > > I think It's doable to report the row value that cannot be found in > > > the local relation, but the concern is the potential risk of > > > exposing some sensitive data in the log. This may be OK, as we are > > > already reporting the key value for constraints violation, so if > > > others also agree, we can add the row value in the DETAIL as well. > > > > This is still awaiting some feedback. I feel it will be good to add > > some pk value at-least in DETAIL section, like we add for other > > conflict types. > > > > I agree that displaying pk where applicable should be okay as we display it at > other places but the same won't be possible when we do sequence scan to > fetch the required tuple. So, the message will be different in that case, right? After some research, I think we can report the key values in DETAIL if the apply worker uses any unique indexes to find the tuple to update/delete. Otherwise, we can try to output all column values in DETAIL if the current user of apply worker has SELECT access to these columns. This is consistent with what we do when reporting table constraint violation (e.g. when violating a check constraint, it could output all the column value if the current has access to all the column): - First, use super user to create a table. CREATE TABLE t1 (c1 int, c2 int, c3 int check (c3 < 5)); - 1) using super user to insert a row that violates the constraint. We should see all the column value. INSERT INTO t1(c3) VALUES (6); ERROR: new row for relation "t1" violates check constraint "t1_c3_check" DETAIL: Failing row contains (null, null, 6). - 2) use a user without access to all the columns. We can only see the inserted column and CREATE USER regress_priv_user2; GRANT INSERT (c1, c2, c3) ON t1 TO regress_priv_user2; SET SESSION AUTHORIZATION regress_priv_user2; INSERT INTO t1 (c3) VALUES (6); ERROR: new row for relation "t1" violates check constraint "t1_c3_check" DETAIL: Failing row contains (c3) = (6). To achieve this, I think we can expose the ExecBuildSlotValueDescription function and use it in conflict reporting. What do you think ? Best Regards, Hou zj
pgsql-hackers by date: