RE: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers
From | Zhijie Hou (Fujitsu) |
---|---|
Subject | RE: Conflict detection for update_deleted in logical replication |
Date | |
Msg-id | OS0PR01MB571694B5F7FFB9ECEF5FCB3294132@OS0PR01MB5716.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Conflict detection for update_deleted in logical replication (vignesh C <vignesh21@gmail.com>) |
List | pgsql-hackers |
On Wednesday, January 8, 2025 7:03 PM vignesh C <vignesh21@gmail.com> wrote: Hi, > Consider a LR setup with retain_conflict_info=true for a table t1: > Publisher: > insert into t1 values(1); > -- Have a open transaction before delete operation in subscriber begin; > > Subscriber: > -- delete the record that was replicated delete from t1; > > -- Now commit the transaction in publisher > Publisher: > update t1 set c1 = 2; > commit; > > In normal case update_deleted conflict is detected > 2025-01-08 15:41:38.529 IST [112744] LOG: conflict detected on relation > "public.t1": conflict=update_deleted > 2025-01-08 15:41:38.529 IST [112744] DETAIL: The row to be updated was > deleted locally in transaction 751 at 2025-01-08 15:41:29.811566+05:30. > Remote tuple (2); replica identity full (1). > 2025-01-08 15:41:38.529 IST [112744] CONTEXT: processing remote data for > replication origin "pg_16387" during message type "UPDATE" for replication > target relation "public.t1" in transaction 747, finished at 0/16FBCA0 > > Now execute the same above case by having a presetup to consume all the > replication slots in the system by executing pg_create_logical_replication_slot > before the subscription is created, in this case the conflict is not detected > correctly. > 2025-01-08 15:39:17.931 IST [112551] LOG: conflict detected on relation > "public.t1": conflict=update_missing > 2025-01-08 15:39:17.931 IST [112551] DETAIL: Could not find the row to be > updated. > Remote tuple (2); replica identity full (1). > 2025-01-08 15:39:17.931 IST [112551] CONTEXT: processing remote data for > replication origin "pg_16387" during message type "UPDATE" for replication > target relation "public.t1" in transaction 747, finished at 0/16FBC68 > 2025-01-08 15:39:18.266 IST [112582] ERROR: all replication slots are in use > 2025-01-08 15:39:18.266 IST [112582] HINT: Free one or increase > "max_replication_slots". > > This is because even though we say create subscription is successful, the > launcher has not yet created the replication slot. I think some detection miss in the beginning after enabling the option is acceptable. Because even if we let the launcher to create the slot before starting workers, some dead tuples could have been already removed during this period, so update_missing could still be detected. I have added some documents to clarify that the information can be safely retained only after the slot is created. > > There are few observations from this test: > 1) Create subscription does not wait for the slot to be created by the launcher > and starts applying the changes. Should create a subscription wait till the slot > is created by the launcher process. I think the DDL could not wait for the slot creation, because the launcher would not create the slot until the DDL is committed. Instead, I have changed the code to create the slot before starting workers, so that at least the worker would not unnecessarily maintain the oldest non-removable xid. > 2) Currently launcher is exiting continuously and trying to create replication > slots. Should the launcher wait for wal_retrieve_retry_interval configuration > before trying to create the slot instead of filling the logs continuously. Since the launcher already have a 5s (bgw_restart_time) restart interval, I feel it would not consume the too much resources in this case. > 3) If we try to create a similar subscription with retain_conflict_info and > disable_on_error option and there is an error in replication slot creation, > currently the subscription does not get disabled. Should we consider > disable_on_error for these cases and disable the subscription if we are not able > to create the slots. Currently, since only ERRORs in apply worker would trigger disable_on_error, I am not sure if It's worth the effort to teach the apply to catch launcher's error because it doesn't seem like a common scenario. Best Regards, Hou zj
pgsql-hackers by date: