Re: Logical replication fails when adding multiple replicas - Mailing list pgsql-general

From Kyotaro Horiguchi
Subject Re: Logical replication fails when adding multiple replicas
Date
Msg-id 20230323.171742.1357157542021128059.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Logical replication fails when adding multiple replicas  (Will Roper <will.roper@democracyclub.org.uk>)
Responses Re: Logical replication fails when adding multiple replicas  (Will Roper <will.roper@democracyclub.org.uk>)
List pgsql-general
At Wed, 22 Mar 2023 09:25:37 +0000, Will Roper <will.roper@democracyclub.org.uk> wrote in 
> Thanks for the response Hou,
> 
> I've had a look and when the tablesync workers are spinning up there are
> some errors of the form:
> 
> "2023-03-17 18:37:06.900 UTC [4071] LOG:  logical replication table
> synchronization worker for subscription
> ""polling_stations_0561a02f66363d911"", table ""uk_geo_utils_onspd"" has
> started"
> "2023-03-17 18:37:06.976 UTC [4071] ERROR:  could not create replication
> slot ""pg_37986_sync_37922_7210774007126708177"": ERROR:  replication slot
> ""pg_37986_sync_37922_7210774007126708177"" already exists"

The slot name format is "pg_<suboid>_sync_<relid>_<systemid>". It's no
surprise this happens if the subscribers come from the same
backup.

If that's true, the simplest workaround would be to recreate the
subscription multiple times, using a different number of repetitions
for each subscriber so that the subscribers have subscriptions with
different OIDs.



I believe it's not prohitibed for subscribers to have the same system
identifer, but the slot name generation logic for tablesync doesn't
account for cases like this.  We might need some server-wide value
that's unique among subscribers and stable while table sync is
running.  I can't think of a better place than pg_subscription but I
don't like it because it's not really necessary most of the the
subscription's life.

Do you think using the postmaster's startup time would work for this
purpose?  I'm assuming that the slot name doesn't need to persist
across server restarts, but I'm not sure that's really true.


diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 07eea504ba..a5b4f7cf7c 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -1214,7 +1214,7 @@ ReplicationSlotNameForTablesync(Oid suboid, Oid relid,
                                 char *syncslotname, Size szslot)
 {
     snprintf(syncslotname, szslot, "pg_%u_sync_%u_" UINT64_FORMAT, suboid,
-             relid, GetSystemIdentifier());
+             relid, PgStartTime);
 }
 
 /*


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?
Next
From: Dominique Devienne
Date:
Subject: Convert pg_constraint.conkey array to same-order array of column names