RE: Initial Schema Sync for Logical Replication - Mailing list pgsql-hackers

From Kumar, Sachin
Subject RE: Initial Schema Sync for Logical Replication
Date
Msg-id e378fb636a694c81b354d3c405f0179d@amazon.com
Whole thread Raw
In response to Re: Initial Schema Sync for Logical Replication  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: Initial Schema Sync for Logical Replication  ("Euler Taveira" <euler@eulerto.com>)
List pgsql-hackers
Hi Alvaro,

> From: Alvaro Herrera <alvherre@alvh.no-ip.org>
> Subject: RE: [EXTERNAL]Initial Schema Sync for Logical Replication
> On 2023-Mar-15, Kumar, Sachin wrote:
> 
> > 1. In  CreateSubscription()  when we create replication
> > slot(walrcv_create_slot()), should use CRS_EXPORT_SNAPSHOT, So that we
> can use this snapshot later in the pg_dump.
> >
> > 2.  Now we can call pg_dump with above snapshot from CreateSubscription.
> 
> Overall I'm not on board with the idea that logical replication would depend on
> pg_dump; that seems like it could run into all sorts of trouble (what if calling
> external binaries requires additional security setup?  what about pg_hba
> connection requirements? what about max_connections in tight
> circumstances?).
> what if calling external binaries requires additional security setup
I am not sure what kind of security restriction would apply in this case, maybe pg_dump
binary can be changed ? 
> what about pg_hba connection requirements?
We will use the same connection string which subscriber process uses to connect to
the publisher.
>what about max_connections in tight circumstances?
Right that might be a issue, but I don’t think it will be a big issue, We will create dump
of database in CreateSubscription() function itself , So before tableSync process even starts
if we have reached max_connections while calling pg_dump itself , tableSync wont be successful.
> It would be much better, I think, to handle this internally in the publisher instead:
> similar to how DDL sync would work, except it'd somehow generate the CREATE
> statements from the existing tables instead of waiting for DDL events to occur.  I
> grant that this does require writing a bunch of new code for each object type, a
> lot of which would duplicate the pg_dump logic, but it would probably be a lot
> more robust.
Agree , But we might have a lots of code duplication essentially almost all of pg_dump
Code needs to be duplicated, which might cause issue when modifying/adding new
DDLs. 
I am not sure but if it's possible to move dependent code of pg_dump to common/ folder
, to avoid duplication.

Regards
Sachin

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [BUG] pg_stat_statements and extended query protocol
Next
From: Michael Paquier
Date:
Subject: Re: Fix typo plgsql to plpgsql.