Re: Initial Schema Sync for Logical Replication - Mailing list pgsql-hackers
From | Euler Taveira |
---|---|
Subject | Re: Initial Schema Sync for Logical Replication |
Date | |
Msg-id | fb7894e4-b44e-4ae3-a74d-7c5650f69f1a@app.fastmail.com Whole thread Raw |
In response to | Re: Initial Schema Sync for Logical Replication (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
RE: Initial Schema Sync for Logical Replication
("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
|
List | pgsql-hackers |
On Thu, Mar 23, 2023, at 8:44 AM, Amit Kapila wrote:
On Thu, Mar 23, 2023 at 2:48 AM Euler Taveira <euler@eulerto.com> wrote:>> On Tue, Mar 21, 2023, at 8:18 AM, Amit Kapila wrote:>> Now, how do we avoid these problems even if we have our own version of> functionality similar to pg_dump for selected objects? I guess we will> face similar problems. If so, we may need to deny schema sync in any> such case.>> There are 2 approaches for initial DDL synchronization:>> 1) generate the DDL command on the publisher, stream it and apply it as-is on> the subscriber;> 2) generate a DDL representation (JSON, for example) on the publisher, stream> it, transform it into a DDL command on subscriber and apply it.>> The option (1) is simpler and faster than option (2) because it does not> require an additional step (transformation). However, option (2) is more> flexible than option (1) because it allow you to create a DDL command even if a> feature was removed from the subscriber and the publisher version is less than> the subscriber version or a feature was added to the publisher and the> publisher version is greater than the subscriber version.>Is this practically possible? Say the publisher has a higher versionthat has introduced a new object type corresponding to which it haseither a new catalog or some new columns in the existing catalog. Now,I don't think the older version of the subscriber can modify thecommand received from the publisher so that the same can be applied tothe subscriber because it won't have any knowledge of the new feature.In the other case where the subscriber is of a newer version, weanyway should be able to support it with pg_dump as there doesn'tappear to be any restriction with that, am, I missing something?
I think so (with some limitations). Since the publisher knows the subscriber
version, publisher knows that the subscriber does not contain the new object
type then publisher can decide if this case is critical (and reject the
replication) or optional (and silently not include the feature X -- because it
is not essential for logical replication). If required, the transformation
should be done on the publisher.
Even if we decide to use deparse approach, it would still need tomimic stuff from pg_dump to construct commands based on only catalogcontents. I am not against using this approach but we shouldn't ignorethe duplicity required in this approach.
It is fine to share code between pg_dump and this new infrastructure. However,
the old code should coexist to support older versions because the new set of
functions don't exist in older server versions. Hence, duplicity should exist
for a long time (if you consider that the current policy is to allow dump from
9.2, we are talking about 10 years or so). There are some threads [1][2] that
discussed this topic: provide a SQL command based on the catalog
representation. You can probably find other discussions searching for "pg_dump
library" or "getddl".
pgsql-hackers by date: