Re: Initial Schema Sync for Logical Replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Initial Schema Sync for Logical Replication
Date
Msg-id CAA4eK1+YCpPtMnMqOUy1pa7_2f7SRivpfc6PQ=CUf=NG76QzEA@mail.gmail.com
Whole thread Raw
In response to Re: Initial Schema Sync for Logical Replication  ("Euler Taveira" <euler@eulerto.com>)
Responses Re: Initial Schema Sync for Logical Replication  ("Euler Taveira" <euler@eulerto.com>)
List pgsql-hackers
On Thu, Mar 23, 2023 at 2:48 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Tue, Mar 21, 2023, at 8:18 AM, Amit Kapila wrote:
>
> Now, how do we avoid these problems even if we have our own version of
> functionality similar to pg_dump for selected objects? I guess we will
> face similar problems. If so, we may need to deny schema sync in any
> such case.
>
> There are 2 approaches for initial DDL synchronization:
>
> 1) generate the DDL command on the publisher, stream it and apply it as-is on
> the subscriber;
> 2) generate a DDL representation (JSON, for example) on the publisher, stream
> it, transform it into a DDL command on subscriber and apply it.
>
> The option (1) is simpler and faster than option (2) because it does not
> require an additional step (transformation). However, option (2) is more
> flexible than option (1) because it allow you to create a DDL command even if a
> feature was removed from the subscriber and the publisher version is less than
> the subscriber version or a feature was added to the publisher and the
> publisher version is greater than the subscriber version.
>

Is this practically possible? Say the publisher has a higher version
that has introduced a new object type corresponding to which it has
either a new catalog or some new columns in the existing catalog. Now,
I don't think the older version of the subscriber can modify the
command received from the publisher so that the same can be applied to
the subscriber because it won't have any knowledge of the new feature.
In the other case where the subscriber is of a newer version, we
anyway should be able to support it with pg_dump as there doesn't
appear to be any restriction with that, am, I missing something?

> One of the main use cases for logical replication is migration (X -> Y where X
> < Y).
>

I don't think we need to restrict this case even if we decide to use pg_dump.

>
> Per discussion [1], I think if we agree that the Alvaro's DDL deparse patch is
> the way to go with DDL replication, it seems wise that it should be used for
> initial DDL synchronization as well.
>

Even if we decide to use deparse approach, it would still need to
mimic stuff from pg_dump to construct commands based on only catalog
contents. I am not against using this approach but we shouldn't ignore
the duplicity required in this approach.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Initial Schema Sync for Logical Replication
Next
From: Melanie Plageman
Date:
Subject: Re: Memory leak from ExecutorState context?