Re: Initial Schema Sync for Logical Replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Initial Schema Sync for Logical Replication
Date
Msg-id CAA4eK1J43qN0ixCv_mcc6mAe2PJ8_s=_UMMs-8e1FAGbB6XkBA@mail.gmail.com
Whole thread Raw
In response to Re: Initial Schema Sync for Logical Replication  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses RE: Initial Schema Sync for Logical Replication
RE: Initial Schema Sync for Logical Replication
Re: Initial Schema Sync for Logical Replication
Re: Initial Schema Sync for Logical Replication
List pgsql-hackers
On Wed, Mar 22, 2023 at 8:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Mar 21, 2023 at 8:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Mar 21, 2023 at 7:32 AM Euler Taveira <euler@eulerto.com> wrote:
> >
> > > You should
> > > exclude them removing these objects from the TOC before running pg_restore or
> > > adding a few pg_dump options to exclude these objects. Another issue is related
> > > to different version. Let's say the publisher has a version ahead of the
> > > subscriber version, a new table syntax can easily break your logical
> > > replication setup. IMO pg_dump doesn't seem like a good solution for initial
> > > synchronization.
> > >
> > > Instead, the backend should provide infrastructure to obtain the required DDL
> > > commands for the specific (set of) tables. This can work around the issues from
> > > the previous paragraph:
> > >
> > ...
> > > * don't need to worry about different versions.
> > >
> >
> > AFAICU some of the reasons why pg_dump is not allowed to dump from the
> > newer version are as follows: (a) there could be more columns in the
> > newer version of the system catalog and then Select * type of stuff
> > won't work because the client won't have knowledge of additional
> > columns. (b) the newer version could have new features (represented by
> > say new columns in existing catalogs or new catalogs) that the older
> > version of pg_dump has no knowledge of and will fail to get that data
> > and hence an inconsistent dump. The subscriber will easily be not in
> > sync due to that.
> >
> > Now, how do we avoid these problems even if we have our own version of
> > functionality similar to pg_dump for selected objects? I guess we will
> > face similar problems.
>
> Right. I think that such functionality needs to return DDL commands
> that can be executed on the requested version.
>
> > If so, we may need to deny schema sync in any such case.
>
> Yes. Do we have any concrete use case where the subscriber is an older
> version, in the first place?
>

As per my understanding, it is mostly due to the reason that it can
work today. Today, during an off-list discussion with Jonathan on this
point, he pointed me to a similar incompatibility in MySQL
replication. See the "SQL incompatibilities" section in doc[1]. Also,
please note that this applies not only to initial sync but also to
schema sync during replication. I don't think it would be feasible to
keep such cross-version compatibility for DDL replication.

Having said above, I don't intend that we must use pg_dump from the
subscriber for the purpose of initial sync. I think the idea at this
stage is to primarily write a POC patch to see what difficulties we
may face. The other options that we could try out are (a) try to
duplicate parts of pg_dump code in some way (by extracting required
code) for the subscription's initial sync, or (b) have a common code
(probably as a library or some other way) for the required
functionality. There could be more possibilities that we may not have
thought of yet. But the main point is that for approaches other than
using pg_dump, we should consider ways to avoid duplicity of various
parts of its code. Due to this, I think before ruling out using
pg_dump, we should be clear about its risks and limitations.

Thoughts?

[1] - https://dev.mysql.com/doc/refman/8.0/en/replication-compatibility.html
[2] - https://www.postgresql.org/message-id/CAAD30U%2BpVmfKwUKy8cbZOnUXyguJ-uBNejwD75Kyo%3DOjdQGJ9g%40mail.gmail.com

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: CREATE DATABASE ... STRATEGY WAL_LOG issues
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Error "initial slot snapshot too large" in create replication slot