Re: Initial Schema Sync for Logical Replication - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Initial Schema Sync for Logical Replication
Date
Msg-id CAD21AoBeHERZFdK1mrU5G5doKA3iY8COaLD0ViZJzBYoePSRzg@mail.gmail.com
Whole thread Raw
In response to Re: Initial Schema Sync for Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Wed, Mar 22, 2023 at 2:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Mar 22, 2023 at 8:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Mar 21, 2023 at 8:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Mar 21, 2023 at 7:32 AM Euler Taveira <euler@eulerto.com> wrote:
> > >
> > > > You should
> > > > exclude them removing these objects from the TOC before running pg_restore or
> > > > adding a few pg_dump options to exclude these objects. Another issue is related
> > > > to different version. Let's say the publisher has a version ahead of the
> > > > subscriber version, a new table syntax can easily break your logical
> > > > replication setup. IMO pg_dump doesn't seem like a good solution for initial
> > > > synchronization.
> > > >
> > > > Instead, the backend should provide infrastructure to obtain the required DDL
> > > > commands for the specific (set of) tables. This can work around the issues from
> > > > the previous paragraph:
> > > >
> > > ...
> > > > * don't need to worry about different versions.
> > > >
> > >
> > > AFAICU some of the reasons why pg_dump is not allowed to dump from the
> > > newer version are as follows: (a) there could be more columns in the
> > > newer version of the system catalog and then Select * type of stuff
> > > won't work because the client won't have knowledge of additional
> > > columns. (b) the newer version could have new features (represented by
> > > say new columns in existing catalogs or new catalogs) that the older
> > > version of pg_dump has no knowledge of and will fail to get that data
> > > and hence an inconsistent dump. The subscriber will easily be not in
> > > sync due to that.
> > >
> > > Now, how do we avoid these problems even if we have our own version of
> > > functionality similar to pg_dump for selected objects? I guess we will
> > > face similar problems.
> >
> > Right. I think that such functionality needs to return DDL commands
> > that can be executed on the requested version.
> >
> > > If so, we may need to deny schema sync in any such case.
> >
> > Yes. Do we have any concrete use case where the subscriber is an older
> > version, in the first place?
> >
>
> As per my understanding, it is mostly due to the reason that it can
> work today. Today, during an off-list discussion with Jonathan on this
> point, he pointed me to a similar incompatibility in MySQL
> replication. See the "SQL incompatibilities" section in doc[1]. Also,
> please note that this applies not only to initial sync but also to
> schema sync during replication. I don't think it would be feasible to
> keep such cross-version compatibility for DDL replication.

Makes sense to me.

> Having said above, I don't intend that we must use pg_dump from the
> subscriber for the purpose of initial sync. I think the idea at this
> stage is to primarily write a POC patch to see what difficulties we
> may face. The other options that we could try out are (a) try to
> duplicate parts of pg_dump code in some way (by extracting required
> code) for the subscription's initial sync, or (b) have a common code
> (probably as a library or some other way) for the required
> functionality. There could be more possibilities that we may not have
> thought of yet. But the main point is that for approaches other than
> using pg_dump, we should consider ways to avoid duplicity of various
> parts of its code. Due to this, I think before ruling out using
> pg_dump, we should be clear about its risks and limitations.
>
> Thoughts?
>

Agreed. My biggest concern about approaches other than using pg_dump
is the same; the code duplication that could increase the maintenance
costs. We should clarify what points of using pg_dump is not a good
idea, and also analyze alternative ideas in depth.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: torikoshia
Date:
Subject: Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Next
From: Robert Haas
Date:
Subject: Re: CREATE DATABASE ... STRATEGY WAL_LOG issues