Re: Initial Schema Sync for Logical Replication - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Initial Schema Sync for Logical Replication |
Date | |
Msg-id | CAD21AoBeHERZFdK1mrU5G5doKA3iY8COaLD0ViZJzBYoePSRzg@mail.gmail.com Whole thread Raw |
In response to | Re: Initial Schema Sync for Logical Replication (Amit Kapila <amit.kapila16@gmail.com>) |
List | pgsql-hackers |
On Wed, Mar 22, 2023 at 2:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Mar 22, 2023 at 8:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Tue, Mar 21, 2023 at 8:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Mar 21, 2023 at 7:32 AM Euler Taveira <euler@eulerto.com> wrote: > > > > > > > You should > > > > exclude them removing these objects from the TOC before running pg_restore or > > > > adding a few pg_dump options to exclude these objects. Another issue is related > > > > to different version. Let's say the publisher has a version ahead of the > > > > subscriber version, a new table syntax can easily break your logical > > > > replication setup. IMO pg_dump doesn't seem like a good solution for initial > > > > synchronization. > > > > > > > > Instead, the backend should provide infrastructure to obtain the required DDL > > > > commands for the specific (set of) tables. This can work around the issues from > > > > the previous paragraph: > > > > > > > ... > > > > * don't need to worry about different versions. > > > > > > > > > > AFAICU some of the reasons why pg_dump is not allowed to dump from the > > > newer version are as follows: (a) there could be more columns in the > > > newer version of the system catalog and then Select * type of stuff > > > won't work because the client won't have knowledge of additional > > > columns. (b) the newer version could have new features (represented by > > > say new columns in existing catalogs or new catalogs) that the older > > > version of pg_dump has no knowledge of and will fail to get that data > > > and hence an inconsistent dump. The subscriber will easily be not in > > > sync due to that. > > > > > > Now, how do we avoid these problems even if we have our own version of > > > functionality similar to pg_dump for selected objects? I guess we will > > > face similar problems. > > > > Right. I think that such functionality needs to return DDL commands > > that can be executed on the requested version. > > > > > If so, we may need to deny schema sync in any such case. > > > > Yes. Do we have any concrete use case where the subscriber is an older > > version, in the first place? > > > > As per my understanding, it is mostly due to the reason that it can > work today. Today, during an off-list discussion with Jonathan on this > point, he pointed me to a similar incompatibility in MySQL > replication. See the "SQL incompatibilities" section in doc[1]. Also, > please note that this applies not only to initial sync but also to > schema sync during replication. I don't think it would be feasible to > keep such cross-version compatibility for DDL replication. Makes sense to me. > Having said above, I don't intend that we must use pg_dump from the > subscriber for the purpose of initial sync. I think the idea at this > stage is to primarily write a POC patch to see what difficulties we > may face. The other options that we could try out are (a) try to > duplicate parts of pg_dump code in some way (by extracting required > code) for the subscription's initial sync, or (b) have a common code > (probably as a library or some other way) for the required > functionality. There could be more possibilities that we may not have > thought of yet. But the main point is that for approaches other than > using pg_dump, we should consider ways to avoid duplicity of various > parts of its code. Due to this, I think before ruling out using > pg_dump, we should be clear about its risks and limitations. > > Thoughts? > Agreed. My biggest concern about approaches other than using pg_dump is the same; the code duplication that could increase the maintenance costs. We should clarify what points of using pg_dump is not a good idea, and also analyze alternative ideas in depth. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: