Re: Initial Schema Sync for Logical Replication - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Initial Schema Sync for Logical Replication |
Date | |
Msg-id | CAA4eK1J43qN0ixCv_mcc6mAe2PJ8_s=_UMMs-8e1FAGbB6XkBA@mail.gmail.com Whole thread Raw |
In response to | Re: Initial Schema Sync for Logical Replication (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
RE: Initial Schema Sync for Logical Replication
RE: Initial Schema Sync for Logical Replication Re: Initial Schema Sync for Logical Replication Re: Initial Schema Sync for Logical Replication |
List | pgsql-hackers |
On Wed, Mar 22, 2023 at 8:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Mar 21, 2023 at 8:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Mar 21, 2023 at 7:32 AM Euler Taveira <euler@eulerto.com> wrote: > > > > > You should > > > exclude them removing these objects from the TOC before running pg_restore or > > > adding a few pg_dump options to exclude these objects. Another issue is related > > > to different version. Let's say the publisher has a version ahead of the > > > subscriber version, a new table syntax can easily break your logical > > > replication setup. IMO pg_dump doesn't seem like a good solution for initial > > > synchronization. > > > > > > Instead, the backend should provide infrastructure to obtain the required DDL > > > commands for the specific (set of) tables. This can work around the issues from > > > the previous paragraph: > > > > > ... > > > * don't need to worry about different versions. > > > > > > > AFAICU some of the reasons why pg_dump is not allowed to dump from the > > newer version are as follows: (a) there could be more columns in the > > newer version of the system catalog and then Select * type of stuff > > won't work because the client won't have knowledge of additional > > columns. (b) the newer version could have new features (represented by > > say new columns in existing catalogs or new catalogs) that the older > > version of pg_dump has no knowledge of and will fail to get that data > > and hence an inconsistent dump. The subscriber will easily be not in > > sync due to that. > > > > Now, how do we avoid these problems even if we have our own version of > > functionality similar to pg_dump for selected objects? I guess we will > > face similar problems. > > Right. I think that such functionality needs to return DDL commands > that can be executed on the requested version. > > > If so, we may need to deny schema sync in any such case. > > Yes. Do we have any concrete use case where the subscriber is an older > version, in the first place? > As per my understanding, it is mostly due to the reason that it can work today. Today, during an off-list discussion with Jonathan on this point, he pointed me to a similar incompatibility in MySQL replication. See the "SQL incompatibilities" section in doc[1]. Also, please note that this applies not only to initial sync but also to schema sync during replication. I don't think it would be feasible to keep such cross-version compatibility for DDL replication. Having said above, I don't intend that we must use pg_dump from the subscriber for the purpose of initial sync. I think the idea at this stage is to primarily write a POC patch to see what difficulties we may face. The other options that we could try out are (a) try to duplicate parts of pg_dump code in some way (by extracting required code) for the subscription's initial sync, or (b) have a common code (probably as a library or some other way) for the required functionality. There could be more possibilities that we may not have thought of yet. But the main point is that for approaches other than using pg_dump, we should consider ways to avoid duplicity of various parts of its code. Due to this, I think before ruling out using pg_dump, we should be clear about its risks and limitations. Thoughts? [1] - https://dev.mysql.com/doc/refman/8.0/en/replication-compatibility.html [2] - https://www.postgresql.org/message-id/CAAD30U%2BpVmfKwUKy8cbZOnUXyguJ-uBNejwD75Kyo%3DOjdQGJ9g%40mail.gmail.com -- With Regards, Amit Kapila.
pgsql-hackers by date: