Re: Initial Schema Sync for Logical Replication - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Initial Schema Sync for Logical Replication
Date
Msg-id CAD21AoAPCFQW87RZEvH6iL8JqrAqj48Vcdhz8mjSfbWfn2GevA@mail.gmail.com
Whole thread Raw
In response to RE: Initial Schema Sync for Logical Replication  ("Kumar, Sachin" <ssetiya@amazon.com>)
Responses RE: Initial Schema Sync for Logical Replication
List pgsql-hackers
On Mon, Jul 10, 2023 at 8:06 PM Kumar, Sachin <ssetiya@amazon.com> wrote:
>
>
>
> > From: Amit Kapila <amit.kapila16@gmail.com>
> > On Wed, Jul 5, 2023 at 7:45 AM Masahiko Sawada
> > <sawada.mshk@gmail.com> wrote:
> > >
> > > On Mon, Jun 19, 2023 at 5:29 PM Peter Smith <smithpb2250@gmail.com>
> > wrote:
> > > >
> > > > Hi,
> > > >
> > > > Below are my review comments for the PoC patch 0001.
> > > >
> > > > In addition,  the patch needed rebasing, and, after I rebased it
> > > > locally in my private environment there were still test failures:
> > > > a) The 'make check' tests fail but only in a minor way due to
> > > > changes colname
> > > > b) the subscription TAP test did not work at all for me -- many errors.
> > >
> > > Thank you for reviewing the patch.
> > >
> > > While updating the patch, I realized that the current approach won't
> > > work well or at least has the problem with partition tables. If a
> > > publication has a partitioned table with publish_via_root = false, the
> > > subscriber launches tablesync workers for its partitions so that each
> > > tablesync worker copies data of each partition. Similarly, if it has a
> > > partition table with publish_via_root = true, the subscriber launches
> > > a tablesync worker for the parent table. With the current design,
> > > since the tablesync worker is responsible for both schema and data
> > > synchronization for the target table, it won't be possible to
> > > synchronize both the parent table's schema and partitions' schema.
> > >
> >
> > I think one possibility to make this design work is that when publish_via_root
> > is false, then we assume that subscriber already has parent table and then
> > the individual tablesync workers can sync the schema of partitions and their
> > data.
>
> Since publish_via_partition_root is false by default users have to create parent table by themselves
> which I think is not a good user experience.

I have the same concern. I think that users normally use
publish_via_partiiton_root = false if the partitioned table on the
subscriber consists of the same set of partitions as the publisher's
ones. And such users would expect the both partitioned table and its
partitions to be synchronized.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Add index scan progress to pg_stat_progress_vacuum
Next
From: Andrey Lepikhov
Date:
Subject: Re: Generating code for query jumbling through gen_node_support.pl