Re: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Data is copied twice when specifying both child and parent table in publication |
Date | |
Msg-id | CAA4eK1KM1ewGS0Y6VjxBebA4J3EWA=krQMob6WJCQHJ=S7X4kA@mail.gmail.com Whole thread Raw |
In response to | Re: Data is copied twice when specifying both child and parent table in publication (Amit Langote <amitlangote09@gmail.com>) |
Responses |
Re: Data is copied twice when specifying both child and parent table in publication
|
List | pgsql-hackers |
On Mon, Oct 18, 2021 at 2:32 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Mon, Oct 18, 2021 at 3:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Oct 18, 2021 at 8:27 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > > > On Sat, Oct 16, 2021 at 5:30 PM houzj.fnst@fujitsu.com > > > <houzj.fnst@fujitsu.com> wrote: > > > > > > > > On Friday, October 15, 2021 7:23 PM houzj.fnst@fujitsu.com wrote: > > > > > Attach a patch to fix it. > > > > Attach a new version patch which refactor the fix code in a cleaner way. > > > > > > > > > > I have not debugged it yet to find out why, but with the patch > > > applied, the original double-publish problem that I reported > > > (converted to just use TABLE rather than ALL TABLES IN SCHEMA) still > > > occurs. > > > > > > > Yeah, I think this is a variant of the problem being fixed by > > Hou-San's patch. I think one possible idea to investigate is that on > > the subscriber-side, after fetching tables, we check the already > > subscribed tables and if the child tables already exist then we ignore > > the parent table and vice versa. We might want to consider the case > > where a user has toggled the "publish_via_partition_root" parameter. > > > > It seems both these behaviours/problems exist since commit 17b9e7f9 > > (Support adding partitioned tables to publication). Adding Amit L and > > Peter E (people involved in this work) to know their opinion? > > I can imagine that the behavior seen here may look surprising, but not > sure if I would call it a bug as such. I do remember thinking about > this case and the current behavior is how I may have coded it to be. > > Looking at this command in Hou-san's email: > > create publication pub for table tbl1, tbl1_part1 with > (publish_via_partition_root=on); > > It's adding both the root partitioned table and the leaf partition > *explicitly*, and it's not clear to me if the latter's inclusion in > the publication should be assumed because the former is found to have > been added to the publication, that is, as far as the latter's > visibility to the subscriber is concerned. It's not a stretch to > imagine that a user may write the command this way to account for a > subscriber node on which tbl1 and tbl1_part1 are unrelated tables. > > I don't think we assume anything on the publisher side regarding the > state/configuration of tables on the subscriber side, at least with > publication commands where tables are added to a publication > explicitly, so it is up to the user to make sure that the tables are > not added duplicatively. One may however argue that the way we've > decided to handle FOR ALL TABLES does assume something about > partitions where it skips advertising them to subscribers when > publish_via_partition_root flag is set to true, but that is exactly to > avoid the duplication of data that goes to a subscriber. > I think the same confusion will then apply to the new feature (For All Tables In Schema) being discussed [1] (that is a bit long thread so shared the email where the latest patch version is posted). There also, the partitioned table and partition can be in a different schema. We either need to follow "For All Tables" or "For Table" behavior. Then, there is also an argument that such behavior is not documented, and by reading "publish_via_partition_root", it is not clear why would the user expect the current behavior? Also, what about Greg's case [2], where I think it is clear that the subscriber also has partitions? [1] - https://www.postgresql.org/message-id/OS0PR01MB5716B523961FE338EB9B3F9A94BC9%40OS0PR01MB5716.jpnprd01.prod.outlook.com [2] - https://www.postgresql.org/message-id/CAJcOf-eQR_%3Dq0f4ZVHd342QdLvBd_995peSr4xCU05hrS3TeTg%40mail.gmail.com -- With Regards, Amit Kapila.
pgsql-hackers by date: