RE: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From wangw.fnst@fujitsu.com
Subject RE: Data is copied twice when specifying both child and parent table in publication
Date
Msg-id OS3PR01MB627554B7BC63125FB1641F1E9E009@OS3PR01MB6275.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Data is copied twice when specifying both child and parent table in publication  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
On Fri, Oct 21, 2022 at 17:02 PM Peter Smith <smithpb2250@gmail.com> wrote:
> Here are my review comments for HEAD patches v13*

Thanks for your comments.
 
> Patch HEAD_v13-0002
> 
> 1. Commit message
> 
> The following usage scenarios are not described in detail in the manual:
> If one subscription subscribes multiple publications, and these publications
> publish a partitioned table and its partitions respectively. When we specify
> this parameter on one or more of these publications, which identity and schema
> should be used to publish the changes?
> 
> In these cases, I think the parameter publish_via_partition_root behave as
> follows:
> 
> ~
> 
> It seemed worded a bit strangely. Also, you said "on one or more of
> these publications" but the examples only show only one publication
> having 'publish_via_partition_root'.
> 
> SUGGESTION (I've modified the wording slightly but the examples are
> unchanged).
> 
> Assume a subscription is subscribing to multiple publications, and
> these publications publish a partitioned table and its partitions
> respectively:
> 
> [publisher-side]
> create table parent (a int primary key) partition by range (a);
> create table child partition of parent default;
> 
> create publication pub1 for table parent;
> create publication pub2 for table child;
> 
> [subscriber-side]
> create subscription sub connection 'xxxx' publication pub1, pub2;
> 
> The manual does not clearly describe the behaviour when the user had
> specified the parameter 'publish_via_partition_root' on just one of
> the publications. This patch modifies documentation to clarify the
> following rules:
> 
> - If the parameter publish_via_partition_root is specified only in pub1,
> changes will be published using the identity and schema of the table 'parent'.
> 
> - If the parameter publish_via_partition_root is specified only in pub2,
> changes will be published using the identity and schema of the table 'child'.

Improved as suggested.

> ~~~
> 
> 2.
> 
> - If the parameter publish_via_partition_root is specified only in pub2,
> changes will be published using the identity and schema of the table child.
> 
> ~
> 
> Is that right though? This rule seems 100% contrary to the meaning of
> 'publish_via_partition_root=true'.

Yes, I think this behaviour fits the meaning of publish_via_partition_root.
Please refer to this description in the document:
```
This parameter determines whether changes in a partitioned table (or on its
partitions) contained in the publication will be published ...
```

So I think for 'publish_via_partition_root' to work, the partitioned table must
be specified in this publication.

Since only one member (partition table 'child') of this partition tree
('parent', 'child') is specified in 'pub2', even if 'pub2' specifies the
parameter 'publish_via_partition_root', 'pub2' will publish changes using the
identity and schema of the table 'child'.

> ------
> 
> 3. doc/src/sgml/ref/create_publication.sgml
> 
> +         <para>
> +          If a root partitioned table is published by any subscribed
> publications which
> +          set publish_via_partition_root = true, changes on this root
> partitioned table
> +          (or on its partitions) will be published using the identity
> and schema of this
> +          root partitioned table rather than that of the individual partitions.
> +         </para>
> 
> This seems to only describe the first example from the commit message.
> What about some description to explain the second example?

I think the second example is already described in the pg-doc (please see the
reply to #2). I am not quite sure if additional modifications are required. Do
you have any suggestions?

The new patch set was attached in [1].

[1] -
https://www.postgresql.org/message-id/OS3PR01MB6275FB5397C6A647F262A3A69E009%40OS3PR01MB6275.jpnprd01.prod.outlook.com

Regards,
Wang wei

pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
Next
From: "wangw.fnst@fujitsu.com"
Date:
Subject: RE: Data is copied twice when specifying both child and parent table in publication