Thread: tablesync copy ignores publication actions
The logical replication tablesync ignores the publication 'publish' operations during the initial data copy. This is current/known PG behaviour (e.g. as recently mentioned [1]) but it was not documented anywhere. This patch just documents the existing behaviour and gives some examples. ------ [1] https://www.postgresql.org/message-id/CAA4eK1L_98LF7Db4yFY1PhKKRzoT83xtN41jTS5X%2B8OeGrAkLw%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Tue, Jun 7, 2022, at 1:10 AM, Peter Smith wrote:
The logical replication tablesync ignores the publication 'publish'operations during the initial data copy.This is current/known PG behaviour (e.g. as recently mentioned [1])but it was not documented anywhere.
initial data synchronization != replication. publish parameter is a replication
property; it is not a initial data synchronization property. Maybe we should
make it clear like you are suggesting.
This patch just documents the existing behaviour and gives some examples.
Why did you add this information to that specific paragraph? IMO it belongs to
a separate paragraph; I would add it as the first paragraph in that subsection.
I suggest the following paragraph:
<para>
The initial data synchronization does not take into account the
<literal>publish</literal> parameter to copy the existing data.
</para>
There is no point to link the Initial Snapshot subsection. That subsection is
explaining the initial copy steps and you want to inform about the effect of a
publication parameter on the initial copy. Although both are talking about the
same topic (initial copy), that link to Initial Snapshot subsection won't add
additional information about the publish parameter. You could expand the
suggested sentence to make it clear what publish parameter is or even add a
link to the CREATE PUBLICATION synopsis (that explains about publish
parameter).
You add an empty paragraph. Remove it.
I'm not sure it deserves an example. It is an easy-to-understand concept and a
good description is better than ~ 80 new lines.
On Tue, Jun 7, 2022 at 7:08 PM Euler Taveira <euler@eulerto.com> wrote: > > On Tue, Jun 7, 2022, at 1:10 AM, Peter Smith wrote: > > The logical replication tablesync ignores the publication 'publish' > operations during the initial data copy. > > This is current/known PG behaviour (e.g. as recently mentioned [1]) > but it was not documented anywhere. > > initial data synchronization != replication. publish parameter is a replication > property; it is not a initial data synchronization property. Maybe we should > make it clear like you are suggesting. > +1 to document it. We respect some other properties of publication like the publish_via_partition_root parameter, column lists, and row filters. So it is better to explain about 'publish' parameter which we ignore during the initial sync. > This patch just documents the existing behaviour and gives some examples. > > Why did you add this information to that specific paragraph? IMO it belongs to > a separate paragraph; I would add it as the first paragraph in that subsection. > > I suggest the following paragraph: > > <para> > The initial data synchronization does not take into account the > <literal>publish</literal> parameter to copy the existing data. > </para> > > There is no point to link the Initial Snapshot subsection. That subsection is > explaining the initial copy steps and you want to inform about the effect of a > publication parameter on the initial copy. Although both are talking about the > same topic (initial copy), that link to Initial Snapshot subsection won't add > additional information about the publish parameter. > Here, we are explaining the behavior of row filters during initial sync so adding a link to the Initial Snapshot section makes sense to me. > You could expand the > suggested sentence to make it clear what publish parameter is or even add a > link to the CREATE PUBLICATION synopsis (that explains about publish > parameter). > +1. I suggest that we should add some text about the behavior of initial sync in CREATE PUBLICATION doc (along with the 'publish' parameter) or otherwise, we can explain it where we are talking about publications [1]. > You add an empty paragraph. Remove it. > > I'm not sure it deserves an example. It is an easy-to-understand concept and a > good description is better than ~ 80 new lines. > I don't think it is very clear that "initial data synchronization != replication" as mentioned by you nor does our docs does a good job in explaining it otherwise the confusion wouldn't have arisen in the email link shared by Peter. Personally, I think such things can be better explained by example and in that regards the example shared by Peter does half the job because it doesn't explain the replication part. I don't think "Initial Snapshot" is the right place for these examples considering we want to show the replication based on the publish actions. We can extend it to show one example with row filters as well. How about showing these examples in the Subscription section [2]? [1]: https://www.postgresql.org/docs/devel/logical-replication-publication.html [2]: https://www.postgresql.org/docs/devel/logical-replication-subscription.html -- With Regards, Amit Kapila.
On Wed, Jun 8, 2022 12:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jun 7, 2022 at 7:08 PM Euler Taveira <euler@eulerto.com> wrote: > > > > On Tue, Jun 7, 2022, at 1:10 AM, Peter Smith wrote: > > > > The logical replication tablesync ignores the publication 'publish' > > operations during the initial data copy. > > > > This is current/known PG behaviour (e.g. as recently mentioned [1]) > > but it was not documented anywhere. > > > > initial data synchronization != replication. publish parameter is a replication > > property; it is not a initial data synchronization property. Maybe we should > > make it clear like you are suggesting. > > > > +1 to document it. We respect some other properties of publication > like the publish_via_partition_root parameter, column lists, and row > filters. So it is better to explain about 'publish' parameter which we > ignore during the initial sync. > I also agree to add it to the document. > > You could expand the > > suggested sentence to make it clear what publish parameter is or even add > a > > link to the CREATE PUBLICATION synopsis (that explains about publish > > parameter). > > > > +1. I suggest that we should add some text about the behavior of > initial sync in CREATE PUBLICATION doc (along with the 'publish' > parameter) or otherwise, we can explain it where we are talking about > publications [1]. > I noticed that CREATE SUBSCRIPTION doc mentions that row filter will affect initial sync along with "copy_data" parameter.[1] Maybe we can add some text for "publish" parameter there. [1] https://www.postgresql.org/docs/devel/sql-createsubscription.html Regards, Shi yu
PSA v2 of the patch, based on all feedback received. ~~~ Main differences from v1: * Rewording and more explanatory text. * The examples were moved to the "Subscription" [1] page and also extended to show some normal replication and row filter examples, from [Amit]. * Added some text to CREATE PUBLICATION 'publish' param [2], from [Euler][Amit]. * Added some text to CREATE SUBSCRIPTION Notes [3], from [Shi-san]. * Added some text to the "Publication page" [4] to say the 'publish' is only for DML operations. * I changed the note in "Row Filter - Initial Data Synchronization" [5] to be a warning because I felt users could be surprised to see data exposed by the initial copy, which a DML operation would not expose. ------ [1] https://www.postgresql.org/docs/devel/logical-replication-subscription.html [2] https://www.postgresql.org/docs/devel/sql-createpublication.html [3] https://www.postgresql.org/docs/devel/sql-createsubscription.html [4] https://www.postgresql.org/docs/devel/logical-replication-publication.html [5] https://www.postgresql.org/docs/devel/logical-replication-row-filter.html#LOGICAL-REPLICATION-ROW-FILTER-INITIAL-DATA-SYNC [Euler] https://www.postgresql.org/message-id/bd49c14d-7a01-4ae3-b424-8c49630fec57%40www.fastmail.com [Amit] https://www.postgresql.org/message-id/CAA4eK1Lb5QpWCQU8qkELnX6t8z7JeVtGantmKptxkkpxnYnpHA%40mail.gmail.com [Shi-san] https://www.postgresql.org/message-id/OSZPR01MB631026B8428422EAC1BFB8A4FDAA9%40OSZPR01MB6310.jpnprd01.prod.outlook.com Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Tue, Jun 14, 2022 3:36 PM Peter Smith <smithpb2250@gmail.com> wrote: > > PSA v2 of the patch, based on all feedback received. > > ~~~ > > Main differences from v1: > > * Rewording and more explanatory text. > > * The examples were moved to the "Subscription" [1] page and also > extended to show some normal replication and row filter examples, from > [Amit]. > > * Added some text to CREATE PUBLICATION 'publish' param [2], from > [Euler][Amit]. > > * Added some text to CREATE SUBSCRIPTION Notes [3], from [Shi-san]. > > * Added some text to the "Publication page" [4] to say the 'publish' > is only for DML operations. > > * I changed the note in "Row Filter - Initial Data Synchronization" > [5] to be a warning because I felt users could be surprised to see > data exposed by the initial copy, which a DML operation would not > expose. > Thanks for updating the patch. Two comments: 1. + it means the copied table <literal>t3</literal> contains all rows even when + they do not patch the row filter of publication <literal>pub3b</literal>. Typo. I think "they do not patch the row filter" should be "they do not match the row filter", right? 2. @@ -500,7 +704,6 @@ </para> </listitem> </itemizedlist></para> - </sect2> <sect2 id="logical-replication-row-filter-examples"> It seems we should remove this change. Regards, Shi yu
On Wed, Jun 15, 2022 at 5:05 PM shiy.fnst@fujitsu.com <shiy.fnst@fujitsu.com> wrote: > ... > Thanks for updating the patch. Two comments: > > 1. > + it means the copied table <literal>t3</literal> contains all rows even when > + they do not patch the row filter of publication <literal>pub3b</literal>. > > Typo. I think "they do not patch the row filter" should be "they do not match > the row filter", right? > > 2. > @@ -500,7 +704,6 @@ > </para> > </listitem> > </itemizedlist></para> > - > </sect2> > > <sect2 id="logical-replication-row-filter-examples"> > > It seems we should remove this change. > Thank you for your review comments. Those reported mistakes are fixed in the attached patch v3. ------ Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Thu, Jun 16, 2022 at 6:07 AM Peter Smith <smithpb2250@gmail.com> wrote: > > Thank you for your review comments. Those reported mistakes are fixed > in the attached patch v3. > This patch looks mostly good to me except for a few minor comments which are mentioned below. It is not very clear in which branch(es) we should commit this patch? As per my understanding, this is a pre-existing behavior but we want to document it because (a) It was not already documented, and (b) we followed it for row filters in PG-15 it seems that should be explained. So, we have the following options (a) commit it only for PG-15, (b) commit for PG-15 and backpatch the relevant sections, or (c) commit it when branch opens for PG-16. What do you or others think? Few comments: ============== 1. > - particular event types. By default, all operation types are replicated. - (Row filters have no effect for <command>TRUNCATE</command>. See - <xref linkend="logical-replication-row-filter"/>). + particular event types. By default, all operation types are replicated. + These are DML operation limitations only; they do not affect the initial + data synchronization copy. > Using limitations in the above sentence can be misleading. Can we change it to something like: "These publication specifications apply only for DML operations; they do ... ". 2. + operations. The publication <literal>pub3b</literal> has a row filter. In the Examples section, you have used row filter whereas that section is later in the docs. So, it is better if you give reference to that section in the above sentence (see Section ...). 3. + <para> + This parameter only affects DML operations. In particular, the + subscription initial data synchronization does not take this parameter + into account when copying existing table data. + </para> In the second sentence: "... subscription initial data synchronization ..." doesn't sound appropriate. Can we change it to something like: "In particular, the initial data synchronization (see Section ..) in logical replication does not take this parameter into account when copying existing table data."? -- With Regards, Amit Kapila.
On Wed, Jun 22, 2022 at 2:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jun 16, 2022 at 6:07 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > Thank you for your review comments. Those reported mistakes are fixed > > in the attached patch v3. > > > > This patch looks mostly good to me except for a few minor comments > which are mentioned below. It is not very clear in which branch(es) we > should commit this patch? As per my understanding, this is a > pre-existing behavior but we want to document it because (a) It was > not already documented, and (b) we followed it for row filters in > PG-15 it seems that should be explained. So, we have the following > options (a) commit it only for PG-15, (b) commit for PG-15 and > backpatch the relevant sections, or (c) commit it when branch opens > for PG-16. What do you or others think? Even though this is a very old docs omission, AFAIK nobody ever raised it as a problem before. It only became more important because of the PG15 row-filters. So I think option (a) is ok. > > Few comments: > ============== > 1. > > > - particular event types. By default, all operation types are replicated. > - (Row filters have no effect for <command>TRUNCATE</command>. See > - <xref linkend="logical-replication-row-filter"/>). > + particular event types. By default, all operation types are replicated. > + These are DML operation limitations only; they do not affect the initial > + data synchronization copy. > > > > Using limitations in the above sentence can be misleading. Can we > change it to something like: "These publication specifications apply > only for DML operations; they do ... ". > OK - modified as suggested. > 2. > + operations. The publication <literal>pub3b</literal> has a row filter. > > In the Examples section, you have used row filter whereas that section > is later in the docs. So, it is better if you give reference to that > section in the above sentence (see Section ...). > OK - added xref as suggested. > 3. > + <para> > + This parameter only affects DML operations. In particular, the > + subscription initial data synchronization does not take > this parameter > + into account when copying existing table data. > + </para> > > In the second sentence: "... subscription initial data synchronization > ..." doesn't sound appropriate. Can we change it to something like: > "In particular, the initial data synchronization (see Section ..) in > logical replication does not take this parameter into account when > copying existing table data."? > OK - modified and added xref as suggested. ~~ PSA patch v4 to address all the above review comments. ------ Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Wed, Jun 22, 2022 4:49 PM Peter Smith <smithpb2250@gmail.com> wrote: > > On Wed, Jun 22, 2022 at 2:18 PM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > > On Thu, Jun 16, 2022 at 6:07 AM Peter Smith <smithpb2250@gmail.com> > wrote: > > > > > > > > Thank you for your review comments. Those reported mistakes are fixed > > > in the attached patch v3. > > > > > > > This patch looks mostly good to me except for a few minor comments > > which are mentioned below. It is not very clear in which branch(es) we > > should commit this patch? As per my understanding, this is a > > pre-existing behavior but we want to document it because (a) It was > > not already documented, and (b) we followed it for row filters in > > PG-15 it seems that should be explained. So, we have the following > > options (a) commit it only for PG-15, (b) commit for PG-15 and > > backpatch the relevant sections, or (c) commit it when branch opens > > for PG-16. What do you or others think? > > Even though this is a very old docs omission, AFAIK nobody ever raised > it as a problem before. It only became more important because of the > PG15 row-filters. So I think option (a) is ok. > I also think option (a) is ok. > > PSA patch v4 to address all the above review comments. > Thanks for updating the patch. It looks good to me. Besides, I tested the examples in the patch, and there's no problem. Regards, Shi yu
On Thu, Jun 23, 2022 at 8:43 AM shiy.fnst@fujitsu.com <shiy.fnst@fujitsu.com> wrote: > > On Wed, Jun 22, 2022 4:49 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > > > This patch looks mostly good to me except for a few minor comments > > > which are mentioned below. It is not very clear in which branch(es) we > > > should commit this patch? As per my understanding, this is a > > > pre-existing behavior but we want to document it because (a) It was > > > not already documented, and (b) we followed it for row filters in > > > PG-15 it seems that should be explained. So, we have the following > > > options (a) commit it only for PG-15, (b) commit for PG-15 and > > > backpatch the relevant sections, or (c) commit it when branch opens > > > for PG-16. What do you or others think? > > > > Even though this is a very old docs omission, AFAIK nobody ever raised > > it as a problem before. It only became more important because of the > > PG15 row-filters. So I think option (a) is ok. > > > > I also think option (a) is ok. > > > > > PSA patch v4 to address all the above review comments. > > > > Thanks for updating the patch. It looks good to me. > The patch looks good to me as well. I will push this patch in HEAD (as per option (a)) tomorrow unless I see any more suggestions/comments. -- With Regards, Amit Kapila.
On Thu, Jun 23, 2022 at 2:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > The patch looks good to me as well. I will push this patch in HEAD (as > per option (a)) tomorrow unless I see any more suggestions/comments. The example seems to demonstrate the point quite well but one thing that I notice is that it is quite long. I don't really see an obvious way of making it shorter without making it less clear, so perhaps that's fine. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Jun 24, 2022 at 2:09 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Jun 23, 2022 at 2:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > The patch looks good to me as well. I will push this patch in HEAD (as > > per option (a)) tomorrow unless I see any more suggestions/comments. > > The example seems to demonstrate the point quite well but one thing > that I notice is that it is quite long. I don't really see an obvious > way of making it shorter without making it less clear, so perhaps > that's fine. > Thanks for looking into it. Pushed! -- With Regards, Amit Kapila.