Thread: PG DOCS - logical replication filtering

PG DOCS - logical replication filtering

From

Peter Smith

Date:

02 March 2022, 04:47:46

Hi.

PSA a PG docs patch that is associated with the logical replication
Row Filters feature which was recently pushed [1].

This patch introduces a new "Filtering" page to give a common place
where all kinds of logical replication filtering can be described.
(e.g. It is envisaged that a "Column Filters" section can be added
sometime in the future).

The main new content for this page is the "Row Filters" section. This
gives a full overview of the new row filter feature, plus examples.

------
[1] https://github.com/postgres/postgres/commit/52e4f0cd472d39d07732b99559989ea3b615be78

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

v1-0001-PG-docs-Logical-Replication-Filtering.patch

Re: PG DOCS - logical replication filtering

From

Aleksander Alekseev

Date:

02 March 2022, 09:07:09

Hi Peter,

> PSA a PG docs patch that is associated with the logical replication
> Row Filters feature which was recently pushed [1].

The patch looks mostly OK, but I have several nitpicks.

```
By default, all data from all published tables will be replicated to the
appropriate subscribers.
[...]
By default, all operation types are replicated.
```

The second sentence seems to be redundant.

```
(This feature is available since PostgreSQL 15)
```

Please correct me if I'm wrong, but I don't think we say that in the docs. When the user opens the documentation for version X he or she sees everything that is available in this version.

```
31.3. Filtering
[...]
There are 3 different ways to filter what data gets replicated.
31.3.1. Operation Filters
[...]
31.3.2. Row Filters
[...]
```
It looks like there are 2 different ways after all.

I see that a large part of the documentation is commented and marked as TBA (Column Filters, Combining Different Kinds of Filters). Could you please clarify if it's a work-in-progress patch? If it's not, I believe the commented part should be removed before committing.

--
Best regards,
Aleksander Alekseev

Re: PG DOCS - logical replication filtering

From

Aleksander Alekseev

Date:

02 March 2022, 09:14:02

Hi again,

> The second sentence seems to be redundant.

Actually, I'm wrong on this one.

Best regards,

Aleksander Alekseev

Re: PG DOCS - logical replication filtering

From

Amit Kapila

Date:

02 March 2022, 09:25:55

On Wed, Mar 2, 2022 at 2:37 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
>
> I see that a large part of the documentation is commented and marked as TBA (Column Filters, Combining Different
Kindsof Filters). Could you please clarify if it's a work-in-progress patch? If it's not, I believe the commented part
shouldbe removed before committing. 
>

I think we can remove any Column Filters related information
(placeholders), if that patch gets committed, we can always extend the
existing docs.

--
With Regards,
Amit Kapila.

Re: PG DOCS - logical replication filtering

From

Aleksander Alekseev

Date:

02 March 2022, 09:42:52

Hi hackers,

> I see that a large part of the documentation is commented and marked as TBA (Column Filters, Combining Different Kinds of Filters). Could you please clarify if it's a work-in-progress patch? If it's not, I believe the commented part should be removed before committing.
>

I think we can remove any Column Filters related information
(placeholders), if that patch gets committed, we can always extend the
existing docs.

Here is an updated version of the patch.

Best regards,

Aleksander Alekseev

Attachment

v2-0001-Update-the-documentation-for-logical-replication.patch

Re: PG DOCS - logical replication filtering

From

Peter Eisentraut

Date:

02 March 2022, 14:29:59

On 02.03.22 05:47, Peter Smith wrote:
> This patch introduces a new "Filtering" page to give a common place
> where all kinds of logical replication filtering can be described.
> (e.g. It is envisaged that a "Column Filters" section can be added
> sometime in the future).

The pending feature to select a subset of table columns to replicate is 
not "column filtering".  The thread might still be still called that, 
but we have changed the patch to not use that terminology.

Filtering is a dynamic action based on actual values.  The row filtering 
feature does that.  The column list feature is a static DDL-time 
configuration.  It is no more filtering than specifying a list of tables 
in a publication is table filtering.

So please consider organizing the documentation differently to not 
create this confusion.

Re: PG DOCS - logical replication filtering

From

Peter Smith

Date:

03 March 2022, 00:52:16

On Wed, Mar 2, 2022 at 8:43 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:
...
> Here is an updated version of the patch.

Thanks for your review comments and fixes. The updated v2 patch looks
good to me.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: PG DOCS - logical replication filtering

From

Peter Smith

Date:

03 March 2022, 00:56:27

On Wed, Mar 2, 2022 at 8:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Mar 2, 2022 at 2:37 PM Aleksander Alekseev
> <aleksander@timescale.com> wrote:
> >
> >
> > I see that a large part of the documentation is commented and marked as TBA (Column Filters, Combining Different
Kindsof Filters). Could you please clarify if it's a work-in-progress patch? If it's not, I believe the commented part
shouldbe removed before committing. 
> >
>
> I think we can remove any Column Filters related information
> (placeholders), if that patch gets committed, we can always extend the
> existing docs.
>

+1

------
Kind Regards,
Peter Smith.
Fujitsu Australia.

Re: PG DOCS - logical replication filtering

From

Amit Kapila

Date:

03 March 2022, 03:15:17

On Wed, Mar 2, 2022 at 8:00 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> On 02.03.22 05:47, Peter Smith wrote:
> > This patch introduces a new "Filtering" page to give a common place
> > where all kinds of logical replication filtering can be described.
> > (e.g. It is envisaged that a "Column Filters" section can be added
> > sometime in the future).
>
> The pending feature to select a subset of table columns to replicate is
> not "column filtering".  The thread might still be still called that,
> but we have changed the patch to not use that terminology.
>
> Filtering is a dynamic action based on actual values.  The row filtering
> feature does that.  The column list feature is a static DDL-time
> configuration.  It is no more filtering than specifying a list of tables
> in a publication is table filtering.
>
> So please consider organizing the documentation differently to not
> create this confusion.
>

+1. I think Row Filters can directly be a section just before
Conflicts on the logical replication page [1].

Some comments on the patch:
1. I think we can extend/add the example to have filters on more than
one table. This has been noticed multiple times during development
that people are not very clear on it.
2. I think we can add an example or two for row filters actions (like
Insert, Update).
3.
 Publications can choose to limit the changes they produce to
    any combination of <command>INSERT</command>, <command>UPDATE</command>,
-   <command>DELETE</command>, and <command>TRUNCATE</command>,
similar to how triggers are fired by
-   particular event types.  By default, all operation types are replicated.
+   <command>DELETE</command>, and <command>TRUNCATE</command> by using
+   <quote>operation filters</quote>.

By this, one can imply that row filters are used for Truncate as well
but that is not true. I know that that patch later specifies that "Row
filters have no effect for <command>TRUNCATE</command> commands." but
the above modification is not very clear.

[1] - https://www.postgresql.org/docs/devel/logical-replication.html

-- 
With Regards,
Amit Kapila.

Re: PG DOCS - logical replication filtering

From

Peter Smith

Date:

03 March 2022, 04:10:36

On Thu, Mar 3, 2022 at 2:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Mar 2, 2022 at 8:00 PM Peter Eisentraut
> <peter.eisentraut@enterprisedb.com> wrote:
> >
> > On 02.03.22 05:47, Peter Smith wrote:
> > > This patch introduces a new "Filtering" page to give a common place
> > > where all kinds of logical replication filtering can be described.
> > > (e.g. It is envisaged that a "Column Filters" section can be added
> > > sometime in the future).
> >
> > The pending feature to select a subset of table columns to replicate is
> > not "column filtering".  The thread might still be still called that,
> > but we have changed the patch to not use that terminology.
> >
> > Filtering is a dynamic action based on actual values.  The row filtering
> > feature does that.  The column list feature is a static DDL-time
> > configuration.  It is no more filtering than specifying a list of tables
> > in a publication is table filtering.
> >
> > So please consider organizing the documentation differently to not
> > create this confusion.
> >
>
> +1. I think Row Filters can directly be a section just before
> Conflicts on the logical replication page [1].
>

OK. I will reorganize the page as suggested, and also attend to the
other comments below.

> Some comments on the patch:
> 1. I think we can extend/add the example to have filters on more than
> one table. This has been noticed multiple times during development
> that people are not very clear on it.
> 2. I think we can add an example or two for row filters actions (like
> Insert, Update).
> 3.
>  Publications can choose to limit the changes they produce to
>     any combination of <command>INSERT</command>, <command>UPDATE</command>,
> -   <command>DELETE</command>, and <command>TRUNCATE</command>,
> similar to how triggers are fired by
> -   particular event types.  By default, all operation types are replicated.
> +   <command>DELETE</command>, and <command>TRUNCATE</command> by using
> +   <quote>operation filters</quote>.
>
> By this, one can imply that row filters are used for Truncate as well
> but that is not true. I know that that patch later specifies that "Row
> filters have no effect for <command>TRUNCATE</command> commands." but
> the above modification is not very clear.
>
> [1] - https://www.postgresql.org/docs/devel/logical-replication.html
>

------
Kind Regards,
Peter Smith.
Fujitsu Australia.

Re: PG DOCS - logical replication filtering

From

Peter Smith

Date:

04 March 2022, 03:41:58

PSA patch v3 to address all comments received so far.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

v3-0001-Update-the-documentation-for-logical-replication.patch

Re: PG DOCS - logical replication filtering

From

Peter Smith

Date:

04 March 2022, 03:47:56

On Thu, Mar 3, 2022 at 2:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Mar 2, 2022 at 8:00 PM Peter Eisentraut
> <peter.eisentraut@enterprisedb.com> wrote:
> >
> > On 02.03.22 05:47, Peter Smith wrote:
> > > This patch introduces a new "Filtering" page to give a common place
> > > where all kinds of logical replication filtering can be described.
> > > (e.g. It is envisaged that a "Column Filters" section can be added
> > > sometime in the future).
> >
> > The pending feature to select a subset of table columns to replicate is
> > not "column filtering".  The thread might still be still called that,
> > but we have changed the patch to not use that terminology.
> >
> > Filtering is a dynamic action based on actual values.  The row filtering
> > feature does that.  The column list feature is a static DDL-time
> > configuration.  It is no more filtering than specifying a list of tables
> > in a publication is table filtering.
> >
> > So please consider organizing the documentation differently to not
> > create this confusion.
> >
>
> +1. I think Row Filters can directly be a section just before
> Conflicts on the logical replication page [1].

Done as suggested in v3. [1]

>
> Some comments on the patch:
> 1. I think we can extend/add the example to have filters on more than
> one table. This has been noticed multiple times during development
> that people are not very clear on it.

Added example in v3 [1]

> 2. I think we can add an example or two for row filters actions (like
> Insert, Update).

Added examples of INSERT and UPDATE in v3 [1]

> 3.
>  Publications can choose to limit the changes they produce to
>     any combination of <command>INSERT</command>, <command>UPDATE</command>,
> -   <command>DELETE</command>, and <command>TRUNCATE</command>,
> similar to how triggers are fired by
> -   particular event types.  By default, all operation types are replicated.
> +   <command>DELETE</command>, and <command>TRUNCATE</command> by using
> +   <quote>operation filters</quote>.
>
> By this, one can imply that row filters are used for Truncate as well
> but that is not true. I know that that patch later specifies that "Row
> filters have no effect for <command>TRUNCATE</command> commands." but
> the above modification is not very clear.
>

Fixed in v3 [1]. Restored original text, and added a note about row filters.

------
[1] https://www.postgresql.org/message-id/CAHut%2BPtwp0FscVpmMjHLV6_%3DSHR5ndwvWdX_gb41_3H2UA9ecA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia.

Re: PG DOCS - logical replication filtering

From

"Euler Taveira"

Date:

10 March 2022, 23:06:38

On Fri, Mar 4, 2022, at 12:41 AM, Peter Smith wrote:

PSA patch v3 to address all comments received so far.

Peter,

I started reviewing this documentation for row filter and I noticed that I was

changing too much lines to submit a patch on the top of it. Hence, I'm

attaching a new version based on this one.

Here as some of the changes that I did:

* links: I renamed the ids to use "logical-replication-row-filter" instead of

"logical-replication-rf" because it is used in the anchors. A compound word

is better than an abbreviation.

* titles: I changed all titles because of some stylish issue (words are

generally capitalized) or because it reads better.

* sections: I moved the "Restrictions" section to the top but it seems

important than the other sections. I removed the "psql" section. The psql

commands are shown in the "Examples" section so I don't think we should

provide a section for it.

* sentences: I expanded several sentences to provide details about the specific

topic. I also reordered some sentences and removed some duplicated sentences.

* replica identity: I added a link to replica identity. It is a key concept for

row filter.

* transformations: I added a few sentences explaining when/why a transformation

is required. I removed the "Cases" part (same as in the source code) because

it is redundant with the new sentences.

* partitioned table: the title should be _partitioned_ table. I replaced the

bullets with sentences in the same paragraph.

* initial data sync: I removed the "subscriber" from the section title. When

you say "initial data synchronization" it seems clear we're talking about the

subscriber. I also add a sentence saying why pre-15 does not consider row

filters.

* combining row filters: I renamed the section and decided to remove "for the

same table". When reading the first sentences from this section, it is clear

that row filtering is per table. So if you are combining multiple row

filters, it should be for the same table. I also added a sentence saying why

some clauses make the row filter irrelevant.

* examples: I combined the psql commands that shows row filter information

together (\dRp+ and \d). I changed to connection string to avoid "localhost".

Why? It does not seem a separate service (there is no port) and setup pub/sub

in the same cluster requires additional steps. It is better to illustrate

different clusters (at least it seems so since we don't provide details from

publisher). I changed a value in an UPDATE because both UPDATEs use 999.

We could probably reduce the number of rows in the example but I didn't bother

to remove them.

It seems we can remove some sentences from the CREATE PUBLICATION because we

have a new section that explains all of it. I think the link that was added by

this patch is sufficient.

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

v4-0001-doc-new-section-for-row-filter.patch

Re: PG DOCS - logical replication filtering

From

Peter Smith

Date:

24 March 2022, 06:17:46

On Fri, Mar 11, 2022 at 9:37 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Fri, Mar 4, 2022, at 12:41 AM, Peter Smith wrote:
>
> PSA patch v3 to address all comments received so far.
>
> Peter,
>
> I started reviewing this documentation for row filter and I noticed that I was
> changing too much lines to submit a patch on the top of it. Hence, I'm
> attaching a new version based on this one.

Sorry for my long delay in replying. I have been away.

Thanks very much for the updated patch with all your suggestions.
There were many changes in your v4. I have not merged everything
exactly, but I did take the majority of your suggestions on-board in
the attached v5.

I responded below to each change.

>
> Here as some of the changes that I did:
>
> * links: I renamed the ids to use "logical-replication-row-filter" instead of
>   "logical-replication-rf" because it is used in the anchors. A compound word
>   is better than an abbreviation.

OK, done as suggested.

> * titles: I changed all titles because of some stylish issue (words are
>   generally capitalized) or because it reads better.

OK, most titles changed as suggested.

> * sections: I moved the "Restrictions" section to the top but it seems
>   important than the other sections. I removed the "psql" section. The psql
>   commands are shown in the "Examples" section so I don't think we should
>   provide a section for it.

OK, moved the "Restrictions" section and removed the "psql" section.

> * sentences: I expanded several sentences to provide details about the specific
>   topic. I also reordered some sentences and removed some duplicated sentences.

I did not take everything exactly as in your v4, but wherever your
suggested changes added some more information I tried to include them
using similar wording to yours.

> * replica identity: I added a link to replica identity. It is a key concept for
>   row filter.

OK, done as suggested.

> * transformations: I added a few sentences explaining when/why a transformation
>   is required. I removed the "Cases" part (same as in the source code) because
>   it is redundant with the new sentences.

OK, I incorporated most of your new descriptions for the
transformations, however I still wanted to keep the summary of "cases"
part because IMO it makes the rules so much clearer than just having
the text descriptions.

> * partitioned table: the title should be _partitioned_ table. I replaced the
>   bullets with sentences in the same paragraph.

OK. The title was changed, but I kept the bullets.

> * initial data sync: I removed the "subscriber" from the section title. When
>   you say "initial data synchronization" it seems clear we're talking about the
>   subscriber. I also add a sentence saying why pre-15 does not consider row
>   filters.

OK. Title is changed. Also I added your sentence about the pre-15. But
I kept all the HTML note rendering that you'd removed in v4. I think
this information was important enough to be a "note" instead of just
buried in a paragraph.

> * combining row filters: I renamed the section and decided to remove "for the
>   same table". When reading the first sentences from this section, it is clear
>   that row filtering is per table. So if you are combining multiple row
>   filters, it should be for the same table. I also added a sentence saying why
>   some clauses make the row filter irrelevant.

OK. Title is changed. Your extra sentence was added.

> * examples: I combined the psql commands that shows row filter information
>   together (\dRp+ and \d). I changed to connection string to avoid "localhost".
>   Why? It does not seem a separate service (there is no port) and setup pub/sub
>   in the same cluster requires additional steps. It is better to illustrate
>   different clusters (at least it seems so since we don't provide details from
>   publisher). I changed a value in an UPDATE because both UPDATEs use 999.
>

I did not combine those \dRp+ and \d. I kept them separate so I could
write some separate notes about them.

I'm unsure about the connection change. This documentation is for "Row
Filters" so the examples were only intended to demonstrate row filters
and nothing else. I wanted to have a trivial connection such that a
user can just cut/paste directly from the example and get something
they can immediately test without having to change it. I don't mind
changing this later but probably I'd like to get some other opinions
about it first.

About the UPDATE (555 value) - OK, I changed that value as you suggested.

> We could probably reduce the number of rows in the example but I didn't bother
> to remove them.
>
> It seems we can remove some sentences from the CREATE PUBLICATION because we
> have a new section that explains all of it. I think the link that was added by
> this patch is sufficient.
>

Yeah, maybe some sentences can be removed. But even though some
information is duplicated it might be useful to have a few things
still mentioned on the CREATE PUBLICATION page just so the user does
not have to chase links too much. I will wait to see if other people
have an opinion about it before removing any content from that page.
Meanwhile, I have made the create_publication.sgml identical to your
v4.

~~~

There should be much fewer v4/v5 differences now although there might
be a few things I missed that you want to re-comment on. Hopefully, it
will now be easier to post review comments as BEFORE/AFTER text
fragments - that would help other people to see the suggestions more
easily and give their opinions too.

------
Kind Regards,
Peter Smith.
Fujitsu Australia.

Attachment

v5-0001-This-patch-introduces-a-new-documentation-page-fo.patch

Re: PG DOCS - logical replication filtering

From

Amit Kapila

Date:

06 April 2022, 10:57:49

On Thu, Mar 24, 2022 at 11:48 AM Peter Smith <smithpb2250@gmail.com> wrote:
>

Review comments:
===============
1.
+    The <literal>WHERE</literal> clause expression is evaluated with the same
+    role used for the replication connection. i.e. the role specified in the
+    <literal>CONNECTION</literal> clause of the <xref
linkend="sql-createsubscription"/>.

Can we use () around i.e. sentence? It looks bit odd to me now.
The <literal>WHERE</literal> clause expression is evaluated with the
same role used for the replication connection (i.e., the role
specified in the <literal>CONNECTION</literal> clause of the <xref
linkend="sql-createsubscription"/>).

2.
+   <para>
+    Whenever an <command>UPDATE</command> is processed, the row filter
+    expression is evaluated for both the old and new row (i.e. before
+    and after the data is updated).

Can we write the below part slightly differently?
Before:
(i.e. before and after the data is updated).
After:
(i.e the data before and after the update).

3.
+   <para>
+    Whenever an <command>UPDATE</command> is processed, the row filter
+    expression is evaluated for both the old and new row (i.e. before
+    and after the data is updated).
+   </para>
+
+   <para>
+    If both evaluations are <literal>true</literal>, it replicates the
+    <command>UPDATE</command> change.
+   </para>
+
+   <para>
+    If both evaluations are <literal>false</literal>, it doesn't replicate
+    the change.
+   </para>

I think we can combine these three separate paragraphs.

4.
+   <para>
+    If the publication contains a partitioned table, the publication parameter
+    <literal>publish_via_partition_root</literal> determines which row filter
+    is used.
+    <itemizedlist>
+
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is
<literal>false</literal>
+       (default), each <emphasis>partition's</emphasis> row filter is used.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is
<literal>true</literal>,
+       the <emphasis>root partitioned table's</emphasis> row filter is used.
+      </para>
+     </listitem>
+
+    </itemizedlist>
+   </para>

I think we can combine this into single para as Euler had in his version.

5.
+   <note>
+    <para>
+     Publication <literal>publish</literal> operations are ignored
when copying pre-existing table data.
+    </para>
+   </note>
+
+   <note>
+    <para>
+     If the subscriber is in a release prior to 15, copy pre-existing data
+     doesn't use row filters even if they are defined in the publication.
+     This is because old releases can only copy the entire table data.
+    </para>
+   </note>

I don't see the need for the first <note> here, the second one seems
to convey it.

6.
+   <para>
+    Create some tables to be used in the following examples.
+<programlisting>
+testpub=# CREATE TABLE t1(a int, b int, c text, primary key(a,c));
+CREATE TABLE
+testpub=# CREATE TABLE t2(d int primary key, e int, f int);
+CREATE TABLE
+testpub=# CREATE TABLE t3(g int primary key, h int, i int);
+CREATE TABLE
+</programlisting>
+   </para>
+
+   <para>
+    Create some publications.
+   </para>
+   <para>
+    - notice publication <literal>p1</literal> has 1 table with a row filter.
+   </para>
+   <para>
+    - notice publication <literal>p2</literal> has 2 tables, one without a row
+    filter, and one with a row filter.
+   </para>
+   <para>
+    - notice publication <literal>p3</literal> has 2 tables, both
with row filters.
+<programlisting>
+testpub=# CREATE PUBLICATION p1 FOR TABLE t1 WHERE (a > 5 AND c = 'NSW');
+CREATE PUBLICATION
+testpub=# CREATE PUBLICATION p2 FOR TABLE t1, t2 WHERE (e = 99);
+CREATE PUBLICATION
+testpub=# CREATE PUBLICATION p3 FOR TABLE t2 WHERE (d = 10), t3 WHERE (g = 10);
+CREATE PUBLICATION
+</programlisting>
+   </para>

I think it is better to use the corresponding content from Euler's version.

7.
+   <para>
+    The PSQL command <command>\d</command> shows what publications the table is
+    a member of, as well as that table's row filter expression (if defined) in
+    those publications.
+   </para>
+   <para>
+    - notice table <literal>t1</literal> is a member of 2 publications, but
+    has a row filter only in <literal>p1</literal>.
+   </para>
+   <para>
+    - notice table <literal>t2</literal> is a member of 2 publications, and
+    has a different row filter in each of them.

This looks unnecessary to me. Let's remove this part.

8.
+   <para>
+    - notice that only the rows satisfying the <literal>t1 WHERE</literal>
+    clause of publication <literal>p1</literal> are replicated.

Again, it is better to use Euler's version for this and at the place,
he had in his version. Similarly, adjust other notices if any like
this one.

9. I suggest adding an example for partition tables showing how the
row filter is used based on the 'publish_via_partition_root'
parameter.

-- 
With Regards,
Amit Kapila.

Re: PG DOCS - logical replication filtering

From

Peter Smith

Date:

08 April 2022, 05:53:38

PSA patch v6 to address some of Amit's review comments [1].

------
[1] https://www.postgresql.org/message-id/CAA4eK1JdwQQsxa%2BzpsBW5rCxEfXopYx381nwcCyeXk6mpF8ChQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

v6-0001-PG-DOCS-page-for-row-filters.patch

Re: PG DOCS - logical replication filtering

From

Peter Smith

Date:

08 April 2022, 06:12:07

On Wed, Apr 6, 2022 at 8:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Mar 24, 2022 at 11:48 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
>
> Review comments:
> ===============
> 1.
> +    The <literal>WHERE</literal> clause expression is evaluated with the same
> +    role used for the replication connection. i.e. the role specified in the
> +    <literal>CONNECTION</literal> clause of the <xref
> linkend="sql-createsubscription"/>.
>
> Can we use () around i.e. sentence? It looks bit odd to me now.
> The <literal>WHERE</literal> clause expression is evaluated with the
> same role used for the replication connection (i.e., the role
> specified in the <literal>CONNECTION</literal> clause of the <xref
> linkend="sql-createsubscription"/>).

OK. Modified in v6 [1].

> 2.
> +   <para>
> +    Whenever an <command>UPDATE</command> is processed, the row filter
> +    expression is evaluated for both the old and new row (i.e. before
> +    and after the data is updated).
>
> Can we write the below part slightly differently?
> Before:
> (i.e. before and after the data is updated).
> After:
> (i.e the data before and after the update).

OK. Modified in v6 [1].

> 3.
> +   <para>
> +    Whenever an <command>UPDATE</command> is processed, the row filter
> +    expression is evaluated for both the old and new row (i.e. before
> +    and after the data is updated).
> +   </para>
> +
> +   <para>
> +    If both evaluations are <literal>true</literal>, it replicates the
> +    <command>UPDATE</command> change.
> +   </para>
> +
> +   <para>
> +    If both evaluations are <literal>false</literal>, it doesn't replicate
> +    the change.
> +   </para>
>
> I think we can combine these three separate paragraphs.

The first sentence is the explanation, then there are the 3 separate
“If …” cases mentioned. It doesn’t seem quite right to group just the
first 2 “if…” cases into one paragraph, while leaving the 3rd one
separated. OTOH combining everything into one big paragraph seems
worse. Please confirm if you still want this changed.

> 4.
> +   <para>
> +    If the publication contains a partitioned table, the publication parameter
> +    <literal>publish_via_partition_root</literal> determines which row filter
> +    is used.
> +    <itemizedlist>
> +
> +     <listitem>
> +      <para>
> +       If <literal>publish_via_partition_root</literal> is
> <literal>false</literal>
> +       (default), each <emphasis>partition's</emphasis> row filter is used.
> +      </para>
> +     </listitem>
> +
> +     <listitem>
> +      <para>
> +       If <literal>publish_via_partition_root</literal> is
> <literal>true</literal>,
> +       the <emphasis>root partitioned table's</emphasis> row filter is used.
> +      </para>
> +     </listitem>
> +
> +    </itemizedlist>
> +   </para>
>
> I think we can combine this into single para as Euler had in his version.

We can do it, but I am not sure if your review was looking at the
rendered HTML or just looking at the SGML text? IMO using bullets here
ended up being more readable (it is also consistent with other bullet
usages on this page). Please confirm if you still want this changed.

> 5.
> +   <note>
> +    <para>
> +     Publication <literal>publish</literal> operations are ignored
> when copying pre-existing table data.
> +    </para>
> +   </note>
> +
> +   <note>
> +    <para>
> +     If the subscriber is in a release prior to 15, copy pre-existing data
> +     doesn't use row filters even if they are defined in the publication.
> +     This is because old releases can only copy the entire table data.
> +    </para>
> +   </note>
>
> I don't see the need for the first <note> here, the second one seems
> to convey it.

Well, the 2nd note is only about compatibility with older versions
doing the subscribe. But the 1st note is not version-specific at all.
It is saying that the COPY does not take the “publish” option into
account. If you know of some other docs already mentioning this subtle
behaviour of the COPY then I can remove this note and just
cross-reference to the other place. But I did not know anywhere this
is already mentioned, so that is why I wrote the note about it.

> 6.
> +   <para>
> +    Create some tables to be used in the following examples.
> +<programlisting>
> +testpub=# CREATE TABLE t1(a int, b int, c text, primary key(a,c));
> +CREATE TABLE
> +testpub=# CREATE TABLE t2(d int primary key, e int, f int);
> +CREATE TABLE
> +testpub=# CREATE TABLE t3(g int primary key, h int, i int);
> +CREATE TABLE
> +</programlisting>
> +   </para>
> +
> +   <para>
> +    Create some publications.
> +   </para>
> +   <para>
> +    - notice publication <literal>p1</literal> has 1 table with a row filter.
> +   </para>
> +   <para>
> +    - notice publication <literal>p2</literal> has 2 tables, one without a row
> +    filter, and one with a row filter.
> +   </para>
> +   <para>
> +    - notice publication <literal>p3</literal> has 2 tables, both
> with row filters.
> +<programlisting>
> +testpub=# CREATE PUBLICATION p1 FOR TABLE t1 WHERE (a > 5 AND c = 'NSW');
> +CREATE PUBLICATION
> +testpub=# CREATE PUBLICATION p2 FOR TABLE t1, t2 WHERE (e = 99);
> +CREATE PUBLICATION
> +testpub=# CREATE PUBLICATION p3 FOR TABLE t2 WHERE (d = 10), t3 WHERE (g = 10);
> +CREATE PUBLICATION
> +</programlisting>
> +   </para>
>
> I think it is better to use the corresponding content from Euler's version.

OK. Modified in v6 [1]. I changed the primary key syntax to be the
same as Euler had. I also moved all the 'notice' parts to below the
corresponding example and modified the text.

>
> 7.
> +   <para>
> +    The PSQL command <command>\d</command> shows what publications the table is
> +    a member of, as well as that table's row filter expression (if defined) in
> +    those publications.
> +   </para>
> +   <para>
> +    - notice table <literal>t1</literal> is a member of 2 publications, but
> +    has a row filter only in <literal>p1</literal>.
> +   </para>
> +   <para>
> +    - notice table <literal>t2</literal> is a member of 2 publications, and
> +    has a different row filter in each of them.
>
> This looks unnecessary to me. Let's remove this part.

I thought something is needed to explain/demonstrate how the user can
know the different row filters for all the publications that the same
table is a member of. Otherwise, the user has to guess (??) what
publications are using their table and then use \dRp+ to dig at all
those publications to find the row filters.

I can remove all this part from the Examples, but I think at least the
\d should still be mentioned somewhere. IMO I should put that "PSQL
commands"  section back (which existed in an earlier version) and just
add a sentence about this. Then this examples part can be removed.
What do you think?

> 8.
> +   <para>
> +    - notice that only the rows satisfying the <literal>t1 WHERE</literal>
> +    clause of publication <literal>p1</literal> are replicated.
>
> Again, it is better to use Euler's version for this and at the place,
> he had in his version. Similarly, adjust other notices if any like
> this one.

OK. Modified in v6 [1]. Every “notice” has now been moved to follow
the associated example (how Euler had them)


> 9. I suggest adding an example for partition tables showing how the
> row filter is used based on the 'publish_via_partition_root'
> parameter.

OK - I am working on this and will add it in a future patch version.

------
[1] https://www.postgresql.org/message-id/CAHut%2BPvyxMedYY-jHaT9YSfEPHv0jU2-CZ8F_nPvhuP0b955og%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: PG DOCS - logical replication filtering

From

Alvaro Herrera

Date:

09 April 2022, 10:45:31

On 2022-Apr-08, Peter Smith wrote:

> PSA patch v6 to address some of Amit's review comments [1].

I think the text is good (didn't read it all, just about the first
half), but why is there one paragraph per sentence?  Seems a bit too
sparse.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"You're _really_ hosed if the person doing the hiring doesn't understand
relational systems: you end up with a whole raft of programmers, none of
whom has had a Date with the clue stick."              (Andrew Sullivan)

Re: PG DOCS - logical replication filtering

From

Amit Kapila

Date:

11 April 2022, 03:27:30

On Fri, Apr 8, 2022 at 11:42 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Wed, Apr 6, 2022 at 8:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> > 3.
> > +   <para>
> > +    Whenever an <command>UPDATE</command> is processed, the row filter
> > +    expression is evaluated for both the old and new row (i.e. before
> > +    and after the data is updated).
> > +   </para>
> > +
> > +   <para>
> > +    If both evaluations are <literal>true</literal>, it replicates the
> > +    <command>UPDATE</command> change.
> > +   </para>
> > +
> > +   <para>
> > +    If both evaluations are <literal>false</literal>, it doesn't replicate
> > +    the change.
> > +   </para>
> >
> > I think we can combine these three separate paragraphs.
>
> The first sentence is the explanation, then there are the 3 separate
> “If …” cases mentioned. It doesn’t seem quite right to group just the
> first 2 “if…” cases into one paragraph, while leaving the 3rd one
> separated. OTOH combining everything into one big paragraph seems
> worse. Please confirm if you still want this changed.
>

Yeah, I think we should have something like what Euler's version had
and maybe keep a summary section from the current patch.

> > 4.
> > +   <para>
> > +    If the publication contains a partitioned table, the publication parameter
> > +    <literal>publish_via_partition_root</literal> determines which row filter
> > +    is used.
> > +    <itemizedlist>
> > +
> > +     <listitem>
> > +      <para>
> > +       If <literal>publish_via_partition_root</literal> is
> > <literal>false</literal>
> > +       (default), each <emphasis>partition's</emphasis> row filter is used.
> > +      </para>
> > +     </listitem>
> > +
> > +     <listitem>
> > +      <para>
> > +       If <literal>publish_via_partition_root</literal> is
> > <literal>true</literal>,
> > +       the <emphasis>root partitioned table's</emphasis> row filter is used.
> > +      </para>
> > +     </listitem>
> > +
> > +    </itemizedlist>
> > +   </para>
> >
> > I think we can combine this into single para as Euler had in his version.
>
> We can do it, but I am not sure if your review was looking at the
> rendered HTML or just looking at the SGML text? IMO using bullets here
> ended up being more readable (it is also consistent with other bullet
> usages on this page). Please confirm if you still want this changed.
>

Fair enough. We can keep this part as it is.

> > 5.
> > +   <note>
> > +    <para>
> > +     Publication <literal>publish</literal> operations are ignored
> > when copying pre-existing table data.
> > +    </para>
> > +   </note>
> > +
> > +   <note>
> > +    <para>
> > +     If the subscriber is in a release prior to 15, copy pre-existing data
> > +     doesn't use row filters even if they are defined in the publication.
> > +     This is because old releases can only copy the entire table data.
> > +    </para>
> > +   </note>
> >
> > I don't see the need for the first <note> here, the second one seems
> > to convey it.
>
> Well, the 2nd note is only about compatibility with older versions
> doing the subscribe. But the 1st note is not version-specific at all.
> It is saying that the COPY does not take the “publish” option into
> account. If you know of some other docs already mentioning this subtle
> behaviour of the COPY then I can remove this note and just
> cross-reference to the other place. But I did not know anywhere this
> is already mentioned, so that is why I wrote the note about it.
>

I don't see the need to say about general initial sync (COPY) behavior
here. It is already defined at [1]. If we want to enhance, we can do
that as a separate patch to make changes where the initial sync is
explained. I am not sure that is required though.

> >
> > 7.
> > +   <para>
> > +    The PSQL command <command>\d</command> shows what publications the table is
> > +    a member of, as well as that table's row filter expression (if defined) in
> > +    those publications.
> > +   </para>
> > +   <para>
> > +    - notice table <literal>t1</literal> is a member of 2 publications, but
> > +    has a row filter only in <literal>p1</literal>.
> > +   </para>
> > +   <para>
> > +    - notice table <literal>t2</literal> is a member of 2 publications, and
> > +    has a different row filter in each of them.
> >
> > This looks unnecessary to me. Let's remove this part.
>
> I thought something is needed to explain/demonstrate how the user can
> know the different row filters for all the publications that the same
> table is a member of. Otherwise, the user has to guess (??) what
> publications are using their table and then use \dRp+ to dig at all
> those publications to find the row filters.
>
> I can remove all this part from the Examples, but I think at least the
> \d should still be mentioned somewhere. IMO I should put that "PSQL
> commands"  section back (which existed in an earlier version) and just
> add a sentence about this. Then this examples part can be removed.
> What do you think?
>

I think the way it is changed in the current patch by moving that
explanation down seems okay to me.

I feel in the initial "Row Filters" and "Row Filter Rules" sections,
we don't need to have separate paragraphs. I think the same is pointed
out by Alvaro as well.

--
With Regards,
Amit Kapila.

Re: PG DOCS - logical replication filtering

From

Peter Smith

Date:

11 April 2022, 07:03:55

PSA patch v7 which addresses all the remaining review comments (from
Amit [1a][1b], and from Alvaro [2]).

------
[1a] https://www.postgresql.org/message-id/CAHut%2BPvDFWGUOBugYMtcXhAiViZu%2BQ6P-kxw2%2BU83VOGx0Osdg%40mail.gmail.com
[1b] https://www.postgresql.org/message-id/CAA4eK1JPyVoc1dUjeqbPd9D0_uYxWyyx-8fcsrgiZ5Tpr9OAuw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/202204091045.v2ei4yupxqso%40alvherre.pgsql

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

v7-0001-PG-DOCS-page-for-row-filters.patch

Re: PG DOCS - logical replication filtering

From

Peter Smith

Date:

11 April 2022, 07:09:02

On Fri, Apr 8, 2022 at 4:12 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Wed, Apr 6, 2022 at 8:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Mar 24, 2022 at 11:48 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> >
> > Review comments:
> > ===============
[snip]
>
> > 9. I suggest adding an example for partition tables showing how the
> > row filter is used based on the 'publish_via_partition_root'
> > parameter.
>
> OK - I am working on this and will add it in a future patch version.
>

OK. Added in v7 [1]

------
[1] https://www.postgresql.org/message-id/CAHut%2BPt1X%3D3VaWRbx3yHByEMC-GPh4oeeMeJKJeTmOELDxZJHQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: PG DOCS - logical replication filtering

From

Peter Smith

Date:

11 April 2022, 07:17:27

On Mon, Apr 11, 2022 at 1:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Apr 8, 2022 at 11:42 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Wed, Apr 6, 2022 at 8:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> >
> > > 3.
> > > +   <para>
> > > +    Whenever an <command>UPDATE</command> is processed, the row filter
> > > +    expression is evaluated for both the old and new row (i.e. before
> > > +    and after the data is updated).
> > > +   </para>
> > > +
> > > +   <para>
> > > +    If both evaluations are <literal>true</literal>, it replicates the
> > > +    <command>UPDATE</command> change.
> > > +   </para>
> > > +
> > > +   <para>
> > > +    If both evaluations are <literal>false</literal>, it doesn't replicate
> > > +    the change.
> > > +   </para>
> > >
> > > I think we can combine these three separate paragraphs.
> >
> > The first sentence is the explanation, then there are the 3 separate
> > “If …” cases mentioned. It doesn’t seem quite right to group just the
> > first 2 “if…” cases into one paragraph, while leaving the 3rd one
> > separated. OTOH combining everything into one big paragraph seems
> > worse. Please confirm if you still want this changed.
> >
>
> Yeah, I think we should have something like what Euler's version had
> and maybe keep a summary section from the current patch.
>

Modified in v7 [1]. Removed the bullets. Structured the text
paragraphs the same way that Euler had it. The summary is kept as-is.


> > > 5.
> > > +   <note>
> > > +    <para>
> > > +     Publication <literal>publish</literal> operations are ignored
> > > when copying pre-existing table data.
> > > +    </para>
> > > +   </note>
> > > +
> > > +   <note>
> > > +    <para>
> > > +     If the subscriber is in a release prior to 15, copy pre-existing data
> > > +     doesn't use row filters even if they are defined in the publication.
> > > +     This is because old releases can only copy the entire table data.
> > > +    </para>
> > > +   </note>
> > >
> > > I don't see the need for the first <note> here, the second one seems
> > > to convey it.
> >
> > Well, the 2nd note is only about compatibility with older versions
> > doing the subscribe. But the 1st note is not version-specific at all.
> > It is saying that the COPY does not take the “publish” option into
> > account. If you know of some other docs already mentioning this subtle
> > behaviour of the COPY then I can remove this note and just
> > cross-reference to the other place. But I did not know anywhere this
> > is already mentioned, so that is why I wrote the note about it.
> >
>
> I don't see the need to say about general initial sync (COPY) behavior
> here. It is already defined at [1]. If we want to enhance, we can do
> that as a separate patch to make changes where the initial sync is
> explained. I am not sure that is required though.
>

Did you miss providing the link URL? Anyway, I removed the note in v7
[1]. This information can be done as a separate patch one day (or not
at all).

> I feel in the initial "Row Filters" and "Row Filter Rules" sections,
> we don't need to have separate paragraphs. I think the same is pointed
> out by Alvaro as well.
>

Modified in v7 [1] those sections as suggested. I also assumed these
were the same sections that Alvaro was referring to.

------
[1] https://www.postgresql.org/message-id/CAHut%2BPt1X%3D3VaWRbx3yHByEMC-GPh4oeeMeJKJeTmOELDxZJHQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: PG DOCS - logical replication filtering

From

Amit Kapila

Date:

11 April 2022, 10:40:10

On Mon, Apr 11, 2022 at 12:39 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Fri, Apr 8, 2022 at 4:12 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> OK. Added in v7 [1]
>

Thanks, this looks mostly good to me. I didn't like the new section
added for partitioned tables examples, so I removed it and added some
explanation of the tests. I have slightly changed a few other lines. I
am planning to commit the attached tomorrow unless there are more
comments.

-- 
With Regards,
Amit Kapila.

Attachment

v8-0001-Add-additional-documentation-for-row-filters.patch

Re: PG DOCS - logical replication filtering

From

"Euler Taveira"

Date:

11 April 2022, 17:32:53

On Mon, Apr 11, 2022, at 7:40 AM, Amit Kapila wrote:

On Mon, Apr 11, 2022 at 12:39 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Fri, Apr 8, 2022 at 4:12 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> OK. Added in v7 [1]
>

Thanks, this looks mostly good to me. I didn't like the new section
added for partitioned tables examples, so I removed it and added some
explanation of the tests. I have slightly changed a few other lines. I
am planning to commit the attached tomorrow unless there are more
comments.

I have a few comments.

> If a publication publishes UPDATE and/or DELETE operations ...

If we are talking about operations, use lowercase like I suggested in the

previous version. See the publish parameter [1]. If we are talking about

commands, the word "operations" should be removed or replaced by "commands".

The "and/or" isn't required, "or" implies the same. If you prefer "operations",

my suggestion is to adjust the last sentence that says "only INSERT" to "only

<italic>insert</italic> operation".

> If the old row satisfies the row filter expression (it was sent to the

> subscriber) but the new row doesn't, then from a data consistency perspective

> the old row should be removed from the subscriber.

I suggested small additions to this sentence. We should at least add a comma

after "then" and "perspective". The same applies to the next paragraph too.

Regarding the "Summary", it is redundant as I said before. We already described

all 4 cases. I vote to remove it. However, if we would go with a table, I

suggest to improve the formatting: add borders, "old row" and "new row" should

be titles, "no match" and "match" should be represented by symbols ("" and "X",

for example), and "Case X" column should be removed because this extra column

adds nothing.

Regarding the "Partitioned Tables", I suggested to remove the bullets. We

generally have paragraphs with a few sentences. I tend to avoid short

paragraphs. I also didn't like the 2 bullets using almost the same words. In

the previous version, I suggested something like

If the publication contains a partitioned table, the parameter

publish_via_partition_root determines which row filter expression is used. If

the parameter publish_via_partition_root is true, the row filter expression

associated with the partitioned table is used. Otherwise, the row filter

expression associated with the individual partition is used.

> will be copied. (see Section 31.3.6 for details).

There is an extra period after "copied" that should be removed. The other

option is to remove the parentheses and have another sentence for "See ...".

> those expressions get OR'ed together

I prefer plain English here. This part of the sentence is also redundant with

the rest of the sentence so I suggested to remove it in the previous version.

rows that satisfy any of the row filter expressions is replicated.

instead of

those expressions get OR'ed together, so that rows satisfying any of the

expressions will be replicated.

I also didn't use a different paragraph (like I suggested in the previous

version) because we are talking about the same thing.

The bullets in the example sounds strange, that's why I suggested removing it.

We can even combine the 3 sentences into one paragraph.

> The PSQL command \dRp+ shows the row filter expressions (if defined) for each

> table of the publications.

Well, we don't use PSQL (upppercase) in the documentation. I suggested a

different sentence:

The psql shows the row filter expressions (if defined) for each table.

> The PSQL command \d shows what publications the table is a member of, as well

> as that table's row filter expression (if defined) in those publications.

It is not logical replication business to explain about psql behavior. If, for

some reason, someone decided to change it, this section will contain obsolete

information. The psql output is fine, the explanation is not. That's why I

suggested this modification.

> Only the rows satisfying the t1 WHERE clause of publication p1 are

> replicated.

Again, no bullets. This sentence is useful *before* the psql output. We're

presenting the results. Let's follow the pattern, describe the action and show

the results.

> The UPDATE replicates the change as normal.

This sentence should be *before* the psql output (see my previous version).

Regarding the new examples (for partitioned tables), shouldn't we move the

parent / child definitions to the beginning of the Examples section? It seems

confusing use the same code snippet to show repeated table definitions

(publisher and subscriber). I checked fast and after a few seconds I realized

that the example is not wrong but the database name has a small difference (one

letter "s" x "p"). The publication and subscription definitions are fine there.

I think reusing the same tables and publication introduces complexity.

Shouldn't we just use different tables and publication to provide an "easy"

example? It would avoid DROP PUBLICATION, ALTER SUBSCRIPTION and TRUNCATE.

> Do the inserts same as before.

We should indicate the node (publisher) to be clear.

[1] https://www.postgresql.org/docs/devel/sql-createpublication.html

Euler Taveira

EDB https://www.enterprisedb.com/

Re: PG DOCS - logical replication filtering

From

Amit Kapila

Date:

12 April 2022, 03:52:32

On Mon, Apr 11, 2022 at 11:03 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Mon, Apr 11, 2022, at 7:40 AM, Amit Kapila wrote:
>
> Regarding the new examples (for partitioned tables), shouldn't we move the
> parent / child definitions to the beginning of the Examples section?
>

I think that will make examples less clear.

> It seems
> confusing use the same code snippet to show repeated table definitions
> (publisher and subscriber). I checked fast and after a few seconds I realized
> that the example is not wrong but the database name has a small difference (one
> letter "s" x "p").
>

Can you be more specific? AFAICS, dbname used (testpub) is same.

> The publication and subscription definitions are fine there.
>
> I think reusing the same tables and publication introduces complexity.
> Shouldn't we just use different tables and publication to provide an "easy"
> example? It would avoid DROP PUBLICATION, ALTER SUBSCRIPTION and TRUNCATE.
>

I don't know. I find the current way understandable. I feel using
different names won't gain much and make the example difficult to
follow.

-- 
With Regards,
Amit Kapila.

Re: PG DOCS - logical replication filtering

From

Peter Smith

Date:

12 April 2022, 07:50:52

PSA patch v9 which addresses most of Euler's review comments [1]

------
[1] https://www.postgresql.org/message-id/1c78ebd4-b38d-4b5d-a6ea-d583efe87d97%40www.fastmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

v9-0001-Add-additional-documentation-for-row-filters.patch

Re: PG DOCS - logical replication filtering

From

Peter Smith

Date:

12 April 2022, 08:30:40

On Tue, Apr 12, 2022 at 3:33 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Mon, Apr 11, 2022, at 7:40 AM, Amit Kapila wrote:
>
> On Mon, Apr 11, 2022 at 12:39 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Fri, Apr 8, 2022 at 4:12 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > OK. Added in v7 [1]
> >
>
> Thanks, this looks mostly good to me. I didn't like the new section
> added for partitioned tables examples, so I removed it and added some
> explanation of the tests. I have slightly changed a few other lines. I
> am planning to commit the attached tomorrow unless there are more
> comments.
>
> I have a few comments.

Thanks for your review comments! I addressed most of them as suggested
- see the details below.

>
> > If a publication publishes UPDATE and/or DELETE operations ...
> >
>
> If we are talking about operations, use lowercase like I suggested in the
> previous version. See the publish parameter [1]. If we are talking about
> commands, the word "operations" should be removed or replaced by "commands".
> The "and/or" isn't required, "or" implies the same. If you prefer "operations",
> my suggestion is to adjust the last sentence that says "only INSERT" to "only
> <italic>insert</italic> operation".

Not changed. Because in fact, I copied most of this sentence
(including the uppercase, "operations", "and/or") from existing
documentation [1]
e.g. see "The tables added to a publication that publishes UPDATE
and/or DELETE operations must ..."
[1] https://www.postgresql.org/docs/current/sql-createpublication.html

>
> > If the old row satisfies the row filter expression (it was sent to the
> > subscriber) but the new row doesn't, then from a data consistency perspective
> > the old row should be removed from the subscriber.
> >
>
> I suggested small additions to this sentence. We should at least add a comma
> after "then" and "perspective". The same applies to the next paragraph too.

Modified the commas in [v9] as suggested.

>
> Regarding the "Summary", it is redundant as I said before. We already described
> all 4 cases. I vote to remove it. However, if we would go with a table, I
> suggest to improve the formatting: add borders, "old row" and "new row" should
> be titles, "no match" and "match" should be represented by symbols ("" and "X",
> for example), and "Case X" column should be removed because this extra column
> adds nothing.

Yeah, I know the information is the same in the summary and in the
text. Personally, I find big slabs of technical text difficult to
digest, so I'd have to spend 5 minutes of careful reading and drawing
the exact same summary on a piece of paper anyway just to visualize
what the text says. The summary makes it easy to understand at a
glance. But I have modified the summary in [v9] to remove the "case"
and to add other column headings as suggested.

>
> Regarding the "Partitioned Tables", I suggested to remove the bullets. We
> generally have paragraphs with a few sentences. I tend to avoid short
> paragraphs. I also didn't like the 2 bullets using almost the same words. In
> the previous version, I suggested something like
>
> If the publication contains a partitioned table, the parameter
> publish_via_partition_root determines which row filter expression is used. If
> the parameter publish_via_partition_root is true, the row filter expression
> associated with the partitioned table is used. Otherwise, the row filter
> expression associated with the individual partition is used.
>

Modified in [v9] to remove the bullets.

> > will be copied. (see Section 31.3.6 for details).
>
> There is an extra period after "copied" that should be removed. The other
> option is to remove the parentheses and have another sentence for "See ...".
>

Modified in [v9] as suggested.

> > those expressions get OR'ed together
>
> I prefer plain English here. This part of the sentence is also redundant with
> the rest of the sentence so I suggested to remove it in the previous version.
>
> rows that satisfy any of the row filter expressions is replicated.
>
> instead of
>
> those expressions get OR'ed together, so that rows satisfying any of the
> expressions will be replicated.

Not changed. The readers of this docs page are all users who will be
familiar with the filter expressions, so I felt the "OR'ed together"
part will be perfectly clear to the intended audience.

>
> I also didn't use a different paragraph (like I suggested in the previous
> version) because we are talking about the same thing.
>

Modified in [v9] to use a single paragraph.

> The bullets in the example sounds strange, that's why I suggested removing it.
> We can even combine the 3 sentences into one paragraph.

Modified [v9] so the whole example now has no bullets. Also combined
all these 3 sentences as suggested.

>
> > The PSQL command \dRp+ shows the row filter expressions (if defined) for each
> > table of the publications.
>
> Well, we don't use PSQL (upppercase) in the documentation. I suggested a
> different sentence:
>
> The psql shows the row filter expressions (if defined) for each table.
>

Modified the sentence in [v9]. Now it uses lowercase psql.

> > The PSQL command \d shows what publications the table is a member of, as well
> > as that table's row filter expression (if defined) in those publications.
>
> It is not logical replication business to explain about psql behavior. If, for
> some reason, someone decided to change it, this section will contain obsolete
> information. The psql output is fine, the explanation is not. That's why I
> suggested this modification.

Modified [v9] this sentence also to use lowercase psql.

But I did not understand your point about “If, for some reason,
someone decided to change it, this section will contain obsolete
information”, because IIUC that will be equally true for both the
explanation and the output, so I did not understand why you say "psql
output is fine, the explanation is not". It is the business of this
documentation to help the user to know how and where they can find the
row filter information they may need to know.

>
> > Only the rows satisfying the t1 WHERE clause of publication p1 are
> > replicated.
>
> Again, no bullets. This sentence is useful *before* the psql output. We're
> presenting the results. Let's follow the pattern, describe the action and show
> the results.
>

OK. Modified all the [v9]  example now has all the bullets removed and
follows the suggested pattern (e.g. where the notes always come
*before* the results)

> > The UPDATE replicates the change as normal.
>
> This sentence should be *before* the psql output (see my previous version).
>

Modified [v9] as suggested.

> Regarding the new examples (for partitioned tables), shouldn't we move the
> parent / child definitions to the beginning of the Examples section?

Not changed. IMO if we moved those CREATE TABLE as suggested then they
will then be too far away from where they are being used.

> It seems
> confusing use the same code snippet to show repeated table definitions
> (publisher and subscriber). I checked fast and after a few seconds I realized
> that the example is not wrong but the database name has a small difference (one
> letter "s" x "p"). The publication and subscription definitions are fine there.
>

Not changed. The publisher and subscriber programlistings are always
separated. If you are looking at the rendered HTML I think it is quite
clear that one is at the publisher and one is at the subscriber. OTOH,
if we omitted creating the tables on the subscriber then I think that
really would cause some confusion.

> I think reusing the same tables and publication introduces complexity.
> Shouldn't we just use different tables and publication to provide an "easy"
> example? It would avoid DROP PUBLICATION, ALTER SUBSCRIPTION and TRUNCATE.
>

Not changed. Those same tables were re-used *deliberately* so that the
examples could use identical inserts, and to emphasize that the
different behaviour was caused only by the
"publish_via_partition_root" setting.

> > Do the inserts same as before.
>
> We should indicate the node (publisher) to be clear.

OK. Modified [v9] as suggested.

------
[v9] https://www.postgresql.org/message-id/CAHut%2BPvYqo77rwg_vHC%3DOyQ7hCHZGVm%3DNi%2BJQbf8VyBz8hoo2w%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: PG DOCS - logical replication filtering

From

"Euler Taveira"

Date:

12 April 2022, 14:07:27

On Tue, Apr 12, 2022, at 5:30 AM, Peter Smith wrote:

Not changed. Because in fact, I copied most of this sentence
(including the uppercase, "operations", "and/or") from existing
documentation [1]
e.g. see "The tables added to a publication that publishes UPDATE
and/or DELETE operations must ..."
[1] https://www.postgresql.org/docs/current/sql-createpublication.html

Hmm. You are checking the Notes. I'm referring the the publish parameter. IMO

this sentence should use operations in lowercase letters too because even if

you create it with uppercase letters, Postgres will provide a lowercase word

when you dump it.

Yeah, I know the information is the same in the summary and in the
text. Personally, I find big slabs of technical text difficult to
digest, so I'd have to spend 5 minutes of careful reading and drawing
the exact same summary on a piece of paper anyway just to visualize
what the text says. The summary makes it easy to understand at a
glance. But I have modified the summary in [v9] to remove the "case"
and to add other column headings as suggested.

Isn't it better to use a table instead of synopsis?

Not changed. The readers of this docs page are all users who will be
familiar with the filter expressions, so I felt the "OR'ed together"
part will be perfectly clear to the intended audience.

If you want to keep it, change it to "ORed". It is used in indices.sgml. Let's

keep the consistence.

But I did not understand your point about “If, for some reason,
someone decided to change it, this section will contain obsolete
information”, because IIUC that will be equally true for both the
explanation and the output, so I did not understand why you say "psql
output is fine, the explanation is not". It is the business of this
documentation to help the user to know how and where they can find the
row filter information they may need to know.

You are describing a psql command here. My point is keep psql explanation in

the psql section. This section is to describe the row filter feature. If we

start describing features in other sections, we will have outdated information

when the referred feature is changed and someone fails to find all references.

I tend to concentrate detailed explanation in the feature section. If I have to

add links in other sections, I use "Seee Section 1.23 for details ...".

Not changed. The publisher and subscriber programlistings are always
separated. If you are looking at the rendered HTML I think it is quite
clear that one is at the publisher and one is at the subscriber. OTOH,
if we omitted creating the tables on the subscriber then I think that
really would cause some confusion.

The difference is extra space. By default, the CSS does not include a border

for programlisting. That's why I complained about it. I noticed that the

website CSS includes it. However, the PDF will not include the border. I would

add a separate description for the subscriber just to be clear.

One last suggestion, you are using identifiers in uppercase letters but

"primary key" is in lowercase.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: PG DOCS - logical replication filtering

From

Peter Smith

Date:

13 April 2022, 03:24:53

PSA patch v10 which addresses the remaining review comments from Euler [1]

------
[1] https://www.postgresql.org/message-id/3cd8d622-6a26-4eaf-a5aa-ac78030e5f50%40www.fastmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

v10-0001-Add-additional-documentation-for-row-filters.patch

Re: PG DOCS - logical replication filtering

From

Peter Smith

Date:

13 April 2022, 03:34:33

On Wed, Apr 13, 2022 at 12:08 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Tue, Apr 12, 2022, at 5:30 AM, Peter Smith wrote:
>
> Not changed. Because in fact, I copied most of this sentence
> (including the uppercase, "operations", "and/or") from existing
> documentation [1]
> e.g. see "The tables added to a publication that publishes UPDATE
> and/or DELETE operations must ..."
> [1] https://www.postgresql.org/docs/current/sql-createpublication.html
>
> Hmm. You are checking the Notes. I'm referring the the publish parameter. IMO
> this sentence should use operations in lowercase letters too because even if
> you create it with uppercase letters, Postgres will provide a lowercase word
> when you dump it.

IIRC the row filter replication identity checking is at run-time based
on the operation (not at DDL time based on the publish_parameter). For
this reason, and also because this is the same format/wording used in
many places already on the create_publication.sgml, I did not change
this formatting. But I did modify [v10] as earlier suggested to
replace the “and/or” with “or”, and also added another word
“operation”.

>
> Yeah, I know the information is the same in the summary and in the
> text. Personally, I find big slabs of technical text difficult to
> digest, so I'd have to spend 5 minutes of careful reading and drawing
> the exact same summary on a piece of paper anyway just to visualize
> what the text says. The summary makes it easy to understand at a
> glance. But I have modified the summary in [v9] to remove the "case"
> and to add other column headings as suggested.
>
> Isn't it better to use a table instead of synopsis?

Modified [v10] as suggested.

>
> Not changed. The readers of this docs page are all users who will be
> familiar with the filter expressions, so I felt the "OR'ed together"
> part will be perfectly clear to the intended audience.
>
> If you want to keep it, change it to "ORed". It is used in indices.sgml. Let's
> keep the consistence.

Modified [v10] as suggested.

>
> But I did not understand your point about “If, for some reason,
> someone decided to change it, this section will contain obsolete
> information”, because IIUC that will be equally true for both the
> explanation and the output, so I did not understand why you say "psql
> output is fine, the explanation is not". It is the business of this
> documentation to help the user to know how and where they can find the
> row filter information they may need to know.
>
> You are describing a psql command here. My point is keep psql explanation in
> the psql section. This section is to describe the row filter feature. If we
> start describing features in other sections, we will have outdated information
> when the referred feature is changed and someone fails to find all references.
> I tend to concentrate detailed explanation in the feature section. If I have to
> add links in other sections, I use "Seee Section 1.23 for details ...".

Modified [v10] so the psql descriptions are now very generic.

>
> Not changed. The publisher and subscriber programlistings are always
> separated. If you are looking at the rendered HTML I think it is quite
> clear that one is at the publisher and one is at the subscriber. OTOH,
> if we omitted creating the tables on the subscriber then I think that
> really would cause some confusion.
>
> The difference is extra space. By default, the CSS does not include a border
> for programlisting. That's why I complained about it. I noticed that the
> website CSS includes it. However, the PDF will not include the border. I would
> add a separate description for the subscriber just to be clear.
>

Modified [v10] as suggested to add a separate description for the subscriber.

> One last suggestion, you are using identifiers in uppercase letters but
> "primary key" is in lowercase.
>

Modified [v10] as suggested to make this uppercase

------
[v10] https://www.postgresql.org/message-id/CAHut%2BPvMEYkCRWDoZSpFnP%2B5SExus7YzWAd%3D6ah9vwkfRhOnSg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: PG DOCS - logical replication filtering

From

"Euler Taveira"

Date:

13 April 2022, 19:59:30

On Wed, Apr 13, 2022, at 12:24 AM, Peter Smith wrote:

PSA patch v10 which addresses the remaining review comments from Euler [1]

Looks good to me.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: PG DOCS - logical replication filtering

From

Peter Smith

Date:

13 April 2022, 23:01:46

I've changed the CF entry [1] status to "ready for committer".

------
[1] https://commitfest.postgresql.org/38/3605/

Kind Regards,
Peter Smith.
Fujitsu Australia

RE: PG DOCS - logical replication filtering

From

"houzj.fnst@fujitsu.com"

Date:

14 April 2022, 00:58:42

On Wednesday, April 13, 2022 11:25 AM Peter Smith <smithpb2250@gmail.com> wrote:
> 
> PSA patch v10 which addresses the remaining review comments from Euler [1]

Thanks for the patch, it looks good to me.

Best regards,
Hou zj

Re: PG DOCS - logical replication filtering

From

Amit Kapila

Date:

14 April 2022, 03:25:18

On Thu, Apr 14, 2022 at 1:29 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Wed, Apr 13, 2022, at 12:24 AM, Peter Smith wrote:
>
> PSA patch v10 which addresses the remaining review comments from Euler [1]
>
> Looks good to me.
>

Thanks, this looks good to me as well. I'll check this again early
next week and push unless I find something or there are more comments.

-- 
With Regards,
Amit Kapila.

Re: PG DOCS - logical replication filtering

From

Amit Kapila

Date:

18 April 2022, 05:52:56

On Thu, Apr 14, 2022 at 8:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Apr 14, 2022 at 1:29 AM Euler Taveira <euler@eulerto.com> wrote:
> >
> > On Wed, Apr 13, 2022, at 12:24 AM, Peter Smith wrote:
> >
> > PSA patch v10 which addresses the remaining review comments from Euler [1]
> >
> > Looks good to me.
> >
>
> Thanks, this looks good to me as well. I'll check this again early
> next week and push unless I find something or there are more comments.
>

Pushed.

-- 
With Regards,
Amit Kapila.

Re: PG DOCS - logical replication filtering

From

Peter Smith

Date:

18 April 2022, 23:15:37

On Mon, Apr 18, 2022 at 3:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Apr 14, 2022 at 8:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Apr 14, 2022 at 1:29 AM Euler Taveira <euler@eulerto.com> wrote:
> > >
> > > On Wed, Apr 13, 2022, at 12:24 AM, Peter Smith wrote:
> > >
> > > PSA patch v10 which addresses the remaining review comments from Euler [1]
> > >
> > > Looks good to me.
> > >
> >
> > Thanks, this looks good to me as well. I'll check this again early
> > next week and push unless I find something or there are more comments.
> >
>
> Pushed.
>

Thanks for pushing. I updated the CF entry [1] to say "committed'.

------
[1] https://commitfest.postgresql.org/38/3605/

Kind Regards,
Peter Smith.
Fujitsu Australia