Re: row filtering for logical replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: row filtering for logical replication
Date
Msg-id CAA4eK1JdLzJEmxxzEEYAOg41Om3Y88uL+7CgXdvnAaj7hkw8BQ@mail.gmail.com
Whole thread Raw
In response to Re: row filtering for logical replication  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: row filtering for logical replication
Re: row filtering for logical replication
List pgsql-hackers
On Tue, Dec 14, 2021 at 4:44 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Tue, Dec 7, 2021 at 5:48 PM tanghy.fnst@fujitsu.com
> <tanghy.fnst@fujitsu.com> wrote:
> >
> > I think for "FOR ALL TABLE" publication(p2 in my case), table tbl should be
> > treated as no filter, and table tbl should have no filter in subscription sub. Thoughts?
> >
> > But for now, the filter(a > 10) works both when copying initial data and later changes.
> >
> > To fix it, I think we can check if the table is published in a 'FOR ALL TABLES'
> > publication or published as part of schema in function pgoutput_row_filter_init
> > (which was introduced in v44-0003 patch), also we need to make some changes in
> > tablesync.c.
> >
>
> Partly fixed in v46-0005  [1]
>
> NOTE
> - The initial COPY part of the tablesync does not take the publish
> operation into account so it means that if any of the subscribed
> publications have "puballtables" flag then all data will be copied
> sans filters.
>

I think this should be okay but the way you have implemented it in the
patch doesn't appear to be the optimal way. Can't we fetch
allpubtables info and qual info as part of one query instead of using
separate queries?

> I guess this is consistent with the other decision to
> ignore publication operations [2].
>
> TODO
> - Documentation
> - IIUC there is a similar case yet to be addressed - FOR ALL TABLES IN SCHEMA
>

Yeah, "FOR ALL TABLES IN SCHEMA" should also be addressed. In this
case, the difference would be that we need to check the presence of
schema corresponding to the table (for which we are fetching
row_filter information) is there in pg_publication_namespace. If it
exists then we don't need to apply row_filter for the table. I feel it
is better to fetch all this information as part of the query which you
are using to fetch row_filter info. The idea is to avoid the extra
round-trip between subscriber and publisher.

Few other comments:
===================
1.
@@ -926,6 +928,22 @@ pgoutput_row_filter_init(PGOutputData *data,
Relation relation, RelationSyncEntr
  bool rfisnull;

  /*
+ * If the publication is FOR ALL TABLES then it is treated same as if this
+ * table has no filters (even if for some other publication it does).
+ */
+ if (pub->alltables)
+ {
+ if (pub->pubactions.pubinsert)
+ no_filter[idx_ins] = true;
+ if (pub->pubactions.pubupdate)
+ no_filter[idx_upd] = true;
+ if (pub->pubactions.pubdelete)
+ no_filter[idx_del] = true;
+
+ continue;
+ }

Is there a reason to continue checking the other publications if
no_filter is true for all kind of pubactions?

2.
+ * All row filter expressions will be discarded if there is one
+ * publication-relation entry without a row filter. That's because
+ * all expressions are aggregated by the OR operator. The row
+ * filter absence means replicate all rows so a single valid
+ * expression means publish this row.

This same comment is at two places, remove from one of the places. I
think keeping it atop for loop is better.

3.
+ {
+ int idx;
+ bool found_filters = false;

I am not sure if starting such ad-hoc braces in the code to localize
the scope of variables is a regular practice. Can we please remove
this?


-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Skipping logical replication transactions on subscriber side
Next
From: Michael Paquier
Date:
Subject: Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set