Re: row filtering for logical replication - Mailing list pgsql-hackers

From Euler Taveira
Subject Re: row filtering for logical replication
Date
Msg-id 532a18d8-ce90-4444-8570-8a9fcf09f329@www.fastmail.com
Whole thread Raw
In response to Re: row filtering for logical replication  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: row filtering for logical replication  ("Euler Taveira" <euler@eulerto.com>)
Re: row filtering for logical replication  (Amit Kapila <amit.kapila16@gmail.com>)
Re: row filtering for logical replication  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
On Fri, Jul 2, 2021, at 4:29 AM, Peter Smith wrote:
Hi.

I have been looking at the latest patch set (v16). Below are my review
comments and some patches.

Peter, thanks for your detailed review. Comments are inline.

1. Patch 0001 comment - typo

you can optionally filter rows that does not satisfy a WHERE condition

typo: does/does
Fixed.


2. Patch 0001 comment - typo

The WHERE clause should probably contain only columns that are part of
the primary key or that are covered by REPLICA IDENTITY. Otherwise,
and DELETEs won't be replicated.

typo: "Otherwise, and DELETEs" ??
Fixed.

3. Patch 0001 comment - typo and clarification

If your publication contains partitioned table, the parameter
publish_via_partition_root determines if it uses the partition row filter (if
the parameter is false -- default) or the partitioned table row filter.

Typo: "contains partitioned table" -> "contains a partitioned table"
Fixed.

Also, perhaps the text "or the partitioned table row filter." should
say "or the root partitioned table row filter." to disambiguate the
case where there are more levels of partitions like A->B->C. e.g. What
filter does C use?
I agree it can be confusing. BTW, CREATE PUBLICATION does not mention that the
root partitioned table is used. We should improve that sentence too.

4. src/backend/catalog/pg_publication.c - misleading names

-publication_add_relation(Oid pubid, Relation targetrel,
+publication_add_relation(Oid pubid, PublicationRelationInfo *targetrel,
  bool if_not_exists)

Leaving this parameter name as "targetrel" seems a bit misleading now
in the function code. Maybe this should be called something like "pri"
which is consistent with other places where you have declared
PublicationRelationInfo.

Also, consider declaring some local variables so that the patch may
have less impact on existing code. e.g.
Oid relid = pri->relid
Relation *targetrel = relationinfo->relation
Done.

5. src/backend/commands/publicationcmds.c - simplify code

- rels = OpenTableList(stmt->tables);
+ if (stmt->tableAction == DEFELEM_DROP)
+ rels = OpenTableList(stmt->tables, true);
+ else
+ rels = OpenTableList(stmt->tables, false);

Consider writing that code more simply as just:

rels = OpenTableList(stmt->tables, stmt->tableAction == DEFELEM_DROP);
It is not a common pattern to use an expression as a function argument in
Postgres. I prefer to use a variable with a suggestive name.

6. src/backend/commands/publicationcmds.c - bug?

- CloseTableList(rels);
+ CloseTableList(rels, false);
}

Is this a potential bug? When you called OpenTableList the 2nd param
was maybe true/false, so is it correct to be unconditionally false
here? I am not sure.
Good catch.

7. src/backend/commands/publicationcmds.c - OpenTableList function comment.

  * Open relations specified by a RangeVar list.
+ * AlterPublicationStmt->tables has a different list element, hence, is_drop
+ * indicates if it has a RangeVar (true) or PublicationTable (false).
  * The returned tables are locked in ShareUpdateExclusiveLock mode in order to
  * add them to a publication.

I am not sure about this. Should that comment instead say "indicates
if it has a Relation (true) or PublicationTable (false)"?
Fixed.

8. src/backend/commands/publicationcmds.c - OpenTableList
>8

For some reason it feels kind of clunky to me for this function to be
processing the list differently according to the 2nd param. e.g. the
name "is_drop" seems quite unrelated to the function code, and more to
do with where it was called from. Sorry, I don't have any better ideas
for improvement atm.
My suggestion is to rename it to "pub_drop_table".

9. src/backend/commands/publicationcmds.c - OpenTableList bug?
>8

I felt maybe this is a possible bug here because there seems no code
explicitly assigning the whereClause = NULL  if "is_drop" is true so
maybe it can have a garbage value which could cause problems later.
Maybe this is fixed by using palloc0.
Fixed.

10. src/backend/commands/publicationcmds.c - CloseTableList function comment
>8

Probably the meaning of "is_drop" should be described in this function comment.
Done.

11. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry signature.

-static RelationSyncEntry *get_rel_sync_entry(PGOutputData *data, Oid relid);
+static RelationSyncEntry *get_rel_sync_entry(PGOutputData *data, Relation rel);

I see that this function signature is modified but I did not see how
this parameter refactoring is actually related to the RowFilter patch.
Perhaps I am mistaken, but IIUC this only changes the relid =
RelationGetRelid(rel); to be done inside this function instead of
being done outside by the callers.
It is not critical for this patch so I removed it.

12. src/backend/replication/pgoutput/pgoutput.c - missing function comments

The static functions create_estate_for_relation and
pgoutput_row_filter_prepare_expr probably should be commented.
Done.

13. src/backend/replication/pgoutput/pgoutput.c -
pgoutput_row_filter_prepare_expr function name

+static ExprState *pgoutput_row_filter_prepare_expr(Node *rfnode,
EState *estate);

This function has an unfortunate name with the word "prepare" in it. I
wonder if a different name can be found for this function to avoid any
confusion with pgoutput functions (coming soon) which are related to
the two-phase commit "prepare".
The word "prepare" is related to the executor context. The function name
contains "row_filter" that is sufficient to distinguish it from any other
function whose context is "prepare". I replaced "prepare" with "init".

14. src/bin/psql/describe.c

+ if (!PQgetisnull(tabres, j, 2))
+ appendPQExpBuffer(&buf, " WHERE (%s)",
+   PQgetvalue(tabres, j, 2));

Because the where-clause value already has enclosing parentheses so
using " WHERE (%s)" seems overkill here. e.g. you can see the effect
in your src/test/regress/expected/publication.out file. I think this
should be changed to " WHERE %s" to give better output.
Peter E suggested that extra parenthesis be added. See 0005 [1].

15. src/include/catalog/pg_publication.h - new typedef

+typedef struct PublicationRelationInfo
+{
+ Oid relid;
+ Relation relation;
+ Node    *whereClause;
+} PublicationRelationInfo;
+

The new PublicationRelationInfo should also be added
src/tools/pgindent/typedefs.list
Patches usually don't update typedefs.list. Check src/tools/pgindent/README.

16. src/include/nodes/parsenodes.h - new typedef

+typedef struct PublicationTable
+{
+ NodeTag type;
+ RangeVar   *relation; /* relation to be published */
+ Node    *whereClause; /* qualifications */
+} PublicationTable;

The new PublicationTable should also be added src/tools/pgindent/typedefs.list
Idem.

17. sql/publication.sql - show more output

+CREATE PUBLICATION testpub5 FOR TABLE testpub_rf_tbl1,
testpub_rf_tbl2 WHERE (c <> 'test' AND d < 5);
+RESET client_min_messages;
+ALTER PUBLICATION testpub5 ADD TABLE testpub_rf_tbl3 WHERE (e > 1000
AND e < 2000);
+ALTER PUBLICATION testpub5 DROP TABLE testpub_rf_tbl2;
+-- remove testpub_rf_tbl1 and add testpub_rf_tbl3 again (another
WHERE expression)
+ALTER PUBLICATION testpub5 SET TABLE testpub_rf_tbl3 WHERE (e > 300
AND e < 500);
+-- fail - functions disallowed
+ALTER PUBLICATION testpub5 ADD TABLE testpub_rf_tbl4 WHERE (length(g) < 6);
+-- fail - WHERE not allowed in DROP
+ALTER PUBLICATION testpub5 DROP TABLE testpub_rf_tbl3 WHERE (e < 27);
+\dRp+ testpub5

I felt that it would be better to have a "\dRp+ testpub5" after each
of the valid ALTER PUBLICATION steps to show the intermediate results
also; not just the final one at the end.
Done.

18. src/test/subscription/t/020_row_filter.pl - rename file

I think this file should be renamed to 021_row_filter.pl as there is
already an 020 TAP test present.
Done.

19. src/test/subscription/t/020_row_filter.pl - test comments

AFAIK the test cases are all OK, but it was really quite hard to
review these TAP tests to try to determine what the expected results
should be.
I included your comments but heavily changed it.

20. src/test/subscription/t/020_row_filter.pl - missing test case?

There are some partition tests, but I did not see any test that was
like 3 levels deep like A->B->C, so I was not sure if there is any
case C would ever make use of the filter of its parent B, or would it
only use the filter of the root A?
I didn't include it yet. There is an issue with initial synchronization and
partitioned table when you set publish_via_partition_root. I'll start another
thread for this issue.

21. src/test/subscription/t/020_row_filter.pl - missing test case?

If the same table is in multiple publications they can each have a row
filter. And a subscription might subscribe to some but not all of
those publications. I think this scenario is only partly tested.
8<
e.g.
pub_1 has tableX with RowFilter1
pub_2 has tableX with RowFilter2

Then sub_12 subscribes to pub_1, pub_2
This is already tested in your TAP test (I think) and it makes sure
both filters are applied

But if there was also
pub_3 has tableX with RowFilter3

Then sub_12 still should only be checking the filtered RowFilter1 AND
RowFilter2 (but NOT row RowFilter3). I think this scenario is not
tested.
I added a new publication tap_pub_not_used to cover this case.

POC PATCH FOR PLAN CACHE
========================

PSA a POC patch for a plan cache which gets used inside the
pgoutput_row_filter function instead of calling prepare for every row.
I think this is implementing something like Andes was suggesting a
while back [1].
I also had a WIP patch for it (that's very similar to your patch) so I merged
it.

This cache mechanism consists of caching ExprState and avoid calling
pgoutput_row_filter_init_expr() for every single row. Greg N suggested in
another email that tuple table slot should also be cached to avoid a few cycles
too. It is also included in this new patch.

Measurements with/without this plan cache:

Time spent processing within the pgoutput_row_filter function
- Data was captured using the same technique as the
0002-Measure-row-filter-overhead.patch.
- Inserted 1000 rows, sampled data for the first 100 times in this function.
not cached: average ~ 28.48 us
cached: average ~ 9.75 us

Replication times:
- Using tables and row filters same as in Onder's commands_to_test_perf.sql [2]
100K rows - not cached: ~ 42sec, 43sec, 44sec
100K rows - cached: ~ 41sec, 42sec, 42 sec.

There does seem to be a tiny gain achieved by having the plan cache,
but I think the gain might be a lot less than what people were
expecting.
I did another measure using as baseline the previous patch (v16).

without cache (v16)
---------------------------

mean:           1.46 us
stddev:         2.13 us
median:         1.39 us
min-max:        [0.69 .. 1456.69] us
percentile(99): 3.15 us
mode:           0.91 us

with cache (v18)
-----------------------

mean:           0.63 us
stddev:         1.07 us
median:         0.55 us
min-max:        [0.29 .. 844.87] us
percentile(99): 1.38 us
mode:           0.41 us

It represents -57%. It is a really good optimization for just a few extra lines
of code.




--
Euler Taveira

Attachment

pgsql-hackers by date:

Previous
From: Soumyadeep Chakraborty
Date:
Subject: Re: pg_stats and range statistics
Next
From: "Euler Taveira"
Date:
Subject: Re: row filtering for logical replication