Re: row filtering for logical replication - Mailing list pgsql-hackers

From Euler Taveira
Subject Re: row filtering for logical replication
Date
Msg-id c56e001e-313a-4056-a289-4ea33da92322@www.fastmail.com
Whole thread Raw
In response to Re: row filtering for logical replication  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: row filtering for logical replication  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Re: row filtering for logical replication  (Greg Nancarrow <gregn4422@gmail.com>)
RE: row filtering for logical replication  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Re: row filtering for logical replication  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Tue, Jul 13, 2021, at 12:25 AM, Peter Smith wrote:
I have reviewed the latest v18 patch. Below are some more review
comments and patches.
Peter, thanks for quickly check the new patch. I'm attaching a new patch (v19)
that addresses (a) this new review, (b) Tomas' review and (c) Greg's review. I
also included the copy/equal node support for the new node (PublicationTable)
mentioned by Tomas in another email.

1. Commit comment - wording
8<

=>

I think this means to say: "Rows that don't satisfy an optional WHERE
clause will be filtered out."
Agreed.

2. Commit comment - wording

"The row filter is per table, which allows different row filters to be
defined for different tables."

=>

I think all that is the same as just saying: "The row filter is per table."
Agreed.

3. PG docs - independent improvement

You wrote (ref [1] point 3):

"I agree it can be confusing. BTW, CREATE PUBLICATION does not mention that the
root partitioned table is used. We should improve that sentence too."

I agree, but that PG docs improvement is independent of your RowFilter
patch; please make another thread for that idea.
I will. And I will also include the next item that I removed from the patch.

4. doc/src/sgml/ref/create_publication.sgml - independent improvement

@@ -131,9 +135,9 @@ CREATE PUBLICATION <replaceable
class="parameter">name</replaceable>
           on its partitions) contained in the publication will be published
           using the identity and schema of the partitioned table rather than
           that of the individual partitions that are actually changed; the
-          latter is the default.  Enabling this allows the changes to be
-          replicated into a non-partitioned table or a partitioned table
-          consisting of a different set of partitions.
+          latter is the default (<literal>false</literal>).  Enabling this
+          allows the changes to be replicated into a non-partitioned table or a
+          partitioned table consisting of a different set of partitions.
          </para>

I think that Tomas wrote (ref [2] point 2) that this change seems
unrelated to your RowFilter patch.

I agree; I liked the change, but IMO you need to propose this one in
another thread too.
Reverted.

5. doc/src/sgml/ref/create_subscription.sgml - wording
8<

I felt that the sentence: "If any table in the publications has a
<literal>WHERE</literal> clause, data synchronization does not use it
if the subscriber is a <productname>PostgreSQL</productname> version
before 15."

Could be expressed more simply like: "If the subscriber is a
<productname>PostgreSQL</productname> version before 15 then any row
filtering is ignored."
Agreed.

6. src/backend/commands/publicationcmds.c - wrong function comment
8<

/*
  * Close all relations in the list.
+ *
+ * Publication node can have a different list element, hence, pub_drop_table
+ * indicates if it has a Relation (true) or PublicationTable (false).
  */
static void
CloseTableList(List *rels)

=>

The 2nd parameter does not exist in v18, so that comment about
pub_drop_table seems to be a cut/paste error from the OpenTableList.
Oops. Removed.

src/backend/replication/logical/tablesync.c - bug ?

@@ -829,16 +883,23 @@ copy_table(Relation rel)
  relmapentry = logicalrep_rel_open(lrel.remoteid, NoLock);
  Assert(rel == relmapentry->localrel);

+ /* List of columns for COPY */
+ attnamelist = make_copy_attnamelist(relmapentry);
+
  /* Start copy on the publisher. */
=>

I did not understand the above call to make_copy_attnamelist. The
result seems unused before it is overwritten later in this same
function (??)
Good catch. This seems to be a leftover from an ancient version.

7. src/backend/replication/logical/tablesync.c  -
fetch_remote_table_info enhancement

+ /* Get relation qual */
+ if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 150000)
+ {
+ resetStringInfo(&cmd);
+ appendStringInfo(&cmd,
+ "SELECT pg_get_expr(prqual, prrelid) "
+ "  FROM pg_publication p "
+ "  INNER JOIN pg_publication_rel pr "
+ "       ON (p.oid = pr.prpubid) "
+ " WHERE pr.prrelid = %u "
+ "   AND p.pubname IN (", lrel->remoteid);

=>

I think a small improvement is possible in this SQL.

If we change that to "SELECT DISTINCT pg_get_expr(prqual, prrelid)"...
then it avoids the copy SQL from having multiple WHERE clauses which
are all identical. This could happen when subscribed to multiple
publications which had the same filter for the same table.
Good catch!

8. src/backend/replication/pgoutput/pgoutput.c - qual member is redundant

@@ -99,6 +108,9 @@ typedef struct RelationSyncEntry

  bool replicate_valid;
  PublicationActions pubactions;
+ List    *qual; /* row filter */
+ List    *exprstate; /* ExprState for row filter */
+ TupleTableSlot *scantuple; /* tuple table slot for row filter */

=>

Now that the exprstate is introduced I think that the other member
"qual" is redundant, so it can be removed.
I was thinking about it for the next patch. Removed.

9. src/backend/replication/pgoutput/pgoutput.c - comment typo?
8<

typo: it/that ?

I think it ought to say "This is the same code as ExecPrepareExpr()
but that is not used because"...
Fixed.

10. src/backend/replication/pgoutput/pgoutput.c - redundant debug logging?

+ /* Evaluates row filter */
+ result = pgoutput_row_filter_exec_expr(exprstate, ecxt);
+
+ elog(DEBUG3, "row filter %smatched", result ? "" : "not ");

The above debug logging is really only a repeat (with different
wording) of the same information already being logged inside the
pgoutput_row_filter_exec_expr function isn't it? Consider removing the
redundant logging.
Agreed. Removed.


--
Euler Taveira

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Reducing memory consumption for pending inval messages
Next
From: "Euler Taveira"
Date:
Subject: Re: row filtering for logical replication