Re: row filtering for logical replication - Mailing list pgsql-hackers

From Peter Smith
Subject Re: row filtering for logical replication
Date
Msg-id CAHut+PuLoZuHD_A=n8GshC84Nc=8guReDsTmV1RFsCYojssD8Q@mail.gmail.com
Whole thread Raw
In response to Re: row filtering for logical replication  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
Hi Euler,

Greg noticed that your patch set was missing any implementation of the
psql tab auto-complete for the new row filter WHERE syntax.

So I have added a POC patch for this missing feature.

Unfortunately, there is an existing HEAD problem overlapping with this
exact same code. I reported this already in another thread [1].

So there are 2 patches attached here:
0001 - Fixes the other reported problem (I hope this may be pushed soon)
0002 - Adds the tab-completion code for your row filter WHERE's

------
[1] https://www.postgresql.org/message-id/CAHut+Ps-vkmnWAShWSRVCB3gx8aM=bFoDqWgBNTzofK0q1LpwA@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

On Tue, Jul 13, 2021 at 1:25 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Mon, Jul 12, 2021 at 5:39 AM Euler Taveira <euler@eulerto.com> wrote:
> >
> > On Fri, Jul 2, 2021, at 4:29 AM, Peter Smith wrote:
> >
> > Hi.
> >
> > I have been looking at the latest patch set (v16). Below are my review
> > comments and some patches.
> >
> > Peter, thanks for your detailed review. Comments are inline.
> >
>
> Hi Euler,
>
> Thanks for addressing my previous review comments.
>
> I have reviewed the latest v18 patch. Below are some more review
> comments and patches.
>
> (The patches 0003,0004 are just examples of what is mentioned in my
> comments; The patches 0001,0002 are there only to try to keep cfbot
> green).
>
> //////////
>
> 1. Commit comment - wording
>
> "When a publication is defined or modified, rows that don't satisfy a
> WHERE clause may be
> optionally filtered out."
>
> =>
>
> I think this means to say: "Rows that don't satisfy an optional WHERE
> clause will be filtered out."
>
> ------
>
> 2. Commit comment - wording
>
> "The row filter is per table, which allows different row filters to be
> defined for different tables."
>
> =>
>
> I think all that is the same as just saying: "The row filter is per table."
>
> ------
>
> 3. PG docs - independent improvement
>
> You wrote (ref [1] point 3):
>
> "I agree it can be confusing. BTW, CREATE PUBLICATION does not mention that the
> root partitioned table is used. We should improve that sentence too."
>
> I agree, but that PG docs improvement is independent of your RowFilter
> patch; please make another thread for that idea.
>
> ------
>
> 4. doc/src/sgml/ref/create_publication.sgml - independent improvement
>
> @@ -131,9 +135,9 @@ CREATE PUBLICATION <replaceable
> class="parameter">name</replaceable>
>            on its partitions) contained in the publication will be published
>            using the identity and schema of the partitioned table rather than
>            that of the individual partitions that are actually changed; the
> -          latter is the default.  Enabling this allows the changes to be
> -          replicated into a non-partitioned table or a partitioned table
> -          consisting of a different set of partitions.
> +          latter is the default (<literal>false</literal>).  Enabling this
> +          allows the changes to be replicated into a non-partitioned table or a
> +          partitioned table consisting of a different set of partitions.
>           </para>
>
> I think that Tomas wrote (ref [2] point 2) that this change seems
> unrelated to your RowFilter patch.
>
> I agree; I liked the change, but IMO you need to propose this one in
> another thread too.
>
> ------
>
> 5. doc/src/sgml/ref/create_subscription.sgml - wording
>
> @@ -102,7 +102,16 @@ CREATE SUBSCRIPTION <replaceable
> class="parameter">subscription_name</replaceabl
>           <para>
>            Specifies whether the existing data in the publications that are
>            being subscribed to should be copied once the replication starts.
> -          The default is <literal>true</literal>.
> +          The default is <literal>true</literal>. If any table in the
> +          publications has a <literal>WHERE</literal> clause, rows that do not
> +          satisfy the <replaceable class="parameter">expression</replaceable>
> +          will not be copied. If the subscription has several publications in
> +          which a table has been published with different
> +          <literal>WHERE</literal> clauses, rows must satisfy all expressions
> +          to be copied. If any table in the publications has a
> +          <literal>WHERE</literal> clause, data synchronization does not use it
> +          if the subscriber is a <productname>PostgreSQL</productname> version
> +          before 15.
>
> I felt that the sentence: "If any table in the publications has a
> <literal>WHERE</literal> clause, data synchronization does not use it
> if the subscriber is a <productname>PostgreSQL</productname> version
> before 15."
>
> Could be expressed more simply like: "If the subscriber is a
> <productname>PostgreSQL</productname> version before 15 then any row
> filtering is ignored."
>
> ------
>
> 6. src/backend/commands/publicationcmds.c - wrong function comment
>
> @@ -585,6 +611,9 @@ OpenTableList(List *tables)
>
>  /*
>   * Close all relations in the list.
> + *
> + * Publication node can have a different list element, hence, pub_drop_table
> + * indicates if it has a Relation (true) or PublicationTable (false).
>   */
>  static void
>  CloseTableList(List *rels)
>
> =>
>
> The 2nd parameter does not exist in v18, so that comment about
> pub_drop_table seems to be a cut/paste error from the OpenTableList.
>
> ------
>
> src/backend/replication/logical/tablesync.c - bug ?
>
> @@ -829,16 +883,23 @@ copy_table(Relation rel)
>   relmapentry = logicalrep_rel_open(lrel.remoteid, NoLock);
>   Assert(rel == relmapentry->localrel);
>
> + /* List of columns for COPY */
> + attnamelist = make_copy_attnamelist(relmapentry);
> +
>   /* Start copy on the publisher. */
> =>
>
> I did not understand the above call to make_copy_attnamelist. The
> result seems unused before it is overwritten later in this same
> function (??)
>
> ------
>
> 7. src/backend/replication/logical/tablesync.c  -
> fetch_remote_table_info enhancement
>
> + /* Get relation qual */
> + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 150000)
> + {
> + resetStringInfo(&cmd);
> + appendStringInfo(&cmd,
> + "SELECT pg_get_expr(prqual, prrelid) "
> + "  FROM pg_publication p "
> + "  INNER JOIN pg_publication_rel pr "
> + "       ON (p.oid = pr.prpubid) "
> + " WHERE pr.prrelid = %u "
> + "   AND p.pubname IN (", lrel->remoteid);
>
> =>
>
> I think a small improvement is possible in this SQL.
>
> If we change that to "SELECT DISTINCT pg_get_expr(prqual, prrelid)"...
> then it avoids the copy SQL from having multiple WHERE clauses which
> are all identical. This could happen when subscribed to multiple
> publications which had the same filter for the same table.
>
> I attached a tmp POC patch for this change and it works as expected.
> For example, I subscribe to 3 publications, but 2 of them have the
> same filter for the table.
>
> BEFORE
> COPY (SELECT key, value, data FROM public.test WHERE (key > 0) AND
> (key > 1000) AND (key > 1000)) TO STDOUT
>
> AFTER
> COPY (SELECT key, value, data FROM public.test WHERE (key > 0) AND
> (key > 1000) ) TO STDOUT
>
> ------
>
> 8. src/backend/replication/pgoutput/pgoutput.c - qual member is redundant
>
> @@ -99,6 +108,9 @@ typedef struct RelationSyncEntry
>
>   bool replicate_valid;
>   PublicationActions pubactions;
> + List    *qual; /* row filter */
> + List    *exprstate; /* ExprState for row filter */
> + TupleTableSlot *scantuple; /* tuple table slot for row filter */
>
> =>
>
> Now that the exprstate is introduced I think that the other member
> "qual" is redundant, so it can be removed.
>
> FYI - I attached a tmp patch with all the qual references deleted and
> everything is fine.
>
> ------
>
> 9. src/backend/replication/pgoutput/pgoutput.c - comment typo?
>
> + /*
> + * Cache ExprState using CacheMemoryContext. This is the same code as
> + * ExecPrepareExpr() but it is not used because it doesn't use an EState.
> + * It should probably be another function in the executor to handle the
> + * execution outside a normal Plan tree context.
> + */
>
> =>
>
> typo: it/that ?
>
> I think it ought to say "This is the same code as ExecPrepareExpr()
> but that is not used because"...
>
> ------
>
> 10. src/backend/replication/pgoutput/pgoutput.c - redundant debug logging?
>
> + /* Evaluates row filter */
> + result = pgoutput_row_filter_exec_expr(exprstate, ecxt);
> +
> + elog(DEBUG3, "row filter %smatched", result ? "" : "not ");
>
> The above debug logging is really only a repeat (with different
> wording) of the same information already being logged inside the
> pgoutput_row_filter_exec_expr function isn't it? Consider removing the
> redundant logging.
>
> e.g. This is already getting logged by pgoutput_row_filter_exec_expr:
>
>     elog(DEBUG3, "row filter evaluates to %s (isnull: %s)",
>          DatumGetBool(ret) ? "true" : "false",
>          isnull ? "true" : "false");
>
>
> ------
> [1] https://www.postgresql.org/message-id/532a18d8-ce90-4444-8570-8a9fcf09f329%40www.fastmail.com
> [2] https://www.postgresql.org/message-id/849ee491-bba3-c0ae-cc25-4fce1c03f105%40enterprisedb.com
> [3] https://www.postgresql.org/message-id/532a18d8-ce90-4444-8570-8a9fcf09f329%40www.fastmail.com
>
> Kind Regards,
> Peter Smith.
> Fujitsu Australia

Attachment

pgsql-hackers by date:

Previous
From: Yugo NAGATA
Date:
Subject: Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Next
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: Failed transaction statistics to measure the logical replication progress