RE: row filtering for logical replication - Mailing list pgsql-hackers

From houzj.fnst@fujitsu.com
Subject RE: row filtering for logical replication
Date
Msg-id OS0PR01MB5716A7B5A66C225A8499FC0094559@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: row filtering for logical replication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: row filtering for logical replication
Re: row filtering for logical replication
Re: row filtering for logical replication
List pgsql-hackers
On Friday, January 14, 2022 7:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Jan 13, 2022 at 6:46 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> >
> > Attach the V64 patch set which addressed Alvaro, Amit and Peter's comments.
> >
> 
> Few more comments:
> ===================
> 1.
> "SELECT DISTINCT pg_get_expr(pr.prqual, pr.prrelid)"
> + "  FROM pg_publication p"
> + "  LEFT OUTER JOIN pg_publication_rel pr"
> + "       ON (p.oid = pr.prpubid AND pr.prrelid = %u),"
> + "  LATERAL pg_get_publication_tables(p.pubname) GPT"
> + " WHERE GPT.relid = %u"
> + "   AND p.pubname IN ( %s );",
> 
> Use all aliases either in CAPS or in lower case. Seeing the nearby
> code, it is better to use lower case for aliases.
> 
> 2.
> -
> +extern Oid GetTopMostAncestorInPublication(Oid puboid, List *ancestors);
> 
> It seems like a spurious line removal. I think you should declare it
> immediately after GetPubPartitionOptionRelations() to match the order
> of functions as they are in pg_publication.c
> 
> 3.
> + * It is only safe to execute UPDATE/DELETE when all columns referenced in
> + * the row filters from publications which the relation is in are valid -
> + * i.e. when all referenced columns are part of REPLICA IDENTITY, or the
> 
> There is no need for a comma after REPLICA IDENTITY.
> 
> 4.
> + /*
> + * Find what are the cols that are part of the REPLICA IDENTITY.
> 
> Let's change this comment as: "Remember columns that are part of the
> REPLICA IDENTITY."
> 
> 5. The function name rowfilter_column_walker sounds goo generic for
> its purpose. Can we rename it contain_invalid_rfcolumn_walker() and
> move it to publicationcmds.c? Also, can we try to rearrange the code
> in GetRelationPublicationInfo() such that row filter validation
> related code is moved to a new function contain_invalid_rfcolumn()
> which will internally call contain_invalid_rfcolumn_walker(). This new
> functions can also be defined in publicationcmds.c.
> 
> 6.
> + *
> + * If the cached validation result is true, we assume that the cached
> + * publication actions are also valid.
> + */
> +AttrNumber
> +GetRelationPublicationInfo(Relation relation, bool validate_rowfilter)
> 
> Instead of having the above comment, can we have an Assert for valid
> relation->rd_pubactions when we are returning in the function due to
> rd_rfcol_valid. Then, you can add a comment (publication actions must
> be valid) before Assert.
> 
> 7. I think we should have a function check_simple_rowfilter_expr()
> which internally should call rowfilter_walker. See
> check_nested_generated/check_nested_generated_walker. If you agree
> with this, we can probably change the name of row_filter function to
> check_simple_rowfilter_expr_walker().
> 
> 8.
> + if (pubobj->pubtable && pubobj->pubtable->whereClause)
> + ereport(ERROR,
> + errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("WHERE clause for schema not allowed"),
> 
> Will it be better to write the above message as: "WHERE clause not
> allowed for schema"?
> 
> 9.
> --- a/src/backend/replication/logical/proto.c
> +++ b/src/backend/replication/logical/proto.c
> @@ -15,6 +15,7 @@
>  #include "access/sysattr.h"
>  #include "catalog/pg_namespace.h"
>  #include "catalog/pg_type.h"
> +#include "executor/executor.h"
> 
> Do we really need this include now? Please check includes in other
> files as well and remove if anything is not required.
> 
> 10.
>  /*
> - * Get information about remote relation in similar fashion the RELATION
> - * message provides during replication.
> + * Get information about a remote relation, in a similar fashion to how the
> + * RELATION message provides information during replication.
> 
> Why this part of the comment needs to be changed?
> 
> 11.
> /*
>   * For non-tables, we need to do COPY (SELECT ...), but we can't just
> - * do SELECT * because we need to not copy generated columns.
> + * do SELECT * because we need to not copy generated columns.
> 
> I think here comment should say: "For non-tables and tables with row
> filters, we need to do...."
> 
> Apart from the above, I have modified a few comments which you can
> find in the attached patch v64-0002-Modify-comments. Kindly, review
> those and if you are okay with them then merge those into the main
> patch.

Thanks for the comments.
Attach the V65 patch set which addressed the above comments and Peter's comments[1].
I also fixed some typos and removed some unused code.

[1] https://www.postgresql.org/message-id/CAHut%2BPvDKLrkT_nmPXd1cKfi7Cq8dVR7HGEKOyjrMwe65FdZ7Q%40mail.gmail.com

Best regards,
Hou zj



Attachment

pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: TAP test to cover "EndOfLogTLI != replayTLI" case
Next
From: Justin Pryzby
Date:
Subject: Re: Add 64-bit XIDs into PostgreSQL 15