RE: row filtering for logical replication - Mailing list pgsql-hackers

From houzj.fnst@fujitsu.com
Subject RE: row filtering for logical replication
Date
Msg-id OS0PR01MB57164BD1F23A83C1601880A994519@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: row filtering for logical replication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses RE: row filtering for logical replication
Re: row filtering for logical replication
List pgsql-hackers
On Mon, Jan 10, 2022 2:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Jan 10, 2022 at 8:41 AM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> >
> > Attach the v61 patch set.
> >
> 
> Few comments:
> ==============
> 1.
> pgoutput_row_filter()
> {
> ..
> +
> + oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> + rfnode = n_filters > 1 ? makeBoolExpr(OR_EXPR, rfnodes[idx], -1) :
> linitial(rfnodes[idx]);
> + entry->exprstate[idx] = pgoutput_row_filter_init_expr(rfnode);
> + MemoryContextSwitchTo(oldctx);
> ..
> }
> 
> rel_sync_cache_relation_cb()
> {
> ..
> + if (entry->exprstate[idx] != NULL)
> + {
> + pfree(entry->exprstate[idx]);
> + entry->exprstate[idx] = NULL;
> + }
> ..
> }
> 
> I think this can leak memory as just freeing 'exprstate' is not
> sufficient. It contains other allocated memory as well like for
> 'steps'. Apart from that we might allocate other memory as well for
> generating expression state. I think it would be better if we can have
> another memory context (say cache_expr_cxt) in RelationSyncEntry and
> allocate it the first time we need it and then reset it instead of
> doing pfree of 'exprstate'. Also, we can free this new context in
> pgoutput_shutdown before destroying RelationSyncCache.
> 2. If we do the above, we can use this new context at all other places
> in the patch where it is using CacheMemoryContext.

Changed.

> 3.
> @@ -1365,6 +1785,7 @@ rel_sync_cache_publication_cb(Datum arg, int
> cacheid, uint32 hashvalue)
>  {
>   HASH_SEQ_STATUS status;
>   RelationSyncEntry *entry;
> + MemoryContext oldctx;
> 
>   /*
>   * We can get here if the plugin was used in SQL interface as the
> @@ -1374,6 +1795,8 @@ rel_sync_cache_publication_cb(Datum arg, int
> cacheid, uint32 hashvalue)
>   if (RelationSyncCache == NULL)
>   return;
> 
> + oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> +
>   /*
>   * There is no way to find which entry in our cache the hash belongs to so
>   * mark the whole cache as invalid.
> @@ -1392,6 +1815,8 @@ rel_sync_cache_publication_cb(Datum arg, int
> cacheid, uint32 hashvalue)
>   entry->pubactions.pubdelete = false;
>   entry->pubactions.pubtruncate = false;
>   }
> +
> + MemoryContextSwitchTo(oldctx);
>  }
> 
> Is there a reason for the above change?

Reverted this change.

> 4.
> +#define SET_NO_FILTER_FOR_CURRENT_PUBACTIONS \
> + if (pub->pubactions.pubinsert) \
> + no_filter[idx_ins] = true; \
> + if (pub->pubactions.pubupdate) \
> + no_filter[idx_upd] = true; \
> + if (pub->pubactions.pubdelete) \
> + no_filter[idx_del] = true
> 
> I don't see the need for this macro and it makes code less readable. I
> think we can instead move this code to a function to avoid duplicate
> code.

I slightly refactor the code in this function to avoid duplicate code.

> 5.
> Multiple publications might have multiple row filters for
> + * this relation. Since row filter usage depends on the DML operation,
> + * there are multiple lists (one for each operation) which row filters
> + * will be appended.
> 
> There seems to be a typo in the above sentence.
> /which row filters/to which row filters

Changed.

> 6.
> + /*
> + * Find if there are any row filters for this relation. If there are,
> + * then prepare the necessary ExprState and cache it in
> + * entry->exprstate.
> + *
> + * NOTE: All publication-table mappings must be checked.
> + *
> + * NOTE: If the relation is a partition and pubviaroot is true, use
> + * the row filter of the topmost partitioned table instead of the row
> + * filter of its own partition.
> + *
> + * NOTE: Multiple publications might have multiple row filters for
> + * this relation. Since row filter usage depends on the DML operation,
> + * there are multiple lists (one for each operation) which row filters
> + * will be appended.
> + *
> + * NOTE: FOR ALL TABLES implies "don't use row filter expression" so
> + * it takes precedence.
> + *
> + * NOTE: ALL TABLES IN SCHEMA implies "don't use row filter
> + * expression" if the schema is the same as the table schema.
> + */
> + foreach(lc, data->publications)
> 
> Let's not add NOTE for each of these points but instead expand the
> first sentence as "Find if there are any row filters for this
> relation. If there are, then prepare the necessary ExprState and cache
> it in entry->exprstate. To build an expression state, we need to
> ensure the following:"

Changed.

Attach the v62 patch set which address the above comments and slightly
adjust the commit message in 0002 patch.

Best regards,
Hou zj

Attachment

pgsql-hackers by date:

Previous
From: Tatsuro Yamada
Date:
Subject: Re: \dP and \dX use ::regclass without "pg_catalog."
Next
From: Amit Kapila
Date:
Subject: Re: Skipping logical replication transactions on subscriber side