Re: Column Filtering in Logical Replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Column Filtering in Logical Replication
Date
Msg-id CAA4eK1K5pkrPT9z5TByUPptExian5c18g6GnfNf9Cr97QdPbjw@mail.gmail.com
Whole thread Raw
In response to Re: Column Filtering in Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Column Filtering in Logical Replication  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On Wed, Mar 9, 2022 at 3:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Mar 7, 2022 at 8:48 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>
> > OK, I reworked this to do the same thing as the row filtering patch.
> >
>
> Thanks, I'll check this.
>

Some assorted comments:
=====================
1. We don't need to send a column list for the old tuple in case of an
update (similar to delete). It is not required to apply a column
filter for those cases because we ensure that RI must be part of the
column list for updates and deletes.
2.
+ /*
+ * Check if all columns referenced in the column filter are part of
+ * the REPLICA IDENTITY index or not.

I think this comment is reverse. The rule we follow here is that
attributes that are part of RI must be there in a specified column
list. This is used at two places in the patch.
3. get_rel_sync_entry()
{
/* XXX is there a danger of memory leak here? beware */
+ oldctx = MemoryContextSwitchTo(CacheMemoryContext);
+ for (int i = 0; i < nelems; i++)
...
}

Similar to the row filter, I think we need to use
entry->cache_expr_cxt to allocate this. There are other usages of
CacheMemoryContext in this part of the code but I think those need to
be also changed and we can do that as a separate patch. If we do the
suggested change then we don't need to separately free columns.
4. I think we don't need the DDL changes in AtExecDropColumn. Instead,
we can change the dependency of columns to NORMAL during publication
commands.
5. There is a reference to check_publication_columns but that function
is removed from the patch.
6.
/*
* If we know everything is replicated and the row filter is invalid
* for update and delete, there is no point to check for other
* publications.
*/
if (pubdesc->pubactions.pubinsert && pubdesc->pubactions.pubupdate &&
pubdesc->pubactions.pubdelete && pubdesc->pubactions.pubtruncate &&
!pubdesc->rf_valid_for_update && !pubdesc->rf_valid_for_delete)
break;

/*
* If we know everything is replicated and the column filter is invalid
* for update and delete, there is no point to check for other
* publications.
*/
if (pubdesc->pubactions.pubinsert && pubdesc->pubactions.pubupdate &&
pubdesc->pubactions.pubdelete && pubdesc->pubactions.pubtruncate &&
!pubdesc->cf_valid_for_update && !pubdesc->cf_valid_for_delete)
break;

Can we combine these two checks?

I feel this patch needs a more thorough review.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Adding CI to our tree
Next
From: Andres Freund
Date:
Subject: Re: Adding CI to our tree