Re: Column Filtering in Logical Replication - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Column Filtering in Logical Replication
Date
Msg-id 5af7066b-15ac-1989-9afe-4e5dc1d60154@enterprisedb.com
Whole thread Raw
In response to Re: Column Filtering in Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Column Filtering in Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers

On 3/10/22 04:09, Amit Kapila wrote:
> On Wed, Mar 9, 2022 at 3:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Mon, Mar 7, 2022 at 8:48 PM Tomas Vondra
>> <tomas.vondra@enterprisedb.com> wrote:
>>
>>> OK, I reworked this to do the same thing as the row filtering patch.
>>>
>>
>> Thanks, I'll check this.
>>
> 
> Some assorted comments:
> =====================
> 1. We don't need to send a column list for the old tuple in case of an
> update (similar to delete). It is not required to apply a column
> filter for those cases because we ensure that RI must be part of the
> column list for updates and deletes.

I'm not sure which part of the code does this refer to?

> 2.
> + /*
> + * Check if all columns referenced in the column filter are part of
> + * the REPLICA IDENTITY index or not.
> 
> I think this comment is reverse. The rule we follow here is that
> attributes that are part of RI must be there in a specified column
> list. This is used at two places in the patch.

Yeah, you're right. Will fix.

> 3. get_rel_sync_entry()
> {
> /* XXX is there a danger of memory leak here? beware */
> + oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> + for (int i = 0; i < nelems; i++)
> ...
> }
> 
> Similar to the row filter, I think we need to use
> entry->cache_expr_cxt to allocate this. There are other usages of
> CacheMemoryContext in this part of the code but I think those need to
> be also changed and we can do that as a separate patch. If we do the
> suggested change then we don't need to separately free columns.

I agree a shorter-lived context would be better than CacheMemoryContext,
but "expr" seems to indicate it's for the expression only, so maybe we
should rename that. But do we really want a memory context for every
single entry?

> 4. I think we don't need the DDL changes in AtExecDropColumn. Instead,
> we can change the dependency of columns to NORMAL during publication
> commands.

I'll think about that.

> 5. There is a reference to check_publication_columns but that function
> is removed from the patch.

Right, will fix.

> 6.
> /*
> * If we know everything is replicated and the row filter is invalid
> * for update and delete, there is no point to check for other
> * publications.
> */
> if (pubdesc->pubactions.pubinsert && pubdesc->pubactions.pubupdate &&
> pubdesc->pubactions.pubdelete && pubdesc->pubactions.pubtruncate &&
> !pubdesc->rf_valid_for_update && !pubdesc->rf_valid_for_delete)
> break;
> 
> /*
> * If we know everything is replicated and the column filter is invalid
> * for update and delete, there is no point to check for other
> * publications.
> */
> if (pubdesc->pubactions.pubinsert && pubdesc->pubactions.pubupdate &&
> pubdesc->pubactions.pubdelete && pubdesc->pubactions.pubtruncate &&
> !pubdesc->cf_valid_for_update && !pubdesc->cf_valid_for_delete)
> break;
> 
> Can we combine these two checks?
> 

I was worried it'd get too complex / hard to understand, but I'll think
about maybe simplifying the conditions a bit.

> I feel this patch needs a more thorough review.
> 

I won't object to more review, of course.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Column Filtering in Logical Replication
Next
From: Robert Haas
Date:
Subject: Re: role self-revocation