Re: Column Filtering in Logical Replication - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Column Filtering in Logical Replication
Date
Msg-id f987cbd5-89b1-9afb-4c6d-824eae0eec40@enterprisedb.com
Whole thread Raw
In response to Re: Column Filtering in Logical Replication  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Column Filtering in Logical Replication
Re: Column Filtering in Logical Replication
List pgsql-hackers

On 3/18/22 15:43, Tomas Vondra wrote:
> 
> 
> On 3/18/22 06:52, Amit Kapila wrote:
>>
>>  ...
>> I think it is in good shape but apart from cleanup, there are issues
>> with dependency handling which I have analyzed and reported as one of
>> the comments in the email [2]. I was getting some weird behavior
>> during my testing due to that. Apart from that still the patch has DDL
>> handling code in tablecmds.c which probably is not required.
>> Similarly, Shi-San has reported an issue with replica full in her
>> email [3]. It is up to you what to do here but it would be good if you
>> can once share the patch after fixing these issues so that we can
>> re-test/review it.
> 
> Ah, thanks for reminding me - it's hard to keep track of all the issues
> in threads as long as this one.
> 

Attached is an updated patch, hopefully addressing these issues.

Firstly, I've reverted the changes in tablecmds.c, instead relying on
regular dependency behavior. I've also switched from DEPENDENCY_AUTO to
DEPENDENCY_NORMAL. This makes the code simpler, and the behavior should
be the same as for row filters, which makes it more consistent.

As for the SET COLUMNS breaking behaviors, I've decided to drop this
feature entirely, for the reasons outlined earlier today. We don't have
that for row filters either, etc. This means the dependency issue simply
disappears.

Without SET COLUMNS, if you want to change the column list you have to
remove the table from the subscription, and add it back (with the new
column list). Perhaps inconvenient, but the behavior is clearly defined.
Maybe we need a more convenient way to tweak column lists, but I'd say
we should have the same thing for row filters too.


As for the issue reported by Shi-San about replica identity full and
column filters, presumably you're referring to this:

  create table tbl (a int, b int, c int);
  create publication pub for table tbl (a, b, c);
  alter table tbl replica identity full;

  postgres=# delete from tbl;
  ERROR:  cannot delete from table "tbl"
  DETAIL:  Column list used by the publication does not cover the
           replica identity.

I believe not allowing column lists with REPLICA IDENTITY FULL is
expected / correct behavior. I mean, for that to work the column list
has to always include all columns anyway, so it's pretty pointless. Of
course, we might check that the column list contains everything, but
considering the list does always have to contain all columns, and it
break as soon as you add any columns, it seems reasonable (cheaper) to
just require no column lists.

I also went through the patch and made the naming more consistent. The
comments used both "column filter" and "column list" randomly, and I
think the agreement is to use "list" so I adopted that wording.


However, while looking at how pgoutput, I realized one thing - for row
filters we track them "per operation", depending on which operations are
defined for a given publication. Shouldn't we do the same thing for
column lists, really?

I mean, if there are two publications with different column lists, one
for inserts and the other one for updates, isn't it wrong to merge these
two column lists?

Also, doesn't this mean publish_as_relid should be "per operation" too?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Out-of-tree certificate interferes ssltest
Next
From: Andrew Dunstan
Date:
Subject: Re: speed up a logical replica setup