Re: Column Filtering in Logical Replication - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Column Filtering in Logical Replication |
Date | |
Msg-id | f987cbd5-89b1-9afb-4c6d-824eae0eec40@enterprisedb.com Whole thread Raw |
In response to | Re: Column Filtering in Logical Replication (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: Column Filtering in Logical Replication
Re: Column Filtering in Logical Replication |
List | pgsql-hackers |
On 3/18/22 15:43, Tomas Vondra wrote: > > > On 3/18/22 06:52, Amit Kapila wrote: >> >> ... >> I think it is in good shape but apart from cleanup, there are issues >> with dependency handling which I have analyzed and reported as one of >> the comments in the email [2]. I was getting some weird behavior >> during my testing due to that. Apart from that still the patch has DDL >> handling code in tablecmds.c which probably is not required. >> Similarly, Shi-San has reported an issue with replica full in her >> email [3]. It is up to you what to do here but it would be good if you >> can once share the patch after fixing these issues so that we can >> re-test/review it. > > Ah, thanks for reminding me - it's hard to keep track of all the issues > in threads as long as this one. > Attached is an updated patch, hopefully addressing these issues. Firstly, I've reverted the changes in tablecmds.c, instead relying on regular dependency behavior. I've also switched from DEPENDENCY_AUTO to DEPENDENCY_NORMAL. This makes the code simpler, and the behavior should be the same as for row filters, which makes it more consistent. As for the SET COLUMNS breaking behaviors, I've decided to drop this feature entirely, for the reasons outlined earlier today. We don't have that for row filters either, etc. This means the dependency issue simply disappears. Without SET COLUMNS, if you want to change the column list you have to remove the table from the subscription, and add it back (with the new column list). Perhaps inconvenient, but the behavior is clearly defined. Maybe we need a more convenient way to tweak column lists, but I'd say we should have the same thing for row filters too. As for the issue reported by Shi-San about replica identity full and column filters, presumably you're referring to this: create table tbl (a int, b int, c int); create publication pub for table tbl (a, b, c); alter table tbl replica identity full; postgres=# delete from tbl; ERROR: cannot delete from table "tbl" DETAIL: Column list used by the publication does not cover the replica identity. I believe not allowing column lists with REPLICA IDENTITY FULL is expected / correct behavior. I mean, for that to work the column list has to always include all columns anyway, so it's pretty pointless. Of course, we might check that the column list contains everything, but considering the list does always have to contain all columns, and it break as soon as you add any columns, it seems reasonable (cheaper) to just require no column lists. I also went through the patch and made the naming more consistent. The comments used both "column filter" and "column list" randomly, and I think the agreement is to use "list" so I adopted that wording. However, while looking at how pgoutput, I realized one thing - for row filters we track them "per operation", depending on which operations are defined for a given publication. Shouldn't we do the same thing for column lists, really? I mean, if there are two publications with different column lists, one for inserts and the other one for updates, isn't it wrong to merge these two column lists? Also, doesn't this mean publish_as_relid should be "per operation" too? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: