Re: Column Filtering in Logical Replication - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Column Filtering in Logical Replication
Date
Msg-id d26d24dd-2fab-3c48-0162-2b7f84a9c893@enterprisedb.com
Whole thread Raw
In response to Re: Column Filtering in Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Column Filtering in Logical Replication  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
Re: Column Filtering in Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers

On 3/4/22 11:42, Amit Kapila wrote:
> On Wed, Mar 2, 2022 at 5:43 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> Attached is an updated patch, addressing most of the issues reported so
>> far. There are various minor tweaks, but the main changes are:
> ...
>>
>> 3) checks of column filter vs. publish_via_partition_root and replica
>> identity, following the same logic as the row-filter patch (hopefully,
>> it touches the same places, using the same logic, ...)
>>
>> That means - with "publish_via_partition_root=false" it's not allowed to
>> specify column filters on partitioned tables, only for leaf partitions.
>>
>> And we check column filter vs. replica identity when adding tables to
>> publications, or whenever we change the replica identity.
>>
> 
> This handling is different from row filter work and I see problems
> with it.

By different, I assume you mean I tried to enfoce the rules in ALTER
PUBLICATION and other ALTER commands, instead of when modifying the
data? OK, I reworked this to do the same thing as the row filtering patch.

> The column list validation w.r.t primary key (default replica
> identity) is missing. The handling of column list vs. partitions has
> multiple problems: (a) In attach partition, the patch is just checking
> ancestors for RI validation but what if the table being attached has
> further subpartitions; (b) I think the current locking also seems to
> have problems because it is quite possible that while it validates the
> ancestors here, concurrently someone changes the column list. I think
> it won't be enough to just change the locking mode because with the
> current patch strategy during attach, we will be first taking locks
> for child tables of current partition and then parent tables which can
> pose deadlock hazards.
> > The columns list validation also needs to be done when we change
> publication action.
>
I believe those issues should be solved by adopting the same approach as
the row-filtering patch, right?

> There could be more similar problems which I might have missed. For
> some of these (except for concurrency issues), my colleague Shi-San
> has done testing and the results are below [1]. I feel we should do RI
> vs. column list handling similar to row filter work (at one place) to
> avoid all such hazards and possibly similar handling at various
> places, there is a good chance that we will miss some places or make
> mistakes that are not easy to catch.

I agree if both patches use the same approach, that would reduce the
risk of missing the handling in one place, etc.

> Do let me know if you think it makes sense for me or one of the 
> people who work on row filter patch to try this (make the handling of
> RI checks similar to row filter work) and then we can see if that
> turns out to be a simple way to deal with all these problems?
> 

If someone who is more familiar with the design conclusions from the row
filtering patch, that would be immensely useful. Especially now, when I
reworked it to the same approach as the row filtering patch.


> Some other miscellaneous comments:
> =============================
> *
> In get_rel_sync_entry(), the handling for partitioned tables doesn't
> seem to be correct. It can publish a different set of columns based on
> the order of publications specified in the subscription.
> 
> For example:
> ----
> create table parent (a int, b int, c int) partition by range (a);
> create table test_part1 (like parent);
> alter table parent attach partition test_part1 for values from (1) to (10);
> 
> create publication pub for table parent(a) with (PUBLISH_VIA_PARTITION_ROOT);
> create publication pub2 for table test_part1(b);
> ---
> 
> Now, depending on the order of publications in the list while defining
> subscription, the column list will change
> ----
> create subscription sub connection 'port=10000 dbname=postgres'
> publication pub, pub2;
> 
> For the above, column list will be: (a)
> 
> create subscription sub connection 'port=10000 dbname=postgres'
> publication pub2, pub;
> 
> For this one, the column list will be: (a, b)
> ----
> 
> To avoid this, the column list should be computed based on the final
> publish_as_relid as we are doing for the row filter.
> 

Hmm, yeah. That seems like a genuine problem - it should not depend on
the order of publications in the subscription, I guess.

But is it an issue in the patch? Isn't that a pre-existing issue? AFAICS
the problem is that we initialize publish_as_relid=relid before the loop
over publications, and then just update it. So the first iteration
starts with relid, but the second iteration ends with whatever value is
set by the first iteration (e.g. the root).

So with the example you posted, we start with

  publish_as_relid = relid = test_part1

but then if the first publication is pubviaroot=true, we update it to
parent. And in the second iteration, we fail to find the column filter,
because "parent" (publish_as_relid) is not part of the pub2.

If we do it in the other order, we leave the publish_as_relid value as
is (and find the filter), and then update it in the second iteration
(and find the column filter too).

Now, this can be resolved by re-calculating the publish_as_relid from
scratch in each iteration (start with relid, then maybe update it). But
that's just half the story - the issue is there even without column
filters. Consider this example:

create table t (a int, b int, c int) partition by range (a);

create table t_1 partition of t for values from (1) to (10)
       partition by range (a);

create table t_2 partition of t_1 for values from (1) to (10);

create publication pub1 for table t(a)
  with (PUBLISH_VIA_PARTITION_ROOT);

create publication pub2 for table t_1(a)
  with (PUBLISH_VIA_PARTITION_ROOT);


Now, is you change subscribe to "pub1, pub2" and "pub2, pub1", we'll end
up with different publish_as_relid values (t or t_1). Which seems like
the same ambiguity issue.


> *
> Fetching column filter info in tablesync.c is quite expensive. It
> seems to be using four round-trips to get the complete info whereas
> for row-filter we use just one round trip. I think we should try to
> get both row filter and column filter info in just one round trip.
> 

Maybe, but I really don't think this is an issue. The sync happens only
very rarely, and the rest of the sync (starting workers, copying data)
is likely way more expensive than this.

> [1] -
> Test-1:
> The patch doesn't check when the primary key changes.
> 
> e.g.
> -- publisher --
> create table tbl(a int primary key, b int);
> create publication pub for table tbl(a);
> alter table tbl drop CONSTRAINT tbl_pkey;
> alter table tbl add primary key (b);
> insert into tbl values (1,1);
> 
> -- subscriber --
> create table tbl(a int, b int);
> create subscription sub connection 'port=5432 dbname=postgres' publication pub;
> update tbl set b=1 where a=1;
> alter table tbl add primary key (b);
> 
> -- publisher --
> delete from tbl;
> 
> Column "b" is part of replica identity, but it is filtered, which
> caused an error on the subscriber side.
> 
> ERROR:  publisher did not send replica identity column expected by the
> logical replication target relation "public.tbl"
> CONTEXT:  processing remote data during "DELETE" for replication
> target relation "public.tbl" in transaction 724 at 2022-03-04
> 11:46:16.330892+08
> 
> Test-2: Partitioned table RI w.r.t column list.
> 2.1
> Using "create table ... partition of".
> 
> e.g.
> -- publisher --
> create table parent (a int, b int) partition by range (a);
> create publication pub for table parent(a)
> with(publish_via_partition_root=true);
> create table child partition of parent (primary key (a,b)) default;
> insert into parent values (1,1);
> 
> -- subscriber --
> create table parent (a int, b int) partition by range (a);
> create table child partition of parent default;
> create subscription sub connection 'port=5432 dbname=postgres'
> publication pub; update child set b=1 where a=1;
> alter table parent add primary key (a,b);
> 
> -- publisher --
> delete from parent;
> 
> Column "b" is part of replica identity in the child table, but it is
> filtered, which caused an error on the subscriber side.
> 
> ERROR:  publisher did not send replica identity column expected by the
> logical replication target relation "public.parent"
> CONTEXT:  processing remote data during "DELETE" for replication
> target relation "public.parent" in transaction 723 at 2022-03-04
> 15:15:39.776949+08
> 
> 2.2
> It is likely that a table to be attached also has a partition.
> 
> e.g.
> -- publisher --
> create table t1 (a int, b int) partition by range (a);
> create publication pub for table t1(b) with(publish_via_partition_root=true);
> create table t2 (a int, b int) partition by range (a);
> create table t3 (a int primary key, b int);
> alter table t2 attach partition t3 default;
> alter table t1 attach partition t2 default;
> insert into t1 values (1,1);
> 
> -- subscriber --
> create table t1 (a int, b int) partition by range (a);
> create table t2 (a int, b int) partition by range (a);
> create table t3 (a int, b int);
> alter table t2 attach partition t3 default;
> alter table t1 attach partition t2 default;
> create subscription sub connection 'port=5432 dbname=postgres' publication pub;
> update t1 set a=1 where b=1;
> alter table t1 add primary key (a);
> 
> -- publisher --
> delete from t1;
> 
> Column "a" is part of replica identity in table t3, but t3's ancestor
> t1 is published with column "a" filtered, which caused an error on the
> subscriber side.
> 
> ERROR:  publisher did not send replica identity column expected by the
> logical replication target relation "public.t1"
> CONTEXT:  processing remote data during "DELETE" for replication
> target relation "public.t1" in transaction 726 at 2022-03-04
> 14:40:29.297392+08
> 
> 3.
> Using "alter publication pub set(publish='...'); "
> 
> e.g.
> -- publisher --
> create table tbl(a int primary key, b int); create publication pub for
> table tbl(b) with(publish='insert'); insert into tbl values (1,1);
> 
> -- subscriber --
> create table tbl(a int, b int);
> create subscription sub connection 'port=5432 dbname=postgres' publication pub;
> 
> -- publisher --
> alter publication pub set(publish='insert,update');
> 
> -- subscriber --
> update tbl set a=1 where b=1;
> alter table tbl add primary key (b);
> 
> -- publisher --
> update tbl set a=2 where a=1;
> 
> Updates are replicated, and the column "a" is part of replica
> identity, but it is filtered, which caused an error on the subscriber
> side.
> 
> ERROR:  publisher did not send replica identity column expected by the
> logical replication target relation "public.tbl"
> CONTEXT:  processing remote data during "UPDATE" for replication
> target relation "public.tbl" in transaction 723 at 2022-03-04
> 11:56:33.905843+08
> 

AFAICS these issues should be resolved by the adoption of the row-filter
approach (i.e. it should fail the same way as for row filter).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: role self-revocation
Next
From: Robert Haas
Date:
Subject: Re: role self-revocation