Re: Column Filtering in Logical Replication - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Column Filtering in Logical Replication
Date
Msg-id 7ce6a43a-830a-931f-d311-a5bfd444b3ae@enterprisedb.com
Whole thread Raw
In response to Re: Column Filtering in Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On 3/9/22 11:03, Amit Kapila wrote:
> ...
>> Hmm, yeah. That seems like a genuine problem - it should not depend on
>> the order of publications in the subscription, I guess.
>>
>> But is it an issue in the patch? Isn't that a pre-existing issue? AFAICS
>> the problem is that we initialize publish_as_relid=relid before the loop
>> over publications, and then just update it. So the first iteration
>> starts with relid, but the second iteration ends with whatever value is
>> set by the first iteration (e.g. the root).
>>
>> So with the example you posted, we start with
>>
>>   publish_as_relid = relid = test_part1
>>
>> but then if the first publication is pubviaroot=true, we update it to
>> parent. And in the second iteration, we fail to find the column filter,
>> because "parent" (publish_as_relid) is not part of the pub2.
>>
>> If we do it in the other order, we leave the publish_as_relid value as
>> is (and find the filter), and then update it in the second iteration
>> (and find the column filter too).
>>
>> Now, this can be resolved by re-calculating the publish_as_relid from
>> scratch in each iteration (start with relid, then maybe update it). But
>> that's just half the story - the issue is there even without column
>> filters. Consider this example:
>>
>> create table t (a int, b int, c int) partition by range (a);
>>
>> create table t_1 partition of t for values from (1) to (10)
>>        partition by range (a);
>>
>> create table t_2 partition of t_1 for values from (1) to (10);
>>
>> create publication pub1 for table t(a)
>>   with (PUBLISH_VIA_PARTITION_ROOT);
>>
>> create publication pub2 for table t_1(a)
>>   with (PUBLISH_VIA_PARTITION_ROOT);
>>
>>
>> Now, is you change subscribe to "pub1, pub2" and "pub2, pub1", we'll end
>> up with different publish_as_relid values (t or t_1). Which seems like
>> the same ambiguity issue.
>>
> 
> I think we should fix this existing problem by always using the
> top-most table as publish_as_relid. Basically, we can check, if the
> existing publish_as_relid is an ancestor of a new rel that is going to
> replace it then we shouldn't replace it.

Right, using the topmost relid from all publications seems like the
correct solution.

> However, I think even if we
> fix the existing problem, we will still have the order problem for the
> column filter patch, and to avoid that instead of fetching column
> filters in the publication loop, we should use the final
> publish_as_relid. I think it will have another problem as well if we
> don't use final publish_as_relid which is that sometimes when we
> should not use any filter (say when pubviaroot is true and that
> publication has root partitioned table which has no filter) as per our
> rule of filters for a partitioned table, it can still use some filter
> from the non-root table.
> 

Yeah, the current behavior is just a consequence of how we determine
publish_as_relid now. If we rework that, we should first determine the
relid and then fetch the filter only for that single rel.

>>
>>> *
>>> Fetching column filter info in tablesync.c is quite expensive. It
>>> seems to be using four round-trips to get the complete info whereas
>>> for row-filter we use just one round trip. I think we should try to
>>> get both row filter and column filter info in just one round trip.
>>>
>>
>> Maybe, but I really don't think this is an issue.
>>
> 
> I am not sure but it might matter for small tables. Leaving aside the
> performance issue, I think the current way will get the wrong column
> list in many cases: (a) The ALL TABLES IN SCHEMA case handling won't
> work for partitioned tables when the partitioned table is part of one
> schema and partition table is part of another schema. (b) The handling
> of partition tables in other cases will fetch incorrect lists as it
> tries to fetch the column list of all the partitions in the hierarchy.
> 
> One of my colleagues has even tested these cases both for column
> filters and row filters and we find the behavior of row filter is okay
> whereas for column filter it uses the wrong column list. We will share
> the tests and results with you in a later email. We are trying to
> unify the column filter queries with row filter to make their behavior
> the same and will share the findings once it is done. I hope if we are
> able to achieve this that we will reduce the chances of bugs in this
> area.
> 

OK, I'll take a look at that email.

> Note: I think the first two patches for tests are not required after
> commit ceb57afd3c.
> 

Right. Will remove.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Ashutosh Sharma
Date:
Subject: Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Next
From: "Daniel Westermann (DWE)"
Date:
Subject: Re: Changing "Hot Standby" to "hot standby"