Re: Column Filtering in Logical Replication - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Column Filtering in Logical Replication
Date
Msg-id 3536d621-e3fc-c248-c03b-dd7bd0cfc4fc@enterprisedb.com
Whole thread Raw
In response to RE: Column Filtering in Logical Replication  ("wangw.fnst@fujitsu.com" <wangw.fnst@fujitsu.com>)
List pgsql-hackers

On 3/11/22 08:05, wangw.fnst@fujitsu.com wrote:
> On Fri, Mar 11, 2022 at 9:57 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>>
> Hi Tomas,
> Thanks for your patches.
> 
> On Mon, Mar 9, 2022 at 9:53 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>> On Wed, Mar 9, 2022 at 6:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> On Mon, Mar 7, 2022 at 11:18 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>>>> On Fri, Mar 4, 2022 at 6:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>>> Fetching column filter info in tablesync.c is quite expensive. It
>>>>> seems to be using four round-trips to get the complete info whereas
>>>>> for row-filter we use just one round trip. I think we should try to
>>>>> get both row filter and column filter info in just one round trip.
>>>>>
>>>>
>>>> Maybe, but I really don't think this is an issue.
>>>>
>>>
>>> I am not sure but it might matter for small tables. Leaving aside the
>>> performance issue, I think the current way will get the wrong column
>>> list in many cases: (a) The ALL TABLES IN SCHEMA case handling won't
>>> work for partitioned tables when the partitioned table is part of one
>>> schema and partition table is part of another schema. (b) The handling
>>> of partition tables in other cases will fetch incorrect lists as it
>>> tries to fetch the column list of all the partitions in the hierarchy.
>>>
>>> One of my colleagues has even tested these cases both for column
>>> filters and row filters and we find the behavior of row filter is okay
>>> whereas for column filter it uses the wrong column list. We will share
>>> the tests and results with you in a later email. We are trying to
>>> unify the column filter queries with row filter to make their behavior
>>> the same and will share the findings once it is done. I hope if we are
>>> able to achieve this that we will reduce the chances of bugs in this
>>> area.
>>>
>>
>> OK, I'll take a look at that email.
> I tried to get both the column filters and the row filters with one SQL, but
> it failed because I think the result is not easy to parse.
> 
> I noted that we use two SQLs to get column filters in the latest
> patches(20220311). I think maybe we could use one SQL to get column filters to
> reduce network cost. Like the SQL in the attachment.
> 

I'll take a look. But as I said before - I very much prefer SQL that is
easy to understand, and I don't think the one extra round trip is an
issue during tablesync (which is a very rare action).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
Next
From: Tomas Vondra
Date:
Subject: Re: Column Filtering in Logical Replication