Home > mailing lists

Re: row filtering for logical replication - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: row filtering for logical replication
Date	September 24, 2021 06:09:31
Msg-id	CAA4eK1KrEFzFc42EvdNVpFRE9sWnQq1Gswpm9ewhKGy5vnrbUw@mail.gmail.com Whole thread
In response to	Re: row filtering for logical replication (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses	Re: row filtering for logical replication
List	pgsql-hackers

Tree view

On Thu, Sep 23, 2021 at 6:03 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> 13) turning update into insert
>
> I agree with Ajin Cherian [4] that looking at just old or new row for
> updates is not the right solution, because each option will "break" the
> replica in some case. So I think the goal "keeping the replica in sync"
> is the right perspective, and converting the update to insert/delete if
> needed seems appropriate.
>
> This seems a somewhat similar to what pglogical does, because that may
> also convert updates (although only to inserts, IIRC) when handling
> replication conflicts. The difference is pglogical does all this on the
> subscriber, while this makes the decision on the publisher.
>
> I wonder if this might have some negative consequences, or whether
> "moving" this to downstream would be useful for other purposes in the
> fuure (e.g. it might be reused for handling other conflicts).
>

Apart from additional traffic, I am not sure how will we handle all
the conditions on subscribers, say if the new row doesn't match, how
will subscribers know about this unless we pass row_filter or some
additional information along with tuple. Previously, I have done some
research and shared in one of the emails above that IBM's InfoSphere
Data Replication [1] performs filtering in this way which also
suggests that we won't be off here.

>
>
> 15) pgoutput_row_filter initializing filter
>
> I'm not sure I understand why the filter initialization gets moved from
> get_rel_sync_entry. Presumably, most of what the replication does is
> replicating rows, so I see little point in not initializing this along
> with the rest of the rel_sync_entry.
>

Sorry, IIRC, this has been suggested by me and I thought it was best
to do any expensive computation the first time it is required. I have
shared few cases like in [2] where it would lead to additional cost
without any gain. Unless I am missing something, I don't see any
downside of doing it in a delayed fashion.

[1] - https://www.ibm.com/docs/en/idr/11.4.0?topic=rows-search-conditions
[2] - https://www.postgresql.org/message-id/CAA4eK1JBHo2U2sZemFdJmcwEinByiJVii8wzGCDVMxOLYB3CUw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

pgsql-hackers by date:

From: vignesh C
Date: 24 September 2021, 06:07:58
Subject: Re: Added schema level support for publication.

From: Masahiko Sawada
Date: 24 September 2021, 06:15:08
Subject: Re: Added schema level support for publication.

Re: row filtering for logical replication - Mailing list pgsql-hackers

Previous

Next