Re: row filtering for logical replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: row filtering for logical replication
Date
Msg-id CAA4eK1KrEFzFc42EvdNVpFRE9sWnQq1Gswpm9ewhKGy5vnrbUw@mail.gmail.com
Whole thread Raw
In response to Re: row filtering for logical replication  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: row filtering for logical replication
List pgsql-hackers
On Thu, Sep 23, 2021 at 6:03 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> 13) turning update into insert
>
> I agree with Ajin Cherian [4] that looking at just old or new row for
> updates is not the right solution, because each option will "break" the
> replica in some case. So I think the goal "keeping the replica in sync"
> is the right perspective, and converting the update to insert/delete if
> needed seems appropriate.
>
> This seems a somewhat similar to what pglogical does, because that may
> also convert updates (although only to inserts, IIRC) when handling
> replication conflicts. The difference is pglogical does all this on the
> subscriber, while this makes the decision on the publisher.
>
> I wonder if this might have some negative consequences, or whether
> "moving" this to downstream would be useful for other purposes in the
> fuure (e.g. it might be reused for handling other conflicts).
>

Apart from additional traffic, I am not sure how will we handle all
the conditions on subscribers, say if the new row doesn't match, how
will subscribers know about this unless we pass row_filter or some
additional information along with tuple. Previously, I have done some
research and shared in one of the emails above that IBM's InfoSphere
Data Replication [1] performs filtering in this way which also
suggests that we won't be off here.

>
>
> 15) pgoutput_row_filter initializing filter
>
> I'm not sure I understand why the filter initialization gets moved from
> get_rel_sync_entry. Presumably, most of what the replication does is
> replicating rows, so I see little point in not initializing this along
> with the rest of the rel_sync_entry.
>

Sorry, IIRC, this has been suggested by me and I thought it was best
to do any expensive computation the first time it is required. I have
shared few cases like in [2] where it would lead to additional cost
without any gain. Unless I am missing something, I don't see any
downside of doing it in a delayed fashion.

[1] - https://www.ibm.com/docs/en/idr/11.4.0?topic=rows-search-conditions
[2] - https://www.postgresql.org/message-id/CAA4eK1JBHo2U2sZemFdJmcwEinByiJVii8wzGCDVMxOLYB3CUw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Added schema level support for publication.
Next
From: Masahiko Sawada
Date:
Subject: Re: Added schema level support for publication.