Re: row filtering for logical replication - Mailing list pgsql-hackers

From Ajin Cherian
Subject Re: row filtering for logical replication
Date
Msg-id CAFPTHDb7bpkuc4SxaL9B5vEvF2aEi0EOERdrG+xgVeAyMJsF=Q@mail.gmail.com
Whole thread Raw
In response to Re: row filtering for logical replication  ("Euler Taveira" <euler@eulerto.com>)
Responses Re: row filtering for logical replication
List pgsql-hackers
On Wed, Sep 1, 2021 at 9:23 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Sun, Aug 29, 2021, at 11:14 PM, Peter Smith wrote:
>
> Here are the new v26* patches. This is a refactoring of the row-filter
> caches to remove all the logic from the get_rel_sync_entry function
> and delay it until if/when needed in the pgoutput_row_filter function.
> This is now implemented per Amit's suggestion to move all the cache
> code [1]. It is a replacement for the v25* patches.
>
> The make check and TAP subscription tests are all OK. I have repeated
> the performance tests [2] and those results are good too.
>
> v26-0001 <--- v23 (base RF patch)
> v26-0002 <--- ExprState cache mods (refactored row filter caching)
> v26-0002 <--- ExprState cache extra debug logging (temp)
>
> Peter, I'm still reviewing this new cache mechanism. I will provide a feedback
> as soon as I integrate it as part of this recent modification.
>
> I'm attaching a new version that simply including Houzj review [1]. This is
> based on v23.
>
> There has been a discussion about which row should be used by row filter. We
> don't have a unanimous choice, so I think it is prudent to provide a way for
> the user to change it. I suggested in a previous email [2] that a publication
> option should be added. Hence, row filter can be applied to old tuple, new
> tuple, or both. This approach is simpler than using OLD/NEW references (less
> code and avoid validation such as NEW reference for DELETEs and OLD reference
> for INSERTs). I think about a reasonable default value and it seems _new_ tuple
> is a good one because (i) it is always available and (ii) user doesn't have
> to figure out that replication is broken due to a column that is not part
> of replica identity. I'm attaching a POC that implements it. I'm still
> polishing it. Add tests for multiple row filters and integrate Peter's caching
> mechanism [3] are the next steps.
>

Assuming this _new_tuple option is enabled and
1. An UPDATE, where the new_tuple satisfies the row filter, but the
old_tuple did not  (not checked). Since the row filter check passed
but the actual row never existed on the subscriber, would this patch
convert the UPDATE to an INSERT or would this UPDATE be ignored? Based
on the tests that I did, I see that it is ignored.
2. An UPDATE where the new tuple does not satisfy the row filter but
the old_tuple did. Since the new_tuple did not match the row filter,
wouldn't this row now remain divergent on the replica?

Somehow this approach of either new_tuple or old_tuple doesn't seem to
be very fruitful if the user requires that his replica is up-to-date
based on the filter condition. For that, I think you will need to
convert UPDATES to either INSERTS or DELETES if only new_tuple or
old_tuple matches the filter condition but not both matches the filter
condition.

UPDATE
old-row (match)       new-row (no match)  -> DELETE
old-row (no match)  new row (match)       -> INSERT
old-row (match)       new row (match)       -> UPDATE
old-row (no match)  new-row (no match)  -> (drop change)

regards,
Ajin Cherian
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Gather performance analysis
Next
From: Hannu Krosing
Date:
Subject: Re: The Free Space Map: Problems and Opportunities