Re: row filtering for logical replication - Mailing list pgsql-hackers
From | Ajin Cherian |
---|---|
Subject | Re: row filtering for logical replication |
Date | |
Msg-id | CAFPTHDb7bpkuc4SxaL9B5vEvF2aEi0EOERdrG+xgVeAyMJsF=Q@mail.gmail.com Whole thread Raw |
In response to | Re: row filtering for logical replication ("Euler Taveira" <euler@eulerto.com>) |
Responses |
Re: row filtering for logical replication
|
List | pgsql-hackers |
On Wed, Sep 1, 2021 at 9:23 PM Euler Taveira <euler@eulerto.com> wrote: > > On Sun, Aug 29, 2021, at 11:14 PM, Peter Smith wrote: > > Here are the new v26* patches. This is a refactoring of the row-filter > caches to remove all the logic from the get_rel_sync_entry function > and delay it until if/when needed in the pgoutput_row_filter function. > This is now implemented per Amit's suggestion to move all the cache > code [1]. It is a replacement for the v25* patches. > > The make check and TAP subscription tests are all OK. I have repeated > the performance tests [2] and those results are good too. > > v26-0001 <--- v23 (base RF patch) > v26-0002 <--- ExprState cache mods (refactored row filter caching) > v26-0002 <--- ExprState cache extra debug logging (temp) > > Peter, I'm still reviewing this new cache mechanism. I will provide a feedback > as soon as I integrate it as part of this recent modification. > > I'm attaching a new version that simply including Houzj review [1]. This is > based on v23. > > There has been a discussion about which row should be used by row filter. We > don't have a unanimous choice, so I think it is prudent to provide a way for > the user to change it. I suggested in a previous email [2] that a publication > option should be added. Hence, row filter can be applied to old tuple, new > tuple, or both. This approach is simpler than using OLD/NEW references (less > code and avoid validation such as NEW reference for DELETEs and OLD reference > for INSERTs). I think about a reasonable default value and it seems _new_ tuple > is a good one because (i) it is always available and (ii) user doesn't have > to figure out that replication is broken due to a column that is not part > of replica identity. I'm attaching a POC that implements it. I'm still > polishing it. Add tests for multiple row filters and integrate Peter's caching > mechanism [3] are the next steps. > Assuming this _new_tuple option is enabled and 1. An UPDATE, where the new_tuple satisfies the row filter, but the old_tuple did not (not checked). Since the row filter check passed but the actual row never existed on the subscriber, would this patch convert the UPDATE to an INSERT or would this UPDATE be ignored? Based on the tests that I did, I see that it is ignored. 2. An UPDATE where the new tuple does not satisfy the row filter but the old_tuple did. Since the new_tuple did not match the row filter, wouldn't this row now remain divergent on the replica? Somehow this approach of either new_tuple or old_tuple doesn't seem to be very fruitful if the user requires that his replica is up-to-date based on the filter condition. For that, I think you will need to convert UPDATES to either INSERTS or DELETES if only new_tuple or old_tuple matches the filter condition but not both matches the filter condition. UPDATE old-row (match) new-row (no match) -> DELETE old-row (no match) new row (match) -> INSERT old-row (match) new row (match) -> UPDATE old-row (no match) new-row (no match) -> (drop change) regards, Ajin Cherian Fujitsu Australia
pgsql-hackers by date: