Re: row filtering for logical replication - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: row filtering for logical replication |
Date | |
Msg-id | CAA4eK1JTn1NFnXbUFkLfdkwsUHKDA2MY9KiAxZsm=kgU-KgVwQ@mail.gmail.com Whole thread Raw |
In response to | Re: row filtering for logical replication (Greg Nancarrow <gregn4422@gmail.com>) |
List | pgsql-hackers |
On Mon, Dec 20, 2021 at 6:07 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Sat, Dec 18, 2021 at 1:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > I think it's a concern, for such a basic example with only one row, > > > getting unpredictable (and even wrong) replication results, depending > > > upon the order of operations. > > > > > > > I am not sure how we can deduce that. The results are based on current > > and new values of row which is what I think we are expecting here. > > > > In the two simple cases presented, the publisher ends up with the same > single row (2,1) in both cases, but in one of the cases the subscriber > ends up with an extra row (1,1) that the publisher doesn't have. So, > in using a "filter", a new row has been published that the publisher > doesn't have. I'm not so sure a user would be expecting that. Not to > mention that if (1,1) is subsequently INSERTed on the publisher side, > it will result in a duplicate key error on the publisher. > Personally, I feel users need to be careful in defining publications and subscriptions, otherwise, there are various ways "duplicate key error .." kind of issues can arise. Say, you different publications which publish the same table, and then you have different subscriptions on the subscriber which subscribe to those publications. > > > Doesn't this problem result from allowing different WHERE clauses for > > > different pubactions for the same table? > > > My current thoughts are that this shouldn't be allowed, and also WHERE > > > clauses for INSERTs should, like UPDATE and DELETE, be restricted to > > > using only columns covered by the replica identity or primary key. > > > > > > > Hmm, even if we do that one could have removed the insert row filter > > by the time we are evaluating the update. So, we will get the same > > result. I think the behavior in your example is as we expect as per > > the specs defined by the patch and I don't see any problem, in this > > case, w.r.t replication results. Let us see what others think on this? > > > > Here I'm talking about the typical use-case of setting the > row-filtering WHERE clause up-front and not changing it thereafter. > I think that dynamically changing filters after INSERT/UPDATE/DELETE > operations is not the typical use-case, and IMHO it's another thing > entirely (could result in all kinds of unpredictable, random results). > Yeah, that's what I also wanted to say that but users need to carefully define publications/subscriptions, otherwise, with up-front definition also leads to unpredictable results as shared in the explanation above. I feel Hou-San's latest email [1] explains the current rules very well and maybe we should document them in some way to avoid confusion. > Personally I think it would make more sense to: > 1) Disallow different WHERE clauses on the same table, for different pubactions. > 2) If only INSERTs are being published, allow any column in the WHERE > clause, otherwise (as for UPDATE and DELETE) restrict the referenced > columns to be part of the replica identity or primary key. > We can restrict in some way like you are saying or we can even restrict such that we "disallow specifying row filters unless pubactions have all the dml operations and allow row filter to have columns that are part of replica identity or primary key". I feel it is better to provide flexibility as the current patch does and document it to make users aware of the kind of problems that can arise with the wrong usage. [1] - https://www.postgresql.org/message-id/OS0PR01MB57168F4384D50656A4FC2DC5947B9%40OS0PR01MB5716.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.
pgsql-hackers by date: