Re: bogus: logical replication rows/cols combinations - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: bogus: logical replication rows/cols combinations |
Date | |
Msg-id | 7fdf7ac3-912b-f450-f9c1-aa239df4ac1a@enterprisedb.com Whole thread Raw |
In response to | Re: bogus: logical replication rows/cols combinations (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: bogus: logical replication rows/cols combinations
Re: bogus: logical replication rows/cols combinations |
List | pgsql-hackers |
On 5/2/22 07:31, Amit Kapila wrote: > On Mon, May 2, 2022 at 3:27 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> On 4/30/22 12:11, Amit Kapila wrote: >>> On Sat, Apr 30, 2022 at 3:01 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >>>> >>>> My proposal is that if users want to define multiple publications, and >>>> their definitions conflict in a way that would behave ridiculously (== >>>> bound to cause data inconsistencies eventually), an error should be >>>> thrown. Maybe we will not be able to catch all bogus cases, but we can >>>> be prepared for the most obvious ones, and patch later when we find >>>> others. >>>> >>> >>> I agree with throwing errors for obvious/known bogus cases but do we >>> want to throw errors or restrict the combining of column lists when >>> row filters are present in all cases? See some examples [1 ] where it >>> may be valid to combine them. >>> >> >> I think there are three challenges: >> >> (a) Deciding what's an obvious bug or an unsupported case (e.g. because >> it's not clear what's the correct behavior / way to merge column lists). >> >> (b) When / where to detect the issue. >> >> (c) Making sure this does not break/prevent existing use cases. >> >> >> As I said before [1], I think the issue stems from essentially allowing >> DML to have different row filters / column lists. So we could forbid >> publications to specify WITH (publish=...) and one of the two features, >> > > I don't think this is feasible for row filters because that would mean > publishing all actions because we have a restriction that all columns > referenced in the row filter expression are part of the REPLICA > IDENTITY index. This restriction is only valid for updates/deletes, so > if we allow all pubactions then this will be imposed on inserts as > well. A similar restriction is there for column lists as well, so I > don't think we can do it there as well. Do you have some idea to > address it? > No, I haven't thought about how exactly to implement this, and I have not thought about how to deal with the replica identity issues. My thoughts were that we'd only really need this for tables with row filters and/or column lists, treating it as a cost of those features. But yeah, it seems annoying. >> or make sure subscription does not combine multiple such publications. >> > > Yeah, or don't allow to define such publications in the first place so > that different subscriptions can't combine them but I guess that might > forbid some useful cases as well where publication may not get > combined with other publications. > But how would you check that? You don't know which publications will be combined by a subscription until you create the subscription, right? >> The second option has the annoying consequence that it makes this >> useless for the "data redaction" use case I described in [2], because >> that relies on combining multiple publications. >> > > True, but as a workaround users can create different subscriptions for > different publications. > Won't that replicate duplicate data, when the row filters re not mutually exclusive? >> Furthermore, what if the publications change after the subscriptions get >> created? Will we be able to detect the error etc.? >> > > I think from that apart from 'Create Subscription', the same check > needs to be added for Alter Subscription ... Refresh, Alter > Subscription ... Enable. > > In the publication side, we need an additional check in Alter > Publication ... SET table variant. One idea is that we get all other > publications for which the corresponding relation is defined. And then > if we find anything which we don't want to allow then we can throw an > error. This will forbid some useful cases as well as mentioned above. > So, the other possibility is to expose all publications for a > walsender, and then we can find the exact set of publications where > the current publication is used with other publications and we can > check only those publications. So, if we have three walsenders > (walsnd1: pub1, pub2; walsnd2 pub2; walsnd3: pub2, pub3) in the system > and we are currently altering publication pub1 then we need to check > only pub3 for any conflicting conditions. Yet another simple way could > be that we don't allow to change column list via Alter Publication ... > Set variant because the other variants anyway need REFRESH publication > which we have covered. > > I think it is tricky to decide what exactly we want to forbid, so, we > may want to follow something simple like if the column list and row > filters for a table are different in the required set of publications > then we treat it as an unsupported case. I think this will prohibit > some useful cases but should probably forbid the cases we are worried > about here. > I don't have a clear idea on what the right tradeoff is :-( Maybe we're digressing a bit from the stuff Alvaro complained about initially. Arguably the existing column list behavior is surprising and would not work with reasonable use cases. So let's fix it. But maybe you're right validating row filters is a step too far. Yes, users may define strange combinations of publications, but is that really an issue we have to solve? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: