Re: bogus: logical replication rows/cols combinations - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: bogus: logical replication rows/cols combinations
Date
Msg-id CAA4eK1JXDzOu5jTCYM-TcqE2i3iiFJQoWW_qoVGgZ=vMgEHi4w@mail.gmail.com
Whole thread Raw
In response to Re: bogus: logical replication rows/cols combinations  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: bogus: logical replication rows/cols combinations  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
List pgsql-hackers
On Sat, Apr 30, 2022 at 2:02 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 4/29/22 06:48, Amit Kapila wrote:
> > On Thu, Apr 28, 2022 at 11:00 PM Tomas Vondra
>
> I think such issues due to ALTER of the publication are somewhat
> expected, and I think users will understand they might need to resync
> the subscription or something like that.
>
> A similar example might be just changing the where condition,
>
>   create publication p for table t where (a > 10);
>
> and then
>
>   alter publication p set table t where (a > 15);
>
> If we replicated any rows with (a > 10) and (a <= 15), we'll just stop
> replicating them. But if we re-create the subscription, we end up with a
> different set of rows on the subscriber, omitting rows with (a <= 15).
>
> In principle we'd need to replicate the ALTER somehow, to delete or
> insert the rows that start/stop matching the row filter. It's a bit
> similar to not replicating DDL, perhaps.
>
> But I think the issue I've described is different, because you don't
> have to change the subscriptions at all and you'll still have the
> problem. I mean, imagine doing this:
>
> -- publisher
> create table t (a int primary key, b int);
> create publication p for table t where (a > 10) with (publish='update');
>
> -- subscriber
> create table t (a int primary key, b int);
> create subscription s connection '...' publication p;
>
> -- publisher
> insert into t select i, i from generate_series(1,20) s(i);
> update t set b = b * 10;
>
> -- subscriber
> --> has no rows in "t"
> --> recreate the subscription
> drop subscription s;
> create subscription s connection '...' publication p;
>
> --> now it has all the rows with (a>10), because tablesync ignores
> publication actions
>
>
> The reason why I find this really annoying is that it makes it almost
> impossible to setup two logical replicas that'd be "consistent", unless
> you create them at the same time (= without any writes in between). And
> it's damn difficult to think about the inconsistencies.
>

I understood your case related to the initial sync and it is with or
without rowfilter.

>
> IMHO this all stems from allowing row filters and restricting pubactions
> at the same time (notice this only used a single publication). So maybe
> the best option would be to disallow combining these two features? That
> would ensure the row filter filter is always applied to all actions in a
> consistent manner, preventing all these issues.
>
> Maybe that's not possible - maybe there are valid use cases that would
> need such combination, and you mentioned replica identity might be an
> issue
>

Yes, that is the reason we can't combine the row filters for all pubactions.

> (and maybe requiring RIF with row filters is not desirable).
>
> So maybe we should at least warn against this in the documentation?
>

Yeah, I find this as the most suitable thing to do to address your
concern. I would like to add this information to the 'Initial
Snapshot' page with some examples (both with and without a row
filter).

> >
> > True, I think to some extent we rely on users to define it sanely
> > otherwise currently also it can easily lead to even replication being
> > stuck. This can happen when the user is trying to operate on the same
> > table and define publication/subscription on multiple nodes for it.
> > See [1] where we trying to deal with such a problem.
> >
> > [1] - https://commitfest.postgresql.org/38/3610/
> >
>
> That seems to deal with a circular replication, i.e. two logical
> replication links - a bit like a multi-master. Not sure how is that
> related to the issue we're discussing here?
>

It is not directly related to what we are discussing here but I was
trying to emphasize the point that users need to define the logical
replication via pub/sub sanely otherwise they might see some weird
behaviors like that.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: failures in t/031_recovery_conflict.pl on CI
Next
From: Bharath Rupireddy
Date:
Subject: Progress report removal of temp files and temp relation files using ereport_startup_progress