Re: bogus: logical replication rows/cols combinations - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: bogus: logical replication rows/cols combinations
Date
Msg-id 45d27a8a-7c7a-88e8-a3db-c7c1d144df5e@enterprisedb.com
Whole thread Raw
In response to Re: bogus: logical replication rows/cols combinations  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: bogus: logical replication rows/cols combinations  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On 4/29/22 06:48, Amit Kapila wrote:
> On Thu, Apr 28, 2022 at 11:00 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> On 4/28/22 14:26, Peter Eisentraut wrote:
>>> On 27.04.22 12:33, Amit Kapila wrote:
>>>
>>> I wonder how we handle the combination of
>>>
>>> pub1: publish=insert WHERE (foo)
>>> pub2: publish=update WHERE (bar)
>>>
>>> I think it would be incorrect if the combination is
>>>
>>> pub1, pub2: publish=insert,update WHERE (foo OR bar).
>>
>> That's a good question, actually. No, we don't combine the publications
>> like this, the row filters are kept "per action".
>>
> 
> Right, and it won't work even if try to combine in this case because
> of replica identity restrictions.
> 
>> But the exact behavior
>> turns out to be rather confusing in this case.
>>
>> (Note: This has nothing to do with column lists.)
>>
>> Consider an example similar to what Alvaro posted earlier:
>>
>>   create table uno (a int primary key, b int, c int);
>>
>>   create publication uno for table uno where (a > 0)
>>     with (publish='insert');
>>
>>   create publication dos for table uno where (a < 0)
>>     with (publish='update');
>>
>> And do this:
>>
>>   insert into uno values (1, 2, 3), (-1, 3, 4)
>>
>> which on the subscriber produces just one row, because (a<0) replicates
>> only updates:
>>
>>    a | b | c
>>   ---+---+---
>>    1 | 2 | 3
>>   (1 row)
>>
>> Now, let's update the (a<0) row.
>>
>>   update uno set a = 2 where a = -1;
>>
>> It might seem reasonable to expect the updated row (2,3,4) to appear on
>> the subscriber, but no - that's not what happens. Because we have (a<0)
>> for UPDATE, and we evaluate this on the old row (matches) and new row
>> (does not match). And pgoutput_row_filter() decides the update needs to
>> be converted to DELETE, despite the old row was not replicated at all.
>>
> 
> Right, but we don't know what previously would have happened maybe the
> user would have altered the publication action after the initial row
> is published in which case this DELETE is required as is shown in the
> example below. We can only make the decision based on the current
> tuple. For example:
> 
> create table uno (a int primary key, b int, c int);
> 
>   create publication uno for table uno where (a > 0)
>     with (publish='insert');
> 
>   create publication dos for table uno where (a < 0)
>     with (publish='insert');
> 
> -- create subscription for both these publications.
> 
> insert into uno values (1, 2, 3), (-1, 3, 4);
> 
> Alter publication dos set (publish='update');
> 
> update uno set a = 2 where a = -1;
> 
> Now, in this case, the old row was replicated and we would need a
> DELETE corresponding to it.
> 

I think such issues due to ALTER of the publication are somewhat
expected, and I think users will understand they might need to resync
the subscription or something like that.

A similar example might be just changing the where condition,

  create publication p for table t where (a > 10);

and then

  alter publication p set table t where (a > 15);

If we replicated any rows with (a > 10) and (a <= 15), we'll just stop
replicating them. But if we re-create the subscription, we end up with a
different set of rows on the subscriber, omitting rows with (a <= 15).

In principle we'd need to replicate the ALTER somehow, to delete or
insert the rows that start/stop matching the row filter. It's a bit
similar to not replicating DDL, perhaps.

But I think the issue I've described is different, because you don't
have to change the subscriptions at all and you'll still have the
problem. I mean, imagine doing this:

-- publisher
create table t (a int primary key, b int);
create publication p for table t where (a > 10) with (publish='update');

-- subscriber
create table t (a int primary key, b int);
create subscription s connection '...' publication p;

-- publisher
insert into t select i, i from generate_series(1,20) s(i);
update t set b = b * 10;

-- subscriber
--> has no rows in "t"
--> recreate the subscription
drop subscription s;
create subscription s connection '...' publication p;

--> now it has all the rows with (a>10), because tablesync ignores
publication actions


The reason why I find this really annoying is that it makes it almost
impossible to setup two logical replicas that'd be "consistent", unless
you create them at the same time (= without any writes in between). And
it's damn difficult to think about the inconsistencies.


IMHO this all stems from allowing row filters and restricting pubactions
at the same time (notice this only used a single publication). So maybe
the best option would be to disallow combining these two features? That
would ensure the row filter filter is always applied to all actions in a
consistent manner, preventing all these issues.

Maybe that's not possible - maybe there are valid use cases that would
need such combination, and you mentioned replica identity might be an
issue (and maybe requiring RIF with row filters is not desirable).

So maybe we should at least warn against this in the documentation?


>> I'm not sure if pgoutput_row_filter() can even make reasonable decisions
>> with such configuration (combination of row filters, actions ...). But
>> it sure seems confusing, because if you just inserted the updated row,
>> it would get replicated.
>>
> 
> True, but that is what the combination of publications suggests. The
> publication that publishes inserts have different criteria than
> updates, so such behavior (a particular row when inserted will be
> replicated but when it came as a result of an update it won't be
> replicated) is expected.
> 
>> Which brings me to a second problem, related to this one. Imagine you
>> create the subscription *after* inserting the two rows. In that case you
>> get this:
>>
>>    a  | b | c
>>   ----+---+---
>>     1 | 2 | 3
>>    -1 | 3 | 4
>>   (2 rows)
>>
>> because tablesync.c ignores which actions is the publication (and thus
>> the rowfilter) defined for.
>>
> 
> Yeah, this is the behavior of tablesync.c with or without rowfilter.
> It ignores publication actions. So, if you update any tuple before
> creation of subscription it will be replicated but the same update
> won't be replicated after initial sync if the publication just
> publishes 'insert'. I think we can't decide which data to copy based
> on publication actions as COPY wouldn't know if a particular row is
> due to a fresh insert or due to an update. In your example, it is
> possible that row (-1, 3, 4) would have been there due to an update.
> 

Right. Which is why I think disallowing these two features (filtering
actions and row filters) might prevent this, because it eliminates this
ambiguity. It would not matter if a row was INSERTed or UPDATEd when
evaluating the row filter.

> 
>> I think it's natural to expect that (INSERT + sync) and (sync + INSERT)
>> produce the same output on the subscriber.
>>
>>
>> I'm not sure we can actually make this perfectly sane with arbitrary
>> combinations of filters and actions. It would probably depend on whether
>> the actions are commutative, associative and stuff like that. But maybe
>> we can come up with restrictions that'd make this sane?
>>
> 
> True, I think to some extent we rely on users to define it sanely
> otherwise currently also it can easily lead to even replication being
> stuck. This can happen when the user is trying to operate on the same
> table and define publication/subscription on multiple nodes for it.
> See [1] where we trying to deal with such a problem.
> 
> [1] - https://commitfest.postgresql.org/38/3610/
> 

That seems to deal with a circular replication, i.e. two logical
replication links - a bit like a multi-master. Not sure how is that
related to the issue we're discussing here?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: fix cost subqueryscan wrong parallel cost
Next
From: Tom Lane
Date:
Subject: Re: failures in t/031_recovery_conflict.pl on CI