Re: row filtering for logical replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: row filtering for logical replication
Date
Msg-id CAA4eK1+m45Xyzx7AUY9TyFnB6CZ7_+_uooPb7WHSpp7UE=YmKg@mail.gmail.com
Whole thread Raw
In response to Re: row filtering for logical replication  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: row filtering for logical replication
List pgsql-hackers
On Thu, Jan 20, 2022 at 7:56 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> > >  Maybe this was meant to be "validate RF
> > > expressions" and return, perhaps, a bitmapset of all invalid columns
> > > referenced?
> >
> > Currently, we stop as soon as we find the first invalid column.
>
> That seems quite strange.  (And above you say "gather as much info as
> possible", so why stop at the first one?)
>

Because that is an error case, so, there doesn't seem to be any
benefit in proceeding further. However, we can build all the required
information by processing all publications (aka gather all
information) and then later give an error if that idea appeals to you
more.

> > >  (What is an invalid column in the first place?)
> >
> > A column that is referenced in the row filter but is not part of
> > Replica Identity.
>
> I do wonder how do these invalid columns reach the table definition in
> the first place.  Shouldn't these be detected at DDL time and prohibited
> from getting into the definition?
>

As mentioned by Peter E [1], there are two ways to deal with this: (a)
The current approach is that the user can set the replica identity
freely, and we decide later based on that what we can replicate (e.g.,
no updates). If we follow the same approach for this patch, we don't
restrict what columns are part of the row filter, but we check what
actions we can replicate based on the row filter. This is what is
currently followed in the patch. (b) Add restrictions during DDL which
is not as straightforward as it looks.

For approach (b), we need to restrict quite a few DDLs like DROP
INDEX/DROP PRIMARY/ALTER REPLICA IDENTITY/ATTACH PARTITION/CREATE
TABLE PARTITION OF/ALTER PUBLICATION SET(publish='update')/ALTER
PUBLICATION SET(publish_via_root), etc.

We need to deal with partition table cases because newly added
partitions automatically become part of publication if any of its
ancestor tables is part of the publication. Now consider the case
where the user needs to use CREATE TABLE PARTITION OF. The problem is
that the user cannot specify the Replica Identity using an index when
creating the table so we can't validate and it will lead to errors
during replication if the parent table is published with a row filter.

[1] - https://www.postgresql.org/message-id/2d6c8b74-bdef-767b-bdb6-29705985ed9c%40enterprisedb.com

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: A test for replay of regression tests
Next
From: Masahiko Sawada
Date:
Subject: Re: Skipping logical replication transactions on subscriber side