Re: row filtering for logical replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: row filtering for logical replication
Date
Msg-id CAA4eK1+keaZY_pejjt+6OWn-myhyCS=MtJ9o2qQxKrL1_1zXzQ@mail.gmail.com
Whole thread Raw
In response to Re: row filtering for logical replication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: row filtering for logical replication
List pgsql-hackers
On Fri, Sep 24, 2021 at 11:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Sep 24, 2021 at 11:06 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Sep 24, 2021 at 10:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > > 12) misuse of REPLICA IDENTITY
> > > >
> > > > The more I think about this, the more I think we're actually misusing
> > > > REPLICA IDENTITY for something entirely different. The whole purpose of
> > > > RI was to provide a row identifier for the subscriber.
> > > >
> > > > But now we're using it to ensure we have all the necessary columns,
> > > > which is entirely orthogonal to the original purpose. I predict this
> > > > will have rather negative consequences.
> > > >
> > > > People will either switch everything to REPLICA IDENTITY FULL, or create
> > > > bogus unique indexes with extra columns. Which is really silly, because
> > > > it wastes network bandwidth (transfers more data) or local resources
> > > > (CPU and disk space to maintain extra indexes).
> > > >
> > > > IMHO this needs more infrastructure to request extra columns to decode
> > > > (e.g. for the filter expression), and then remove them before sending
> > > > the data to the subscriber.
> > > >
> > >
> > > Yeah, but that would have an additional load on write operations and I
> > > am not sure at this stage but maybe there could be other ways to
> > > extend the current infrastructure wherein we build the snapshots using
> > > which we can access the user tables instead of only catalog tables.
> > > Such enhancements if feasible would be useful not only for allowing
> > > additional column access in row filters but for other purposes like
> > > allowing access to functions that access user tables. I feel we can
> > > extend this later as well seeing the usage and requests. For the first
> > > version, this doesn't sound too limiting to me.
> >
> > I agree with one point from Tomas, that if we bind the row filter with
> > the RI, then if the user has to use the row filter on any column 1)
> > they have to add an unnecessary column to the index 2) Since they have
> > to add it to RI so now we will have to send it over the network as
> > well.  3). We anyway have to WAL log it if it is modified because now
> > we forced users to add some columns to RI because they wanted to use
> > the row filter on that.   Now suppose we remove that limitation and we
> > somehow make these changes orthogonal to RI, i.e. if we have a row
> > filter on some column then we WAL log it, so now the only extra cost
> > we are paying is to just WAL log that column, but the user is not
> > forced to add it to index, not forced to send it over the network.
> >
>
> I am not suggesting adding additional columns to RI just for using
> filter expressions. If most users that intend to publish delete/update
> wanted to use filter conditions apart from replica identity then we
> can later extend this functionality but not sure if the only way to
> accomplish that is to log additional data in WAL.
>

One possibility in this regard could be that we enhance Replica
Identity .. Include (column_list) where all the columns in the include
list won't be sent but I think it is better to postpone such
enhancements for a later version. Like, I suggested above, we might
want to extend our infrastructure in a way where not only this extra
columns request can be accomplished but we should be able to allow
UDF's (where user tables can be accessed) and probably sub-queries as
well.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: a comment in joinrel.c: compute_partition_bounds()
Next
From: Dilip Kumar
Date:
Subject: Re: row filtering for logical replication