Re: row filtering for logical replication - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: row filtering for logical replication |
Date | |
Msg-id | CAA4eK1+keaZY_pejjt+6OWn-myhyCS=MtJ9o2qQxKrL1_1zXzQ@mail.gmail.com Whole thread Raw |
In response to | Re: row filtering for logical replication (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: row filtering for logical replication
|
List | pgsql-hackers |
On Fri, Sep 24, 2021 at 11:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Sep 24, 2021 at 11:06 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Fri, Sep 24, 2021 at 10:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > 12) misuse of REPLICA IDENTITY > > > > > > > > The more I think about this, the more I think we're actually misusing > > > > REPLICA IDENTITY for something entirely different. The whole purpose of > > > > RI was to provide a row identifier for the subscriber. > > > > > > > > But now we're using it to ensure we have all the necessary columns, > > > > which is entirely orthogonal to the original purpose. I predict this > > > > will have rather negative consequences. > > > > > > > > People will either switch everything to REPLICA IDENTITY FULL, or create > > > > bogus unique indexes with extra columns. Which is really silly, because > > > > it wastes network bandwidth (transfers more data) or local resources > > > > (CPU and disk space to maintain extra indexes). > > > > > > > > IMHO this needs more infrastructure to request extra columns to decode > > > > (e.g. for the filter expression), and then remove them before sending > > > > the data to the subscriber. > > > > > > > > > > Yeah, but that would have an additional load on write operations and I > > > am not sure at this stage but maybe there could be other ways to > > > extend the current infrastructure wherein we build the snapshots using > > > which we can access the user tables instead of only catalog tables. > > > Such enhancements if feasible would be useful not only for allowing > > > additional column access in row filters but for other purposes like > > > allowing access to functions that access user tables. I feel we can > > > extend this later as well seeing the usage and requests. For the first > > > version, this doesn't sound too limiting to me. > > > > I agree with one point from Tomas, that if we bind the row filter with > > the RI, then if the user has to use the row filter on any column 1) > > they have to add an unnecessary column to the index 2) Since they have > > to add it to RI so now we will have to send it over the network as > > well. 3). We anyway have to WAL log it if it is modified because now > > we forced users to add some columns to RI because they wanted to use > > the row filter on that. Now suppose we remove that limitation and we > > somehow make these changes orthogonal to RI, i.e. if we have a row > > filter on some column then we WAL log it, so now the only extra cost > > we are paying is to just WAL log that column, but the user is not > > forced to add it to index, not forced to send it over the network. > > > > I am not suggesting adding additional columns to RI just for using > filter expressions. If most users that intend to publish delete/update > wanted to use filter conditions apart from replica identity then we > can later extend this functionality but not sure if the only way to > accomplish that is to log additional data in WAL. > One possibility in this regard could be that we enhance Replica Identity .. Include (column_list) where all the columns in the include list won't be sent but I think it is better to postpone such enhancements for a later version. Like, I suggested above, we might want to extend our infrastructure in a way where not only this extra columns request can be accomplished but we should be able to allow UDF's (where user tables can be accessed) and probably sub-queries as well. -- With Regards, Amit Kapila.
pgsql-hackers by date: