Re: Pgoutput not capturing the generated columns - Mailing list pgsql-hackers

From vignesh C
Subject Re: Pgoutput not capturing the generated columns
Date
Msg-id CALDaNm1c7xPBodHw6LKp9e8hvGVJHcKH=DHK0iXmZuXKPnxZ3Q@mail.gmail.com
Whole thread Raw
In response to Re: Pgoutput not capturing the generated columns  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Tue, 10 Sept 2024 at 09:45, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Sep 10, 2024 at 2:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Sep 9, 2024 at 2:38 AM Shubham Khanna
> > <khannashubham1197@gmail.com> wrote:
> > >
> >
> > Thank you for updating the patches. I have some comments:
> >
> > Do we really need to add this option to test_decoding?
> >
>
> I don't see any reason to have such an option in test_decoding,
> otherwise, we need a separate option for each publication option. I
> guess this is leftover of the previous subscriber-side approach.
>
> > I think it
> > would be good if this improves the test coverage. Otherwise, I'm not
> > sure we need this part. If we want to add it, I think it would be
> > better to have it in a separate patch.
> >
>
> Right.
>
> > ---
> > +         <para>
> > +          If the publisher-side column is also a generated column
> > then this option
> > +          has no effect; the publisher column will be filled as normal with the
> > +          publisher-side computed or default data.
> > +         </para>
> >
> > I don't understand this description. Why does this option have no
> > effect if the publisher-side column is a generated column?
> >
>
> Shouldn't it be subscriber-side?
>
> I have one additional comment:
> /*
> - * If the publication is FOR ALL TABLES then it is treated the same as
> - * if there are no column lists (even if other publications have a
> - * list).
> + * If the publication is FOR ALL TABLES and include generated columns
> + * then it is treated the same as if there are no column lists (even
> + * if other publications have a list).
>   */
> - if (!pub->alltables)
> + if (!pub->alltables || !pub->pubgencolumns)
>
> Why do we treat pubgencolumns at the same level as the FOR ALL TABLES
> case? I thought that if the user has provided a column list, we only
> need to publish the specified columns even when the
> publish_generated_columns option is set.

To handle cases where the publish_generated_columns option isn't
specified for all tables in a publication, the pubgencolumns check
needs to be performed. In such cases, we must create a column list
that excludes generated columns. This process involves:
a) Retrieving all columns for the table and adding them to the column
list. b) Iterating through this column list and removing generated
columns. c) Checking if the remaining column count matches the total
number of columns. If they match, set the relation entry's column list
to NULL, so we don’t need to check columns during data replication. If
they do not match, update the column list to include only the relevant
columns, allowing pgoutput to replicate data for these specific
columns.

This step is necessary because some tables in the publication may
include generated columns.
For tables where publish_generated_columns is set, the column list
will be set to NULL, eliminating the need for a column list check
during data publication.
However, modifying the column list based on publish_generated_columns
is not required, this is addressed in the v31 patch posted by Shubham
at [1].

[1] - https://www.postgresql.org/message-id/CAHv8Rj%2BinrG6EU0rpDJxih8mmYLhCUP6ouTAmMN2RDnT9tE_Gg%40mail.gmail.com

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Why don't we consider explicit Incremental Sort?
Next
From: Alexander Lakhin
Date:
Subject: Re: [HACKERS] make async slave to wait for lsn to be replayed