Re: Pgoutput not capturing the generated columns - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Pgoutput not capturing the generated columns
Date
Msg-id CAA4eK1+7BnXQBDHUsQpDsF4gTenzVe1k3RUxXkxEF=vKRVKajQ@mail.gmail.com
Whole thread Raw
In response to Re: Pgoutput not capturing the generated columns  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Pgoutput not capturing the generated columns
List pgsql-hackers
On Mon, May 20, 2024 at 1:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, May 8, 2024 at 4:14 PM Shubham Khanna
> <khannashubham1197@gmail.com> wrote:
> >
> > On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal
> > <dangwalrajendra888@gmail.com> wrote:
> > >
> > > Hi PG Hackers.
> > >
> > > We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated columns.
> > > Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for
trackingsuch feature requests? Any insights or assistance you can provide on this matter would be greatly appreciated. 
> >
> > The attached patch has the changes to support capturing generated
> > column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
> > ‘include_generated_columns’ option is specified, the generated column
> > information and generated column data also will be sent.
>
> As Euler mentioned earlier, I think it's a decision not to replicate
> generated columns because we don't know the target table on the
> subscriber has the same expression and there could be locale issues
> even if it looks the same. I can see that a benefit of this proposal
> would be to save cost to compute generated column values if the user
> wants the target table on the subscriber to have exactly the same data
> as the publisher's one. Are there other benefits or use cases?
>

The cost is one but the other is the user may not want the data to be
different based on volatile functions like timeofday() or the table on
subscriber won't have the column marked as generated. Now, considering
such use cases, is providing a subscription-level option a good idea
as the patch is doing? I understand that this can serve the purpose
but it could also lead to having the same behavior for all the tables
in all the publications for a subscription which may or may not be
what the user expects. This could lead to some performance overhead
(due to always sending generated columns for all the tables) for cases
where the user needs it only for a subset of tables.

I think we should consider it as a table-level option while defining
publication in some way. A few ideas could be: (a) We ask users to
explicitly mention the generated column in the columns list while
defining publication. This has a drawback such that users need to
specify the column list even when all columns need to be replicated.
(b) We can have some new syntax to indicate the same like: CREATE
PUBLICATION pub1 FOR TABLE t1 INCLUDE GENERATED COLS, t2, t3, t4
INCLUDE ..., t5;. I haven't analyzed the feasibility of this, so there
could be some challenges but we can at least investigate it.

Yet another idea is to keep this as a publication option
(include_generated_columns or publish_generated_columns) similar to
"publish_via_partition_root". Normally, "publish_via_partition_root"
is used when tables on either side have different partition
hierarchies which is somewhat the case here.

Thoughts?

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: Conflict detection and logging in logical replication
Next
From: shveta malik
Date:
Subject: Re: Collect statistics about conflicts in logical replication