Re: Pgoutput not capturing the generated columns - Mailing list pgsql-hackers

From vignesh C
Subject Re: Pgoutput not capturing the generated columns
Date
Msg-id CALDaNm2q0kgSM0Msg4CHLQviRN8EnjKoc5opZ=fYxAfXNi3Azg@mail.gmail.com
Whole thread Raw
In response to Re: Pgoutput not capturing the generated columns  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Mon, 20 May 2024 at 13:49, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Hi,
>
> On Wed, May 8, 2024 at 4:14 PM Shubham Khanna
> <khannashubham1197@gmail.com> wrote:
> >
> > On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal
> > <dangwalrajendra888@gmail.com> wrote:
> > >
> > > Hi PG Hackers.
> > >
> > > We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated columns.
> > > Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for
trackingsuch feature requests? Any insights or assistance you can provide on this matter would be greatly appreciated. 
> >
> > The attached patch has the changes to support capturing generated
> > column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
> > ‘include_generated_columns’ option is specified, the generated column
> > information and generated column data also will be sent.
>
> As Euler mentioned earlier, I think it's a decision not to replicate
> generated columns because we don't know the target table on the
> subscriber has the same expression and there could be locale issues
> even if it looks the same. I can see that a benefit of this proposal
> would be to save cost to compute generated column values if the user
> wants the target table on the subscriber to have exactly the same data
> as the publisher's one. Are there other benefits or use cases?

I think this will be useful mainly for the use cases where the
publisher has generated columns and the subscriber does not have
generated  columns.
In the case where both the publisher and subscriber have generated
columns, the current patch will overwrite the generated column values
based on the expression for the generated column in the subscriber.

> >
> > Usage from pgoutput plugin:
> > CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
> > (a * 2) STORED);
> > CREATE publication pub1 for all tables;
> > SELECT 'init' FROM pg_create_logical_replication_slot('slot1', 'pgoutput');
> > SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
> > 'proto_version', '1', 'publication_names', 'pub1',
> > 'include_generated_columns', 'true');
> >
> > Usage from test_decoding plugin:
> > SELECT 'init' FROM pg_create_logical_replication_slot('slot2', 'test_decoding');
> > CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
> > (a * 2) STORED);
> > INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> > SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
> > 'include-xids', '0', 'skip-empty-xacts', '1',
> > 'include_generated_columns', '1');
> >
> > Currently it is not supported as a subscription option because table
> > sync for the generated column is not possible as copy command does not
> > support getting data for the generated column. If this feature is
> > required we can remove this limitation from the copy command and then
> > add it as a subscription option later.
> > Thoughts?
>
> I think that if we want to support an option to replicate generated
> columns, the initial tablesync should support it too. Otherwise, we
> end up filling the target columns data with NULL during the initial
> tablesync but with replicated data during the streaming changes.

+1 for supporting initial sync.
Currently copy_data = true and generate_column = true are not
supported, this limitation will be removed in one of the upcoming
patches.

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: Re: POC: GROUP BY optimization
Next
From: Mark Cave-Ayland
Date:
Subject: Re: commitfest.postgresql.org is no longer fit for purpose