Re: Pgoutput not capturing the generated columns - Mailing list pgsql-hackers

From Shubham Khanna
Subject Re: Pgoutput not capturing the generated columns
Date
Msg-id CAHv8RjJcOsk=y+vJ3y+vXhzR9ZUzUEURvS_90hQW3MNfJ5di7A@mail.gmail.com
Whole thread Raw
In response to Re: Pgoutput not capturing the generated columns  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: Pgoutput not capturing the generated columns
Re: Pgoutput not capturing the generated columns
List pgsql-hackers
On Thu, May 16, 2024 at 11:35 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are some review comments for the patch v1-0001.
>
> ======
> GENERAL
>
> G.1. Use consistent names
>
> It seems to add unnecessary complications by having different names
> for all the new options, fields and API parameters.
>
> e.g. sometimes 'include_generated_columns'
> e.g. sometimes 'publish_generated_columns'
>
> Won't it be better to just use identical names everywhere for
> everything? I don't mind which one you choose; I just felt you only
> need one name, not two. This comment overrides everything else in this
> post so whatever name you choose, make adjustments for all my other
> review comments as necessary.

I have updated the name to 'include_generated_columns' everywhere in the Patch.

> ======
>
> G.2. Is it possible to just use the existing bms?
>
> A very large part of this patch is adding more API parameters to
> delegate the 'publish_generated_columns' flag value down to when it is
> finally checked and used. e.g.
>
> The functions:
> - logicalrep_write_insert(), logicalrep_write_update(),
> logicalrep_write_delete()
> ... are delegating the new parameter 'publish_generated_column' down to:
> - logicalrep_write_tuple
>
> The functions:
> - logicalrep_write_rel()
> ... are delegating the new parameter 'publish_generated_column' down to:
> - logicalrep_write_attrs
>
> AFAICT in all these places the API is already passing a "Bitmapset
> *columns". I was wondering if it might be possible to modify the
> "Bitmapset *columns" BEFORE any of those functions get called so that
> the "columns" BMS either does or doesn't include generated cols (as
> appropriate according to the option).
>
> Well, it might not be so simple because there are some NULL BMS
> considerations also, but I think it would be worth investigating at
> least, because if indeed you can find some common place (somewhere
> like pgoutput_change()?) where the columns BMS can be filtered to
> remove bits for generated cols then it could mean none of those other
> patch API changes are needed at all -- then the patch would only be
> 1/2 the size.

I will analyse and reply to this in the next version.

> ======
> Commit message
>
> 1.
> Now if include_generated_columns option is specified, the generated
> column information and generated column data also will be sent.
>
> Usage from pgoutput plugin:
> SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
> 'proto_version', '1', 'publication_names', 'pub1',
> 'include_generated_columns', 'true');
>
> Usage from test_decoding plugin:
> SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include_generated_columns', '1');
>
> ~
>
> I think there needs to be more background information given here. This
> commit message doesn't seem to describe anything about what is the
> problem and how this patch fixes it. It just jumps straight into
> giving usages of a 'include_generated_columns' option.
>
> It also doesn't say that this is an option that was newly *introduced*
> by the patch -- it refers to it as though the reader should already
> know about it.
>
> Furthermore, your hacker's post says "Currently it is not supported as
> a subscription option because table sync for the generated column is
> not possible as copy command does not support getting data for the
> generated column. If this feature is required we can remove this
> limitation from the copy command and then add it as a subscription
> option later." IMO that all seems like the kind of information that
> ought to also be mentioned in this commit message.

I have updated the Commit message mentioning the suggested changes.

> ======
> contrib/test_decoding/sql/ddl.sql
>
> 2.
> +-- check include_generated_columns option with generated column
> +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
> AS (a * 2) STORED);
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include_generated_columns', '1');
> +DROP TABLE gencoltable;
> +
>
> 2a.
> Perhaps you should include both option values to demonstrate the
> difference in behaviour:
>
> 'include_generated_columns', '0'
> 'include_generated_columns', '1'

Added the other option values to demonstrate the difference in behaviour:

> 2b.
> I think you maybe need to include more some test combinations where
> there is and isn't a COLUMN LIST, because I am not 100% sure I
> understand the current logic/expectations for all combinations.
>
> e.g. When the generated column is in a column list but
> 'publish_generated_columns' is false then what should happen? etc.
> Also if there are any special rules then those should be mentioned in
> the commit message.

Test case is added and the same is mentioned in the documentation.

> ======
> src/backend/replication/logical/proto.c
>
> 3.
> For all the API changes the new parameter name should be plural.
>
> /publish_generated_column/publish_generated_columns/

Updated the name to 'include_generated_columns'

> 4. logical_rep_write_tuple:
>
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
>   continue;
>
> - if (!column_in_column_list(att->attnum, columns))
> + if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
> + continue;
> +
> + if (att->attgenerated && !publish_generated_column)
>   continue;
> That code seems confusing. Shouldn't the logic be exactly as also in
> logicalrep_write_attrs()?
>
> e.g. Shouldn't they both look like this:
>
> if (att->attisdropped)
>   continue;
>
> if (att->attgenerated && !publish_generated_column)
>   continue;
>
> if (!column_in_column_list(att->attnum, columns))
>   continue;

Fixed.

> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 5.
>  static void send_relation_and_attrs(Relation relation, TransactionId xid,
>   LogicalDecodingContext *ctx,
> - Bitmapset *columns);
> + Bitmapset *columns,
> + bool publish_generated_column);
>
> Use plural. /publish_generated_column/publish_generated_columns/

Updated the name to 'include_generated_columns'

> 6. parse_output_parameters
>
>   bool origin_option_given = false;
> + bool generate_column_option_given = false;
>
>   data->binary = false;
>   data->streaming = LOGICALREP_STREAM_OFF;
>   data->messages = false;
>   data->two_phase = false;
> + data->publish_generated_column = false;
>
> I think the 1st var should be 'include_generated_columns_option_given'
> for consistency with the name of the actual option that was given.

Updated the name to 'include_generated_columns_option_given'

> ======
> src/include/replication/logicalproto.h
>
> 7.
> (Same as a previous review comment)
>
> For all the API changes the new parameter name should be plural.
>
> /publish_generated_column/publish_generated_columns/

Updated the name to 'include_generated_columns'

> ======
> src/include/replication/pgoutput.h
>
> 8.
>   bool publish_no_origin;
> + bool publish_generated_column;
>  } PGOutputData;
>
> /publish_generated_column/publish_generated_columns/

Updated the name to 'include_generated_columns'

The attached Patch contains the suggested changes.

Thanks and Regards,
Shubham Khanna.

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Conflict Detection and Resolution
Next
From: jian he
Date:
Subject: Re: [PATCH] New [relation] option engine