Re: Pgoutput not capturing the generated columns - Mailing list pgsql-hackers

From vignesh C
Subject Re: Pgoutput not capturing the generated columns
Date
Msg-id CALDaNm3-qiNuf4Wy2E1HtG=SLBJEFGnh010Fqc8eO=Lyg5+pfQ@mail.gmail.com
Whole thread Raw
In response to Re: Pgoutput not capturing the generated columns  (Shubham Khanna <khannashubham1197@gmail.com>)
Responses Re: Pgoutput not capturing the generated columns
List pgsql-hackers
On Mon, 3 Jun 2024 at 13:03, Shubham Khanna <khannashubham1197@gmail.com> wrote:
>
> On Thu, May 16, 2024 at 11:35 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Here are some review comments for the patch v1-0001.
> >
> > ======
> > GENERAL
> >
> > G.1. Use consistent names
> >
> > It seems to add unnecessary complications by having different names
> > for all the new options, fields and API parameters.
> >
> > e.g. sometimes 'include_generated_columns'
> > e.g. sometimes 'publish_generated_columns'
> >
> > Won't it be better to just use identical names everywhere for
> > everything? I don't mind which one you choose; I just felt you only
> > need one name, not two. This comment overrides everything else in this
> > post so whatever name you choose, make adjustments for all my other
> > review comments as necessary.
>
> I have updated the name to 'include_generated_columns' everywhere in the Patch.
>
> > ======
> >
> > G.2. Is it possible to just use the existing bms?
> >
> > A very large part of this patch is adding more API parameters to
> > delegate the 'publish_generated_columns' flag value down to when it is
> > finally checked and used. e.g.
> >
> > The functions:
> > - logicalrep_write_insert(), logicalrep_write_update(),
> > logicalrep_write_delete()
> > ... are delegating the new parameter 'publish_generated_column' down to:
> > - logicalrep_write_tuple
> >
> > The functions:
> > - logicalrep_write_rel()
> > ... are delegating the new parameter 'publish_generated_column' down to:
> > - logicalrep_write_attrs
> >
> > AFAICT in all these places the API is already passing a "Bitmapset
> > *columns". I was wondering if it might be possible to modify the
> > "Bitmapset *columns" BEFORE any of those functions get called so that
> > the "columns" BMS either does or doesn't include generated cols (as
> > appropriate according to the option).
> >
> > Well, it might not be so simple because there are some NULL BMS
> > considerations also, but I think it would be worth investigating at
> > least, because if indeed you can find some common place (somewhere
> > like pgoutput_change()?) where the columns BMS can be filtered to
> > remove bits for generated cols then it could mean none of those other
> > patch API changes are needed at all -- then the patch would only be
> > 1/2 the size.
>
> I will analyse and reply to this in the next version.
>
> > ======
> > Commit message
> >
> > 1.
> > Now if include_generated_columns option is specified, the generated
> > column information and generated column data also will be sent.
> >
> > Usage from pgoutput plugin:
> > SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
> > 'proto_version', '1', 'publication_names', 'pub1',
> > 'include_generated_columns', 'true');
> >
> > Usage from test_decoding plugin:
> > SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
> > 'include-xids', '0', 'skip-empty-xacts', '1',
> > 'include_generated_columns', '1');
> >
> > ~
> >
> > I think there needs to be more background information given here. This
> > commit message doesn't seem to describe anything about what is the
> > problem and how this patch fixes it. It just jumps straight into
> > giving usages of a 'include_generated_columns' option.
> >
> > It also doesn't say that this is an option that was newly *introduced*
> > by the patch -- it refers to it as though the reader should already
> > know about it.
> >
> > Furthermore, your hacker's post says "Currently it is not supported as
> > a subscription option because table sync for the generated column is
> > not possible as copy command does not support getting data for the
> > generated column. If this feature is required we can remove this
> > limitation from the copy command and then add it as a subscription
> > option later." IMO that all seems like the kind of information that
> > ought to also be mentioned in this commit message.
>
> I have updated the Commit message mentioning the suggested changes.
>
> > ======
> > contrib/test_decoding/sql/ddl.sql
> >
> > 2.
> > +-- check include_generated_columns option with generated column
> > +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
> > AS (a * 2) STORED);
> > +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> > NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> > 'include_generated_columns', '1');
> > +DROP TABLE gencoltable;
> > +
> >
> > 2a.
> > Perhaps you should include both option values to demonstrate the
> > difference in behaviour:
> >
> > 'include_generated_columns', '0'
> > 'include_generated_columns', '1'
>
> Added the other option values to demonstrate the difference in behaviour:
>
> > 2b.
> > I think you maybe need to include more some test combinations where
> > there is and isn't a COLUMN LIST, because I am not 100% sure I
> > understand the current logic/expectations for all combinations.
> >
> > e.g. When the generated column is in a column list but
> > 'publish_generated_columns' is false then what should happen? etc.
> > Also if there are any special rules then those should be mentioned in
> > the commit message.
>
> Test case is added and the same is mentioned in the documentation.
>
> > ======
> > src/backend/replication/logical/proto.c
> >
> > 3.
> > For all the API changes the new parameter name should be plural.
> >
> > /publish_generated_column/publish_generated_columns/
>
> Updated the name to 'include_generated_columns'
>
> > 4. logical_rep_write_tuple:
> >
> > - if (att->attisdropped || att->attgenerated)
> > + if (att->attisdropped)
> >   continue;
> >
> > - if (!column_in_column_list(att->attnum, columns))
> > + if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
> > + continue;
> > +
> > + if (att->attgenerated && !publish_generated_column)
> >   continue;
> > That code seems confusing. Shouldn't the logic be exactly as also in
> > logicalrep_write_attrs()?
> >
> > e.g. Shouldn't they both look like this:
> >
> > if (att->attisdropped)
> >   continue;
> >
> > if (att->attgenerated && !publish_generated_column)
> >   continue;
> >
> > if (!column_in_column_list(att->attnum, columns))
> >   continue;
>
> Fixed.
>
> > ======
> > src/backend/replication/pgoutput/pgoutput.c
> >
> > 5.
> >  static void send_relation_and_attrs(Relation relation, TransactionId xid,
> >   LogicalDecodingContext *ctx,
> > - Bitmapset *columns);
> > + Bitmapset *columns,
> > + bool publish_generated_column);
> >
> > Use plural. /publish_generated_column/publish_generated_columns/
>
> Updated the name to 'include_generated_columns'
>
> > 6. parse_output_parameters
> >
> >   bool origin_option_given = false;
> > + bool generate_column_option_given = false;
> >
> >   data->binary = false;
> >   data->streaming = LOGICALREP_STREAM_OFF;
> >   data->messages = false;
> >   data->two_phase = false;
> > + data->publish_generated_column = false;
> >
> > I think the 1st var should be 'include_generated_columns_option_given'
> > for consistency with the name of the actual option that was given.
>
> Updated the name to 'include_generated_columns_option_given'
>
> > ======
> > src/include/replication/logicalproto.h
> >
> > 7.
> > (Same as a previous review comment)
> >
> > For all the API changes the new parameter name should be plural.
> >
> > /publish_generated_column/publish_generated_columns/
>
> Updated the name to 'include_generated_columns'
>
> > ======
> > src/include/replication/pgoutput.h
> >
> > 8.
> >   bool publish_no_origin;
> > + bool publish_generated_column;
> >  } PGOutputData;
> >
> > /publish_generated_column/publish_generated_columns/
>
> Updated the name to 'include_generated_columns'
>
> The attached Patch contains the suggested changes.

Thanks for the updated patch, few comments:
1) The option name seems wrong here:
In one place include_generated_column is specified and other place
include_generated_columns is specified:

+               else if (IsSet(supported_opts,
SUBOPT_INCLUDE_GENERATED_COLUMN) &&
+                                strcmp(defel->defname,
"include_generated_column") == 0)
+               {
+                       if (IsSet(opts->specified_opts,
SUBOPT_INCLUDE_GENERATED_COLUMN))
+                               errorConflictingDefElem(defel, pstate);
+
+                       opts->specified_opts |= SUBOPT_INCLUDE_GENERATED_COLUMN;
+                       opts->include_generated_column = defGetBoolean(defel);
+               }

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index d453e224d9..e8ff752fd9 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3365,7 +3365,7 @@ psql_completion(const char *text, int start, int end)
                COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
                                          "disable_on_error",
"enabled", "failover", "origin",
                                          "password_required",
"run_as_owner", "slot_name",
-                                         "streaming",
"synchronous_commit", "two_phase");
+                                         "streaming",
"synchronous_commit", "two_phase","include_generated_columns");

2) This small data table need not have a primary key column as it will
create an index and insertion will happen in the index too.
+-- check include-generated-columns option with generated column
+CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
AS (a * 2) STORED);
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '1');

3) Please add a test case for this:
+          set to <literal>false</literal>. If the subscriber-side
column is also a
+          generated column then this option has no effect; the
replicated data will
+          be ignored and the subscriber column will be filled as
normal with the
+          subscriber-side computed or default data.

4) You can use a new style of ereport to remove the brackets around errcode
4.a)
+                       else if (!parse_bool(strVal(elem->arg),
&data->include_generated_columns))
+                               ereport(ERROR,
+
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                                                errmsg("could not
parse value \"%s\" for parameter \"%s\"",
+
strVal(elem->arg), elem->defname)));

4.b) similarly here too:
+               ereport(ERROR,
+                               (errcode(ERRCODE_SYNTAX_ERROR),
+               /*- translator: both %s are strings of the form
"option = value" */
+                                       errmsg("%s and %s are mutually
exclusive options",
+                                               "copy_data = true",
"include_generated_column = true")));

4.c) similarly here too:
+                       if (include_generated_columns_option_given)
+                               ereport(ERROR,
+                                               (errcode(ERRCODE_SYNTAX_ERROR),
+                                                errmsg("conflicting
or redundant options")));

5) These variable names can be changed to keep it smaller, something
like gencol or generatedcol or gencolumn, etc
+++ b/src/include/catalog/pg_subscription.h
@@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
BKI_SHARED_RELATION BKI_ROW
  * slots) in the upstream database are enabled
  * to be synchronized to the standbys. */

+ bool subincludegeneratedcolumn; /* True if generated columns must be
published */
+
 #ifdef CATALOG_VARLEN /* variable-length fields start here */
  /* Connection string to the publisher */
  text subconninfo BKI_FORCE_NOT_NULL;
@@ -157,6 +159,7 @@ typedef struct Subscription
  List    *publications; /* List of publication names to subscribe to */
  char    *origin; /* Only publish data originating from the
  * specified origin */
+ bool includegeneratedcolumn; /* publish generated column data */
 } Subscription;

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: Shubham Khanna
Date:
Subject: Re: Pgoutput not capturing the generated columns
Next
From: Bertrand Drouvot
Date:
Subject: Re: relfilenode statistics