Re: Pgoutput not capturing the generated columns - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Pgoutput not capturing the generated columns
Date
Msg-id CAHut+Pvqb5sTna2bOSFSMuqKH-qoZF4odvBFvfi+_zLqHcE3AQ@mail.gmail.com
Whole thread Raw
In response to Re: Pgoutput not capturing the generated columns  (Shlok Kyal <shlok.kyal.oss@gmail.com>)
Responses Re: Pgoutput not capturing the generated columns
List pgsql-hackers
Hi,

Here are some review comments for patch v5-0001.

======
GENERAL G.1

The patch changes HEAD behaviour for PUBLICATION col-lists right? e.g.
maybe before they were always ignored, but now they are not?

OTOH, when 'include_generated_columns' is false then the PUBLICATION
col-list will ignore any generated cols even when they are present in
a PUBLICATION col-list, right?

These kinds of points should be noted in the commit message and in the
(col-list?) documentation.

======
Commit message

General 1a.
IMO the commit message needs some background to say something like:
"Currently generated column values are not replicated because it is
assumed that the corresponding subscriber-side table will generate its
own values for those columns."

~

General 1b.
Somewhere in this commit message, you need to give all the other
special rules --- e.g. the docs says "If the subscriber-side column is
also a generated column then this option has no effect"

~~~

2.
This commit enables support for the 'include_generated_columns' option
in logical replication, allowing the transmission of generated column
information and data alongside regular table changes. This option is
particularly useful for scenarios where applications require access to
generated column values for downstream processing or synchronization.

~

I don't think the sentence "This option is particularly useful..." is
helpful. It seems like just saying "This commit supports option XXX.
This is particularly useful if you want XXX".

~~~

3.
CREATE SUBSCRIPTION test1 connection 'dbname=postgres host=localhost port=9999
'publication pub1;

~

What is this CREATE SUBSCRIPTION for? Shouldn't it have an example of
the new parameter being used in it?

~~~

4.
Currently copy_data option with include_generated_columns option is
not supported. A future patch will remove this limitation.

~

Suggest to single-quote those parameter names for better readability.

~~~

5.
This commit aims to enhance the flexibility and utility of logical
replication by allowing users to include generated column information
in replication streams, paving the way for more robust data
synchronization and processing workflows.

~

IMO this paragraph can be omitted.

======
.../test_decoding/sql/decoding_into_rel.sql

6.
+-- check include-generated-columns option with generated column
+CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
AS (a * 2) STORED);
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '1');
+INSERT INTO gencoltable (a) VALUES (4), (5), (6);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '0');
+DROP TABLE gencoltable;
+

6a.
I felt some additional explicit comments might help the readabilty of
the output file.

e.g.1
-- When 'include-generated=columns' = '1' the generated column 'b'
values will be replicated
SELECT data FROM pg_logical_slot_get_changes...

e.g.2
-- When 'include-generated=columns' = '0' the generated column 'b'
values will not be replicated
SELECT data FROM pg_logical_slot_get_changes...

~~

6b.
Suggest adding one more test case (where 'include-generated=columns'
is not set) to confirm/demonstrate the default behaviour for
replicated generated cols.

======
doc/src/sgml/protocol.sgml

7.
+    <varlistentry>
+     <term><replaceable
class="parameter">include-generated-columns</replaceable></term>
+      <listitem>
+       <para>
+        Boolean option to enable generated columns.
+        The include-generated-columns option controls whether generated
+        columns should be included in the string representation of tuples
+        during logical decoding in PostgreSQL. This allows users to
+        customize the output format based on whether they want to include
+        these columns or not. The default is false.
+       </para>
+      </listitem>
+    </varlistentry>

7a.
It doesn't render properly. e.g. Should not be bold italic (probably
the class is wrong?), because none of the nearby parameters look this
way.

~

7b.
The name here should NOT have hyphens. It needs underscores same as
all other nearby protocol parameters.

~

7c.
The description seems overly verbose.

SUGGESTION
Boolean option to enable generated columns. This option controls
whether generated columns should be included in the string
representation of tuples during logical decoding in PostgreSQL. The
default is false.

======
doc/src/sgml/ref/create_subscription.sgml

8.
+
+       <varlistentry
id="sql-createsubscription-params-with-include-generated-column">
+        <term><literal>include_generated_column</literal>
(<type>boolean</type>)</term>
+        <listitem>
+         <para>
+          Specifies whether the generated columns present in the tables
+          associated with the subscription should be replicated. The default is
+          <literal>false</literal>.
+         </para>

The parameter name should be plural (include_generated_columns).

======
src/backend/commands/subscriptioncmds.c

9.
 #define SUBOPT_ORIGIN 0x00008000
+#define SUBOPT_INCLUDE_GENERATED_COLUMN 0x00010000

Should be plural COLUMNS

~~~

10.
+ else if (IsSet(supported_opts, SUBOPT_INCLUDE_GENERATED_COLUMN) &&
+ strcmp(defel->defname, "include_generated_column") == 0)

The new subscription parameter should be plural ("include_generated_columns").

~~~

11.
+
+ /*
+ * Do additional checking for disallowed combination when copy_data and
+ * include_generated_column are true. COPY of generated columns is
not supported
+ * yet.
+ */
+ if (opts->copy_data && opts->include_generated_column)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: both %s are strings of the form "option = value" */
+ errmsg("%s and %s are mutually exclusive options",
+ "copy_data = true", "include_generated_column = true")));
+ }

/combination/combinations/

The parameter name should be plural in the comment and also in the
error message.

======
src/bin/psql/tab-complete.c

12.
  COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
    "disable_on_error", "enabled", "failover", "origin",
    "password_required", "run_as_owner", "slot_name",
-   "streaming", "synchronous_commit", "two_phase");
+   "streaming", "synchronous_commit", "two_phase","include_generated_columns");

The new param should be added in alphabetical order same as all the others.

======
src/include/catalog/pg_subscription.h

13.
+ bool subincludegeneratedcolumn; /* True if generated columns must be
published */
+

The field name should be plural.

~~~

14.
+ bool includegeneratedcolumn; /* publish generated column data */
 } Subscription;

The field name should be plural.

======
src/include/replication/walreceiver.h

15.
  * prepare time */
  char    *origin; /* Only publish data originating from the
  * specified origin */
+ bool include_generated_column; /* publish generated columns */
  } logical;
  } proto;
 } WalRcvStreamOptions;

~

This new field name should be plural.

======
src/test/subscription/t/011_generated.pl

16.
+my ($cmdret, $stdout, $stderr) = $node_subscriber->psql('postgres', qq(
+ CREATE SUBSCRIPTION sub2 CONNECTION '$publisher_connstr' PUBLICATION
pub2 WITH (include_generated_column = true)
+));
+ok( $stderr =~
+   qr/copy_data = true and include_generated_column = true are
mutually exclusive options/,
+ 'cannot use both include_generated_column and copy_data as true');

Isn't this mutual exclusiveness of options something that could have
been tested in the regress test suite instead of TAP tests? e.g. AFAIK
you won't require a connection to test this case.

~~~

17. Missing test?

IIUC there is a missing test scenario. You can add another subscriber
table TAB3 which *already* has generated cols (e.g. generating
different values to the publisher) so then you can verify they are NOT
overwritten, even when the 'include_generated_cols' is true.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Unexpected results from CALL and AUTOCOMMIT=off
Next
From: Andres Freund
Date:
Subject: Re: Use the same Windows image on both VS and MinGW tasks