Re: Pgoutput not capturing the generated columns - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Pgoutput not capturing the generated columns
Date
Msg-id CAHut+Pu3HGgwTwGJOauA6sgBtoMNzY70rimEMtg61Te28_M6oQ@mail.gmail.com
Whole thread Raw
In response to Re: Pgoutput not capturing the generated columns  (Shlok Kyal <shlok.kyal.oss@gmail.com>)
Responses Re: Pgoutput not capturing the generated columns
List pgsql-hackers
Hi, here are my review comments for patch v7-0001.

======
1. GENERAL - \dRs+

Shouldn't the new SUBSCRIPTION parameter be exposed via "describe"
(e.g. \dRs+ mysub) the same as the other boolean parameters?

======
Commit message

2.
When 'include_generated_columns' is false then the PUBLICATION
col-list will ignore any generated cols even when they are present in
a PUBLICATION col-list

~

Maybe you don't need to mention "PUBLICATION col-list" twice.

SUGGESTION
When 'include_generated_columns' is false, generated columns are not
replicated, even when present in a PUBLICATION col-list.

~~~

2.
CREATE SUBSCRIPTION test1 connection 'dbname=postgres host=localhost port=9999
'publication pub1;

~

2a.
(I've questioned this one in previous reviews)

What exactly is the purpose of this statement in the commit message?
Was this supposed to demonstrate the usage of the
'include_generated_columns' parameter?

~

2b.
/publication/ PUBLICATION/


~~~

3.
If the subscriber-side column is also a generated column then
thisoption has no effect; the replicated data will be ignored and the
subscriber column will be filled as normal with the subscriber-side
computed or default data.

~

Missing space: /thisoption/this option/

======
.../expected/decoding_into_rel.out

4.
+-- When 'include-generated-columns' is not set
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+                            data
+-------------------------------------------------------------
+ BEGIN
+ table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
+ table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
+ table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
+ COMMIT
+(5 rows)

Why is the default value here equivalent to
'include-generated-columns' = '1' here instead of '0'? The default for
the CREATE SUBSCRIPTION parameter 'include_generated_columns' is
false, and IMO it seems confusing for these 2 defaults to be
different. Here I think it should default to '0' *regardless* of what
the previous functionality might have done -- e.g. this is a "test
decoder" so the parameter should behave sensibly.

======
.../test_decoding/sql/decoding_into_rel.sql

NITPICK - wrong comments.

======
doc/src/sgml/protocol.sgml

5.
+    <varlistentry>
+     <term>include_generated_columns</term>
+      <listitem>
+       <para>
+        Boolean option to enable generated columns. This option controls
+        whether generated columns should be included in the string
+        representation of tuples during logical decoding in PostgreSQL.
+        The default is false.
+       </para>
+      </listitem>
+    </varlistentry>
+

Does the protocol version need to be bumped to support this new option
and should that be mentioned on this page similar to how all other
version values are mentioned?

======
doc/src/sgml/ref/create_subscription.sgml

NITPICK - some missing words/sentence.
NITPICK - some missing <literal> tags.
NITPICK - remove duplicated sentence.
NITPICK - add another <para>.

======
src/backend/commands/subscriptioncmds.c

6.
 #define SUBOPT_ORIGIN 0x00008000
+#define SUBOPT_include_generated_columns 0x00010000

Please use UPPERCASE for consistency with other macros.

======
.../libpqwalreceiver/libpqwalreceiver.c

7.
+ if (options->proto.logical.include_generated_columns &&
+ PQserverVersion(conn->streamConn) >= 170000)
+ appendStringInfoString(&cmd, ", include_generated_columns 'on'");
+

IMO it makes more sense to say 'true' here instead of 'on'. It seems
like this was just cut/paste from the above code (where 'on' was
sensible).

======
src/include/catalog/pg_subscription.h

8.
@@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
BKI_SHARED_RELATION BKI_ROW
  * slots) in the upstream database are enabled
  * to be synchronized to the standbys. */

+ bool subincludegencol; /* True if generated columns must be published */
+

Not fixed as claimed. This field name ought to be plural.

/subincludegencol/subincludegencols/

~~~

9.
  char    *origin; /* Only publish data originating from the
  * specified origin */
+ bool includegencol; /* publish generated column data */
 } Subscription;

Not fixed as claimed. This field name ought to be plural.

/includegencol/includegencols/

======
src/test/subscription/t/031_column_list.pl

10.
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE tab2 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
* 2) STORED)"
+);
+
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE tab3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
+ 10) STORED)"
+);
+
 $node_subscriber->safe_psql('postgres',
  "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
* 22) STORED, c int)"
 );

+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE tab2 (a int PRIMARY KEY, b int)"
+);
+
+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE tab3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
+ 20) STORED)"
+);

IMO the test needs lots more comments to describe what it is doing:

For example, the setup deliberately has made:
* publisher-side tab2 has generated col 'b' but subscriber-side tab2
has NON-gnerated col 'b'.
* publisher-side tab3 has generated col 'b' but subscriber-side tab2
has DIFFERENT COMPUTATION generated col 'b'.

So it will be better to have comments to explain all this instead of
having to figure it out.

~~~

11.
 # data for initial sync

 $node_publisher->safe_psql('postgres',
  "INSERT INTO tab1 (a) VALUES (1), (2), (3)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab2 (a) VALUES (1), (2), (3)");

 $node_publisher->safe_psql('postgres',
- "CREATE PUBLICATION pub1 FOR ALL TABLES");
+ "CREATE PUBLICATION pub1 FOR TABLE tab1");
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION pub2 FOR TABLE tab2");
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION pub3 FOR TABLE tab3");
+

# Wait for initial sync of all subscriptions
$node_subscriber->wait_for_subscription_sync;

my $result = $node_subscriber->safe_psql('postgres', "SELECT a, b FROM tab1");
is( $result, qq(1|22
2|44
3|66), 'generated columns initial sync');

~

IMO (and for completeness) it would be better to INSERT data for all
the tables and alsot to validate that tables tab2 and tab3 has zero
rows replicated. Yes, I know there is 'copy_data=false', but it is
just easier to see all the tables instead of guessing why some are
omitted, and anyway this test case will be needed after the next patch
implements the COPY support for gen-cols.

~~~

12.
+$node_publisher->safe_psql('postgres', "INSERT INTO tab2 VALUES (4), (5)");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
+is( $result, qq(4|8
+5|10), 'generated columns replicated to non-generated column on subscriber');
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)");
+
+$node_publisher->wait_for_catchup('sub3');
+
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
+is( $result, qq(4|24
+5|25), 'generated columns replicated to non-generated column on subscriber');
+

Here also I think there should be explicit comments about what these
cases are testing, what results you are expecting, and why. The
comments will look something like the message parameter of those
safe_psql(...)

e.g.
# confirm generated columns ARE replicated when the subscriber-side
column is not generated

e.g.
# confirm generated columns are NOT replicated when the
subscriber-side column is also generated

======

99.
Please also see my nitpicks attachment patch for various other
cosmetic and docs problems, and apply theseif you agree:
- documentation wording/rendering
- wrong comments
- spacing
- etc.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

pgsql-hackers by date:

Previous
From: Matthias van de Meent
Date:
Subject: Re: RFC: adding pytest as a supported test framework
Next
From: Alvaro Herrera
Date:
Subject: Re: Using LibPq in TAP tests via FFI