Thread: Pgoutput not capturing the generated columns

Pgoutput not capturing the generated columns

From
Rajendra Kumar Dangwal
Date:
Hi PG Users.

We are using Debezium to capture the CDC events into Kafka.
With decoderbufs and wal2json plugins the connector is able to capture the generated columns in the table but not with
pgoutputplugin. 

We tested with the following example:

CREATE TABLE employees (
   id SERIAL PRIMARY KEY,
   first_name VARCHAR(50),
   last_name VARCHAR(50),
   full_name VARCHAR(100) GENERATED ALWAYS AS (first_name || ' ' || last_name) STORED
);

// Inserted few records when the connector was running

Insert into employees (first_name, last_name) VALUES ('ABC' , 'XYZ’);


With decoderbufs and wal2json the connector is able to capture the generated column `full_name` in above example. But
withpgoutput the generated column was not captured.  
Is this a known limitation of pgoutput plugin? If yes, where can we request to add support for this feature?

Thanks.
Rajendra.


Re: Pgoutput not capturing the generated columns

From
"Euler Taveira"
Date:
On Tue, Aug 1, 2023, at 3:47 AM, Rajendra Kumar Dangwal wrote:
With decoderbufs and wal2json the connector is able to capture the generated column `full_name` in above example. But with pgoutput the generated column was not captured. 

wal2json materializes the generated columns before delivering the output. I
decided to materialized the generated columns in the output plugin because the
target consumers expects a complete row.

Is this a known limitation of pgoutput plugin? If yes, where can we request to add support for this feature?

I wouldn't say limitation but a design decision.

The logical replication design decides to compute the generated columns at
subscriber side. It was a wise decision aiming optimization (it doesn't
overload the publisher that is *already* in charge of logical decoding).

Should pgoutput provide a complete row? Probably. If it is an option that
defaults to false and doesn't impact performance.

The request for features should be done in this mailing list.


--
Euler Taveira

Re: Pgoutput not capturing the generated columns

From
Rajendra Kumar Dangwal
Date:

Thanks Euler,

Greatly appreciate your inputs.


> Should pgoutput provide a complete row? Probably. If it is an option that defaults to false and doesn't impact performance.


Yes, it would be great if this feature can be implemented.


> The logical replication design decides to compute the generated columns at subscriber side.


If I understand correctly, this approach involves establishing a function on the subscriber's side that emulates the operation executed to derive the generated column values.

If yes, I see one potential issue where disparities might surface between the values of generated columns on the subscriber's side and those computed within Postgres. This could happen if the generated column's value relies on the current_time function.

Please let me know how can we track the feature requests and the discussions around that.

Thanks,
Rajendra.

Re: Pgoutput not capturing the generated columns

From
Rajendra Kumar Dangwal
Date:
Hi PG Hackers.

We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated columns.
Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for tracking
suchfeature requests? Any insights or assistance you can provide on this matter would be greatly appreciated. 

Many thanks.
Rajendra.




Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal
<dangwalrajendra888@gmail.com> wrote:
>
> Hi PG Hackers.
>
> We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated columns.
> Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for tracking
suchfeature requests? Any insights or assistance you can provide on this matter would be greatly appreciated. 

The attached patch has the changes to support capturing generated
column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
‘include_generated_columns’ option is specified, the generated column
information and generated column data also will be sent.

Usage from pgoutput plugin:
CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
(a * 2) STORED);
CREATE publication pub1 for all tables;
SELECT 'init' FROM pg_create_logical_replication_slot('slot1', 'pgoutput');
SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
'proto_version', '1', 'publication_names', 'pub1',
'include_generated_columns', 'true');

Usage from test_decoding plugin:
SELECT 'init' FROM pg_create_logical_replication_slot('slot2', 'test_decoding');
CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
(a * 2) STORED);
INSERT INTO gencoltable (a) VALUES (1), (2), (3);
SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1',
'include_generated_columns', '1');

Currently it is not supported as a subscription option because table
sync for the generated column is not possible as copy command does not
support getting data for the generated column. If this feature is
required we can remove this limitation from the copy command and then
add it as a subscription option later.
Thoughts?

Thanks and Regards,
Shubham Khanna.

Attachment

RE: Pgoutput not capturing the generated columns

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shubham,

Thanks for creating a patch! Here are high-level comments.

1.
Please document the feature. If it is hard to describe, we should change the API.

2.
Currently, the option is implemented as streaming option. Are there any reasons
to choose the way? Another approach is to implement as slot option, like failover
and temporary.

3.
You said that subscription option is not supported for now. Not sure, is it mean
that logical replication feature cannot be used for generated columns? If so,
the restriction won't be acceptable. If the combination between this and initial
sync is problematic, can't we exclude them in CreateSubscrition and AlterSubscription?
E.g., create_slot option cannot be set if slot_name is NONE.

4.
Regarding the test_decoding plugin, it has already been able to decode the
generated columns. So... as the first place, is the proposed option really needed
for the plugin? Why do you include it?
If you anyway want to add the option, the default value should be on - which keeps
current behavior.

5.
Assuming that the feature become usable used for logical replicaiton. Not sure,
should we change the protocol version at that time? Nodes prior than PG17 may
not want receive values for generated columns. Can we control only by the option?

6. logicalrep_write_tuple()

```
-        if (!column_in_column_list(att->attnum, columns))
+        if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
+            continue;
```

Hmm, does above mean that generated columns are decoded even if they are not in
the column list? If so, why? I think such columns should not be sent.

7.

Some functions refer data->publish_generated_column many times. Can we store
the value to a variable?

Below comments are for test_decoding part, but they may be not needed.

=====

a. pg_decode_startup()

```
+        else if (strcmp(elem->defname, "include_generated_columns") == 0)
```

Other options for test_decoding do not have underscore. It should be
"include-generated-columns".

b. pg_decode_change()

data->include_generated_columns is referred four times in the function.
Can you store the value to a varibable?


c. pg_decode_change()

```
-                                    true);
+                                    true, data->include_generated_columns );
```

Please remove the blank.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Here are some review comments for the patch v1-0001.

======
GENERAL

G.1. Use consistent names

It seems to add unnecessary complications by having different names
for all the new options, fields and API parameters.

e.g. sometimes 'include_generated_columns'
e.g. sometimes 'publish_generated_columns'

Won't it be better to just use identical names everywhere for
everything? I don't mind which one you choose; I just felt you only
need one name, not two. This comment overrides everything else in this
post so whatever name you choose, make adjustments for all my other
review comments as necessary.

======

G.2. Is it possible to just use the existing bms?

A very large part of this patch is adding more API parameters to
delegate the 'publish_generated_columns' flag value down to when it is
finally checked and used. e.g.

The functions:
- logicalrep_write_insert(), logicalrep_write_update(),
logicalrep_write_delete()
... are delegating the new parameter 'publish_generated_column' down to:
- logicalrep_write_tuple

The functions:
- logicalrep_write_rel()
... are delegating the new parameter 'publish_generated_column' down to:
- logicalrep_write_attrs

AFAICT in all these places the API is already passing a "Bitmapset
*columns". I was wondering if it might be possible to modify the
"Bitmapset *columns" BEFORE any of those functions get called so that
the "columns" BMS either does or doesn't include generated cols (as
appropriate according to the option).

Well, it might not be so simple because there are some NULL BMS
considerations also, but I think it would be worth investigating at
least, because if indeed you can find some common place (somewhere
like pgoutput_change()?) where the columns BMS can be filtered to
remove bits for generated cols then it could mean none of those other
patch API changes are needed at all -- then the patch would only be
1/2 the size.

======
Commit message

1.
Now if include_generated_columns option is specified, the generated
column information and generated column data also will be sent.

Usage from pgoutput plugin:
SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
'proto_version', '1', 'publication_names', 'pub1',
'include_generated_columns', 'true');

Usage from test_decoding plugin:
SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1',
'include_generated_columns', '1');

~

I think there needs to be more background information given here. This
commit message doesn't seem to describe anything about what is the
problem and how this patch fixes it. It just jumps straight into
giving usages of a 'include_generated_columns' option.

It also doesn't say that this is an option that was newly *introduced*
by the patch -- it refers to it as though the reader should already
know about it.

Furthermore, your hacker's post says "Currently it is not supported as
a subscription option because table sync for the generated column is
not possible as copy command does not support getting data for the
generated column. If this feature is required we can remove this
limitation from the copy command and then add it as a subscription
option later." IMO that all seems like the kind of information that
ought to also be mentioned in this commit message.

======
contrib/test_decoding/sql/ddl.sql

2.
+-- check include_generated_columns option with generated column
+CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
AS (a * 2) STORED);
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include_generated_columns', '1');
+DROP TABLE gencoltable;
+

2a.
Perhaps you should include both option values to demonstrate the
difference in behaviour:

'include_generated_columns', '0'
'include_generated_columns', '1'

~

2b.
I think you maybe need to include more some test combinations where
there is and isn't a COLUMN LIST, because I am not 100% sure I
understand the current logic/expectations for all combinations.

e.g. When the generated column is in a column list but
'publish_generated_columns' is false then what should happen? etc.
Also if there are any special rules then those should be mentioned in
the commit message.

======
src/backend/replication/logical/proto.c

3.
For all the API changes the new parameter name should be plural.

/publish_generated_column/publish_generated_columns/

~~~

4. logical_rep_write_tuple:

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

- if (!column_in_column_list(att->attnum, columns))
+ if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
+ continue;
+
+ if (att->attgenerated && !publish_generated_column)
  continue;
That code seems confusing. Shouldn't the logic be exactly as also in
logicalrep_write_attrs()?

e.g. Shouldn't they both look like this:

if (att->attisdropped)
  continue;

if (att->attgenerated && !publish_generated_column)
  continue;

if (!column_in_column_list(att->attnum, columns))
  continue;
======
src/backend/replication/pgoutput/pgoutput.c

5.
 static void send_relation_and_attrs(Relation relation, TransactionId xid,
  LogicalDecodingContext *ctx,
- Bitmapset *columns);
+ Bitmapset *columns,
+ bool publish_generated_column);

Use plural. /publish_generated_column/publish_generated_columns/

~~~

6. parse_output_parameters

  bool origin_option_given = false;
+ bool generate_column_option_given = false;

  data->binary = false;
  data->streaming = LOGICALREP_STREAM_OFF;
  data->messages = false;
  data->two_phase = false;
+ data->publish_generated_column = false;

I think the 1st var should be 'include_generated_columns_option_given'
for consistency with the name of the actual option that was given.

======
src/include/replication/logicalproto.h

7.
(Same as a previous review comment)

For all the API changes the new parameter name should be plural.

/publish_generated_column/publish_generated_columns/

======
src/include/replication/pgoutput.h

8.
  bool publish_no_origin;
+ bool publish_generated_column;
 } PGOutputData;

/publish_generated_column/publish_generated_columns/

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
Hi Kuroda-san,

Thanks for reviewing the patch. I have fixed some of the comments
> 2.
> Currently, the option is implemented as streaming option. Are there any reasons
> to choose the way? Another approach is to implement as slot option, like failover
> and temporary.
I think the current approach is appropriate. The options such as
failover and temporary seem like properties of a slot and I think
decoding of generated column should not be slot specific. Also adding
a new option for slot may create an overhead.

> 3.
> You said that subscription option is not supported for now. Not sure, is it mean
> that logical replication feature cannot be used for generated columns? If so,
> the restriction won't be acceptable. If the combination between this and initial
> sync is problematic, can't we exclude them in CreateSubscrition and AlterSubscription?
> E.g., create_slot option cannot be set if slot_name is NONE.
Added an option 'generated_column' for create subscription. Currently
it allow to set 'generated_column' option as true only if 'copy_data'
is set to false.
Also we don't allow user to alter the 'generated_column' option.

> 6. logicalrep_write_tuple()
>
> ```
> -        if (!column_in_column_list(att->attnum, columns))
> +        if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
> +            continue;
> ```
>
> Hmm, does above mean that generated columns are decoded even if they are not in
> the column list? If so, why? I think such columns should not be sent.
Fixed

Thanks and Regards,
Shlok Kyal

Attachment

Re: Pgoutput not capturing the generated columns

From
Masahiko Sawada
Date:
Hi,

On Wed, May 8, 2024 at 4:14 PM Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal
> <dangwalrajendra888@gmail.com> wrote:
> >
> > Hi PG Hackers.
> >
> > We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated columns.
> > Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for tracking
suchfeature requests? Any insights or assistance you can provide on this matter would be greatly appreciated. 
>
> The attached patch has the changes to support capturing generated
> column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
> ‘include_generated_columns’ option is specified, the generated column
> information and generated column data also will be sent.

As Euler mentioned earlier, I think it's a decision not to replicate
generated columns because we don't know the target table on the
subscriber has the same expression and there could be locale issues
even if it looks the same. I can see that a benefit of this proposal
would be to save cost to compute generated column values if the user
wants the target table on the subscriber to have exactly the same data
as the publisher's one. Are there other benefits or use cases?

>
> Usage from pgoutput plugin:
> CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
> (a * 2) STORED);
> CREATE publication pub1 for all tables;
> SELECT 'init' FROM pg_create_logical_replication_slot('slot1', 'pgoutput');
> SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
> 'proto_version', '1', 'publication_names', 'pub1',
> 'include_generated_columns', 'true');
>
> Usage from test_decoding plugin:
> SELECT 'init' FROM pg_create_logical_replication_slot('slot2', 'test_decoding');
> CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
> (a * 2) STORED);
> INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include_generated_columns', '1');
>
> Currently it is not supported as a subscription option because table
> sync for the generated column is not possible as copy command does not
> support getting data for the generated column. If this feature is
> required we can remove this limitation from the copy command and then
> add it as a subscription option later.
> Thoughts?

I think that if we want to support an option to replicate generated
columns, the initial tablesync should support it too. Otherwise, we
end up filling the target columns data with NULL during the initial
tablesync but with replicated data during the streaming changes.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Mon, 20 May 2024 at 13:49, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Hi,
>
> On Wed, May 8, 2024 at 4:14 PM Shubham Khanna
> <khannashubham1197@gmail.com> wrote:
> >
> > On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal
> > <dangwalrajendra888@gmail.com> wrote:
> > >
> > > Hi PG Hackers.
> > >
> > > We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated columns.
> > > Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for
trackingsuch feature requests? Any insights or assistance you can provide on this matter would be greatly appreciated. 
> >
> > The attached patch has the changes to support capturing generated
> > column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
> > ‘include_generated_columns’ option is specified, the generated column
> > information and generated column data also will be sent.
>
> As Euler mentioned earlier, I think it's a decision not to replicate
> generated columns because we don't know the target table on the
> subscriber has the same expression and there could be locale issues
> even if it looks the same. I can see that a benefit of this proposal
> would be to save cost to compute generated column values if the user
> wants the target table on the subscriber to have exactly the same data
> as the publisher's one. Are there other benefits or use cases?

I think this will be useful mainly for the use cases where the
publisher has generated columns and the subscriber does not have
generated  columns.
In the case where both the publisher and subscriber have generated
columns, the current patch will overwrite the generated column values
based on the expression for the generated column in the subscriber.

> >
> > Usage from pgoutput plugin:
> > CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
> > (a * 2) STORED);
> > CREATE publication pub1 for all tables;
> > SELECT 'init' FROM pg_create_logical_replication_slot('slot1', 'pgoutput');
> > SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
> > 'proto_version', '1', 'publication_names', 'pub1',
> > 'include_generated_columns', 'true');
> >
> > Usage from test_decoding plugin:
> > SELECT 'init' FROM pg_create_logical_replication_slot('slot2', 'test_decoding');
> > CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
> > (a * 2) STORED);
> > INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> > SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
> > 'include-xids', '0', 'skip-empty-xacts', '1',
> > 'include_generated_columns', '1');
> >
> > Currently it is not supported as a subscription option because table
> > sync for the generated column is not possible as copy command does not
> > support getting data for the generated column. If this feature is
> > required we can remove this limitation from the copy command and then
> > add it as a subscription option later.
> > Thoughts?
>
> I think that if we want to support an option to replicate generated
> columns, the initial tablesync should support it too. Otherwise, we
> end up filling the target columns data with NULL during the initial
> tablesync but with replicated data during the streaming changes.

+1 for supporting initial sync.
Currently copy_data = true and generate_column = true are not
supported, this limitation will be removed in one of the upcoming
patches.

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi,

AFAICT this v2-0001 patch differences from v1 is mostly about adding
the new CREATE SUBSCRIPTION option. Specifically, I don't think it is
addressing any of my previous review comments for patch v1. [1]. So
these comments below are limited only to the new option code; All my
previous review comments probably still apply.

======
Commit message

1. (General)
The commit message is seriously lacking background explanation to describe:
- What is the current behaviour w.r.t. generated columns
- What is the problem with the current behaviour?
- What exactly is this patch doing to address that problem?

~

2.
New option generated_option is added in create subscription. Now if this
option is specified as 'true' during create subscription, generated
columns in the tables, present in publisher (to which this subscription is
subscribed) can also be replicated.

-

2A.
"generated_option" is not the name of the new option.

~

2B.
"create subscription" stmt should be UPPERCASE; will also be more
readable if the option name is quoted.

~

2C.
Needs more information like under what condition is this option ignored etc.

======
doc/src/sgml/ref/create_subscription.sgml

3.
+       <varlistentry id="sql-createsubscription-params-with-generated-column">
+        <term><literal>generated-column</literal> (<type>boolean</type>)</term>
+        <listitem>
+         <para>
+          Specifies whether the generated columns present in the tables
+          associated with the subscription should be replicated. The default is
+          <literal>false</literal>.
+         </para>
+
+         <para>
+          This parameter can only be set true if copy_data is set to false.
+          This option works fine when a generated column (in
publisher) is replicated to a
+          non-generated column (in subscriber). Else if it is
replicated to a generated
+          column, it will ignore the replicated data and fill the
column with computed or
+          default data.
+         </para>
+        </listitem>
+       </varlistentry>

3A.
There is a typo in the name "generated-column" because we should use
underscores (not hyphens) for the option names.

~

3B.
This it is not a good option name because there is no verb so it
doesn't mean anything to set it true/false -- actually there IS a verb
"generate" but we are not saying generate = true/false, so this name
is also quite confusing.

I think "include_generated_columns" would be much better, but if
others think that name is too long then maybe "include_generated_cols"
or "include_gen_cols" or similar. Of course, whatever if the final
decision should be propagated same thru all the code comments, params,
fields, etc.

~

3C.
copy_data and false should be marked up as <literal> fonts in the sgml

~

3D.

Suggest re-word this part. Don't need to explain when it "works fine".

BEFORE
This option works fine when a generated column (in publisher) is
replicated to a non-generated column (in subscriber). Else if it is
replicated to a generated column, it will ignore the replicated data
and fill the column with computed or default data.

SUGGESTION
If the subscriber-side column is also a generated column then this
option has no effect; the replicated data will be ignored and the
subscriber column will be filled as normal with the subscriber-side
computed or default data.

======
src/backend/commands/subscriptioncmds.c

4. AlterSubscription
    SUBOPT_STREAMING | SUBOPT_DISABLE_ON_ERR |
    SUBOPT_PASSWORD_REQUIRED |
    SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER |
-   SUBOPT_ORIGIN);
+   SUBOPT_ORIGIN | SUBOPT_GENERATED_COLUMN);

Hmm. Is this correct? If ALTER is not allowed (later in this patch
there is a message "toggling generated_column option is not allowed."
then why are we even saying that SUBOPT_GENERATED_COLUMN is a
support_opt for ALTER?

~~~

5.
+ if (IsSet(opts.specified_opts, SUBOPT_GENERATED_COLUMN))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("toggling generated_column option is not allowed.")));
+ }

5A.
I suspect this is not even needed if the 'supported_opt' is fixed per
the previous comment.

~

5B.
But if this message is still needed then I think it should say "ALTER
is not allowed" (not "toggling is not allowed") and also the option
name should be quoted as per the new guidelines for error messages.

======
src/backend/replication/logical/proto.c


6. logicalrep_write_tuple

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

  if (!column_in_column_list(att->attnum, columns))
  continue;

+ if (att->attgenerated && !publish_generated_column)
+

Calling column_in_column_list() might be a more expensive operation
than checking just generated columns flag so maybe reverse the order
and check the generated columns first for a tiny performance gain.

~~

7.
- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

  if (!column_in_column_list(att->attnum, columns))
  continue;

+ if (att->attgenerated && !publish_generated_column)
+ continue;

ditto #6

~~

8. logicalrep_write_attrs

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

  if (!column_in_column_list(att->attnum, columns))
  continue;

+ if (att->attgenerated && !publish_generated_column)
+ continue;
+

ditto #6

~~

9.
- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

  if (!column_in_column_list(att->attnum, columns))
  continue;

+ if (att->attgenerated && !publish_generated_column)
+ continue;

ditto #6

======
src/include/catalog/pg_subscription.h


10. CATALOG

+ bool subgeneratedcolumn; /* True if generated colums must be published */

/colums/columns/

======
src/test/regress/sql/publication.sql

11.
--- error: generated column "d" can't be in list
+-- ok


Maybe change "ok" to say like "ok: generated cols can be in the list too"

======

12.
GENERAL - Missing CREATE SUBSCRIPTION test?
GENERAL - Missing ALTER SUBSCRIPTION test?

How come this patch adds a new CREATE SUBSCRIPTION option but does not
seem to include any test case for that option in either the CREATE
SUBSCRIPTION or ALTER SUBSCRIPTION regression tests?

======
[1] My v1 review -
https://www.postgresql.org/message-id/CAHut+PsuJfcaeg6zst=6PE5uyJv_UxVRHU3ck7W2aHb1uQYKng@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Peter Eisentraut
Date:
On 08.05.24 09:13, Shubham Khanna wrote:
> The attached patch has the changes to support capturing generated
> column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
> ‘include_generated_columns’ option is specified, the generated column
> information and generated column data also will be sent.

It might be worth keeping half an eye on the development of virtual 
generated columns [0].  I think it won't be possible to include those 
into the replication output stream.

I think having an option for including stored generated columns is in 
general ok.


[0]: 
https://www.postgresql.org/message-id/flat/a368248e-69e4-40be-9c07-6c3b5880b0a6@eisentraut.org



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
> Dear Shubham,
>
> Thanks for creating a patch! Here are high-level comments.

> 1.
> Please document the feature. If it is hard to describe, we should change the API.

I have added the feature in the document.

> 4.
> Regarding the test_decoding plugin, it has already been able to decode the
> generated columns. So... as the first place, is the proposed option really needed
> for the plugin? Why do you include it?
> If you anyway want to add the option, the default value should be on - which keeps
> current behavior.

I have made the generated column options as true for test_decoding
plugin so by default we will send generated column data.

> 5.
> Assuming that the feature become usable used for logical replicaiton. Not sure,
> should we change the protocol version at that time? Nodes prior than PG17 may
> not want receive values for generated columns. Can we control only by the option?

I verified the backward compatibility test by using the generated
column option and it worked fine. I think there is no need to make any
further changes.

> 7.
>
> Some functions refer data->publish_generated_column many times. Can we store
> the value to a variable?
>
> Below comments are for test_decoding part, but they may be not needed.
>
> =====
>
> a. pg_decode_startup()
>
> ```
> +        else if (strcmp(elem->defname, "include_generated_columns") == 0)
> ```
>
> Other options for test_decoding do not have underscore. It should be
> "include-generated-columns".
>
> b. pg_decode_change()
>
> data->include_generated_columns is referred four times in the function.
> Can you store the value to a varibable?
>
>
> c. pg_decode_change()
>
> ```
> -                                    true);
> +                                    true, data->include_generated_columns );
> ```
>
> Please remove the blank.

Fixed.
The attached v3 Patch has the changes for the same.

Thanks and Regards,
Shubham Khanna.

Attachment

RE: Pgoutput not capturing the generated columns

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shubham,

Thanks for updating the patch! I checked your patches briefly. Here are my comments.

01. API

Since the option for test_decoding is enabled by default, I think it should be renamed.
E.g., "skip-generated-columns" or something.

02. ddl.sql

```
+-- check include-generated-columns option with generated column
+CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED);
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts',
'1','include-generated-columns', '1');
 
+                            data                             
+-------------------------------------------------------------
+ BEGIN
+ table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
+ table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
+ table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
+ COMMIT
+(5 rows)
```

We should test non-default case, which the generated columns are not generated.

03. ddl.sql

Not sure new tests are in the correct place. Do we have to add new file and move tests to it?
Thought?

04. protocol.sgml

Please keep the format of the sgml file.

05. protocol.sgml

The option is implemented as the streaming option of pgoutput plugin, so they should be
located under "Logical Streaming Replication Parameters" section.

05. AlterSubscription

```
+                if (IsSet(opts.specified_opts, SUBOPT_GENERATED_COLUMN))
+                {
+                    ereport(ERROR,
+                            (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                             errmsg("toggling generated_column option is not allowed.")));
+                }
```

If you don't want to support the option, you can remove SUBOPT_GENERATED_COLUMN
macro from the function. But can you clarify the reason why you do not want?

07. logicalrep_write_tuple

```
-        if (!column_in_column_list(att->attnum, columns))
+        if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
+            continue;
+
+        if (att->attgenerated && !publish_generated_column)
             continue;
```

I think changes in v2 was reverted or wrongly merged.

08. test code

Can you add tests that generated columns are replicated by the logical replication?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Thu, 23 May 2024 at 09:19, Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> > Dear Shubham,
> >
> > Thanks for creating a patch! Here are high-level comments.
>
> > 1.
> > Please document the feature. If it is hard to describe, we should change the API.
>
> I have added the feature in the document.
>
> > 4.
> > Regarding the test_decoding plugin, it has already been able to decode the
> > generated columns. So... as the first place, is the proposed option really needed
> > for the plugin? Why do you include it?
> > If you anyway want to add the option, the default value should be on - which keeps
> > current behavior.
>
> I have made the generated column options as true for test_decoding
> plugin so by default we will send generated column data.
>
> > 5.
> > Assuming that the feature become usable used for logical replicaiton. Not sure,
> > should we change the protocol version at that time? Nodes prior than PG17 may
> > not want receive values for generated columns. Can we control only by the option?
>
> I verified the backward compatibility test by using the generated
> column option and it worked fine. I think there is no need to make any
> further changes.
>
> > 7.
> >
> > Some functions refer data->publish_generated_column many times. Can we store
> > the value to a variable?
> >
> > Below comments are for test_decoding part, but they may be not needed.
> >
> > =====
> >
> > a. pg_decode_startup()
> >
> > ```
> > +        else if (strcmp(elem->defname, "include_generated_columns") == 0)
> > ```
> >
> > Other options for test_decoding do not have underscore. It should be
> > "include-generated-columns".
> >
> > b. pg_decode_change()
> >
> > data->include_generated_columns is referred four times in the function.
> > Can you store the value to a varibable?
> >
> >
> > c. pg_decode_change()
> >
> > ```
> > -                                    true);
> > +                                    true, data->include_generated_columns );
> > ```
> >
> > Please remove the blank.
>
> Fixed.
> The attached v3 Patch has the changes for the same.

Few comments:
1) Since this is removed, tupdesc variable is not required anymore:
+++ b/src/backend/catalog/pg_publication.c
@@ -534,12 +534,6 @@ publication_translate_columns(Relation targetrel,
List *columns,
                                        errmsg("cannot use system
column \"%s\" in publication column list",
                                                   colname));

-               if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
-                       ereport(ERROR,
-
errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
-                                       errmsg("cannot use generated
column \"%s\" in publication column list",
-                                                  colname));

2) In test_decoding include_generated_columns option is used:
+               else if (strcmp(elem->defname,
"include_generated_columns") == 0)
+               {
+                       if (elem->arg == NULL)
+                               continue;
+                       else if (!parse_bool(strVal(elem->arg),
&data->include_generated_columns))
+                               ereport(ERROR,
+
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                                                errmsg("could not
parse value \"%s\" for parameter \"%s\"",
+
strVal(elem->arg), elem->defname)));
+               }

In subscription we have used generated_column, we can try to use the
same option in both places:
+               else if (IsSet(supported_opts, SUBOPT_GENERATED_COLUMN) &&
+                                strcmp(defel->defname,
"generated_column") == 0)
+               {
+                       if (IsSet(opts->specified_opts,
SUBOPT_GENERATED_COLUMN))
+                               errorConflictingDefElem(defel, pstate);
+
+                       opts->specified_opts |= SUBOPT_GENERATED_COLUMN;
+                       opts->generated_column = defGetBoolean(defel);
+               }

3) Tab completion can be added for create subscription to include
generated_column option

4) There are few whitespace issues while applying the patch, check for
git diff --check

5) Add few tests for the new option added

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi,

Here are some review comments for the patch v3-0001.

I don't think v3 addressed any of my previous review comments for
patches v1 and v2. [1][2]

So the comments below are limited only to the new code (i.e. the v3
versus v2 differences). Meanwhile, all my previous review comments may
still apply.

======
GENERAL

The patch applied gives whitespace warnings:

[postgres@CentOS7-x64 oss_postgres_misc]$ git apply
../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch
../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:150:
trailing whitespace.

../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:202:
trailing whitespace.

../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:730:
trailing whitespace.
warning: 3 lines add whitespace errors.

======
contrib/test_decoding/test_decoding.c

1. pg_decode_change

  MemoryContext old;
+ bool include_generated_columns;
+

I'm not really convinced this variable saves any code.

======
doc/src/sgml/protocol.sgml

2.
+        <varlistentry>
+         <term><replaceable
class="parameter">include-generated-columns</replaceable></term>
+         <listitem>
+        <para>
+        The include-generated-columns option controls whether
generated columns should be included in the string representation of
tuples during logical decoding in PostgreSQL. This allows users to
customize the output format based on whether they want to include
these columns or not.
+         </para>
+         </listitem>
+         </varlistentry>

2a.
Something is not correct when this name has hyphens and all the nearby
parameter names do not. Shouldn't it be all uppercase like the other
boolean parameter?

~

2b.
Text in the SGML file should be wrapped properly.

~

2c.
IMO the comment can be more terse and it also needs to specify that it
is a boolean type, and what is the default value if not passed.

SUGGESTION

INCLUDE_GENERATED_COLUMNS [ boolean ]

If true, then generated columns should be included in the string
representation of tuples during logical decoding in PostgreSQL. The
default is false.

======
src/backend/replication/logical/proto.c

3. logicalrep_write_tuple

- if (!column_in_column_list(att->attnum, columns))
+ if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
+ continue;
+
+ if (att->attgenerated && !publish_generated_column)
  continue;

3a.
This code seems overcomplicated checking the same flag multiple times.

SUGGESTION
if (att->attgenerated)
{
  if (!publish_generated_column)
    continue;
}
else
{
  if (!column_in_column_list(att->attnum, columns))
    continue;
}

~

3b.
The same logic occurs several times in logicalrep_write_tuple

~~~

4. logicalrep_write_attrs

  if (!column_in_column_list(att->attnum, columns))
  continue;

+ if (att->attgenerated && !publish_generated_column)
+ continue;
+

Shouldn't these code fragments (2x in this function) look the same as
in logicalrep_write_tuple? See the above review comments.

======
src/backend/replication/pgoutput/pgoutput.c

5. maybe_send_schema

  TransactionId topxid = InvalidTransactionId;
+ bool publish_generated_column = data->publish_generated_column;

I'm not convinced this saves any code, and anyway, it is not
consistent with other fields in this function that are not extracted
to another variable (e.g. data->streaming).

~~~

6. pgoutput_change
-
+ bool publish_generated_column = data->publish_generated_column;
+

I'm not convinced this saves any code, and anyway, it is not
consistent with other fields in this function that are not extracted
to another variable (e.g. data->binary).

======
[1] My v1 review -
https://www.postgresql.org/message-id/CAHut+PsuJfcaeg6zst=6PE5uyJv_UxVRHU3ck7W2aHb1uQYKng@mail.gmail.com
[2] My v2 review -
https://www.postgresql.org/message-id/CAHut%2BPv4RpOsUgkEaXDX%3DW2rhHAsJLiMWdUrUGZOcoRHuWj5%2BQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Thu, May 16, 2024 at 11:35 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are some review comments for the patch v1-0001.
>
> ======
> GENERAL
>
> G.1. Use consistent names
>
> It seems to add unnecessary complications by having different names
> for all the new options, fields and API parameters.
>
> e.g. sometimes 'include_generated_columns'
> e.g. sometimes 'publish_generated_columns'
>
> Won't it be better to just use identical names everywhere for
> everything? I don't mind which one you choose; I just felt you only
> need one name, not two. This comment overrides everything else in this
> post so whatever name you choose, make adjustments for all my other
> review comments as necessary.

I have updated the name to 'include_generated_columns' everywhere in the Patch.

> ======
>
> G.2. Is it possible to just use the existing bms?
>
> A very large part of this patch is adding more API parameters to
> delegate the 'publish_generated_columns' flag value down to when it is
> finally checked and used. e.g.
>
> The functions:
> - logicalrep_write_insert(), logicalrep_write_update(),
> logicalrep_write_delete()
> ... are delegating the new parameter 'publish_generated_column' down to:
> - logicalrep_write_tuple
>
> The functions:
> - logicalrep_write_rel()
> ... are delegating the new parameter 'publish_generated_column' down to:
> - logicalrep_write_attrs
>
> AFAICT in all these places the API is already passing a "Bitmapset
> *columns". I was wondering if it might be possible to modify the
> "Bitmapset *columns" BEFORE any of those functions get called so that
> the "columns" BMS either does or doesn't include generated cols (as
> appropriate according to the option).
>
> Well, it might not be so simple because there are some NULL BMS
> considerations also, but I think it would be worth investigating at
> least, because if indeed you can find some common place (somewhere
> like pgoutput_change()?) where the columns BMS can be filtered to
> remove bits for generated cols then it could mean none of those other
> patch API changes are needed at all -- then the patch would only be
> 1/2 the size.

I will analyse and reply to this in the next version.

> ======
> Commit message
>
> 1.
> Now if include_generated_columns option is specified, the generated
> column information and generated column data also will be sent.
>
> Usage from pgoutput plugin:
> SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
> 'proto_version', '1', 'publication_names', 'pub1',
> 'include_generated_columns', 'true');
>
> Usage from test_decoding plugin:
> SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include_generated_columns', '1');
>
> ~
>
> I think there needs to be more background information given here. This
> commit message doesn't seem to describe anything about what is the
> problem and how this patch fixes it. It just jumps straight into
> giving usages of a 'include_generated_columns' option.
>
> It also doesn't say that this is an option that was newly *introduced*
> by the patch -- it refers to it as though the reader should already
> know about it.
>
> Furthermore, your hacker's post says "Currently it is not supported as
> a subscription option because table sync for the generated column is
> not possible as copy command does not support getting data for the
> generated column. If this feature is required we can remove this
> limitation from the copy command and then add it as a subscription
> option later." IMO that all seems like the kind of information that
> ought to also be mentioned in this commit message.

I have updated the Commit message mentioning the suggested changes.

> ======
> contrib/test_decoding/sql/ddl.sql
>
> 2.
> +-- check include_generated_columns option with generated column
> +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
> AS (a * 2) STORED);
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include_generated_columns', '1');
> +DROP TABLE gencoltable;
> +
>
> 2a.
> Perhaps you should include both option values to demonstrate the
> difference in behaviour:
>
> 'include_generated_columns', '0'
> 'include_generated_columns', '1'

Added the other option values to demonstrate the difference in behaviour:

> 2b.
> I think you maybe need to include more some test combinations where
> there is and isn't a COLUMN LIST, because I am not 100% sure I
> understand the current logic/expectations for all combinations.
>
> e.g. When the generated column is in a column list but
> 'publish_generated_columns' is false then what should happen? etc.
> Also if there are any special rules then those should be mentioned in
> the commit message.

Test case is added and the same is mentioned in the documentation.

> ======
> src/backend/replication/logical/proto.c
>
> 3.
> For all the API changes the new parameter name should be plural.
>
> /publish_generated_column/publish_generated_columns/

Updated the name to 'include_generated_columns'

> 4. logical_rep_write_tuple:
>
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
>   continue;
>
> - if (!column_in_column_list(att->attnum, columns))
> + if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
> + continue;
> +
> + if (att->attgenerated && !publish_generated_column)
>   continue;
> That code seems confusing. Shouldn't the logic be exactly as also in
> logicalrep_write_attrs()?
>
> e.g. Shouldn't they both look like this:
>
> if (att->attisdropped)
>   continue;
>
> if (att->attgenerated && !publish_generated_column)
>   continue;
>
> if (!column_in_column_list(att->attnum, columns))
>   continue;

Fixed.

> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 5.
>  static void send_relation_and_attrs(Relation relation, TransactionId xid,
>   LogicalDecodingContext *ctx,
> - Bitmapset *columns);
> + Bitmapset *columns,
> + bool publish_generated_column);
>
> Use plural. /publish_generated_column/publish_generated_columns/

Updated the name to 'include_generated_columns'

> 6. parse_output_parameters
>
>   bool origin_option_given = false;
> + bool generate_column_option_given = false;
>
>   data->binary = false;
>   data->streaming = LOGICALREP_STREAM_OFF;
>   data->messages = false;
>   data->two_phase = false;
> + data->publish_generated_column = false;
>
> I think the 1st var should be 'include_generated_columns_option_given'
> for consistency with the name of the actual option that was given.

Updated the name to 'include_generated_columns_option_given'

> ======
> src/include/replication/logicalproto.h
>
> 7.
> (Same as a previous review comment)
>
> For all the API changes the new parameter name should be plural.
>
> /publish_generated_column/publish_generated_columns/

Updated the name to 'include_generated_columns'

> ======
> src/include/replication/pgoutput.h
>
> 8.
>   bool publish_no_origin;
> + bool publish_generated_column;
>  } PGOutputData;
>
> /publish_generated_column/publish_generated_columns/

Updated the name to 'include_generated_columns'

The attached Patch contains the suggested changes.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Tue, May 21, 2024 at 12:23 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi,
>
> AFAICT this v2-0001 patch differences from v1 is mostly about adding
> the new CREATE SUBSCRIPTION option. Specifically, I don't think it is
> addressing any of my previous review comments for patch v1. [1]. So
> these comments below are limited only to the new option code; All my
> previous review comments probably still apply.
>
> ======
> Commit message
>
> 1. (General)
> The commit message is seriously lacking background explanation to describe:
> - What is the current behaviour w.r.t. generated columns
> - What is the problem with the current behaviour?
> - What exactly is this patch doing to address that problem?

Added the information related to this inside the Patch.

> 2.
> New option generated_option is added in create subscription. Now if this
> option is specified as 'true' during create subscription, generated
> columns in the tables, present in publisher (to which this subscription is
> subscribed) can also be replicated.
>
> -
>
> 2A.
> "generated_option" is not the name of the new option.
>
> ~
>
> 2B.
> "create subscription" stmt should be UPPERCASE; will also be more
> readable if the option name is quoted.
>
> ~
>
> 2C.
> Needs more information like under what condition is this option ignored etc.

Fixed.

> ======
> doc/src/sgml/ref/create_subscription.sgml
>
> 3.
> +       <varlistentry id="sql-createsubscription-params-with-generated-column">
> +        <term><literal>generated-column</literal> (<type>boolean</type>)</term>
> +        <listitem>
> +         <para>
> +          Specifies whether the generated columns present in the tables
> +          associated with the subscription should be replicated. The default is
> +          <literal>false</literal>.
> +         </para>
> +
> +         <para>
> +          This parameter can only be set true if copy_data is set to false.
> +          This option works fine when a generated column (in
> publisher) is replicated to a
> +          non-generated column (in subscriber). Else if it is
> replicated to a generated
> +          column, it will ignore the replicated data and fill the
> column with computed or
> +          default data.
> +         </para>
> +        </listitem>
> +       </varlistentry>
>
> 3A.
> There is a typo in the name "generated-column" because we should use
> underscores (not hyphens) for the option names.
>
> ~
>
> 3B.
> This it is not a good option name because there is no verb so it
> doesn't mean anything to set it true/false -- actually there IS a verb
> "generate" but we are not saying generate = true/false, so this name
> is also quite confusing.
>
> I think "include_generated_columns" would be much better, but if
> others think that name is too long then maybe "include_generated_cols"
> or "include_gen_cols" or similar. Of course, whatever if the final
> decision should be propagated same thru all the code comments, params,
> fields, etc.
>
> ~
>
> 3C.
> copy_data and false should be marked up as <literal> fonts in the sgml
>
> ~
>
> 3D.
>
> Suggest re-word this part. Don't need to explain when it "works fine".
>
> BEFORE
> This option works fine when a generated column (in publisher) is
> replicated to a non-generated column (in subscriber). Else if it is
> replicated to a generated column, it will ignore the replicated data
> and fill the column with computed or default data.
>
> SUGGESTION
> If the subscriber-side column is also a generated column then this
> option has no effect; the replicated data will be ignored and the
> subscriber column will be filled as normal with the subscriber-side
> computed or default data.

Fixed.

> ======
> src/backend/commands/subscriptioncmds.c
>
> 4. AlterSubscription
>     SUBOPT_STREAMING | SUBOPT_DISABLE_ON_ERR |
>     SUBOPT_PASSWORD_REQUIRED |
>     SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER |
> -   SUBOPT_ORIGIN);
> +   SUBOPT_ORIGIN | SUBOPT_GENERATED_COLUMN);
>
> Hmm. Is this correct? If ALTER is not allowed (later in this patch
> there is a message "toggling generated_column option is not allowed."
> then why are we even saying that SUBOPT_GENERATED_COLUMN is a
> support_opt for ALTER?

Fixed.

> 5.
> + if (IsSet(opts.specified_opts, SUBOPT_GENERATED_COLUMN))
> + {
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("toggling generated_column option is not allowed.")));
> + }
>
> 5A.
> I suspect this is not even needed if the 'supported_opt' is fixed per
> the previous comment.
>
> ~
>
> 5B.
> But if this message is still needed then I think it should say "ALTER
> is not allowed" (not "toggling is not allowed") and also the option
> name should be quoted as per the new guidelines for error messages.
>
> ======
> src/backend/replication/logical/proto.c

Fixed.

> 6. logicalrep_write_tuple
>
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
>   continue;
>
>   if (!column_in_column_list(att->attnum, columns))
>   continue;
>
> + if (att->attgenerated && !publish_generated_column)
> +
>
> Calling column_in_column_list() might be a more expensive operation
> than checking just generated columns flag so maybe reverse the order
> and check the generated columns first for a tiny performance gain.

Fixed.

> 7.
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
>   continue;
>
>   if (!column_in_column_list(att->attnum, columns))
>   continue;
>
> + if (att->attgenerated && !publish_generated_column)
> + continue;
>
> ditto #6

Fixed.

> 8. logicalrep_write_attrs
>
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
>   continue;
>
>   if (!column_in_column_list(att->attnum, columns))
>   continue;
>
> + if (att->attgenerated && !publish_generated_column)
> + continue;
> +
>
> ditto #6

Fixed.

> 9.
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
>   continue;
>
>   if (!column_in_column_list(att->attnum, columns))
>   continue;
>
> + if (att->attgenerated && !publish_generated_column)
> + continue;
>
> ditto #6
>
> ======
> src/include/catalog/pg_subscription.h

Fixed.

> 10. CATALOG
>
> + bool subgeneratedcolumn; /* True if generated colums must be published */
>
> /colums/columns/
>
> ======
> src/test/regress/sql/publication.sql

Fixed.

> 11.
> --- error: generated column "d" can't be in list
> +-- ok
>
>
> Maybe change "ok" to say like "ok: generated cols can be in the list too"

Fixed.

> 12.
> GENERAL - Missing CREATE SUBSCRIPTION test?
> GENERAL - Missing ALTER SUBSCRIPTION test?
>
> How come this patch adds a new CREATE SUBSCRIPTION option but does not
> seem to include any test case for that option in either the CREATE
> SUBSCRIPTION or ALTER SUBSCRIPTION regression tests?

Added the test cases for the same.

Patch v4-0001 contains all the changes required. See [1] for the changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjJcOsk%3Dy%2BvJ3y%2BvXhzR9ZUzUEURvS_90hQW3MNfJ5di7A%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Thu, May 23, 2024 at 10:56 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Shubham,
>
> Thanks for updating the patch! I checked your patches briefly. Here are my comments.
>
> 01. API
>
> Since the option for test_decoding is enabled by default, I think it should be renamed.
> E.g., "skip-generated-columns" or something.

Let's keep the same name 'include_generated_columns' for both the cases.

> 02. ddl.sql
>
> ```
> +-- check include-generated-columns option with generated column
> +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED);
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts',
'1','include-generated-columns', '1'); 
> +                            data
> +-------------------------------------------------------------
> + BEGIN
> + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
> + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
> + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
> + COMMIT
> +(5 rows)
> ```
>
> We should test non-default case, which the generated columns are not generated.

Added the non-default case, which the generated columns are not generated.

> 03. ddl.sql
>
> Not sure new tests are in the correct place. Do we have to add new file and move tests to it?
> Thought?

Added the new tests in the 'decoding_into_rel.out' file.

> 04. protocol.sgml
>
> Please keep the format of the sgml file.

Fixed.

> 05. protocol.sgml
>
> The option is implemented as the streaming option of pgoutput plugin, so they should be
> located under "Logical Streaming Replication Parameters" section.

Fixed.

> 05. AlterSubscription
>
> ```
> +                               if (IsSet(opts.specified_opts, SUBOPT_GENERATED_COLUMN))
> +                               {
> +                                       ereport(ERROR,
> +                                                       (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +                                                        errmsg("toggling generated_column option is not
allowed.")));
> +                               }
> ```
>
> If you don't want to support the option, you can remove SUBOPT_GENERATED_COLUMN
> macro from the function. But can you clarify the reason why you do not want?

Fixed.

> 07. logicalrep_write_tuple
>
> ```
> -               if (!column_in_column_list(att->attnum, columns))
> +               if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
> +                       continue;
> +
> +               if (att->attgenerated && !publish_generated_column)
>                         continue;
> ```
>
> I think changes in v2 was reverted or wrongly merged.

Fixed.

> 08. test code
>
> Can you add tests that generated columns are replicated by the logical replication?

Added the test cases.

Patch v4-0001 contains all the changes required. See [1] for the changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjJcOsk%3Dy%2BvJ3y%2BvXhzR9ZUzUEURvS_90hQW3MNfJ5di7A%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Thu, May 23, 2024 at 5:56 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Thu, 23 May 2024 at 09:19, Shubham Khanna
> <khannashubham1197@gmail.com> wrote:
> >
> > > Dear Shubham,
> > >
> > > Thanks for creating a patch! Here are high-level comments.
> >
> > > 1.
> > > Please document the feature. If it is hard to describe, we should change the API.
> >
> > I have added the feature in the document.
> >
> > > 4.
> > > Regarding the test_decoding plugin, it has already been able to decode the
> > > generated columns. So... as the first place, is the proposed option really needed
> > > for the plugin? Why do you include it?
> > > If you anyway want to add the option, the default value should be on - which keeps
> > > current behavior.
> >
> > I have made the generated column options as true for test_decoding
> > plugin so by default we will send generated column data.
> >
> > > 5.
> > > Assuming that the feature become usable used for logical replicaiton. Not sure,
> > > should we change the protocol version at that time? Nodes prior than PG17 may
> > > not want receive values for generated columns. Can we control only by the option?
> >
> > I verified the backward compatibility test by using the generated
> > column option and it worked fine. I think there is no need to make any
> > further changes.
> >
> > > 7.
> > >
> > > Some functions refer data->publish_generated_column many times. Can we store
> > > the value to a variable?
> > >
> > > Below comments are for test_decoding part, but they may be not needed.
> > >
> > > =====
> > >
> > > a. pg_decode_startup()
> > >
> > > ```
> > > +        else if (strcmp(elem->defname, "include_generated_columns") == 0)
> > > ```
> > >
> > > Other options for test_decoding do not have underscore. It should be
> > > "include-generated-columns".
> > >
> > > b. pg_decode_change()
> > >
> > > data->include_generated_columns is referred four times in the function.
> > > Can you store the value to a varibable?
> > >
> > >
> > > c. pg_decode_change()
> > >
> > > ```
> > > -                                    true);
> > > +                                    true, data->include_generated_columns );
> > > ```
> > >
> > > Please remove the blank.
> >
> > Fixed.
> > The attached v3 Patch has the changes for the same.
>
> Few comments:
> 1) Since this is removed, tupdesc variable is not required anymore:
> +++ b/src/backend/catalog/pg_publication.c
> @@ -534,12 +534,6 @@ publication_translate_columns(Relation targetrel,
> List *columns,
>                                         errmsg("cannot use system
> column \"%s\" in publication column list",
>                                                    colname));
>
> -               if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
> -                       ereport(ERROR,
> -
> errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> -                                       errmsg("cannot use generated
> column \"%s\" in publication column list",
> -                                                  colname));

Fixed.

> 2) In test_decoding include_generated_columns option is used:
> +               else if (strcmp(elem->defname,
> "include_generated_columns") == 0)
> +               {
> +                       if (elem->arg == NULL)
> +                               continue;
> +                       else if (!parse_bool(strVal(elem->arg),
> &data->include_generated_columns))
> +                               ereport(ERROR,
> +
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                                                errmsg("could not
> parse value \"%s\" for parameter \"%s\"",
> +
> strVal(elem->arg), elem->defname)));
> +               }
>
> In subscription we have used generated_column, we can try to use the
> same option in both places:
> +               else if (IsSet(supported_opts, SUBOPT_GENERATED_COLUMN) &&
> +                                strcmp(defel->defname,
> "generated_column") == 0)
> +               {
> +                       if (IsSet(opts->specified_opts,
> SUBOPT_GENERATED_COLUMN))
> +                               errorConflictingDefElem(defel, pstate);
> +
> +                       opts->specified_opts |= SUBOPT_GENERATED_COLUMN;
> +                       opts->generated_column = defGetBoolean(defel);
> +               }

Will update the name to 'include_generated_columns' in the next
version of the Patch.

> 3) Tab completion can be added for create subscription to include
> generated_column option

Fixed.

> 4) There are few whitespace issues while applying the patch, check for
> git diff --check

Fixed.

> 5) Add few tests for the new option added

Added new test cases.

Patch v4-0001 contains all the changes required. See [1] for the changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjJcOsk%3Dy%2BvJ3y%2BvXhzR9ZUzUEURvS_90hQW3MNfJ5di7A%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Fri, May 24, 2024 at 8:26 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi,
>
> Here are some review comments for the patch v3-0001.
>
> I don't think v3 addressed any of my previous review comments for
> patches v1 and v2. [1][2]
>
> So the comments below are limited only to the new code (i.e. the v3
> versus v2 differences). Meanwhile, all my previous review comments may
> still apply.

Patch v4-0001 addresses the previous review comments for patches v1
and v2. [1][2]

> ======
> GENERAL
>
> The patch applied gives whitespace warnings:
>
> [postgres@CentOS7-x64 oss_postgres_misc]$ git apply
> ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch
> ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:150:
> trailing whitespace.
>
> ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:202:
> trailing whitespace.
>
> ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:730:
> trailing whitespace.
> warning: 3 lines add whitespace errors.

Fixed.

> ======
> contrib/test_decoding/test_decoding.c
>
> 1. pg_decode_change
>
>   MemoryContext old;
> + bool include_generated_columns;
> +
>
> I'm not really convinced this variable saves any code.

Fixed.

> ======
> doc/src/sgml/protocol.sgml
>
> 2.
> +        <varlistentry>
> +         <term><replaceable
> class="parameter">include-generated-columns</replaceable></term>
> +         <listitem>
> +        <para>
> +        The include-generated-columns option controls whether
> generated columns should be included in the string representation of
> tuples during logical decoding in PostgreSQL. This allows users to
> customize the output format based on whether they want to include
> these columns or not.
> +         </para>
> +         </listitem>
> +         </varlistentry>
>
> 2a.
> Something is not correct when this name has hyphens and all the nearby
> parameter names do not. Shouldn't it be all uppercase like the other
> boolean parameter?
>
> ~
>
> 2b.
> Text in the SGML file should be wrapped properly.
>
> ~
>
> 2c.
> IMO the comment can be more terse and it also needs to specify that it
> is a boolean type, and what is the default value if not passed.
>
> SUGGESTION
>
> INCLUDE_GENERATED_COLUMNS [ boolean ]
>
> If true, then generated columns should be included in the string
> representation of tuples during logical decoding in PostgreSQL. The
> default is false.

Fixed.

> ======
> src/backend/replication/logical/proto.c
>
> 3. logicalrep_write_tuple
>
> - if (!column_in_column_list(att->attnum, columns))
> + if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
> + continue;
> +
> + if (att->attgenerated && !publish_generated_column)
>   continue;
>
> 3a.
> This code seems overcomplicated checking the same flag multiple times.
>
> SUGGESTION
> if (att->attgenerated)
> {
>   if (!publish_generated_column)
>     continue;
> }
> else
> {
>   if (!column_in_column_list(att->attnum, columns))
>     continue;
> }
>
> ~
>
> 3b.
> The same logic occurs several times in logicalrep_write_tuple

Fixed.

> 4. logicalrep_write_attrs
>
>   if (!column_in_column_list(att->attnum, columns))
>   continue;
>
> + if (att->attgenerated && !publish_generated_column)
> + continue;
> +
>
> Shouldn't these code fragments (2x in this function) look the same as
> in logicalrep_write_tuple? See the above review comments.

Fixed.

> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 5. maybe_send_schema
>
>   TransactionId topxid = InvalidTransactionId;
> + bool publish_generated_column = data->publish_generated_column;
>
> I'm not convinced this saves any code, and anyway, it is not
> consistent with other fields in this function that are not extracted
> to another variable (e.g. data->streaming).

Fixed.

> 6. pgoutput_change
> -
> + bool publish_generated_column = data->publish_generated_column;
> +
>
> I'm not convinced this saves any code, and anyway, it is not
> consistent with other fields in this function that are not extracted
> to another variable (e.g. data->binary).

Fixed.

> ======
> [1] My v1 review -
> https://www.postgresql.org/message-id/CAHut+PsuJfcaeg6zst=6PE5uyJv_UxVRHU3ck7W2aHb1uQYKng@mail.gmail.com
> [2] My v2 review -
> https://www.postgresql.org/message-id/CAHut%2BPv4RpOsUgkEaXDX%3DW2rhHAsJLiMWdUrUGZOcoRHuWj5%2BQ%40mail.gmail.com

Patch v4-0001 contains all the changes required. See [1] for the changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjJcOsk%3Dy%2BvJ3y%2BvXhzR9ZUzUEURvS_90hQW3MNfJ5di7A%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Mon, 3 Jun 2024 at 13:03, Shubham Khanna <khannashubham1197@gmail.com> wrote:
>
> On Thu, May 16, 2024 at 11:35 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Here are some review comments for the patch v1-0001.
> >
> > ======
> > GENERAL
> >
> > G.1. Use consistent names
> >
> > It seems to add unnecessary complications by having different names
> > for all the new options, fields and API parameters.
> >
> > e.g. sometimes 'include_generated_columns'
> > e.g. sometimes 'publish_generated_columns'
> >
> > Won't it be better to just use identical names everywhere for
> > everything? I don't mind which one you choose; I just felt you only
> > need one name, not two. This comment overrides everything else in this
> > post so whatever name you choose, make adjustments for all my other
> > review comments as necessary.
>
> I have updated the name to 'include_generated_columns' everywhere in the Patch.
>
> > ======
> >
> > G.2. Is it possible to just use the existing bms?
> >
> > A very large part of this patch is adding more API parameters to
> > delegate the 'publish_generated_columns' flag value down to when it is
> > finally checked and used. e.g.
> >
> > The functions:
> > - logicalrep_write_insert(), logicalrep_write_update(),
> > logicalrep_write_delete()
> > ... are delegating the new parameter 'publish_generated_column' down to:
> > - logicalrep_write_tuple
> >
> > The functions:
> > - logicalrep_write_rel()
> > ... are delegating the new parameter 'publish_generated_column' down to:
> > - logicalrep_write_attrs
> >
> > AFAICT in all these places the API is already passing a "Bitmapset
> > *columns". I was wondering if it might be possible to modify the
> > "Bitmapset *columns" BEFORE any of those functions get called so that
> > the "columns" BMS either does or doesn't include generated cols (as
> > appropriate according to the option).
> >
> > Well, it might not be so simple because there are some NULL BMS
> > considerations also, but I think it would be worth investigating at
> > least, because if indeed you can find some common place (somewhere
> > like pgoutput_change()?) where the columns BMS can be filtered to
> > remove bits for generated cols then it could mean none of those other
> > patch API changes are needed at all -- then the patch would only be
> > 1/2 the size.
>
> I will analyse and reply to this in the next version.
>
> > ======
> > Commit message
> >
> > 1.
> > Now if include_generated_columns option is specified, the generated
> > column information and generated column data also will be sent.
> >
> > Usage from pgoutput plugin:
> > SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
> > 'proto_version', '1', 'publication_names', 'pub1',
> > 'include_generated_columns', 'true');
> >
> > Usage from test_decoding plugin:
> > SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
> > 'include-xids', '0', 'skip-empty-xacts', '1',
> > 'include_generated_columns', '1');
> >
> > ~
> >
> > I think there needs to be more background information given here. This
> > commit message doesn't seem to describe anything about what is the
> > problem and how this patch fixes it. It just jumps straight into
> > giving usages of a 'include_generated_columns' option.
> >
> > It also doesn't say that this is an option that was newly *introduced*
> > by the patch -- it refers to it as though the reader should already
> > know about it.
> >
> > Furthermore, your hacker's post says "Currently it is not supported as
> > a subscription option because table sync for the generated column is
> > not possible as copy command does not support getting data for the
> > generated column. If this feature is required we can remove this
> > limitation from the copy command and then add it as a subscription
> > option later." IMO that all seems like the kind of information that
> > ought to also be mentioned in this commit message.
>
> I have updated the Commit message mentioning the suggested changes.
>
> > ======
> > contrib/test_decoding/sql/ddl.sql
> >
> > 2.
> > +-- check include_generated_columns option with generated column
> > +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
> > AS (a * 2) STORED);
> > +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> > NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> > 'include_generated_columns', '1');
> > +DROP TABLE gencoltable;
> > +
> >
> > 2a.
> > Perhaps you should include both option values to demonstrate the
> > difference in behaviour:
> >
> > 'include_generated_columns', '0'
> > 'include_generated_columns', '1'
>
> Added the other option values to demonstrate the difference in behaviour:
>
> > 2b.
> > I think you maybe need to include more some test combinations where
> > there is and isn't a COLUMN LIST, because I am not 100% sure I
> > understand the current logic/expectations for all combinations.
> >
> > e.g. When the generated column is in a column list but
> > 'publish_generated_columns' is false then what should happen? etc.
> > Also if there are any special rules then those should be mentioned in
> > the commit message.
>
> Test case is added and the same is mentioned in the documentation.
>
> > ======
> > src/backend/replication/logical/proto.c
> >
> > 3.
> > For all the API changes the new parameter name should be plural.
> >
> > /publish_generated_column/publish_generated_columns/
>
> Updated the name to 'include_generated_columns'
>
> > 4. logical_rep_write_tuple:
> >
> > - if (att->attisdropped || att->attgenerated)
> > + if (att->attisdropped)
> >   continue;
> >
> > - if (!column_in_column_list(att->attnum, columns))
> > + if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
> > + continue;
> > +
> > + if (att->attgenerated && !publish_generated_column)
> >   continue;
> > That code seems confusing. Shouldn't the logic be exactly as also in
> > logicalrep_write_attrs()?
> >
> > e.g. Shouldn't they both look like this:
> >
> > if (att->attisdropped)
> >   continue;
> >
> > if (att->attgenerated && !publish_generated_column)
> >   continue;
> >
> > if (!column_in_column_list(att->attnum, columns))
> >   continue;
>
> Fixed.
>
> > ======
> > src/backend/replication/pgoutput/pgoutput.c
> >
> > 5.
> >  static void send_relation_and_attrs(Relation relation, TransactionId xid,
> >   LogicalDecodingContext *ctx,
> > - Bitmapset *columns);
> > + Bitmapset *columns,
> > + bool publish_generated_column);
> >
> > Use plural. /publish_generated_column/publish_generated_columns/
>
> Updated the name to 'include_generated_columns'
>
> > 6. parse_output_parameters
> >
> >   bool origin_option_given = false;
> > + bool generate_column_option_given = false;
> >
> >   data->binary = false;
> >   data->streaming = LOGICALREP_STREAM_OFF;
> >   data->messages = false;
> >   data->two_phase = false;
> > + data->publish_generated_column = false;
> >
> > I think the 1st var should be 'include_generated_columns_option_given'
> > for consistency with the name of the actual option that was given.
>
> Updated the name to 'include_generated_columns_option_given'
>
> > ======
> > src/include/replication/logicalproto.h
> >
> > 7.
> > (Same as a previous review comment)
> >
> > For all the API changes the new parameter name should be plural.
> >
> > /publish_generated_column/publish_generated_columns/
>
> Updated the name to 'include_generated_columns'
>
> > ======
> > src/include/replication/pgoutput.h
> >
> > 8.
> >   bool publish_no_origin;
> > + bool publish_generated_column;
> >  } PGOutputData;
> >
> > /publish_generated_column/publish_generated_columns/
>
> Updated the name to 'include_generated_columns'
>
> The attached Patch contains the suggested changes.

Thanks for the updated patch, few comments:
1) The option name seems wrong here:
In one place include_generated_column is specified and other place
include_generated_columns is specified:

+               else if (IsSet(supported_opts,
SUBOPT_INCLUDE_GENERATED_COLUMN) &&
+                                strcmp(defel->defname,
"include_generated_column") == 0)
+               {
+                       if (IsSet(opts->specified_opts,
SUBOPT_INCLUDE_GENERATED_COLUMN))
+                               errorConflictingDefElem(defel, pstate);
+
+                       opts->specified_opts |= SUBOPT_INCLUDE_GENERATED_COLUMN;
+                       opts->include_generated_column = defGetBoolean(defel);
+               }

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index d453e224d9..e8ff752fd9 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3365,7 +3365,7 @@ psql_completion(const char *text, int start, int end)
                COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
                                          "disable_on_error",
"enabled", "failover", "origin",
                                          "password_required",
"run_as_owner", "slot_name",
-                                         "streaming",
"synchronous_commit", "two_phase");
+                                         "streaming",
"synchronous_commit", "two_phase","include_generated_columns");

2) This small data table need not have a primary key column as it will
create an index and insertion will happen in the index too.
+-- check include-generated-columns option with generated column
+CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
AS (a * 2) STORED);
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '1');

3) Please add a test case for this:
+          set to <literal>false</literal>. If the subscriber-side
column is also a
+          generated column then this option has no effect; the
replicated data will
+          be ignored and the subscriber column will be filled as
normal with the
+          subscriber-side computed or default data.

4) You can use a new style of ereport to remove the brackets around errcode
4.a)
+                       else if (!parse_bool(strVal(elem->arg),
&data->include_generated_columns))
+                               ereport(ERROR,
+
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                                                errmsg("could not
parse value \"%s\" for parameter \"%s\"",
+
strVal(elem->arg), elem->defname)));

4.b) similarly here too:
+               ereport(ERROR,
+                               (errcode(ERRCODE_SYNTAX_ERROR),
+               /*- translator: both %s are strings of the form
"option = value" */
+                                       errmsg("%s and %s are mutually
exclusive options",
+                                               "copy_data = true",
"include_generated_column = true")));

4.c) similarly here too:
+                       if (include_generated_columns_option_given)
+                               ereport(ERROR,
+                                               (errcode(ERRCODE_SYNTAX_ERROR),
+                                                errmsg("conflicting
or redundant options")));

5) These variable names can be changed to keep it smaller, something
like gencol or generatedcol or gencolumn, etc
+++ b/src/include/catalog/pg_subscription.h
@@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
BKI_SHARED_RELATION BKI_ROW
  * slots) in the upstream database are enabled
  * to be synchronized to the standbys. */

+ bool subincludegeneratedcolumn; /* True if generated columns must be
published */
+
 #ifdef CATALOG_VARLEN /* variable-length fields start here */
  /* Connection string to the publisher */
  text subconninfo BKI_FORCE_NOT_NULL;
@@ -157,6 +159,7 @@ typedef struct Subscription
  List    *publications; /* List of publication names to subscribe to */
  char    *origin; /* Only publish data originating from the
  * specified origin */
+ bool includegeneratedcolumn; /* publish generated column data */
 } Subscription;

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
>
> The attached Patch contains the suggested changes.
>

Hi,

Currently, COPY command does not work for generated columns and
therefore, COPY of generated column is not supported during tablesync
process. So, in patch v4-0001 we added a check to allow replication of
the generated column only if 'copy_data = false'.

I am attaching patches to resolve the above issues.

v5-0001: not changed
v5-0002: Support COPY of generated column
v5-0003: Support COPY of generated column during tablesync process

Thought?


Thanks and Regards,
Shlok Kyal

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi,

Here are some review comments for patch v5-0001.

======
GENERAL G.1

The patch changes HEAD behaviour for PUBLICATION col-lists right? e.g.
maybe before they were always ignored, but now they are not?

OTOH, when 'include_generated_columns' is false then the PUBLICATION
col-list will ignore any generated cols even when they are present in
a PUBLICATION col-list, right?

These kinds of points should be noted in the commit message and in the
(col-list?) documentation.

======
Commit message

General 1a.
IMO the commit message needs some background to say something like:
"Currently generated column values are not replicated because it is
assumed that the corresponding subscriber-side table will generate its
own values for those columns."

~

General 1b.
Somewhere in this commit message, you need to give all the other
special rules --- e.g. the docs says "If the subscriber-side column is
also a generated column then this option has no effect"

~~~

2.
This commit enables support for the 'include_generated_columns' option
in logical replication, allowing the transmission of generated column
information and data alongside regular table changes. This option is
particularly useful for scenarios where applications require access to
generated column values for downstream processing or synchronization.

~

I don't think the sentence "This option is particularly useful..." is
helpful. It seems like just saying "This commit supports option XXX.
This is particularly useful if you want XXX".

~~~

3.
CREATE SUBSCRIPTION test1 connection 'dbname=postgres host=localhost port=9999
'publication pub1;

~

What is this CREATE SUBSCRIPTION for? Shouldn't it have an example of
the new parameter being used in it?

~~~

4.
Currently copy_data option with include_generated_columns option is
not supported. A future patch will remove this limitation.

~

Suggest to single-quote those parameter names for better readability.

~~~

5.
This commit aims to enhance the flexibility and utility of logical
replication by allowing users to include generated column information
in replication streams, paving the way for more robust data
synchronization and processing workflows.

~

IMO this paragraph can be omitted.

======
.../test_decoding/sql/decoding_into_rel.sql

6.
+-- check include-generated-columns option with generated column
+CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
AS (a * 2) STORED);
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '1');
+INSERT INTO gencoltable (a) VALUES (4), (5), (6);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '0');
+DROP TABLE gencoltable;
+

6a.
I felt some additional explicit comments might help the readabilty of
the output file.

e.g.1
-- When 'include-generated=columns' = '1' the generated column 'b'
values will be replicated
SELECT data FROM pg_logical_slot_get_changes...

e.g.2
-- When 'include-generated=columns' = '0' the generated column 'b'
values will not be replicated
SELECT data FROM pg_logical_slot_get_changes...

~~

6b.
Suggest adding one more test case (where 'include-generated=columns'
is not set) to confirm/demonstrate the default behaviour for
replicated generated cols.

======
doc/src/sgml/protocol.sgml

7.
+    <varlistentry>
+     <term><replaceable
class="parameter">include-generated-columns</replaceable></term>
+      <listitem>
+       <para>
+        Boolean option to enable generated columns.
+        The include-generated-columns option controls whether generated
+        columns should be included in the string representation of tuples
+        during logical decoding in PostgreSQL. This allows users to
+        customize the output format based on whether they want to include
+        these columns or not. The default is false.
+       </para>
+      </listitem>
+    </varlistentry>

7a.
It doesn't render properly. e.g. Should not be bold italic (probably
the class is wrong?), because none of the nearby parameters look this
way.

~

7b.
The name here should NOT have hyphens. It needs underscores same as
all other nearby protocol parameters.

~

7c.
The description seems overly verbose.

SUGGESTION
Boolean option to enable generated columns. This option controls
whether generated columns should be included in the string
representation of tuples during logical decoding in PostgreSQL. The
default is false.

======
doc/src/sgml/ref/create_subscription.sgml

8.
+
+       <varlistentry
id="sql-createsubscription-params-with-include-generated-column">
+        <term><literal>include_generated_column</literal>
(<type>boolean</type>)</term>
+        <listitem>
+         <para>
+          Specifies whether the generated columns present in the tables
+          associated with the subscription should be replicated. The default is
+          <literal>false</literal>.
+         </para>

The parameter name should be plural (include_generated_columns).

======
src/backend/commands/subscriptioncmds.c

9.
 #define SUBOPT_ORIGIN 0x00008000
+#define SUBOPT_INCLUDE_GENERATED_COLUMN 0x00010000

Should be plural COLUMNS

~~~

10.
+ else if (IsSet(supported_opts, SUBOPT_INCLUDE_GENERATED_COLUMN) &&
+ strcmp(defel->defname, "include_generated_column") == 0)

The new subscription parameter should be plural ("include_generated_columns").

~~~

11.
+
+ /*
+ * Do additional checking for disallowed combination when copy_data and
+ * include_generated_column are true. COPY of generated columns is
not supported
+ * yet.
+ */
+ if (opts->copy_data && opts->include_generated_column)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: both %s are strings of the form "option = value" */
+ errmsg("%s and %s are mutually exclusive options",
+ "copy_data = true", "include_generated_column = true")));
+ }

/combination/combinations/

The parameter name should be plural in the comment and also in the
error message.

======
src/bin/psql/tab-complete.c

12.
  COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
    "disable_on_error", "enabled", "failover", "origin",
    "password_required", "run_as_owner", "slot_name",
-   "streaming", "synchronous_commit", "two_phase");
+   "streaming", "synchronous_commit", "two_phase","include_generated_columns");

The new param should be added in alphabetical order same as all the others.

======
src/include/catalog/pg_subscription.h

13.
+ bool subincludegeneratedcolumn; /* True if generated columns must be
published */
+

The field name should be plural.

~~~

14.
+ bool includegeneratedcolumn; /* publish generated column data */
 } Subscription;

The field name should be plural.

======
src/include/replication/walreceiver.h

15.
  * prepare time */
  char    *origin; /* Only publish data originating from the
  * specified origin */
+ bool include_generated_column; /* publish generated columns */
  } logical;
  } proto;
 } WalRcvStreamOptions;

~

This new field name should be plural.

======
src/test/subscription/t/011_generated.pl

16.
+my ($cmdret, $stdout, $stderr) = $node_subscriber->psql('postgres', qq(
+ CREATE SUBSCRIPTION sub2 CONNECTION '$publisher_connstr' PUBLICATION
pub2 WITH (include_generated_column = true)
+));
+ok( $stderr =~
+   qr/copy_data = true and include_generated_column = true are
mutually exclusive options/,
+ 'cannot use both include_generated_column and copy_data as true');

Isn't this mutual exclusiveness of options something that could have
been tested in the regress test suite instead of TAP tests? e.g. AFAIK
you won't require a connection to test this case.

~~~

17. Missing test?

IIUC there is a missing test scenario. You can add another subscriber
table TAB3 which *already* has generated cols (e.g. generating
different values to the publisher) so then you can verify they are NOT
overwritten, even when the 'include_generated_cols' is true.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi,

Here are some review comments for patch v5-0002.

======
GENERAL

G1.
IIUC now you are unconditionally allowing all generated columns to be copied.

I think this is assuming that the table sync code (in the next patch
0003?) is going to explicitly name all the columns it wants to copy
(so if it wants to get generated cols then it will name the generated
cols, and if is doesn't want generated cols then it won't name them).

Maybe that is OK for the logical replication tablesync case, but I am
not sure if it will be desirable to *always* copy generated columns in
other user scenarios.

e.g. I was wondering if there should be a new COPY command option
introduced here -- INCLUDE_GENERATED_COLUMNS (with default false) so
then the current HEAD behaviour is unaffected unless that option is
enabled.

~~~

G2.
The current COPY command documentation [1] says "If no column list is
specified, all columns of the table except generated columns will be
copied."

But this 0002 patch has changed that documented behaviour, and so the
documentation needs to be changed as well, right?

======
Commit Message

1.
Currently COPY command do not copy generated column. With this commit
added support for COPY for generated column.

~

The grammar/cardinality is not good here. Try some tool (Grammarly or
chatGPT, etc) to help correct it.

======
src/backend/commands/copy.c

======
src/test/regress/expected/generated.out

======
src/test/regress/sql/generated.sql

2.
I think these COPY test cases require some explicit comments to
describe what they are doing, and what are the expected results.

Currently, I have doubts about some of this test input/output

e.g.1. Why is the 'b' column sometimes specified as 1? It needs some
explanation. Are you expecting this generated col value to be
ignored/overwritten or what?

COPY gtest1 (a, b) FROM stdin DELIMITER ' ';
5 1
6 1
\.

e.g.2. what is the reason for this new 'missing data for column "b"'
error? Or is it some introduced quirk because "b" now cannot be
generated since there is no value for "a"? I don't know if the
expected *.out here is OK or not, so some test comments may help to
clarify it.

======
[1] https://www.postgresql.org/docs/devel/sql-copy.html

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi,

Here are some review comments for patch v5-0003.

======
0. Whitespace warnings when the patch was applied.

[postgres@CentOS7-x64 oss_postgres_misc]$ git apply
../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch
../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:29:
trailing whitespace.
          has no effect; the replicated data will be ignored and the subscriber
../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:30:
trailing whitespace.
          column will be filled as normal with the subscriber-side computed or
../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:189:
trailing whitespace.
(walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 &&
warning: 3 lines add whitespace errors.

======
src/backend/commands/subscriptioncmds.c

1.
- res = walrcv_exec(wrconn, cmd.data, check_columnlist ? 3 : 2, tableRow);
+ column_count = (!include_generated_column && check_gen_col) ? 4 :
(check_columnlist ? 3 : 2);
+ res = walrcv_exec(wrconn, cmd.data, column_count, tableRow);

The 'column_count' seems out of control. Won't it be far simpler to
assign/increment the value dynamically only as required instead of the
tricky calculation at the end which is unnecessarily difficult to
understand?

~~~

2.
+ /*
+ * If include_generated_column option is false and all the column of
the table in the
+ * publication are generated then we should throw an error.
+ */
+ if (!isnull && !include_generated_column && check_gen_col)
+ {
+ attlist = DatumGetArrayTypeP(attlistdatum);
+ gen_col_count = DatumGetInt32(slot_getattr(slot, 4, &isnull));
+ Assert(!isnull);
+
+ attcount = ArrayGetNItems(ARR_NDIM(attlist), ARR_DIMS(attlist));
+
+ if (attcount != 0 && attcount == gen_col_count)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use only generated column for table \"%s.%s\" in
publication when generated_column option is false",
+    nspname, relname));
+ }
+

Why do you think this new logic/error is necessary?

IIUC the 'include_generated_columns' should be false to match the
existing HEAD behavior. So this scenario where your publisher-side
table *only* has generated columns is something that could already
happen, right? IOW, this introduced error could be a candidate for
another discussion/thread/patch, but is it really required for this
current patch?

======
src/backend/replication/logical/tablesync.c

3.
  lrel->remoteid,
- (walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 ?
-   "AND a.attgenerated = ''" : ""),
+ (walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 &&
+ (walrcv_server_version(LogRepWorkerWalRcvConn) <= 160000 ||
+ !MySubscription->includegeneratedcolumn) ? "AND a.attgenerated = ''" : ""),

This ternary within one big appendStringInfo seems quite complicated.
Won't it be better to split the appendStringInfo into multiple parts
so the generated-cols calculation can be done more simply?

======
src/test/subscription/t/011_generated.pl

4.
I think there should be a variety of different tablesync scenarios
(when 'include_generated_columns' is true) tested here instead of just
one, and all varieties with lots of comments to say what they are
doing, expectations etc.

a. publisher-side gen-col "a" replicating to subscriber-side NOT
gen-col "a" (ok, value gets replicated)
b. publisher-side gen-col "a" replicating to subscriber-side gen-col
(ok, but ignored)
c. publisher-side NOT gen-col "b" replicating to subscriber-side
gen-col "b" (error?)

~~

5.
+$result = $node_subscriber->safe_psql('postgres', "SELECT a, b FROM tab3");
+is( $result, qq(1|2
+2|4
+3|6), 'generated columns initial sync with include_generated_column = true');

Should this say "ORDER BY..." so it will not fail if the row order
happens to be something unanticipated?

======

99.
Also, see the attached file with numerous other nitpicks:
- plural param- and var-names
- typos in comments
- missing spaces
- SQL keyword should be UPPERCASE
- etc.

Please apply any/all of these if you agree with them.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Mon, Jun 3, 2024 at 9:52 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> >
> > The attached Patch contains the suggested changes.
> >
>
> Hi,
>
> Currently, COPY command does not work for generated columns and
> therefore, COPY of generated column is not supported during tablesync
> process. So, in patch v4-0001 we added a check to allow replication of
> the generated column only if 'copy_data = false'.
>
> I am attaching patches to resolve the above issues.
>
> v5-0001: not changed
> v5-0002: Support COPY of generated column
> v5-0003: Support COPY of generated column during tablesync process
>

Hi Shlok, I have a question about patch v5-0003.

According to the patch 0001 docs "If the subscriber-side column is
also a generated column then this option has no effect; the replicated
data will be ignored and the subscriber column will be filled as
normal with the subscriber-side computed or default data".

Doesn't this mean it will be a waste of effort/resources to COPY any
column value where the subscriber-side column is generated since we
know that any copied value will be ignored anyway?

But I don't recall seeing any comment or logic for this kind of copy
optimisation in the patch 0003. Is this already accounted for
somewhere and I missed it, or is my understanding wrong?

======
Kind Regards,
Peter Smith.
Fujitsu Australia



RE: Pgoutput not capturing the generated columns

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shlok and Shubham,

Thanks for updating the patch!

I briefly checked the v5-0002. IIUC, your patch allows to copy generated
columns unconditionally. I think the behavior affects many people so that it is
hard to get agreement.

Can we add a new option like `GENERATED_COLUMNS [boolean]`? If the default is set
to off, we can keep the current specification.

Thought?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
> Thanks for the updated patch, few comments:
> 1) The option name seems wrong here:
> In one place include_generated_column is specified and other place
> include_generated_columns is specified:
>
> +               else if (IsSet(supported_opts,
> SUBOPT_INCLUDE_GENERATED_COLUMN) &&
> +                                strcmp(defel->defname,
> "include_generated_column") == 0)
> +               {
> +                       if (IsSet(opts->specified_opts,
> SUBOPT_INCLUDE_GENERATED_COLUMN))
> +                               errorConflictingDefElem(defel, pstate);
> +
> +                       opts->specified_opts |= SUBOPT_INCLUDE_GENERATED_COLUMN;
> +                       opts->include_generated_column = defGetBoolean(defel);
> +               }

Fixed.

> diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
> index d453e224d9..e8ff752fd9 100644
> --- a/src/bin/psql/tab-complete.c
> +++ b/src/bin/psql/tab-complete.c
> @@ -3365,7 +3365,7 @@ psql_completion(const char *text, int start, int end)
>                 COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
>                                           "disable_on_error",
> "enabled", "failover", "origin",
>                                           "password_required",
> "run_as_owner", "slot_name",
> -                                         "streaming",
> "synchronous_commit", "two_phase");
> +                                         "streaming",
> "synchronous_commit", "two_phase","include_generated_columns");
>
> 2) This small data table need not have a primary key column as it will
> create an index and insertion will happen in the index too.
> +-- check include-generated-columns option with generated column
> +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
> AS (a * 2) STORED);
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '1');

Fixed.

> 3) Please add a test case for this:
> +          set to <literal>false</literal>. If the subscriber-side
> column is also a
> +          generated column then this option has no effect; the
> replicated data will
> +          be ignored and the subscriber column will be filled as
> normal with the
> +          subscriber-side computed or default data.

Added the required test case.

> 4) You can use a new style of ereport to remove the brackets around errcode
> 4.a)
> +                       else if (!parse_bool(strVal(elem->arg),
> &data->include_generated_columns))
> +                               ereport(ERROR,
> +
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                                                errmsg("could not
> parse value \"%s\" for parameter \"%s\"",
> +
> strVal(elem->arg), elem->defname)));
>
> 4.b) similarly here too:
> +               ereport(ERROR,
> +                               (errcode(ERRCODE_SYNTAX_ERROR),
> +               /*- translator: both %s are strings of the form
> "option = value" */
> +                                       errmsg("%s and %s are mutually
> exclusive options",
> +                                               "copy_data = true",
> "include_generated_column = true")));
>
> 4.c) similarly here too:
> +                       if (include_generated_columns_option_given)
> +                               ereport(ERROR,
> +                                               (errcode(ERRCODE_SYNTAX_ERROR),
> +                                                errmsg("conflicting
> or redundant options")));

Fixed.

> 5) These variable names can be changed to keep it smaller, something
> like gencol or generatedcol or gencolumn, etc
> +++ b/src/include/catalog/pg_subscription.h
> @@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
> BKI_SHARED_RELATION BKI_ROW
>   * slots) in the upstream database are enabled
>   * to be synchronized to the standbys. */
>
> + bool subincludegeneratedcolumn; /* True if generated columns must be
> published */
> +
>  #ifdef CATALOG_VARLEN /* variable-length fields start here */
>   /* Connection string to the publisher */
>   text subconninfo BKI_FORCE_NOT_NULL;
> @@ -157,6 +159,7 @@ typedef struct Subscription
>   List    *publications; /* List of publication names to subscribe to */
>   char    *origin; /* Only publish data originating from the
>   * specified origin */
> + bool includegeneratedcolumn; /* publish generated column data */
>  } Subscription;

Fixed.

The attached Patch contains the suggested changes.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Tue, Jun 4, 2024 at 8:12 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi,
>
> Here are some review comments for patch v5-0001.
>
> ======
> GENERAL G.1
>
> The patch changes HEAD behaviour for PUBLICATION col-lists right? e.g.
> maybe before they were always ignored, but now they are not?
>
> OTOH, when 'include_generated_columns' is false then the PUBLICATION
> col-list will ignore any generated cols even when they are present in
> a PUBLICATION col-list, right?
>
> These kinds of points should be noted in the commit message and in the
> (col-list?) documentation.

Fixed.

> ======
> Commit message
>
> General 1a.
> IMO the commit message needs some background to say something like:
> "Currently generated column values are not replicated because it is
> assumed that the corresponding subscriber-side table will generate its
> own values for those columns."
>
> ~
>
> General 1b.
> Somewhere in this commit message, you need to give all the other
> special rules --- e.g. the docs says "If the subscriber-side column is
> also a generated column then this option has no effect"
>
> ~~~

Fixed.

> 2.
> This commit enables support for the 'include_generated_columns' option
> in logical replication, allowing the transmission of generated column
> information and data alongside regular table changes. This option is
> particularly useful for scenarios where applications require access to
> generated column values for downstream processing or synchronization.
>
> ~
>
> I don't think the sentence "This option is particularly useful..." is
> helpful. It seems like just saying "This commit supports option XXX.
> This is particularly useful if you want XXX".
>

Fixed.

>
> 3.
> CREATE SUBSCRIPTION test1 connection 'dbname=postgres host=localhost port=9999
> 'publication pub1;
>
> ~
>
> What is this CREATE SUBSCRIPTION for? Shouldn't it have an example of
> the new parameter being used in it?
>

Added the description for this in the Patch.

>
> 4.
> Currently copy_data option with include_generated_columns option is
> not supported. A future patch will remove this limitation.
>
> ~
>
> Suggest to single-quote those parameter names for better readability.
>

Fixed.

>
> 5.
> This commit aims to enhance the flexibility and utility of logical
> replication by allowing users to include generated column information
> in replication streams, paving the way for more robust data
> synchronization and processing workflows.
>
> ~
>
> IMO this paragraph can be omitted.

Fixed.

> ======
> .../test_decoding/sql/decoding_into_rel.sql
>
> 6.
> +-- check include-generated-columns option with generated column
> +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
> AS (a * 2) STORED);
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '1');
> +INSERT INTO gencoltable (a) VALUES (4), (5), (6);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '0');
> +DROP TABLE gencoltable;
> +
>
> 6a.
> I felt some additional explicit comments might help the readabilty of
> the output file.
>
> e.g.1
> -- When 'include-generated=columns' = '1' the generated column 'b'
> values will be replicated
> SELECT data FROM pg_logical_slot_get_changes...
>
> e.g.2
> -- When 'include-generated=columns' = '0' the generated column 'b'
> values will not be replicated
> SELECT data FROM pg_logical_slot_get_changes...

Added the required description for this.

> 6b.
> Suggest adding one more test case (where 'include-generated=columns'
> is not set) to confirm/demonstrate the default behaviour for
> replicated generated cols.

Added the required Test case.

> ======
> doc/src/sgml/protocol.sgml
>
> 7.
> +    <varlistentry>
> +     <term><replaceable
> class="parameter">include-generated-columns</replaceable></term>
> +      <listitem>
> +       <para>
> +        Boolean option to enable generated columns.
> +        The include-generated-columns option controls whether generated
> +        columns should be included in the string representation of tuples
> +        during logical decoding in PostgreSQL. This allows users to
> +        customize the output format based on whether they want to include
> +        these columns or not. The default is false.
> +       </para>
> +      </listitem>
> +    </varlistentry>
>
> 7a.
> It doesn't render properly. e.g. Should not be bold italic (probably
> the class is wrong?), because none of the nearby parameters look this
> way.
>
> ~
>
> 7b.
> The name here should NOT have hyphens. It needs underscores same as
> all other nearby protocol parameters.
>
> ~
>
> 7c.
> The description seems overly verbose.
>
> SUGGESTION
> Boolean option to enable generated columns. This option controls
> whether generated columns should be included in the string
> representation of tuples during logical decoding in PostgreSQL. The
> default is false.

Fixed.

> ======
> doc/src/sgml/ref/create_subscription.sgml
>
> 8.
> +
> +       <varlistentry
> id="sql-createsubscription-params-with-include-generated-column">
> +        <term><literal>include_generated_column</literal>
> (<type>boolean</type>)</term>
> +        <listitem>
> +         <para>
> +          Specifies whether the generated columns present in the tables
> +          associated with the subscription should be replicated. The default is
> +          <literal>false</literal>.
> +         </para>
>
> The parameter name should be plural (include_generated_columns).

Fixed.

> ======
> src/backend/commands/subscriptioncmds.c
>
> 9.
>  #define SUBOPT_ORIGIN 0x00008000
> +#define SUBOPT_INCLUDE_GENERATED_COLUMN 0x00010000
>
> Should be plural COLUMNS
>
Fixed.

> 10.
> + else if (IsSet(supported_opts, SUBOPT_INCLUDE_GENERATED_COLUMN) &&
> + strcmp(defel->defname, "include_generated_column") == 0)
>
> The new subscription parameter should be plural ("include_generated_columns").

Fixed.

> 11.
> +
> + /*
> + * Do additional checking for disallowed combination when copy_data and
> + * include_generated_column are true. COPY of generated columns is
> not supported
> + * yet.
> + */
> + if (opts->copy_data && opts->include_generated_column)
> + {
> + ereport(ERROR,
> + (errcode(ERRCODE_SYNTAX_ERROR),
> + /*- translator: both %s are strings of the form "option = value" */
> + errmsg("%s and %s are mutually exclusive options",
> + "copy_data = true", "include_generated_column = true")));
> + }
>
> /combination/combinations/
>
> The parameter name should be plural in the comment and also in the
> error message.

Fixed.

> ======
> src/bin/psql/tab-complete.c
>
> 12.
>   COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
>     "disable_on_error", "enabled", "failover", "origin",
>     "password_required", "run_as_owner", "slot_name",
> -   "streaming", "synchronous_commit", "two_phase");
> +   "streaming", "synchronous_commit", "two_phase","include_generated_columns");
>
> The new param should be added in alphabetical order same as all the others.

Fixed.

> ======
> src/include/catalog/pg_subscription.h
>
> 13.
> + bool subincludegeneratedcolumn; /* True if generated columns must be
> published */
> +
>
> The field name should be plural.

Fixed.

>
> 14.
> + bool includegeneratedcolumn; /* publish generated column data */
>  } Subscription;
>
> The field name should be plural.

Fixed.

> ======
> src/include/replication/walreceiver.h
>
> 15.
>   * prepare time */
>   char    *origin; /* Only publish data originating from the
>   * specified origin */
> + bool include_generated_column; /* publish generated columns */
>   } logical;
>   } proto;
>  } WalRcvStreamOptions;
>
> ~
>
> This new field name should be plural.

Fixed.

> ======
> src/test/subscription/t/011_generated.pl
>
> 16.
> +my ($cmdret, $stdout, $stderr) = $node_subscriber->psql('postgres', qq(
> + CREATE SUBSCRIPTION sub2 CONNECTION '$publisher_connstr' PUBLICATION
> pub2 WITH (include_generated_column = true)
> +));
> +ok( $stderr =~
> +   qr/copy_data = true and include_generated_column = true are
> mutually exclusive options/,
> + 'cannot use both include_generated_column and copy_data as true');
>
> Isn't this mutual exclusiveness of options something that could have
> been tested in the regress test suite instead of TAP tests? e.g. AFAIK
> you won't require a connection to test this case.


> 17. Missing test?
>
> IIUC there is a missing test scenario. You can add another subscriber
> table TAB3 which *already* has generated cols (e.g. generating
> different values to the publisher) so then you can verify they are NOT
> overwritten, even when the 'include_generated_cols' is true.
>
> ======

Moved this test case to the Regression test.

Patch v6-0001 contains all the changes required. See [1] for the changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjJn6EiyAitJbbvkvVV2d45fV3Wjr2VmWFugm3RsbaU%2BRg%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Tue, 4 Jun 2024 at 10:21, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi,
>
> Here are some review comments for patch v5-0002.
>
> ======
> GENERAL
>
> G1.
> IIUC now you are unconditionally allowing all generated columns to be copied.
>
> I think this is assuming that the table sync code (in the next patch
> 0003?) is going to explicitly name all the columns it wants to copy
> (so if it wants to get generated cols then it will name the generated
> cols, and if is doesn't want generated cols then it won't name them).
>
> Maybe that is OK for the logical replication tablesync case, but I am
> not sure if it will be desirable to *always* copy generated columns in
> other user scenarios.
>
> e.g. I was wondering if there should be a new COPY command option
> introduced here -- INCLUDE_GENERATED_COLUMNS (with default false) so
> then the current HEAD behaviour is unaffected unless that option is
> enabled.
>
> ~~~
>
> G2.
> The current COPY command documentation [1] says "If no column list is
> specified, all columns of the table except generated columns will be
> copied."
>
> But this 0002 patch has changed that documented behaviour, and so the
> documentation needs to be changed as well, right?
>
> ======
> Commit Message
>
> 1.
> Currently COPY command do not copy generated column. With this commit
> added support for COPY for generated column.
>
> ~
>
> The grammar/cardinality is not good here. Try some tool (Grammarly or
> chatGPT, etc) to help correct it.
>
> ======
> src/backend/commands/copy.c
>
> ======
> src/test/regress/expected/generated.out
>
> ======
> src/test/regress/sql/generated.sql
>
> 2.
> I think these COPY test cases require some explicit comments to
> describe what they are doing, and what are the expected results.
>
> Currently, I have doubts about some of this test input/output
>
> e.g.1. Why is the 'b' column sometimes specified as 1? It needs some
> explanation. Are you expecting this generated col value to be
> ignored/overwritten or what?
>
> COPY gtest1 (a, b) FROM stdin DELIMITER ' ';
> 5 1
> 6 1
> \.
>
> e.g.2. what is the reason for this new 'missing data for column "b"'
> error? Or is it some introduced quirk because "b" now cannot be
> generated since there is no value for "a"? I don't know if the
> expected *.out here is OK or not, so some test comments may help to
> clarify it.
>
> ======
> [1] https://www.postgresql.org/docs/devel/sql-copy.html
>
Hi Peter,

I have removed the changes in the COPY command. I came up with an
approach which requires changes only in tablesync code. We can COPY
generated columns during tablesync using syntax 'COPY (SELECT
column_name from table) TO STDOUT.'

I have attached the patch for the same.
v7-0001 : Not Modified
v7-0002: Support replication of generated columns during initial sync.

Thanks and Regards,
Shlok Kyal

Attachment

Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Tue, 4 Jun 2024 at 15:01, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi,
>
> Here are some review comments for patch v5-0003.
>
> ======
> 0. Whitespace warnings when the patch was applied.
>
> [postgres@CentOS7-x64 oss_postgres_misc]$ git apply
> ../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch
> ../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:29:
> trailing whitespace.
>           has no effect; the replicated data will be ignored and the subscriber
> ../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:30:
> trailing whitespace.
>           column will be filled as normal with the subscriber-side computed or
> ../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:189:
> trailing whitespace.
> (walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 &&
> warning: 3 lines add whitespace errors.
>
Fixed

> ======
> src/backend/commands/subscriptioncmds.c
>
> 1.
> - res = walrcv_exec(wrconn, cmd.data, check_columnlist ? 3 : 2, tableRow);
> + column_count = (!include_generated_column && check_gen_col) ? 4 :
> (check_columnlist ? 3 : 2);
> + res = walrcv_exec(wrconn, cmd.data, column_count, tableRow);
>
> The 'column_count' seems out of control. Won't it be far simpler to
> assign/increment the value dynamically only as required instead of the
> tricky calculation at the end which is unnecessarily difficult to
> understand?
>
I have removed this piece of code.

> ~~~
>
> 2.
> + /*
> + * If include_generated_column option is false and all the column of
> the table in the
> + * publication are generated then we should throw an error.
> + */
> + if (!isnull && !include_generated_column && check_gen_col)
> + {
> + attlist = DatumGetArrayTypeP(attlistdatum);
> + gen_col_count = DatumGetInt32(slot_getattr(slot, 4, &isnull));
> + Assert(!isnull);
> +
> + attcount = ArrayGetNItems(ARR_NDIM(attlist), ARR_DIMS(attlist));
> +
> + if (attcount != 0 && attcount == gen_col_count)
> + ereport(ERROR,
> + errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("cannot use only generated column for table \"%s.%s\" in
> publication when generated_column option is false",
> +    nspname, relname));
> + }
> +
>
> Why do you think this new logic/error is necessary?
>
> IIUC the 'include_generated_columns' should be false to match the
> existing HEAD behavior. So this scenario where your publisher-side
> table *only* has generated columns is something that could already
> happen, right? IOW, this introduced error could be a candidate for
> another discussion/thread/patch, but is it really required for this
> current patch?
>
Yes, this scenario can also happen in HEAD. For this patch I have
removed this check.

> ======
> src/backend/replication/logical/tablesync.c
>
> 3.
>   lrel->remoteid,
> - (walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 ?
> -   "AND a.attgenerated = ''" : ""),
> + (walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 &&
> + (walrcv_server_version(LogRepWorkerWalRcvConn) <= 160000 ||
> + !MySubscription->includegeneratedcolumn) ? "AND a.attgenerated = ''" : ""),
>
> This ternary within one big appendStringInfo seems quite complicated.
> Won't it be better to split the appendStringInfo into multiple parts
> so the generated-cols calculation can be done more simply?
>
Fixed

> ======
> src/test/subscription/t/011_generated.pl
>
> 4.
> I think there should be a variety of different tablesync scenarios
> (when 'include_generated_columns' is true) tested here instead of just
> one, and all varieties with lots of comments to say what they are
> doing, expectations etc.
>
> a. publisher-side gen-col "a" replicating to subscriber-side NOT
> gen-col "a" (ok, value gets replicated)
> b. publisher-side gen-col "a" replicating to subscriber-side gen-col
> (ok, but ignored)
> c. publisher-side NOT gen-col "b" replicating to subscriber-side
> gen-col "b" (error?)
>
Added the tests

> ~~
>
> 5.
> +$result = $node_subscriber->safe_psql('postgres', "SELECT a, b FROM tab3");
> +is( $result, qq(1|2
> +2|4
> +3|6), 'generated columns initial sync with include_generated_column = true');
>
> Should this say "ORDER BY..." so it will not fail if the row order
> happens to be something unanticipated?
>
Fixed

> ======
>
> 99.
> Also, see the attached file with numerous other nitpicks:
> - plural param- and var-names
> - typos in comments
> - missing spaces
> - SQL keyword should be UPPERCASE
> - etc.
>
> Please apply any/all of these if you agree with them.
Fixed

Patch 7-0002 contains all the changes. Please refer [1]
[1]: https://www.postgresql.org/message-id/CANhcyEUz0FcyR3T76b%2BNhtmvWO7o96O_oEwsLZNZksEoPmVzXw%40mail.gmail.com

Thanks and Regards,
Shlok Kyal



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Wed, 5 Jun 2024 at 05:49, Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Mon, Jun 3, 2024 at 9:52 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >
> > >
> > > The attached Patch contains the suggested changes.
> > >
> >
> > Hi,
> >
> > Currently, COPY command does not work for generated columns and
> > therefore, COPY of generated column is not supported during tablesync
> > process. So, in patch v4-0001 we added a check to allow replication of
> > the generated column only if 'copy_data = false'.
> >
> > I am attaching patches to resolve the above issues.
> >
> > v5-0001: not changed
> > v5-0002: Support COPY of generated column
> > v5-0003: Support COPY of generated column during tablesync process
> >
>
> Hi Shlok, I have a question about patch v5-0003.
>
> According to the patch 0001 docs "If the subscriber-side column is
> also a generated column then this option has no effect; the replicated
> data will be ignored and the subscriber column will be filled as
> normal with the subscriber-side computed or default data".
>
> Doesn't this mean it will be a waste of effort/resources to COPY any
> column value where the subscriber-side column is generated since we
> know that any copied value will be ignored anyway?
>
> But I don't recall seeing any comment or logic for this kind of copy
> optimisation in the patch 0003. Is this already accounted for
> somewhere and I missed it, or is my understanding wrong?
Your understanding is correct.
With v7-0002, if a subscriber-side column is generated, then we do not
include that column in the column list during COPY. This will address
the above issue.

Patch 7-0002 contains all the changes. Please refer [1]
[1]: https://www.postgresql.org/message-id/CANhcyEUz0FcyR3T76b%2BNhtmvWO7o96O_oEwsLZNZksEoPmVzXw%40mail.gmail.com

Thanks and Regards,
Shlok Kyal



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Thu, 6 Jun 2024 at 08:29, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Shlok and Shubham,
>
> Thanks for updating the patch!
>
> I briefly checked the v5-0002. IIUC, your patch allows to copy generated
> columns unconditionally. I think the behavior affects many people so that it is
> hard to get agreement.
>
> Can we add a new option like `GENERATED_COLUMNS [boolean]`? If the default is set
> to off, we can keep the current specification.
>
> Thought?
Hi Kuroda-san,

I agree that we should not allow to copy generated columns unconditionally.
With patch v7-0002, I have used a different approach which does not
require any code changes in COPY.

Please refer [1] for patch v7-0002.
[1]: https://www.postgresql.org/message-id/CANhcyEUz0FcyR3T76b%2BNhtmvWO7o96O_oEwsLZNZksEoPmVzXw%40mail.gmail.com

Thanks and Regards,
Shlok Kyal



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Fri, 14 Jun 2024 at 15:52, Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> The attached Patch contains the suggested changes.
>

Hi Shubham, thanks for providing a patch.
I have some comments for v6-0001.

1. create_subscription.sgml
There is repetition of the same line.

+         <para>
+          Specifies whether the generated columns present in the tables
+          associated with the subscription should be replicated. If the
+          subscriber-side column is also a generated column then this option
+          has no effect; the replicated data will be ignored and the subscriber
+          column will be filled as normal with the subscriber-side computed or
+          default data.
+          <literal>false</literal>.
+         </para>
+
+         <para>
+          This parameter can only be set true if
<literal>copy_data</literal> is
+          set to <literal>false</literal>. If the subscriber-side
column is also a
+          generated column then this option has no effect; the
replicated data will
+          be ignored and the subscriber column will be filled as
normal with the
+          subscriber-side computed or default data.
+         </para>

==============================
2. subscriptioncmds.c

2a. The macro name should be in uppercase. We can use a short name
like 'SUBOPT_INCLUDE_GEN_COL'. Thought?
+#define SUBOPT_include_generated_columns 0x00010000

2b.Update macro name accordingly
+ if (IsSet(supported_opts, SUBOPT_include_generated_columns))
+ opts->include_generated_columns = false;

2c. Update macro name accordingly
+ else if (IsSet(supported_opts, SUBOPT_include_generated_columns) &&
+ strcmp(defel->defname, "include_generated_columns") == 0)
+ {
+ if (IsSet(opts->specified_opts, SUBOPT_include_generated_columns))
+ errorConflictingDefElem(defel, pstate);
+
+ opts->specified_opts |= SUBOPT_include_generated_columns;
+ opts->include_generated_columns = defGetBoolean(defel);
+ }

2d. Update macro name accordingly
+   SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER | SUBOPT_ORIGIN |
+   SUBOPT_include_generated_columns);


==============================

3. decoding_into_rel.out

3a. In comment, I think it should be "When 'include-generated-columns'
= '1' the generated column 'b' values will be replicated"
+-- When 'include-generated-columns' = '1' the generated column 'b'
values will not be replicated
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '1');
+                            data
+-------------------------------------------------------------
+ BEGIN
+ table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
+ table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
+ table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
+ COMMIT

3b. In comment, I think it should be "When 'include-generated-columns'
= '1' the generated column 'b' values will not be replicated"
+-- When 'include-generated-columns' = '0' the generated column 'b'
values will be replicated
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '0');
+                      data
+------------------------------------------------
+ BEGIN
+ table public.gencoltable: INSERT: a[integer]:4
+ table public.gencoltable: INSERT: a[integer]:5
+ table public.gencoltable: INSERT: a[integer]:6
+ COMMIT
+(5 rows)

=========================

4. Here names for both the tests are the same. I think we should use
different names.

+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
+is( $result, qq(4|8
+5|10), 'generated columns replicated to non-generated column on subscriber');
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)");
+
+$node_publisher->wait_for_catchup('sub3');
+
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
+is( $result, qq(4|24
+5|25), 'generated columns replicated to non-generated column on subscriber');

Thanks and Regards,
Shlok Kyal



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Fri, 14 Jun 2024 at 15:52, Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> > Thanks for the updated patch, few comments:
> > 1) The option name seems wrong here:
> > In one place include_generated_column is specified and other place
> > include_generated_columns is specified:
> >
> > +               else if (IsSet(supported_opts,
> > SUBOPT_INCLUDE_GENERATED_COLUMN) &&
> > +                                strcmp(defel->defname,
> > "include_generated_column") == 0)
> > +               {
> > +                       if (IsSet(opts->specified_opts,
> > SUBOPT_INCLUDE_GENERATED_COLUMN))
> > +                               errorConflictingDefElem(defel, pstate);
> > +
> > +                       opts->specified_opts |= SUBOPT_INCLUDE_GENERATED_COLUMN;
> > +                       opts->include_generated_column = defGetBoolean(defel);
> > +               }
>
> Fixed.
>
> > diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
> > index d453e224d9..e8ff752fd9 100644
> > --- a/src/bin/psql/tab-complete.c
> > +++ b/src/bin/psql/tab-complete.c
> > @@ -3365,7 +3365,7 @@ psql_completion(const char *text, int start, int end)
> >                 COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
> >                                           "disable_on_error",
> > "enabled", "failover", "origin",
> >                                           "password_required",
> > "run_as_owner", "slot_name",
> > -                                         "streaming",
> > "synchronous_commit", "two_phase");
> > +                                         "streaming",
> > "synchronous_commit", "two_phase","include_generated_columns");
> >
> > 2) This small data table need not have a primary key column as it will
> > create an index and insertion will happen in the index too.
> > +-- check include-generated-columns option with generated column
> > +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
> > AS (a * 2) STORED);
> > +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> > NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> > 'include-generated-columns', '1');
>
> Fixed.
>
> > 3) Please add a test case for this:
> > +          set to <literal>false</literal>. If the subscriber-side
> > column is also a
> > +          generated column then this option has no effect; the
> > replicated data will
> > +          be ignored and the subscriber column will be filled as
> > normal with the
> > +          subscriber-side computed or default data.
>
> Added the required test case.
>
> > 4) You can use a new style of ereport to remove the brackets around errcode
> > 4.a)
> > +                       else if (!parse_bool(strVal(elem->arg),
> > &data->include_generated_columns))
> > +                               ereport(ERROR,
> > +
> > (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> > +                                                errmsg("could not
> > parse value \"%s\" for parameter \"%s\"",
> > +
> > strVal(elem->arg), elem->defname)));
> >
> > 4.b) similarly here too:
> > +               ereport(ERROR,
> > +                               (errcode(ERRCODE_SYNTAX_ERROR),
> > +               /*- translator: both %s are strings of the form
> > "option = value" */
> > +                                       errmsg("%s and %s are mutually
> > exclusive options",
> > +                                               "copy_data = true",
> > "include_generated_column = true")));
> >
> > 4.c) similarly here too:
> > +                       if (include_generated_columns_option_given)
> > +                               ereport(ERROR,
> > +                                               (errcode(ERRCODE_SYNTAX_ERROR),
> > +                                                errmsg("conflicting
> > or redundant options")));
>
> Fixed.
>
> > 5) These variable names can be changed to keep it smaller, something
> > like gencol or generatedcol or gencolumn, etc
> > +++ b/src/include/catalog/pg_subscription.h
> > @@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
> > BKI_SHARED_RELATION BKI_ROW
> >   * slots) in the upstream database are enabled
> >   * to be synchronized to the standbys. */
> >
> > + bool subincludegeneratedcolumn; /* True if generated columns must be
> > published */
> > +
> >  #ifdef CATALOG_VARLEN /* variable-length fields start here */
> >   /* Connection string to the publisher */
> >   text subconninfo BKI_FORCE_NOT_NULL;
> > @@ -157,6 +159,7 @@ typedef struct Subscription
> >   List    *publications; /* List of publication names to subscribe to */
> >   char    *origin; /* Only publish data originating from the
> >   * specified origin */
> > + bool includegeneratedcolumn; /* publish generated column data */
> >  } Subscription;
>
> Fixed.
>
> The attached Patch contains the suggested changes.

Few comments:
1) Here tab1 and tab2 are exactly the same tables, just check if the
table tab1 itself can be used for your tests.
@@ -24,20 +24,50 @@ $node_publisher->safe_psql('postgres',
        "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS
AS (a * 2) STORED)"
 );
+$node_publisher->safe_psql('postgres',
+       "CREATE TABLE tab2 (a int PRIMARY KEY, b int GENERATED ALWAYS
AS (a * 2) STORED)"
+);

2) We can document  that the include_generate_columns option cannot be altered.

3) You can mention that include-generated-columns is true by default
and generated column data will be selected
+-- When 'include-generated-columns' is not set
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+                            data
+-------------------------------------------------------------
+ BEGIN
+ table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
+ table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
+ table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
+ COMMIT
+(5 rows)

4)  The comment seems to be wrong here, the comment says b will not be
replicated but b is being selected:
-- When 'include-generated-columns' = '1' the generated column 'b'
values will not be replicated
INSERT INTO gencoltable (a) VALUES (1), (2), (3);
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '1');
                            data
-------------------------------------------------------------
 BEGIN
 table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
 table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
 table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
 COMMIT
(5 rows)

5)  Similarly here too the comment seems to be wrong, the comment says
b will not replicated but b is not being selected:
INSERT INTO gencoltable (a) VALUES (4), (5), (6);
-- When 'include-generated-columns' = '0' the generated column 'b'
values will be replicated
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '0');
                      data
------------------------------------------------
 BEGIN
 table public.gencoltable: INSERT: a[integer]:4
 table public.gencoltable: INSERT: a[integer]:5
 table public.gencoltable: INSERT: a[integer]:6
 COMMIT
(5 rows)

6) SUBOPT_include_generated_columns change it to SUBOPT_GENERATED to
keep the name consistent:
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -72,6 +72,7 @@
 #define SUBOPT_FAILOVER                                0x00002000
 #define SUBOPT_LSN                                     0x00004000
 #define SUBOPT_ORIGIN                          0x00008000
+#define SUBOPT_include_generated_columns               0x00010000

7) The comment style seems to be inconsistent, both of them can start
in lower case
+-- check include-generated-columns option with generated column
+CREATE TABLE gencoltable (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);
+-- When 'include-generated-columns' is not set
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+                            data
+-------------------------------------------------------------
+ BEGIN
+ table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
+ table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
+ table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
+ COMMIT
+(5 rows)
+
+-- When 'include-generated-columns' = '1' the generated column 'b'
values will not be replicated

8) This could be changed to remove the insert statements by using
pg_logical_slot_peek_changes:
-- When 'include-generated-columns' is not set
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
-- When 'include-generated-columns' = '1' the generated column 'b'
values will not be replicated
INSERT INTO gencoltable (a) VALUES (1), (2), (3);
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '1');
INSERT INTO gencoltable (a) VALUES (4), (5), (6);
-- When 'include-generated-columns' = '0' the generated column 'b'
values will be replicated
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '0');
to:
-- When 'include-generated-columns' is not set
SELECT data FROM pg_logical_slot_peek_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
-- When 'include-generated-columns' = '1' the generated column 'b'
values will not be replicated
SELECT data FROM pg_logical_slot_peek_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '1');
-- When 'include-generated-columns' = '0' the generated column 'b'
values will be replicated
SELECT data FROM pg_logical_slot_peek_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '0');

9) In commit message  the  option used is wrong
include_generated_columns should actually be
include-generated-columns:
Usage from test_decoding plugin:
SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1',
                                      'include_generated_columns','1');

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, here are my review comments for patch v7-0001.

======
1. GENERAL - \dRs+

Shouldn't the new SUBSCRIPTION parameter be exposed via "describe"
(e.g. \dRs+ mysub) the same as the other boolean parameters?

======
Commit message

2.
When 'include_generated_columns' is false then the PUBLICATION
col-list will ignore any generated cols even when they are present in
a PUBLICATION col-list

~

Maybe you don't need to mention "PUBLICATION col-list" twice.

SUGGESTION
When 'include_generated_columns' is false, generated columns are not
replicated, even when present in a PUBLICATION col-list.

~~~

2.
CREATE SUBSCRIPTION test1 connection 'dbname=postgres host=localhost port=9999
'publication pub1;

~

2a.
(I've questioned this one in previous reviews)

What exactly is the purpose of this statement in the commit message?
Was this supposed to demonstrate the usage of the
'include_generated_columns' parameter?

~

2b.
/publication/ PUBLICATION/


~~~

3.
If the subscriber-side column is also a generated column then
thisoption has no effect; the replicated data will be ignored and the
subscriber column will be filled as normal with the subscriber-side
computed or default data.

~

Missing space: /thisoption/this option/

======
.../expected/decoding_into_rel.out

4.
+-- When 'include-generated-columns' is not set
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+                            data
+-------------------------------------------------------------
+ BEGIN
+ table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
+ table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
+ table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
+ COMMIT
+(5 rows)

Why is the default value here equivalent to
'include-generated-columns' = '1' here instead of '0'? The default for
the CREATE SUBSCRIPTION parameter 'include_generated_columns' is
false, and IMO it seems confusing for these 2 defaults to be
different. Here I think it should default to '0' *regardless* of what
the previous functionality might have done -- e.g. this is a "test
decoder" so the parameter should behave sensibly.

======
.../test_decoding/sql/decoding_into_rel.sql

NITPICK - wrong comments.

======
doc/src/sgml/protocol.sgml

5.
+    <varlistentry>
+     <term>include_generated_columns</term>
+      <listitem>
+       <para>
+        Boolean option to enable generated columns. This option controls
+        whether generated columns should be included in the string
+        representation of tuples during logical decoding in PostgreSQL.
+        The default is false.
+       </para>
+      </listitem>
+    </varlistentry>
+

Does the protocol version need to be bumped to support this new option
and should that be mentioned on this page similar to how all other
version values are mentioned?

======
doc/src/sgml/ref/create_subscription.sgml

NITPICK - some missing words/sentence.
NITPICK - some missing <literal> tags.
NITPICK - remove duplicated sentence.
NITPICK - add another <para>.

======
src/backend/commands/subscriptioncmds.c

6.
 #define SUBOPT_ORIGIN 0x00008000
+#define SUBOPT_include_generated_columns 0x00010000

Please use UPPERCASE for consistency with other macros.

======
.../libpqwalreceiver/libpqwalreceiver.c

7.
+ if (options->proto.logical.include_generated_columns &&
+ PQserverVersion(conn->streamConn) >= 170000)
+ appendStringInfoString(&cmd, ", include_generated_columns 'on'");
+

IMO it makes more sense to say 'true' here instead of 'on'. It seems
like this was just cut/paste from the above code (where 'on' was
sensible).

======
src/include/catalog/pg_subscription.h

8.
@@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
BKI_SHARED_RELATION BKI_ROW
  * slots) in the upstream database are enabled
  * to be synchronized to the standbys. */

+ bool subincludegencol; /* True if generated columns must be published */
+

Not fixed as claimed. This field name ought to be plural.

/subincludegencol/subincludegencols/

~~~

9.
  char    *origin; /* Only publish data originating from the
  * specified origin */
+ bool includegencol; /* publish generated column data */
 } Subscription;

Not fixed as claimed. This field name ought to be plural.

/includegencol/includegencols/

======
src/test/subscription/t/031_column_list.pl

10.
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE tab2 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
* 2) STORED)"
+);
+
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE tab3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
+ 10) STORED)"
+);
+
 $node_subscriber->safe_psql('postgres',
  "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
* 22) STORED, c int)"
 );

+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE tab2 (a int PRIMARY KEY, b int)"
+);
+
+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE tab3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
+ 20) STORED)"
+);

IMO the test needs lots more comments to describe what it is doing:

For example, the setup deliberately has made:
* publisher-side tab2 has generated col 'b' but subscriber-side tab2
has NON-gnerated col 'b'.
* publisher-side tab3 has generated col 'b' but subscriber-side tab2
has DIFFERENT COMPUTATION generated col 'b'.

So it will be better to have comments to explain all this instead of
having to figure it out.

~~~

11.
 # data for initial sync

 $node_publisher->safe_psql('postgres',
  "INSERT INTO tab1 (a) VALUES (1), (2), (3)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab2 (a) VALUES (1), (2), (3)");

 $node_publisher->safe_psql('postgres',
- "CREATE PUBLICATION pub1 FOR ALL TABLES");
+ "CREATE PUBLICATION pub1 FOR TABLE tab1");
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION pub2 FOR TABLE tab2");
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION pub3 FOR TABLE tab3");
+

# Wait for initial sync of all subscriptions
$node_subscriber->wait_for_subscription_sync;

my $result = $node_subscriber->safe_psql('postgres', "SELECT a, b FROM tab1");
is( $result, qq(1|22
2|44
3|66), 'generated columns initial sync');

~

IMO (and for completeness) it would be better to INSERT data for all
the tables and alsot to validate that tables tab2 and tab3 has zero
rows replicated. Yes, I know there is 'copy_data=false', but it is
just easier to see all the tables instead of guessing why some are
omitted, and anyway this test case will be needed after the next patch
implements the COPY support for gen-cols.

~~~

12.
+$node_publisher->safe_psql('postgres', "INSERT INTO tab2 VALUES (4), (5)");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
+is( $result, qq(4|8
+5|10), 'generated columns replicated to non-generated column on subscriber');
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)");
+
+$node_publisher->wait_for_catchup('sub3');
+
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
+is( $result, qq(4|24
+5|25), 'generated columns replicated to non-generated column on subscriber');
+

Here also I think there should be explicit comments about what these
cases are testing, what results you are expecting, and why. The
comments will look something like the message parameter of those
safe_psql(...)

e.g.
# confirm generated columns ARE replicated when the subscriber-side
column is not generated

e.g.
# confirm generated columns are NOT replicated when the
subscriber-side column is also generated

======

99.
Please also see my nitpicks attachment patch for various other
cosmetic and docs problems, and apply theseif you agree:
- documentation wording/rendering
- wrong comments
- spacing
- etc.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, here are my review comments for patch v7-0002

======
Commit Message

NITPICKS
- rearrange paragraphs
- typo "donot"
- don't start a sentence with "And"
- etc.

Please see the attachment for my suggested commit message text updates
and take from it whatever you agree with.

======
doc/src/sgml/ref/create_subscription.sgml

1.
+          If the subscriber-side column is also a generated column
then this option
+          has no effect; the replicated data will be ignored and the subscriber
+          column will be filled as normal with the subscriber-side computed or
+          default data. And during table synchronization, the data
corresponding to
+          the generated column on subscriber-side will not be sent from the
+          publisher to the subscriber.

This text already mentions subscriber-side generated cols. IMO you
don't need to say anything at all about table synchronization --
that's just an internal code optimization, which is not something the
user needs to know about. IOW, the entire last sentence ("And
during...") should be removed.

======
src/backend/replication/logical/relation.c

2. logicalrep_rel_open

- if (attr->attisdropped)
+ if (attr->attisdropped ||
+ (!MySubscription->includegencol && attr->attgenerated))
  {
  entry->attrmap->attnums[i] = -1;
  continue;

~

Maybe I'm mistaken, but isn't this code for skipping checking for
"missing" subscriber-side (aka local) columns? Can't it just
unconditionally skip every attr->attgenerated -- i.e. why does it
matter if the MySubscription->includegencol was set or not?

======
src/backend/replication/logical/tablesync.c

3. make_copy_attnamelist

- for (i = 0; i < rel->remoterel.natts; i++)
+ desc = RelationGetDescr(rel->localrel);
+
+ for (i = 0; i < desc->natts; i++)
  {
- attnamelist = lappend(attnamelist,
-   makeString(rel->remoterel.attnames[i]));
+ int attnum;
+ Form_pg_attribute attr = TupleDescAttr(desc, i);
+
+ if (!attr->attgenerated)
+ continue;
+
+ attnum = logicalrep_rel_att_by_name(&rel->remoterel,
+ NameStr(attr->attname));
+
+ /*
+ * Check if subscription table have a generated column with same
+ * column name as a non-generated column in the corresponding
+ * publication table.
+ */
+ if (attnum >=0 && !attgenlist[attnum])
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical replication target relation \"%s.%s\" is missing
replicated column: \"%s\"",
+ rel->remoterel.nspname, rel->remoterel.relname, NameStr(attr->attname))));
+
+ if (attnum >= 0)
+ gencollist = lappend_int(gencollist, attnum);
  }

~

NITPICK - Use C99-style for loop variables
NITPICK - Typo in comment
NITPICK - spaces

~

3a.
I think above code should be refactored so there is only one check for
"if (attnum >= 0)" -- e.g. other condition should be nested.

~

3b.
That ERROR message says "missing replicated column", but that doesn't
seem much like what the code-comment was saying this code is about.

~~~

4.
+ for (i = 0; i < rel->remoterel.natts; i++)
+ {
+
+ if (gencollist != NIL && j < gencollist->length &&
+ list_nth_int(gencollist, j) == i)
+ j++;
+ else
+ attnamelist = lappend(attnamelist,
+   makeString(rel->remoterel.attnames[i]));
+ }

NITPICK - Use C99-style for loop variables
NITPICK - Unnecessary blank lines

~

IIUC the subscriber-side table and the publisher-side table do NOT
have to have all the columns in identical order for the logical
replication to work correcly. AFAIK it works fine so long as the
column names match for the replicated columns. Therefore, I am
suspicious that this new patch code seems to be imposing some new
ordering assumptions/restrictions (e.g. list_nth_int stuff) which are
not current requirements.

~~~

copy_table:

NITPICK - comment typo
NITPICK - comment wording

~

5.
+ int i = 0;
+ ListCell *l;
+
  appendStringInfoString(&cmd, "COPY (SELECT ");
- for (int i = 0; i < lrel.natts; i++)
+ foreach(l, attnamelist)
  {
- appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i]));
- if (i < lrel.natts - 1)
+ appendStringInfoString(&cmd, quote_identifier(strVal(lfirst(l))));
+ if (i < attnamelist->length - 1)
  appendStringInfoString(&cmd, ", ");
+ i++;
  }
IIUC for new code like this, it is preferred to use the foreach*
macros instead of ListCell.

======
src/test/regress/sql/subscription.sql

6.
--- fail - copy_data and include_generated_columns are mutually
exclusive options
-CREATE SUBSCRIPTION sub2 CONNECTION 'dbname=regress_doesnotexist'
PUBLICATION testpub WITH (include_generated_columns = true);
-ERROR:  copy_data = true and include_generated_columns = true are
mutually exclusive options

It is OK to delete this test now but IMO still needs to be some
"include_generated_columns must be boolean" test cases (e.g. same as
there was two_phase). Actually, this should probably be done by the
0001 patch.

======
src/test/subscription/t/011_generated.pl

7.
All the PRIMARY KEY stuff may be overkill. Are primary keys really
needed for these tests?

~~~

8.
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE tab4 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
* 2) STORED, c int GENERATED ALWAYS AS (a * 2) STORED)"
+);
+
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE tab5 (a int PRIMARY KEY, b int)"
+);
+

Maybe add comments on what is special about all these tables, so don't
have to read the tests later to deduce their purpose.

tab4: publisher-side generated col 'b' and 'c'  ==> subscriber-side
non-generated col 'b', and generated-col 'c'
tab5: publisher-side non-generated col 'b' --> subscriber-side
non-generated col 'b'

~~~

9.
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION sub4 CONNECTION '$publisher_connstr'
PUBLICATION pub4 WITH (include_generated_columns = true)"
+ );
+

All the publications are created together, and all the subscriptions
are created together except for 'sub5'. Consider including a comment
to say why you deliberately created the 'sub5' subscription separate
from all others.

======

99.
Please also see my code nitpicks attachment patch for various other
cosmetic problems, and apply them if you agree.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
> Hi Shubham, thanks for providing a patch.
> I have some comments for v6-0001.
>
> 1. create_subscription.sgml
> There is repetition of the same line.
>
> +         <para>
> +          Specifies whether the generated columns present in the tables
> +          associated with the subscription should be replicated. If the
> +          subscriber-side column is also a generated column then this option
> +          has no effect; the replicated data will be ignored and the subscriber
> +          column will be filled as normal with the subscriber-side computed or
> +          default data.
> +          <literal>false</literal>.
> +         </para>
> +
> +         <para>
> +          This parameter can only be set true if
> <literal>copy_data</literal> is
> +          set to <literal>false</literal>. If the subscriber-side
> column is also a
> +          generated column then this option has no effect; the
> replicated data will
> +          be ignored and the subscriber column will be filled as
> normal with the
> +          subscriber-side computed or default data.
> +         </para>
>
> ==============================
> 2. subscriptioncmds.c
>
> 2a. The macro name should be in uppercase. We can use a short name
> like 'SUBOPT_INCLUDE_GEN_COL'. Thought?
> +#define SUBOPT_include_generated_columns 0x00010000
>
> 2b.Update macro name accordingly
> + if (IsSet(supported_opts, SUBOPT_include_generated_columns))
> + opts->include_generated_columns = false;
>
> 2c. Update macro name accordingly
> + else if (IsSet(supported_opts, SUBOPT_include_generated_columns) &&
> + strcmp(defel->defname, "include_generated_columns") == 0)
> + {
> + if (IsSet(opts->specified_opts, SUBOPT_include_generated_columns))
> + errorConflictingDefElem(defel, pstate);
> +
> + opts->specified_opts |= SUBOPT_include_generated_columns;
> + opts->include_generated_columns = defGetBoolean(defel);
> + }
>
> 2d. Update macro name accordingly
> +   SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER | SUBOPT_ORIGIN |
> +   SUBOPT_include_generated_columns);
>
>
> ==============================
>
> 3. decoding_into_rel.out
>
> 3a. In comment, I think it should be "When 'include-generated-columns'
> = '1' the generated column 'b' values will be replicated"
> +-- When 'include-generated-columns' = '1' the generated column 'b'
> values will not be replicated
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '1');
> +                            data
> +-------------------------------------------------------------
> + BEGIN
> + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
> + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
> + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
> + COMMIT
>
> 3b. In comment, I think it should be "When 'include-generated-columns'
> = '1' the generated column 'b' values will not be replicated"
> +-- When 'include-generated-columns' = '0' the generated column 'b'
> values will be replicated
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '0');
> +                      data
> +------------------------------------------------
> + BEGIN
> + table public.gencoltable: INSERT: a[integer]:4
> + table public.gencoltable: INSERT: a[integer]:5
> + table public.gencoltable: INSERT: a[integer]:6
> + COMMIT
> +(5 rows)
>
> =========================
>
> 4. Here names for both the tests are the same. I think we should use
> different names.
>
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
> +is( $result, qq(4|8
> +5|10), 'generated columns replicated to non-generated column on subscriber');
> +
> +$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)");
> +
> +$node_publisher->wait_for_catchup('sub3');
> +
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
> +is( $result, qq(4|24
> +5|25), 'generated columns replicated to non-generated column on subscriber');

All the comments are handled.

The attached Patch contains all the suggested changes.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
> Few comments:
> 1) Here tab1 and tab2 are exactly the same tables, just check if the
> table tab1 itself can be used for your tests.
> @@ -24,20 +24,50 @@ $node_publisher->safe_psql('postgres',
>         "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS
> AS (a * 2) STORED)"
>  );
> +$node_publisher->safe_psql('postgres',
> +       "CREATE TABLE tab2 (a int PRIMARY KEY, b int GENERATED ALWAYS
> AS (a * 2) STORED)"
> +);

On the subscription side the tables have different descriptions, so we
need to have different tables on the publisher side.

> 2) We can document  that the include_generate_columns option cannot be altered.
>
> 3) You can mention that include-generated-columns is true by default
> and generated column data will be selected
> +-- When 'include-generated-columns' is not set
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> +                            data
> +-------------------------------------------------------------
> + BEGIN
> + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
> + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
> + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
> + COMMIT
> +(5 rows)
>
> 4)  The comment seems to be wrong here, the comment says b will not be
> replicated but b is being selected:
> -- When 'include-generated-columns' = '1' the generated column 'b'
> values will not be replicated
> INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '1');
>                             data
> -------------------------------------------------------------
>  BEGIN
>  table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
>  table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
>  table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
>  COMMIT
> (5 rows)
>
> 5)  Similarly here too the comment seems to be wrong, the comment says
> b will not replicated but b is not being selected:
> INSERT INTO gencoltable (a) VALUES (4), (5), (6);
> -- When 'include-generated-columns' = '0' the generated column 'b'
> values will be replicated
> SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '0');
>                       data
> ------------------------------------------------
>  BEGIN
>  table public.gencoltable: INSERT: a[integer]:4
>  table public.gencoltable: INSERT: a[integer]:5
>  table public.gencoltable: INSERT: a[integer]:6
>  COMMIT
> (5 rows)
>
> 6) SUBOPT_include_generated_columns change it to SUBOPT_GENERATED to
> keep the name consistent:
> --- a/src/backend/commands/subscriptioncmds.c
> +++ b/src/backend/commands/subscriptioncmds.c
> @@ -72,6 +72,7 @@
>  #define SUBOPT_FAILOVER                                0x00002000
>  #define SUBOPT_LSN                                     0x00004000
>  #define SUBOPT_ORIGIN                          0x00008000
> +#define SUBOPT_include_generated_columns               0x00010000
>
> 7) The comment style seems to be inconsistent, both of them can start
> in lower case
> +-- check include-generated-columns option with generated column
> +CREATE TABLE gencoltable (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +-- When 'include-generated-columns' is not set
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> +                            data
> +-------------------------------------------------------------
> + BEGIN
> + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
> + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
> + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
> + COMMIT
> +(5 rows)
> +
> +-- When 'include-generated-columns' = '1' the generated column 'b'
> values will not be replicated
>
> 8) This could be changed to remove the insert statements by using
> pg_logical_slot_peek_changes:
> -- When 'include-generated-columns' is not set
> SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> -- When 'include-generated-columns' = '1' the generated column 'b'
> values will not be replicated
> INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '1');
> INSERT INTO gencoltable (a) VALUES (4), (5), (6);
> -- When 'include-generated-columns' = '0' the generated column 'b'
> values will be replicated
> SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '0');
> to:
> -- When 'include-generated-columns' is not set
> SELECT data FROM pg_logical_slot_peek_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> -- When 'include-generated-columns' = '1' the generated column 'b'
> values will not be replicated
> SELECT data FROM pg_logical_slot_peek_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '1');
> -- When 'include-generated-columns' = '0' the generated column 'b'
> values will be replicated
> SELECT data FROM pg_logical_slot_peek_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '0');
>
> 9) In commit message  the  option used is wrong
> include_generated_columns should actually be
> include-generated-columns:
> Usage from test_decoding plugin:
> SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1',
>                                       'include_generated_columns','1');

All the comments are handled.

Patch v8-0001 contains all the changes required. See [1] for the changes added.

[1] https://www.postgresql.org/message-id/CAHv8Rj%2BAi0CgtXiAga82bWpWB8fVcOWycNyJ_jqXm788v3R8rQ%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Mon, Jun 17, 2024 at 1:57 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, here are my review comments for patch v7-0001.
>
> ======
> 1. GENERAL - \dRs+
>
> Shouldn't the new SUBSCRIPTION parameter be exposed via "describe"
> (e.g. \dRs+ mysub) the same as the other boolean parameters?
>
> ======
> Commit message
>
> 2.
> When 'include_generated_columns' is false then the PUBLICATION
> col-list will ignore any generated cols even when they are present in
> a PUBLICATION col-list
>
> ~
>
> Maybe you don't need to mention "PUBLICATION col-list" twice.
>
> SUGGESTION
> When 'include_generated_columns' is false, generated columns are not
> replicated, even when present in a PUBLICATION col-list.
>
> ~~~
>
> 2.
> CREATE SUBSCRIPTION test1 connection 'dbname=postgres host=localhost port=9999
> 'publication pub1;
>
> ~
>
> 2a.
> (I've questioned this one in previous reviews)
>
> What exactly is the purpose of this statement in the commit message?
> Was this supposed to demonstrate the usage of the
> 'include_generated_columns' parameter?
>
> ~
>
> 2b.
> /publication/ PUBLICATION/
>
>
> ~~~
>
> 3.
> If the subscriber-side column is also a generated column then
> thisoption has no effect; the replicated data will be ignored and the
> subscriber column will be filled as normal with the subscriber-side
> computed or default data.
>
> ~
>
> Missing space: /thisoption/this option/
>
> ======
> .../expected/decoding_into_rel.out
>
> 4.
> +-- When 'include-generated-columns' is not set
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> +                            data
> +-------------------------------------------------------------
> + BEGIN
> + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
> + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
> + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
> + COMMIT
> +(5 rows)
>
> Why is the default value here equivalent to
> 'include-generated-columns' = '1' here instead of '0'? The default for
> the CREATE SUBSCRIPTION parameter 'include_generated_columns' is
> false, and IMO it seems confusing for these 2 defaults to be
> different. Here I think it should default to '0' *regardless* of what
> the previous functionality might have done -- e.g. this is a "test
> decoder" so the parameter should behave sensibly.
>
> ======
> .../test_decoding/sql/decoding_into_rel.sql
>
> NITPICK - wrong comments.
>
> ======
> doc/src/sgml/protocol.sgml
>
> 5.
> +    <varlistentry>
> +     <term>include_generated_columns</term>
> +      <listitem>
> +       <para>
> +        Boolean option to enable generated columns. This option controls
> +        whether generated columns should be included in the string
> +        representation of tuples during logical decoding in PostgreSQL.
> +        The default is false.
> +       </para>
> +      </listitem>
> +    </varlistentry>
> +
>
> Does the protocol version need to be bumped to support this new option
> and should that be mentioned on this page similar to how all other
> version values are mentioned?

I already did the Backward Compatibility test earlier and decided that
protocol bump is not needed.

> doc/src/sgml/ref/create_subscription.sgml
>
> NITPICK - some missing words/sentence.
> NITPICK - some missing <literal> tags.
> NITPICK - remove duplicated sentence.
> NITPICK - add another <para>.
>
> ======
> src/backend/commands/subscriptioncmds.c
>
> 6.
>  #define SUBOPT_ORIGIN 0x00008000
> +#define SUBOPT_include_generated_columns 0x00010000
>
> Please use UPPERCASE for consistency with other macros.
>
> ======
> .../libpqwalreceiver/libpqwalreceiver.c
>
> 7.
> + if (options->proto.logical.include_generated_columns &&
> + PQserverVersion(conn->streamConn) >= 170000)
> + appendStringInfoString(&cmd, ", include_generated_columns 'on'");
> +
>
> IMO it makes more sense to say 'true' here instead of 'on'. It seems
> like this was just cut/paste from the above code (where 'on' was
> sensible).
>
> ======
> src/include/catalog/pg_subscription.h
>
> 8.
> @@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
> BKI_SHARED_RELATION BKI_ROW
>   * slots) in the upstream database are enabled
>   * to be synchronized to the standbys. */
>
> + bool subincludegencol; /* True if generated columns must be published */
> +
>
> Not fixed as claimed. This field name ought to be plural.
>
> /subincludegencol/subincludegencols/
>
> ~~~
>
> 9.
>   char    *origin; /* Only publish data originating from the
>   * specified origin */
> + bool includegencol; /* publish generated column data */
>  } Subscription;
>
> Not fixed as claimed. This field name ought to be plural.
>
> /includegencol/includegencols/
>
> ======
> src/test/subscription/t/031_column_list.pl
>
> 10.
> +$node_publisher->safe_psql('postgres',
> + "CREATE TABLE tab2 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> * 2) STORED)"
> +);
> +
> +$node_publisher->safe_psql('postgres',
> + "CREATE TABLE tab3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> + 10) STORED)"
> +);
> +
>  $node_subscriber->safe_psql('postgres',
>   "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> * 22) STORED, c int)"
>  );
>
> +$node_subscriber->safe_psql('postgres',
> + "CREATE TABLE tab2 (a int PRIMARY KEY, b int)"
> +);
> +
> +$node_subscriber->safe_psql('postgres',
> + "CREATE TABLE tab3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> + 20) STORED)"
> +);
>
> IMO the test needs lots more comments to describe what it is doing:
>
> For example, the setup deliberately has made:
> * publisher-side tab2 has generated col 'b' but subscriber-side tab2
> has NON-gnerated col 'b'.
> * publisher-side tab3 has generated col 'b' but subscriber-side tab2
> has DIFFERENT COMPUTATION generated col 'b'.
>
> So it will be better to have comments to explain all this instead of
> having to figure it out.
>
> ~~~
>
> 11.
>  # data for initial sync
>
>  $node_publisher->safe_psql('postgres',
>   "INSERT INTO tab1 (a) VALUES (1), (2), (3)");
> +$node_publisher->safe_psql('postgres',
> + "INSERT INTO tab2 (a) VALUES (1), (2), (3)");
>
>  $node_publisher->safe_psql('postgres',
> - "CREATE PUBLICATION pub1 FOR ALL TABLES");
> + "CREATE PUBLICATION pub1 FOR TABLE tab1");
> +$node_publisher->safe_psql('postgres',
> + "CREATE PUBLICATION pub2 FOR TABLE tab2");
> +$node_publisher->safe_psql('postgres',
> + "CREATE PUBLICATION pub3 FOR TABLE tab3");
> +
>
> # Wait for initial sync of all subscriptions
> $node_subscriber->wait_for_subscription_sync;
>
> my $result = $node_subscriber->safe_psql('postgres', "SELECT a, b FROM tab1");
> is( $result, qq(1|22
> 2|44
> 3|66), 'generated columns initial sync');
>
> ~
>
> IMO (and for completeness) it would be better to INSERT data for all
> the tables and alsot to validate that tables tab2 and tab3 has zero
> rows replicated. Yes, I know there is 'copy_data=false', but it is
> just easier to see all the tables instead of guessing why some are
> omitted, and anyway this test case will be needed after the next patch
> implements the COPY support for gen-cols.
>
> ~~~
>
> 12.
> +$node_publisher->safe_psql('postgres', "INSERT INTO tab2 VALUES (4), (5)");
> +
> +$node_publisher->wait_for_catchup('sub2');
> +
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
> +is( $result, qq(4|8
> +5|10), 'generated columns replicated to non-generated column on subscriber');
> +
> +$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)");
> +
> +$node_publisher->wait_for_catchup('sub3');
> +
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
> +is( $result, qq(4|24
> +5|25), 'generated columns replicated to non-generated column on subscriber');
> +
>
> Here also I think there should be explicit comments about what these
> cases are testing, what results you are expecting, and why. The
> comments will look something like the message parameter of those
> safe_psql(...)
>
> e.g.
> # confirm generated columns ARE replicated when the subscriber-side
> column is not generated
>
> e.g.
> # confirm generated columns are NOT replicated when the
> subscriber-side column is also generated
>
> ======
>
> 99.
> Please also see my nitpicks attachment patch for various other
> cosmetic and docs problems, and apply theseif you agree:
> - documentation wording/rendering
> - wrong comments
> - spacing
> - etc.

All the comments are handled.

Patch v8-0001 contains all the changes required. See [1] for the changes added.

[1] https://www.postgresql.org/message-id/CAHv8Rj%2BAi0CgtXiAga82bWpWB8fVcOWycNyJ_jqXm788v3R8rQ%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Peter Eisentraut
Date:
On 19.06.24 13:22, Shubham Khanna wrote:
> All the comments are handled.
> 
> The attached Patch contains all the suggested changes.

Please also take a look at the proposed patch for virtual generated 
columns [0] and consider how that would affect your patch.  I think your 
feature can only replicate *stored* generated columns.  So perhaps the 
documentation and terminology in your patch should reflect that.


[0]: 
https://www.postgresql.org/message-id/flat/a368248e-69e4-40be-9c07-6c3b5880b0a6@eisentraut.org




Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, here are my review comments for v8-0001.

======
Commit message.

1.
It seems like the patch name was accidentally omitted, so it became a
mess when it defaulted to the 1st paragraph of the commit message.

======
contrib/test_decoding/test_decoding.c

2.
+ data->include_generated_columns = true;

I previously posted a comment [1, #4] that this should default to
false; IMO it is unintuitive for the test_decoding to have an
*opposite* default behaviour compared to CREATE SUBSCRIPTION.

======
doc/src/sgml/ref/create_subscription.sgml

NITPICK - remove the inconsistent blank line in SGML

======
src/backend/commands/subscriptioncmds.c

3.
+#define SUBOPT_include_generated_columns 0x00010000

I previously posted a comment [1, #6] that this should be UPPERCASE,
but it is not yet fixed.

======
src/bin/psql/describe.c

NITPICK - move and reword the bogus comment

~

4.
+ if (pset.sversion >= 170000)
+ appendPQExpBuffer(&buf,
+ ", subincludegencols AS \"%s\"\n",
+ gettext_noop("include_generated_columns"));

4a.
For consistency with every other parameter, that column title should
be written in words "Include generated columns" (not
"include_generated_columns").

~

4b.
IMO this new column belongs with the other subscription parameter
columns (e.g. put it ahead of the "Conninfo" column).

======
src/test/subscription/t/011_generated.pl

NITPICK - fixed a comment

5.
IMO, it would be better for readability if all the matching CREATE
TABLE for publisher and subscriber are kept together, instead of the
current code which is creating all publisher tables and then creating
all subscriber tables.

~~~

6.
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
+is( $result, qq(4|8
+5|10), 'confirm generated columns ARE replicated when the
subscriber-side column is not generated');
+
...
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
+is( $result, qq(4|24
+5|25), 'confirm generated columns are NOT replicated when the
subscriber-side column is also generated');
+

6a.
These SELECT all need ORDER BY to protect against the SELECT *
returning rows in some unexpected order.

~

6b.
IMO there should be more comments here to explain how you can tell the
column was NOT replicated. E.g. it is because the result value of 'b'
is the subscriber-side computed value (which you made deliberately
different to the publisher-side computed value).

======

99.
Please also refer to the attached nitpicks top-up patch for minor
cosmetic stuff.

======
[1] https://www.postgresql.org/message-id/CAHv8RjLeZtTeXpFdoY6xCPO41HtuOPMSSZgshVdb%2BV%3Dp2YHL8Q%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Wed, 19 Jun 2024 at 21:43, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 19.06.24 13:22, Shubham Khanna wrote:
> > All the comments are handled.
> >
> > The attached Patch contains all the suggested changes.
>
> Please also take a look at the proposed patch for virtual generated
> columns [0] and consider how that would affect your patch.  I think your
> feature can only replicate *stored* generated columns.  So perhaps the
> documentation and terminology in your patch should reflect that.

This patch is unable to manage virtual generated columns because it
stores NULL values for them. Along with documentation the initial sync
command being generated also should be changed to sync data
exclusively for stored generated columns, omitting virtual ones. I
suggest treating these changes as a separate patch(0003) for future
merging or a separate commit, depending on the order of patch
acceptance.

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Tue, 18 Jun 2024 at 10:57, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, here are my review comments for patch v7-0002
>
> ======
> Commit Message
>
> NITPICKS
> - rearrange paragraphs
> - typo "donot"
> - don't start a sentence with "And"
> - etc.
>
> Please see the attachment for my suggested commit message text updates
> and take from it whatever you agree with.
>
Fixed

> ======
> doc/src/sgml/ref/create_subscription.sgml
>
> 1.
> +          If the subscriber-side column is also a generated column
> then this option
> +          has no effect; the replicated data will be ignored and the subscriber
> +          column will be filled as normal with the subscriber-side computed or
> +          default data. And during table synchronization, the data
> corresponding to
> +          the generated column on subscriber-side will not be sent from the
> +          publisher to the subscriber.
>
> This text already mentions subscriber-side generated cols. IMO you
> don't need to say anything at all about table synchronization --
> that's just an internal code optimization, which is not something the
> user needs to know about. IOW, the entire last sentence ("And
> during...") should be removed.
>
Fixed

> ======
> src/backend/replication/logical/relation.c
>
> 2. logicalrep_rel_open
>
> - if (attr->attisdropped)
> + if (attr->attisdropped ||
> + (!MySubscription->includegencol && attr->attgenerated))
>   {
>   entry->attrmap->attnums[i] = -1;
>   continue;
>
> ~
>
> Maybe I'm mistaken, but isn't this code for skipping checking for
> "missing" subscriber-side (aka local) columns? Can't it just
> unconditionally skip every attr->attgenerated -- i.e. why does it
> matter if the MySubscription->includegencol was set or not?
>
In case 'include_generated_columns' is 'true'. column list in
remoterel will have an entry for generated columns.
So, in this case if we skip every attr->attgenerated, we will get a
missing column error.

> ======
> src/backend/replication/logical/tablesync.c
>
> 3. make_copy_attnamelist
>
> - for (i = 0; i < rel->remoterel.natts; i++)
> + desc = RelationGetDescr(rel->localrel);
> +
> + for (i = 0; i < desc->natts; i++)
>   {
> - attnamelist = lappend(attnamelist,
> -   makeString(rel->remoterel.attnames[i]));
> + int attnum;
> + Form_pg_attribute attr = TupleDescAttr(desc, i);
> +
> + if (!attr->attgenerated)
> + continue;
> +
> + attnum = logicalrep_rel_att_by_name(&rel->remoterel,
> + NameStr(attr->attname));
> +
> + /*
> + * Check if subscription table have a generated column with same
> + * column name as a non-generated column in the corresponding
> + * publication table.
> + */
> + if (attnum >=0 && !attgenlist[attnum])
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("logical replication target relation \"%s.%s\" is missing
> replicated column: \"%s\"",
> + rel->remoterel.nspname, rel->remoterel.relname, NameStr(attr->attname))));
> +
> + if (attnum >= 0)
> + gencollist = lappend_int(gencollist, attnum);
>   }
>
> ~
>
> NITPICK - Use C99-style for loop variables
> NITPICK - Typo in comment
> NITPICK - spaces
>
> ~
>
> 3a.
> I think above code should be refactored so there is only one check for
> "if (attnum >= 0)" -- e.g. other condition should be nested.
>
> ~
>
> 3b.
> That ERROR message says "missing replicated column", but that doesn't
> seem much like what the code-comment was saying this code is about.
>
Fixed

> ~~~
>
> 4.
> + for (i = 0; i < rel->remoterel.natts; i++)
> + {
> +
> + if (gencollist != NIL && j < gencollist->length &&
> + list_nth_int(gencollist, j) == i)
> + j++;
> + else
> + attnamelist = lappend(attnamelist,
> +   makeString(rel->remoterel.attnames[i]));
> + }
>
> NITPICK - Use C99-style for loop variables
> NITPICK - Unnecessary blank lines
>
> ~
>
> IIUC the subscriber-side table and the publisher-side table do NOT
> have to have all the columns in identical order for the logical
> replication to work correcly. AFAIK it works fine so long as the
> column names match for the replicated columns. Therefore, I am
> suspicious that this new patch code seems to be imposing some new
> ordering assumptions/restrictions (e.g. list_nth_int stuff) which are
> not current requirements.
>
> ~~~
>
> copy_table:
>
> NITPICK - comment typo
> NITPICK - comment wording
>
Fixed

> ~
>
> 5.
> + int i = 0;
> + ListCell *l;
> +
>   appendStringInfoString(&cmd, "COPY (SELECT ");
> - for (int i = 0; i < lrel.natts; i++)
> + foreach(l, attnamelist)
>   {
> - appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i]));
> - if (i < lrel.natts - 1)
> + appendStringInfoString(&cmd, quote_identifier(strVal(lfirst(l))));
> + if (i < attnamelist->length - 1)
>   appendStringInfoString(&cmd, ", ");
> + i++;
>   }
> IIUC for new code like this, it is preferred to use the foreach*
> macros instead of ListCell.
>
Fixed

> ======
> src/test/regress/sql/subscription.sql
>
> 6.
> --- fail - copy_data and include_generated_columns are mutually
> exclusive options
> -CREATE SUBSCRIPTION sub2 CONNECTION 'dbname=regress_doesnotexist'
> PUBLICATION testpub WITH (include_generated_columns = true);
> -ERROR:  copy_data = true and include_generated_columns = true are
> mutually exclusive options
>
> It is OK to delete this test now but IMO still needs to be some
> "include_generated_columns must be boolean" test cases (e.g. same as
> there was two_phase). Actually, this should probably be done by the
> 0001 patch.
>
Fixed

> ======
> src/test/subscription/t/011_generated.pl
>
> 7.
> All the PRIMARY KEY stuff may be overkill. Are primary keys really
> needed for these tests?
>
Fixed

> ~~~
>
> 8.
> +$node_publisher->safe_psql('postgres',
> + "CREATE TABLE tab4 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> * 2) STORED, c int GENERATED ALWAYS AS (a * 2) STORED)"
> +);
> +
> +$node_publisher->safe_psql('postgres',
> + "CREATE TABLE tab5 (a int PRIMARY KEY, b int)"
> +);
> +
>
> Maybe add comments on what is special about all these tables, so don't
> have to read the tests later to deduce their purpose.
>
> tab4: publisher-side generated col 'b' and 'c'  ==> subscriber-side
> non-generated col 'b', and generated-col 'c'
> tab5: publisher-side non-generated col 'b' --> subscriber-side
> non-generated col 'b'
>
Fixed

> ~~~
>
> 9.
> +$node_subscriber->safe_psql('postgres',
> + "CREATE SUBSCRIPTION sub4 CONNECTION '$publisher_connstr'
> PUBLICATION pub4 WITH (include_generated_columns = true)"
> + );
> +
>
> All the publications are created together, and all the subscriptions
> are created together except for 'sub5'. Consider including a comment
> to say why you deliberately created the 'sub5' subscription separate
> from all others.
>
Fixed

> ======
>
> 99.
> Please also see my code nitpicks attachment patch for various other
> cosmetic problems, and apply them if you agree.
>
Applied the changes

I have fixed the comments and attached the patches. I have also
attached the v9-0003 patch. It will resolve the issue suggested by
Vignesh in [1]. I have also updated the documentation for the same.
v9-0001 - Not Modified
v9-0002 - Support replication of generated columns during initial sync.
v9-0003 - Fix behaviour of tablesync for Virtual Generated Columns.

[1]: https://www.postgresql.org/message-id/CALDaNm3Ufg872XqgPvBVzXHvUVenu-8%2BGz2dyEuKq3CN0UxfKw%40mail.gmail.com

Thanks and Regards,
Shlok Kyal

Attachment

Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Thu, 20 Jun 2024 at 12:52, vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, 19 Jun 2024 at 21:43, Peter Eisentraut <peter@eisentraut.org> wrote:
> >
> > On 19.06.24 13:22, Shubham Khanna wrote:
> > > All the comments are handled.
> > >
> > > The attached Patch contains all the suggested changes.
> >
> > Please also take a look at the proposed patch for virtual generated
> > columns [0] and consider how that would affect your patch.  I think your
> > feature can only replicate *stored* generated columns.  So perhaps the
> > documentation and terminology in your patch should reflect that.
>
> This patch is unable to manage virtual generated columns because it
> stores NULL values for them. Along with documentation the initial sync
> command being generated also should be changed to sync data
> exclusively for stored generated columns, omitting virtual ones. I
> suggest treating these changes as a separate patch(0003) for future
> merging or a separate commit, depending on the order of patch
> acceptance.
>

I have addressed the issue and updated the documentation accordingly.
And created a new 0003 patch.
Please refer to v9-0003 patch for the same in [1].

[1]: https://www.postgresql.org/message-id/CANhcyEXmjLEPNgOSAtjS4YGb9JvS8w-SO9S%2BjRzzzXo2RavNWw%40mail.gmail.com

Thanks and Regards,
Shlok Kyal



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham, here are some more patch v8-0001 comments that I missed yesterday.

======
src/test/subscription/t/011_generated.pl

1.
Are the PRIMARY KEY qualifiers needed for the new tab2, tab3 tables? I
don't think so.

~~~

2.
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
+is( $result, qq(4|8
+5|10), 'confirm generated columns ARE replicated when the
subscriber-side column is not generated');
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)");
+
+$node_publisher->wait_for_catchup('sub3');
+
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
+is( $result, qq(4|24
+5|25), 'confirm generated columns are NOT replicated when the
subscriber-side column is also generated');
+

It would be prudent to do explicit "SELECT a,b" instead of "SELECT *",
for readability and to avoid any surprises.

======
Kind Regards,
Peter Smith.
Fujitsu Australia.



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, here are some review comments for patch v9-0002.

======
src/backend/replication/logical/relation.c

1. logicalrep_rel_open

- if (attr->attisdropped)
+ if (attr->attisdropped ||
+ (!MySubscription->includegencols && attr->attgenerated))

You replied to my question from the previous review [1, #2] as follows:
In case 'include_generated_columns' is 'true'. column list in
remoterel will have an entry for generated columns. So, in this case
if we skip every attr->attgenerated, we will get a missing column
error.

~

TBH, the reason seems very subtle to me. Perhaps that
"(!MySubscription->includegencols && attr->attgenerated))" condition
should be coded as a separate "if", so then you can include a comment
similar to your answer, to explain it.

======
src/backend/replication/logical/tablesync.c

make_copy_attnamelist:

NITPICK - punctuation in function comment
NITPICK - add/reword some more comments
NITPICK - rearrange comments to be closer to the code they are commenting

~

2. make_copy_attnamelist.

+ /*
+ * Construct column list for COPY.
+ */
+ for (int i = 0; i < rel->remoterel.natts; i++)
+ {
+ if(!gencollist[i])
+ attnamelist = lappend(attnamelist,
+   makeString(rel->remoterel.attnames[i]));
+ }

IIUC isn't this assuming that the attribute number (aka column order)
is the same on the subscriber side (e.g. gencollist idx) and on the
publisher side (e.g. remoterel.attnames[i]).  AFAIK logical
replication does not require this ordering must be match like that,
therefore I am suspicious this new logic is accidentally imposing new
unwanted assumptions/restrictions. I had asked the same question
before [1-#4] about this code, but there was no reply.

Ideally, there would be more test cases for when the columns
(including the generated ones) are all in different orders on the
pub/sub tables.

~~~

3. General - varnames.

It would help with understanding if the 'attgenlist' variables in all
these functions are re-named to make it very clear that this is
referring to the *remote* (publisher-side) table genlist, not the
subscriber table genlist.

~~~

4.
+ int i = 0;
+
  appendStringInfoString(&cmd, "COPY (SELECT ");
- for (int i = 0; i < lrel.natts; i++)
+ foreach_ptr(ListCell, l, attnamelist)
  {
- appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i]));
- if (i < lrel.natts - 1)
+ appendStringInfoString(&cmd, quote_identifier(strVal(l)));
+ if (i < attnamelist->length - 1)
  appendStringInfoString(&cmd, ", ");
+ i++;
  }

4a.
I think the purpose of the new macros is to avoid using ListCell, and
also 'l' is an unhelpful variable name. Shouldn't this code be more
like:
foreach_node(String, att_name, attnamelist)

~

4b.
The code can be far simpler if you just put the comma (", ") always
up-front except the *first* iteration, so you can avoid checking the
list length every time. For example:

if (i++)
  appendStringInfoString(&cmd, ", ");

======
src/test/subscription/t/011_generated.pl

5. General.

Hmm. This patch 0002 included many formatting changes to tables tab2
and tab3 and subscriptions sub2 and sub3 but they do not belong here.
The proper formatting for those needs to be done back in patch 0001
where they were introduced. Patch 0002 should just concentrate only on
the new stuff for patch 0002.

~

6. CREATE TABLES would be better in pairs

IMO it will be better if the matching CREATE TABLE for pub and sub are
kept together, instead of separating them by doing all pub then all
sub. I previously made the same comment for patch 0001, so maybe it
will be addressed next time...

~

7. SELECT *

+$result =
+  $node_subscriber->safe_psql('postgres', "SELECT * FROM tab4 ORDER BY a");

It will be prudent to do explicit "SELECT a,b,c" instead of "SELECT
*", for readability and so there are no surprises.

======

99.
Please also refer to my attached nitpicks diff for numerous cosmetic
changes, and apply if you agree.

======
[1] https://www.postgresql.org/message-id/CAHut%2BPtAsEc3PEB1KUk1kFF5tcCrDCCTcbboougO29vP1B4E2Q%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham/Shlok.

FYI,  my patch describing the current PG17 behaviour of logical
replication of generated columns was recently pushed [1].

Note that this will have some impact on your patch set. e.g. You are
changing the current replication behaviour, so the "Generated Columns"
section note will now need to be modified by your patches.

======
[1] https://github.com/postgres/postgres/commit/7a089f6e6a14ca3a5dc8822c393c6620279968b9
[2]

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, Here are some review comments for patch v9-0003

======
Commit Message

/fix/fixes/

======
1.
General. Is tablesync enough?

I don't understand why is the patch only concerned about tablesync?
Does it make sense to prevent VIRTUAL column replication during
tablesync, if you aren't also going to prevent VIRTUAL columns from
normal logical replication (e.g. when copy_data = false)? Or is this
already handled somewhere?

~~~

2.
General. Missing test.

Add some test cases to verify behaviour is different for STORED versus
VIRTUAL generated columns

======
src/sgml/ref/create_subscription.sgml

NITPICK - consider rearranging as shown in my nitpicks diff
NITPICK - use <literal> sgml markup for STORED

======
src/backend/replication/logical/tablesync.c

3.
- if ((walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 &&
- walrcv_server_version(LogRepWorkerWalRcvConn) <= 160000) ||
- !MySubscription->includegencols)
+ if (walrcv_server_version(LogRepWorkerWalRcvConn) < 170000)
+ {
+ if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000)
  appendStringInfo(&cmd, " AND a.attgenerated = ''");
+ }
+ else if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 170000)
+ {
+ if(MySubscription->includegencols)
+ appendStringInfo(&cmd, " AND a.attgenerated != 'v'");
+ else
+ appendStringInfo(&cmd, " AND a.attgenerated = ''");
+ }

IMO this logic is too tricky to remain uncommented -- please add some comments.
Also, it seems somewhat complex. I think you can achieve the same more simply:

SUGGESTION

if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000)
{
  bool gencols_allowed = walrcv_server_version(LogRepWorkerWalRcvConn) >= 170000
    && MySubscription->includegencols;
  if (gencols_allowed)
  {
    /* Replication of generated cols is supported, but not VIRTUAL cols. */
    appendStringInfo(&cmd, " AND a.attgenerated != 'v'");
  }
  else
  {
    /* Replication of generated cols is not supported. */
    appendStringInfo(&cmd, " AND a.attgenerated = ''");
  }
}

======

99.
Please refer also to my attached nitpick diffs and apply those if you agree.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham/Shlok.

FYI, there is some other documentation page that mentions generated
column replication messages.

This page [1] says:
"Next, the following message part appears for each column included in
the publication (except generated columns):"

But, with the introduction of your new feature, I think that the
"except generated columns" wording is not strictly correct anymore.
IOW that docs page needs updating but AFAICT your patches have not
addressed this yet.

======
[1] https://www.postgresql.org/docs/17/protocol-logicalrep-message-formats.html

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Thu, Jun 20, 2024 at 9:03 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, here are my review comments for v8-0001.
>
> ======
> Commit message.
>
> 1.
> It seems like the patch name was accidentally omitted, so it became a
> mess when it defaulted to the 1st paragraph of the commit message.
>
> ======
> contrib/test_decoding/test_decoding.c
>
> 2.
> + data->include_generated_columns = true;
>
> I previously posted a comment [1, #4] that this should default to
> false; IMO it is unintuitive for the test_decoding to have an
> *opposite* default behaviour compared to CREATE SUBSCRIPTION.
>
> ======
> doc/src/sgml/ref/create_subscription.sgml
>
> NITPICK - remove the inconsistent blank line in SGML
>
> ======
> src/backend/commands/subscriptioncmds.c
>
> 3.
> +#define SUBOPT_include_generated_columns 0x00010000
>
> I previously posted a comment [1, #6] that this should be UPPERCASE,
> but it is not yet fixed.
>
> ======
> src/bin/psql/describe.c
>
> NITPICK - move and reword the bogus comment
>
> ~
>
> 4.
> + if (pset.sversion >= 170000)
> + appendPQExpBuffer(&buf,
> + ", subincludegencols AS \"%s\"\n",
> + gettext_noop("include_generated_columns"));
>
> 4a.
> For consistency with every other parameter, that column title should
> be written in words "Include generated columns" (not
> "include_generated_columns").
>
> ~
>
> 4b.
> IMO this new column belongs with the other subscription parameter
> columns (e.g. put it ahead of the "Conninfo" column).
>
> ======
> src/test/subscription/t/011_generated.pl
>
> NITPICK - fixed a comment
>
> 5.
> IMO, it would be better for readability if all the matching CREATE
> TABLE for publisher and subscriber are kept together, instead of the
> current code which is creating all publisher tables and then creating
> all subscriber tables.
>
> ~~~
>
> 6.
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
> +is( $result, qq(4|8
> +5|10), 'confirm generated columns ARE replicated when the
> subscriber-side column is not generated');
> +
> ...
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
> +is( $result, qq(4|24
> +5|25), 'confirm generated columns are NOT replicated when the
> subscriber-side column is also generated');
> +
>
> 6a.
> These SELECT all need ORDER BY to protect against the SELECT *
> returning rows in some unexpected order.
>
> ~
>
> 6b.
> IMO there should be more comments here to explain how you can tell the
> column was NOT replicated. E.g. it is because the result value of 'b'
> is the subscriber-side computed value (which you made deliberately
> different to the publisher-side computed value).
>
> ======
>
> 99.
> Please also refer to the attached nitpicks top-up patch for minor
> cosmetic stuff.

All the comments are handled.

The attached Patch contains all the suggested changes.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Fri, Jun 21, 2024 at 8:23 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shubham, here are some more patch v8-0001 comments that I missed yesterday.
>
> ======
> src/test/subscription/t/011_generated.pl
>
> 1.
> Are the PRIMARY KEY qualifiers needed for the new tab2, tab3 tables? I
> don't think so.
>
> ~~~
>
> 2.
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
> +is( $result, qq(4|8
> +5|10), 'confirm generated columns ARE replicated when the
> subscriber-side column is not generated');
> +
> +$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)");
> +
> +$node_publisher->wait_for_catchup('sub3');
> +
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
> +is( $result, qq(4|24
> +5|25), 'confirm generated columns are NOT replicated when the
> subscriber-side column is also generated');
> +
>
> It would be prudent to do explicit "SELECT a,b" instead of "SELECT *",
> for readability and to avoid any surprises.

Both the comments are handled.

Patch v9-0001 contains all the changes required. See [1] for the changes added.

[1] https://www.postgresql.org/message-id/CAHv8Rj%2B6kwOGmn5MsRaTmciJDxtvNsyszMoPXV62OGPGzkxrCg%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



RE: Pgoutput not capturing the generated columns

From
"Hayato Kuroda (Fujitsu)"
Date:
Hi Shubham,

Thanks for sharing new patch! You shared as v9, but it should be v10, right?
Also, since there are no commitfest entry, I registered [1]. You can rename the
title based on the needs. Currently CFbot said OK.

Anyway, below are my comments.

01. General
Your patch contains unnecessary changes. Please remove all of them. E.g., 

```
                          " s.subpublications,\n");
-
```
And
```
         appendPQExpBufferStr(query, " o.remote_lsn AS suboriginremotelsn,\n"
-                             " s.subenabled,\n");
+                            " s.subenabled,\n");
```

02. General
Again, please run the pgindent/pgperltidy.

03. test_decoding
Previously I suggested to the default value of to be include_generated_columns
should be true, so you modified like that. However, Peter suggested opposite
opinion [3] and you just revised accordingly. I think either way might be okay, but
at least you must clarify the reason why you preferred to set default to false and
changed accordingly.

04. decoding_into_rel.sql
According to the comment atop this file, this test should insert result to a table.
But added case does not - we should put them at another place. I.e., create another
file.

05. decoding_into_rel.sql
```
+-- when 'include-generated-columns' is not set
```
Can you clarify the expected behavior as a comment?

06. getSubscriptions
```
+    else
+        appendPQExpBufferStr(query,
+                        " false AS subincludegencols,\n");
```
I think the comma is not needed.
Also, this error meant that you did not test to use pg_dump for instances prior PG16.
Please verify whether we can dump subscriptions and restore them accordingly.

[1]: https://commitfest.postgresql.org/48/5068/
[2]:
https://www.postgresql.org/message-id/OSBPR01MB25529997E012DEABA8E15A02F5E52%40OSBPR01MB2552.jpnprd01.prod.outlook.com
[3]: https://www.postgresql.org/message-id/CAHut%2BPujrRQ63ju8P41tBkdjkQb4X9uEdLK_Wkauxum1MVUdfA%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, here are some patch v9-0001 comments.

I saw Kuroda-san has already posted comments for this patch so there
may be some duplication here.

======
GENERAL

1.
The later patches 0002 etc are checking to support only STORED
gencols. But, doesn't that restriction belong in this patch 0001 so
VIRTUAL columns are not decoded etc in the first place... (??)

~~~

2.
The "Generated Columns" docs mentioned in my previous review comment
[2] should be modified by this 0001 patch.

~~~

3.
I think the "Message Format" page mentioned in my previous review
comment [3] should be modified by this 0001 patch.

======
Commit message

4.
The patch name is still broken as previously mentioned [1, #1]

======
doc/src/sgml/protocol.sgml

5.
Should this docs be referring to STORED generated columns, instead of
just generated columns?

======
doc/src/sgml/ref/create_subscription.sgml

6.
Should this be docs referring to STORED generated columns, instead of
just generated columns?

======
src/bin/pg_dump/pg_dump.c

getSubscriptions:
NITPICK - tabs
NITPICK - patch removed a blank line it should not be touching
NITPICK = patch altered indents it should not be touching
NITPICK - a missing blank line that was previously present

7.
+ else
+ appendPQExpBufferStr(query,
+ " false AS subincludegencols,\n");

There is an unwanted comma here.

~

dumpSubscription
NITPICK - patch altered indents it should not be touching

======
src/bin/pg_dump/pg_dump.h

NITPICK - unnecessary blank line

======
src/bin/psql/describe.c

describeSubscriptions
NITPICK - bad indentation

8.
In my previous review [1, #4b] I suggested this new column should be
in a different order (e.g. adjacent to the other ones ahead of
'Conninfo'), but this is not yet addressed.

======
src/test/subscription/t/011_generated.pl

NITPICK - missing space in comment
NITPICK - misleading "because" wording in the comment

======

99.
See also my attached nitpicks diff, for cosmetic issues. Please apply
whatever you agree with.

======
[1] My v8-0001 review -
https://www.postgresql.org/message-id/CAHut%2BPujrRQ63ju8P41tBkdjkQb4X9uEdLK_Wkauxum1MVUdfA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAHut%2BPvsRWq9t2tEErt5ZWZCVpNFVZjfZ_owqfdjOhh4yXb_3Q%40mail.gmail.com
[3] https://www.postgresql.org/message-id/CAHut%2BPsHsT3V1wQ5uoH9ynbmWn4ZQqOe34X%2Bg37LSi7sgE_i2g%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia.

Attachment

Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Fri, 21 Jun 2024 at 09:03, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, here are some review comments for patch v9-0002.
>
> ======
> src/backend/replication/logical/relation.c
>
> 1. logicalrep_rel_open
>
> - if (attr->attisdropped)
> + if (attr->attisdropped ||
> + (!MySubscription->includegencols && attr->attgenerated))
>
> You replied to my question from the previous review [1, #2] as follows:
> In case 'include_generated_columns' is 'true'. column list in
> remoterel will have an entry for generated columns. So, in this case
> if we skip every attr->attgenerated, we will get a missing column
> error.
>
> ~
>
> TBH, the reason seems very subtle to me. Perhaps that
> "(!MySubscription->includegencols && attr->attgenerated))" condition
> should be coded as a separate "if", so then you can include a comment
> similar to your answer, to explain it.
Fixed

> ======
> src/backend/replication/logical/tablesync.c
>
> make_copy_attnamelist:
>
> NITPICK - punctuation in function comment
> NITPICK - add/reword some more comments
> NITPICK - rearrange comments to be closer to the code they are commenting
Applied the changes

> ~
>
> 2. make_copy_attnamelist.
>
> + /*
> + * Construct column list for COPY.
> + */
> + for (int i = 0; i < rel->remoterel.natts; i++)
> + {
> + if(!gencollist[i])
> + attnamelist = lappend(attnamelist,
> +   makeString(rel->remoterel.attnames[i]));
> + }
>
> IIUC isn't this assuming that the attribute number (aka column order)
> is the same on the subscriber side (e.g. gencollist idx) and on the
> publisher side (e.g. remoterel.attnames[i]).  AFAIK logical
> replication does not require this ordering must be match like that,
> therefore I am suspicious this new logic is accidentally imposing new
> unwanted assumptions/restrictions. I had asked the same question
> before [1-#4] about this code, but there was no reply.
>
> Ideally, there would be more test cases for when the columns
> (including the generated ones) are all in different orders on the
> pub/sub tables.
'gencollist' is set according to the remoterel
+           gencollist[attnum] = true;
where attnum is the attribute number of the corresponding column on remote rel.

I have also added the tests to confirm the behaviour

> ~~~
>
> 3. General - varnames.
>
> It would help with understanding if the 'attgenlist' variables in all
> these functions are re-named to make it very clear that this is
> referring to the *remote* (publisher-side) table genlist, not the
> subscriber table genlist.
Fixed

> ~~~
>
> 4.
> + int i = 0;
> +
>   appendStringInfoString(&cmd, "COPY (SELECT ");
> - for (int i = 0; i < lrel.natts; i++)
> + foreach_ptr(ListCell, l, attnamelist)
>   {
> - appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i]));
> - if (i < lrel.natts - 1)
> + appendStringInfoString(&cmd, quote_identifier(strVal(l)));
> + if (i < attnamelist->length - 1)
>   appendStringInfoString(&cmd, ", ");
> + i++;
>   }
>
> 4a.
> I think the purpose of the new macros is to avoid using ListCell, and
> also 'l' is an unhelpful variable name. Shouldn't this code be more
> like:
> foreach_node(String, att_name, attnamelist)
>
> ~
>
> 4b.
> The code can be far simpler if you just put the comma (", ") always
> up-front except the *first* iteration, so you can avoid checking the
> list length every time. For example:
>
> if (i++)
>   appendStringInfoString(&cmd, ", ");
Fixed

> ======
> src/test/subscription/t/011_generated.pl
>
> 5. General.
>
> Hmm. This patch 0002 included many formatting changes to tables tab2
> and tab3 and subscriptions sub2 and sub3 but they do not belong here.
> The proper formatting for those needs to be done back in patch 0001
> where they were introduced. Patch 0002 should just concentrate only on
> the new stuff for patch 0002.
Fixed

> ~
>
> 6. CREATE TABLES would be better in pairs
>
> IMO it will be better if the matching CREATE TABLE for pub and sub are
> kept together, instead of separating them by doing all pub then all
> sub. I previously made the same comment for patch 0001, so maybe it
> will be addressed next time...
Fixed

> ~
>
> 7. SELECT *
>
> +$result =
> +  $node_subscriber->safe_psql('postgres', "SELECT * FROM tab4 ORDER BY a");
>
> It will be prudent to do explicit "SELECT a,b,c" instead of "SELECT
> *", for readability and so there are no surprises.
Fixed

> ======
>
> 99.
> Please also refer to my attached nitpicks diff for numerous cosmetic
> changes, and apply if you agree.
Applied the changes.

> ======
> [1] https://www.postgresql.org/message-id/CAHut%2BPtAsEc3PEB1KUk1kFF5tcCrDCCTcbboougO29vP1B4E2Q%40mail.gmail.com

I have attached a v10 patch to address the comments:
v10-0001 - Not Modified
v10-0002 - Support replication of generated columns during initial sync.
v10-0003 - Fix behaviour for Virtual Generated Columns.

Thanks and Regards,
Shlok Kyal

Attachment

Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Fri, 21 Jun 2024 at 12:51, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, Here are some review comments for patch v9-0003
>
> ======
> Commit Message
>
> /fix/fixes/
Fixed

> ======
> 1.
> General. Is tablesync enough?
>
> I don't understand why is the patch only concerned about tablesync?
> Does it make sense to prevent VIRTUAL column replication during
> tablesync, if you aren't also going to prevent VIRTUAL columns from
> normal logical replication (e.g. when copy_data = false)? Or is this
> already handled somewhere?
I checked the behaviour during incremental changes. I saw during
decoding 'null' values are present for Virtual Generated Columns. I
made the relevant changes to not support replication of Virtual
generated columns.

> ~~~
>
> 2.
> General. Missing test.
>
> Add some test cases to verify behaviour is different for STORED versus
> VIRTUAL generated columns
I have not added the tests as it would give an error in cfbot.
I have added a TODO note for the same. This can be done once the
VIRTUAL generated columns patch is committted.

> ======
> src/sgml/ref/create_subscription.sgml
>
> NITPICK - consider rearranging as shown in my nitpicks diff
> NITPICK - use <literal> sgml markup for STORED
Fixed

> ======
> src/backend/replication/logical/tablesync.c
>
> 3.
> - if ((walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 &&
> - walrcv_server_version(LogRepWorkerWalRcvConn) <= 160000) ||
> - !MySubscription->includegencols)
> + if (walrcv_server_version(LogRepWorkerWalRcvConn) < 170000)
> + {
> + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000)
>   appendStringInfo(&cmd, " AND a.attgenerated = ''");
> + }
> + else if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 170000)
> + {
> + if(MySubscription->includegencols)
> + appendStringInfo(&cmd, " AND a.attgenerated != 'v'");
> + else
> + appendStringInfo(&cmd, " AND a.attgenerated = ''");
> + }
>
> IMO this logic is too tricky to remain uncommented -- please add some comments.
> Also, it seems somewhat complex. I think you can achieve the same more simply:
>
> SUGGESTION
>
> if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000)
> {
>   bool gencols_allowed = walrcv_server_version(LogRepWorkerWalRcvConn) >= 170000
>     && MySubscription->includegencols;
>   if (gencols_allowed)
>   {
>     /* Replication of generated cols is supported, but not VIRTUAL cols. */
>     appendStringInfo(&cmd, " AND a.attgenerated != 'v'");
>   }
>   else
>   {
>     /* Replication of generated cols is not supported. */
>     appendStringInfo(&cmd, " AND a.attgenerated = ''");
>   }
> }
Fixed

> ======
>
> 99.
> Please refer also to my attached nitpick diffs and apply those if you agree.
Applied the changes.

I have attached the updated patch v10 here in [1].
[1]: https://www.postgresql.org/message-id/CANhcyEUMCk6cCbw0vVZWo8FRd6ae9CmKG%3DgKP-9Q67jLn8HqtQ%40mail.gmail.com

Thanks and Regards,
Shlok Kyal



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Here are some review comments for the patch v10-0002.

======
Commit Message

1.
Note that we don't copy columns when the subscriber-side column is also
generated. Those will be filled as normal with the subscriber-side computed or
default data.

~

Now this patch also introduced some errors etc, so I think that patch
comment should be written differently to explicitly spell out
behaviour of every combination, something like the below:

Summary

when (include_generated_column = true)

* publisher not-generated column => subscriber not-generated column:
This is just normal logical replication (not changed by this patch).

* publisher not-generated column => subscriber generated column: This
will give ERROR.

* publisher generated column => subscriber not-generated column: The
publisher generated column value is copied.

* publisher generated column => subscriber generated column: The
publisher generated column value is not copied. The subscriber
generated column will be filled with the subscriber-side computed or
default data.

when (include_generated_columns = false)

* publisher not-generated column => subscriber not-generated column:
This is just normal logical replication (not changed by this patch).

* publisher not-generated column => subscriber generated column: This
will give ERROR.

* publisher generated column => subscriber not-generated column: This
will replicate nothing. Publisher generate-column is not replicated.
The subscriber column will be filled with the subscriber-side default
data.

* publisher generated column => subscriber generated column:  This
will replicate nothing. Publisher generate-column is not replicated.
The subscriber generated column will be filled with the
subscriber-side computed or default data.

======
src/backend/replication/logical/relation.c

2.
logicalrep_rel_open:

I tested some of the "missing column" logic, and got the following results:

Scenario A:
PUB
test_pub=# create table t2(a int, b int);
test_pub=# create publication pub2 for table t2;
SUB
test_sub=# create table t2(a int, b int generated always as (a*2) stored);
test_sub=# create subscription sub2 connection 'dbname=test_pub'
publication pub2 with (include_generated_columns = false);
Result:
ERROR:  logical replication target relation "public.t2" is missing
replicated column: "b"

~

Scenario B:
PUB/SUB identical to above, but subscription sub2 created "with
(include_generated_columns = true);"
Result:
ERROR:  logical replication target relation "public.t2" has a
generated column "b" but corresponding column on source relation is
not a generated column

~~~

2a. Question

Why should we get 2 different error messages for what is essentially
the same problem according to whether the 'include_generated_columns'
is false or true? Isn't the 2nd error message the more correct and
useful one for scenarios like this involving generated columns?

Thoughts?

~

2b. Missing tests?

I also noticed there seems no TAP test for the current "missing
replicated column" message. IMO there should be a new test introduced
for this because the loop involved too much bms logic to go
untested...

======
src/backend/replication/logical/tablesync.c

make_copy_attnamelist:
NITPICK - minor comment tweak
NITPICK - add some spaces after "if" code

3.
Should you pfree the gencollist at the bottom of this function when
you no longer need it, for tidiness?

~~~

4.
 static void
-fetch_remote_table_info(char *nspname, char *relname,
+fetch_remote_table_info(char *nspname, char *relname, bool **remotegenlist,
  LogicalRepRelation *lrel, List **qual)
 {
  WalRcvExecResult *res;
  StringInfoData cmd;
  TupleTableSlot *slot;
  Oid tableRow[] = {OIDOID, CHAROID, CHAROID};
- Oid attrRow[] = {INT2OID, TEXTOID, OIDOID, BOOLOID};
+ Oid attrRow[] = {INT2OID, TEXTOID, OIDOID, BOOLOID, BOOLOID};
  Oid qualRow[] = {TEXTOID};
  bool isnull;
+ bool    *remotegenlist_res;

IMO the names 'remotegenlist' and 'remotegenlist_res' should be
swapped the other way around, because it is the function parameter
that is the "result", whereas the 'remotegenlist_res' is just the
local working var for it.

~~~

5. fetch_remote_table_info

Now walrcv_server_version(LogRepWorkerWalRcvConn) is used in multiple
places, I think it will be better to assign this to a 'server_version'
variable to be used everywhere instead of having multiple function
calls.

~~~

6.
  "SELECT a.attnum,"
  "       a.attname,"
  "       a.atttypid,"
- "       a.attnum = ANY(i.indkey)"
+ "       a.attnum = ANY(i.indkey),"
+ " a.attgenerated != ''"
  "  FROM pg_catalog.pg_attribute a"
  "  LEFT JOIN pg_catalog.pg_index i"
  "       ON (i.indexrelid = pg_get_replica_identity_index(%u))"
  " WHERE a.attnum > 0::pg_catalog.int2"
- "   AND NOT a.attisdropped %s"
+ "   AND NOT a.attisdropped", lrel->remoteid);
+
+ if ((walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 &&
+ walrcv_server_version(LogRepWorkerWalRcvConn) <= 160000) ||
+ !MySubscription->includegencols)
+ appendStringInfo(&cmd, " AND a.attgenerated = ''");
+

If the server version is < PG12 then AFAIK there was no such thing as
"a.attgenerated", so shouldn't that SELECT " a.attgenerated != ''"
part also be guarded by some version checking condition like in the
WHERE? Otherwise won't it cause an ERROR for old servers?

~~~

7.
  /*
- * For non-tables and tables with row filters, we need to do COPY
- * (SELECT ...), but we can't just do SELECT * because we need to not
- * copy generated columns. For tables with any row filters, build a
- * SELECT query with OR'ed row filters for COPY.
+ * For non-tables and tables with row filters and when
+ * 'include_generated_columns' is specified as 'true', we need to do
+ * COPY (SELECT ...), as normal COPY of generated column is not
+ * supported. For tables with any row filters, build a SELECT query
+ * with OR'ed row filters for COPY.
  */

NITPICK. I felt this was not quite right. AFAIK the reasons for using
this COPY (SELECT ...) syntax is different for row-filters and
generated-columns. Anyway, I updated the comment slightly in my
nitpicks attachment. Please have a look at it to see if you agree with
the suggestions. Maybe I am wrong.

~~~

8.
- for (int i = 0; i < lrel.natts; i++)
+ foreach_ptr(String, att_name, attnamelist)

I'm not 100% sure, but isn't foreach_node the macro to use here,
rather than foreach_ptr?
======
src/test/subscription/t/011_generated.pl

9.
Please discuss with Shubham how to make all the tab1, tab2, tab3,
tab4, tab5, tab6 comments use the same kind of style/wording.
Currently, the patches 0001 and 0002 test comments are a bit
inconsistent.

~~~

10.
Related to above -- now that patch 0002 supports copy_data=true I
don't see why we need to test generated columns *both* for
copy_data=false and also for copy_data=true. IOW, is it really
necessary to have so many tables/tests? For example, I am thinking
some of those tests from patch 0001 can be re-used or just removed now
that copy_data=true works.

~~~

NITPICK - minor comment tweak

~~~

11.
For tab4 and tab6 I saw the initial sync and normal replication data
tests are all merged together, but I had expected to see the initial
sync and normal replication data tests separated so it would be
consistent with the earlier tab1, tab2, tab3 tests.

======

99.
Also, I have attached a nitpicks diff for some of the cosmetic review
comments mentioned above. Please apply whatever of these that you
agree with.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

RE: Pgoutput not capturing the generated columns

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shlok,

Thanks for updating patches! Below are my comments, maybe only for 0002.

01. General

IIUC, we are not discussed why ALTER SUBSCRIPTION ... SET include_generated_columns
is prohibit. Previously, it seems okay because there are exclusive options. But now,
such restrictions are gone. Do you have a reason in your mind? It is just not considered
yet?

02. General

According to the doc, we allow to alter a column to non-generated one, by ALTER
TABLE ... ALTER COLUMN ... DROP EXPRESSION command. Not sure, what should be
when the command is executed on the subscriber while copying the data? Should
we continue the copy or restart? How do you think?

03. Tes tcode

IIUC, REFRESH PUBLICATION can also lead the table synchronization. Can you add
a test for that?

04. Test code (maybe for 0001)

Please test the combination with TABLE ... ALTER COLUMN ... DROP EXPRESSION command.

05. logicalrep_rel_open

```
+            /*
+             * In case 'include_generated_columns' is 'false', we should skip the
+             * check of missing attrs for generated columns.
+             * In case 'include_generated_columns' is 'true', we should check if
+             * corresponding column for the generated column in publication column
+             * list is present in the subscription table.
+             */
+            if (!MySubscription->includegencols && attr->attgenerated)
+            {
+                entry->attrmap->attnums[i] = -1;
+                continue;
+            }
```

This comment is not very clear to me, because here we do not skip anything.
Can you clarify the reason why attnums[i] is set to -1 and how will it be used?

06. make_copy_attnamelist

```
+    gencollist = palloc0(MaxTupleAttributeNumber * sizeof(bool));
```

I think this array is too large. Can we reduce a size to (desc->natts * sizeof(bool))?
Also, the free'ing should be done.

07. make_copy_attnamelist

```
+    /* Loop to handle subscription table generated columns. */
+    for (int i = 0; i < desc->natts; i++)
```

IIUC, the loop is needed to find generated columns on the subscriber side, right?
Can you clarify as comment?

08. copy_table

```
+    /*
+     * Regular table with no row filter and 'include_generated_columns'
+     * specified as 'false' during creation of subscription.
+     */
```

I think this comment is not correct. After patching, all tablesync command becomes
like COPY (SELECT ...) if include_genereted_columns is set to true. Is it right?
Can we restrict only when the table has generated ones?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shlok. Here are my review comments for patch v10-0003

======
General.

1.
The patch has lots of conditions like:
if (att->attgenerated && (att->attgenerated !=
ATTRIBUTE_GENERATED_STORED || !include_generated_columns))
 continue;

IMO these are hard to read. Although more verbose, please consider if
all those (for the sake of readability) would be better re-written
like below :

if (att->generated)
{
  if (!include_generated_columns)
    continue;

  if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
    continue;
}

======
contrib/test_decoding/test_decoding.c

tuple_to_stringinfo:

NITPICK = refactored the code and comments a bit here to make it easier
NITPICK - there is no need to mention "virtual". Instead, say we only
support STORED

======
src/backend/catalog/pg_publication.c

publication_translate_columns:

NITPICK - introduced variable 'att' to simplify this code

~

2.
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+ errmsg("cannot use virtual generated column \"%s\" in publication
column list",
+    colname));

Is it better to avoid referring to "virtual" at all? Instead, consider
rearranging the wording to say something like "generated column \"%s\"
is not STORED so cannot be used in a publication column list"

~~~

pg_get_publication_tables:

NITPICK - split the condition code for readability

======
src/backend/replication/logical/relation.c

3. logicalrep_rel_open

+ if (attr->attgenerated && attr->attgenerated != ATTRIBUTE_GENERATED_STORED)
+ continue;
+

Isn't this missing some code to say "entry->attrmap->attnums[i] =
-1;", same as all the other nearby code is doing?

~~~

4.
I felt all the "generated column" logic should be kept together, so
this new condition should be combined with the other generated column
condition, like:

if (attr->attgenerated)
{
  /* comment... */
  if (!MySubscription->includegencols)
  {
    entry->attrmap->attnums[i] = -1;
    continue;
  }

  /* comment... */
  if (attr->attgenerated != ATTRIBUTE_GENERATED_STORED)
  {
    entry->attrmap->attnums[i] = -1;
    continue;
  }
}

======
src/backend/replication/logical/tablesync.c

5.
+ if (gencols_allowed)
+ {
+ /* Replication of generated cols is supported, but not VIRTUAL cols. */
+ appendStringInfo(&cmd, " AND a.attgenerated != 'v'");
+ }

Is it better here to use the ATTRIBUTE_GENERATED_VIRTUAL macro instead
of the hardwired 'v'? (Maybe add another TODO comment to revisit
this).

Alternatively, consider if it is more future-proof to rearrange so it
just says what *is* supported instead of what *isn't* supported:
e.g. "AND a.attgenerated IN ('', 's')"

======
src/test/subscription/t/011_generated.pl

NITPICK - some comments are missing the word "stored"
NITPICK - sometimes "col" should be plural "cols"
NITPICK = typo "GNERATED"

======

6.
In a previous review [1, comment #3] I mentioned that there should be
some docs updates on the "Logical Replication Message Formats" section
53.9. So, I expected patch 0001 would make some changes and then patch
0003 would have to update it again to say something about "STORED".
But all that is missing from the v10* patches.

======

99.
See also my nitpicks diff which is a top-up patch addressing all the
nitpick comments mentioned above. Please apply all of these that you
agree with.

======
[1] https://www.postgresql.org/message-id/CAHut%2BPvQ8CLq-JysTTeRj4u5SC9vTVcx3AgwTHcPUEOh-UnKcQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

On Mon, Jun 24, 2024 at 10:56 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> On Fri, 21 Jun 2024 at 09:03, Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi, here are some review comments for patch v9-0002.
> >
> > ======
> > src/backend/replication/logical/relation.c
> >
> > 1. logicalrep_rel_open
> >
> > - if (attr->attisdropped)
> > + if (attr->attisdropped ||
> > + (!MySubscription->includegencols && attr->attgenerated))
> >
> > You replied to my question from the previous review [1, #2] as follows:
> > In case 'include_generated_columns' is 'true'. column list in
> > remoterel will have an entry for generated columns. So, in this case
> > if we skip every attr->attgenerated, we will get a missing column
> > error.
> >
> > ~
> >
> > TBH, the reason seems very subtle to me. Perhaps that
> > "(!MySubscription->includegencols && attr->attgenerated))" condition
> > should be coded as a separate "if", so then you can include a comment
> > similar to your answer, to explain it.
> Fixed
>
> > ======
> > src/backend/replication/logical/tablesync.c
> >
> > make_copy_attnamelist:
> >
> > NITPICK - punctuation in function comment
> > NITPICK - add/reword some more comments
> > NITPICK - rearrange comments to be closer to the code they are commenting
> Applied the changes
>
> > ~
> >
> > 2. make_copy_attnamelist.
> >
> > + /*
> > + * Construct column list for COPY.
> > + */
> > + for (int i = 0; i < rel->remoterel.natts; i++)
> > + {
> > + if(!gencollist[i])
> > + attnamelist = lappend(attnamelist,
> > +   makeString(rel->remoterel.attnames[i]));
> > + }
> >
> > IIUC isn't this assuming that the attribute number (aka column order)
> > is the same on the subscriber side (e.g. gencollist idx) and on the
> > publisher side (e.g. remoterel.attnames[i]).  AFAIK logical
> > replication does not require this ordering must be match like that,
> > therefore I am suspicious this new logic is accidentally imposing new
> > unwanted assumptions/restrictions. I had asked the same question
> > before [1-#4] about this code, but there was no reply.
> >
> > Ideally, there would be more test cases for when the columns
> > (including the generated ones) are all in different orders on the
> > pub/sub tables.
> 'gencollist' is set according to the remoterel
> +           gencollist[attnum] = true;
> where attnum is the attribute number of the corresponding column on remote rel.
>
> I have also added the tests to confirm the behaviour
>
> > ~~~
> >
> > 3. General - varnames.
> >
> > It would help with understanding if the 'attgenlist' variables in all
> > these functions are re-named to make it very clear that this is
> > referring to the *remote* (publisher-side) table genlist, not the
> > subscriber table genlist.
> Fixed
>
> > ~~~
> >
> > 4.
> > + int i = 0;
> > +
> >   appendStringInfoString(&cmd, "COPY (SELECT ");
> > - for (int i = 0; i < lrel.natts; i++)
> > + foreach_ptr(ListCell, l, attnamelist)
> >   {
> > - appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i]));
> > - if (i < lrel.natts - 1)
> > + appendStringInfoString(&cmd, quote_identifier(strVal(l)));
> > + if (i < attnamelist->length - 1)
> >   appendStringInfoString(&cmd, ", ");
> > + i++;
> >   }
> >
> > 4a.
> > I think the purpose of the new macros is to avoid using ListCell, and
> > also 'l' is an unhelpful variable name. Shouldn't this code be more
> > like:
> > foreach_node(String, att_name, attnamelist)
> >
> > ~
> >
> > 4b.
> > The code can be far simpler if you just put the comma (", ") always
> > up-front except the *first* iteration, so you can avoid checking the
> > list length every time. For example:
> >
> > if (i++)
> >   appendStringInfoString(&cmd, ", ");
> Fixed
>
> > ======
> > src/test/subscription/t/011_generated.pl
> >
> > 5. General.
> >
> > Hmm. This patch 0002 included many formatting changes to tables tab2
> > and tab3 and subscriptions sub2 and sub3 but they do not belong here.
> > The proper formatting for those needs to be done back in patch 0001
> > where they were introduced. Patch 0002 should just concentrate only on
> > the new stuff for patch 0002.
> Fixed
>
> > ~
> >
> > 6. CREATE TABLES would be better in pairs
> >
> > IMO it will be better if the matching CREATE TABLE for pub and sub are
> > kept together, instead of separating them by doing all pub then all
> > sub. I previously made the same comment for patch 0001, so maybe it
> > will be addressed next time...
> Fixed
>
> > ~
> >
> > 7. SELECT *
> >
> > +$result =
> > +  $node_subscriber->safe_psql('postgres', "SELECT * FROM tab4 ORDER BY a");
> >
> > It will be prudent to do explicit "SELECT a,b,c" instead of "SELECT
> > *", for readability and so there are no surprises.
> Fixed
>
> > ======
> >
> > 99.
> > Please also refer to my attached nitpicks diff for numerous cosmetic
> > changes, and apply if you agree.
> Applied the changes.
>
> > ======
> > [1] https://www.postgresql.org/message-id/CAHut%2BPtAsEc3PEB1KUk1kFF5tcCrDCCTcbboougO29vP1B4E2Q%40mail.gmail.com
>
> I have attached a v10 patch to address the comments:
> v10-0001 - Not Modified
> v10-0002 - Support replication of generated columns during initial sync.
> v10-0003 - Fix behaviour for Virtual Generated Columns.
>
> Thanks and Regards,
> Shlok Kyal

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Sun, Jun 23, 2024 at 10:28 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Hi Shubham,
>
> Thanks for sharing new patch! You shared as v9, but it should be v10, right?
> Also, since there are no commitfest entry, I registered [1]. You can rename the
> title based on the needs. Currently CFbot said OK.
>
> Anyway, below are my comments.
>
> 01. General
> Your patch contains unnecessary changes. Please remove all of them. E.g.,
>
> ```
>                                                  " s.subpublications,\n");
> -
> ```
> And
> ```
>                 appendPQExpBufferStr(query, " o.remote_lsn AS suboriginremotelsn,\n"
> -                                                        " s.subenabled,\n");
> +                                                       " s.subenabled,\n");
> ```
>
> 02. General
> Again, please run the pgindent/pgperltidy.
>
> 03. test_decoding
> Previously I suggested to the default value of to be include_generated_columns
> should be true, so you modified like that. However, Peter suggested opposite
> opinion [3] and you just revised accordingly. I think either way might be okay, but
> at least you must clarify the reason why you preferred to set default to false and
> changed accordingly.

I have set the default value as true in case of test_decoding. The
reason for this is even before the new feature implementation,
generated columns were getting selected.

> 04. decoding_into_rel.sql
> According to the comment atop this file, this test should insert result to a table.
> But added case does not - we should put them at another place. I.e., create another
> file.
>
> 05. decoding_into_rel.sql
> ```
> +-- when 'include-generated-columns' is not set
> ```
> Can you clarify the expected behavior as a comment?
>
> 06. getSubscriptions
> ```
> +       else
> +               appendPQExpBufferStr(query,
> +                                               " false AS subincludegencols,\n");
> ```
> I think the comma is not needed.
> Also, this error meant that you did not test to use pg_dump for instances prior PG16.
> Please verify whether we can dump subscriptions and restore them accordingly.
>
> [1]: https://commitfest.postgresql.org/48/5068/
> [2]:
https://www.postgresql.org/message-id/OSBPR01MB25529997E012DEABA8E15A02F5E52%40OSBPR01MB2552.jpnprd01.prod.outlook.com
> [3]: https://www.postgresql.org/message-id/CAHut%2BPujrRQ63ju8P41tBkdjkQb4X9uEdLK_Wkauxum1MVUdfA%40mail.gmail.com

All the comments are handled.

The attached Patches contains all the suggested changes.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Mon, Jun 24, 2024 at 8:21 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, here are some patch v9-0001 comments.
>
> I saw Kuroda-san has already posted comments for this patch so there
> may be some duplication here.
>
> ======
> GENERAL
>
> 1.
> The later patches 0002 etc are checking to support only STORED
> gencols. But, doesn't that restriction belong in this patch 0001 so
> VIRTUAL columns are not decoded etc in the first place... (??)
>
> ~~~
>
> 2.
> The "Generated Columns" docs mentioned in my previous review comment
> [2] should be modified by this 0001 patch.
>
> ~~~
>
> 3.
> I think the "Message Format" page mentioned in my previous review
> comment [3] should be modified by this 0001 patch.
>
> ======
> Commit message
>
> 4.
> The patch name is still broken as previously mentioned [1, #1]
>
> ======
> doc/src/sgml/protocol.sgml
>
> 5.
> Should this docs be referring to STORED generated columns, instead of
> just generated columns?
>
> ======
> doc/src/sgml/ref/create_subscription.sgml
>
> 6.
> Should this be docs referring to STORED generated columns, instead of
> just generated columns?
>
> ======
> src/bin/pg_dump/pg_dump.c
>
> getSubscriptions:
> NITPICK - tabs
> NITPICK - patch removed a blank line it should not be touching
> NITPICK = patch altered indents it should not be touching
> NITPICK - a missing blank line that was previously present
>
> 7.
> + else
> + appendPQExpBufferStr(query,
> + " false AS subincludegencols,\n");
>
> There is an unwanted comma here.
>
> ~
>
> dumpSubscription
> NITPICK - patch altered indents it should not be touching
>
> ======
> src/bin/pg_dump/pg_dump.h
>
> NITPICK - unnecessary blank line
>
> ======
> src/bin/psql/describe.c
>
> describeSubscriptions
> NITPICK - bad indentation
>
> 8.
> In my previous review [1, #4b] I suggested this new column should be
> in a different order (e.g. adjacent to the other ones ahead of
> 'Conninfo'), but this is not yet addressed.
>
> ======
> src/test/subscription/t/011_generated.pl
>
> NITPICK - missing space in comment
> NITPICK - misleading "because" wording in the comment
>
> ======
>
> 99.
> See also my attached nitpicks diff, for cosmetic issues. Please apply
> whatever you agree with.
>
> ======
> [1] My v8-0001 review -
> https://www.postgresql.org/message-id/CAHut%2BPujrRQ63ju8P41tBkdjkQb4X9uEdLK_Wkauxum1MVUdfA%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/CAHut%2BPvsRWq9t2tEErt5ZWZCVpNFVZjfZ_owqfdjOhh4yXb_3Q%40mail.gmail.com
> [3] https://www.postgresql.org/message-id/CAHut%2BPsHsT3V1wQ5uoH9ynbmWn4ZQqOe34X%2Bg37LSi7sgE_i2g%40mail.gmail.com

All the comments are handled.

I have attached the updated patch v11 here in [1]. See [1] for the
changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjJpS_XDkR6OrsmMZtCBZNxeYoCdENhC0%3Dbe0rLmNvhiQw%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
>All the comments are handled.
>
> The attached Patches contain all the suggested changes.

v11-0003 patch was not getting applied, so here are the updated
patches for the same.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, here are some patch v11-0001 comments.

(BTW, I had difficulty reviewing this because something seemed strange
with the changes this patch made to the test_decoding tests).

======
General

1. Patch name

Patch name does not need to quote 'logical replication'

~

2. test_decoding tests

Multiple test_decoding tests were failing for me. There is something
very suspicious about the unexplained changes the patch made to the
expected "binary.out" and "decoding_into_rel.out" etc. I REVERTED all
those changes in my nitpicks top-up to get everything working. Please
re-confirm that all the test_decoding tests are OK!

======
Commit Message

3.
Since you are including the example usage for test_decoding, I think
it's better to include the example usage of CREATE SUBSCRIPTION also.

======
contrib/test_decoding/expected/binary.out

4.
 SELECT 'init' FROM
pg_create_logical_replication_slot('regression_slot',
'test_decoding');
- ?column?
-----------
- init
-(1 row)
-
+ERROR:  replication slot "regression_slot" already exists

Huh? Why is this unrelated expected output changed by this patch?

The test_decoding test fails for me unless I REVERT this change!! See
my nitpicks diff.

======
.../expected/decoding_into_rel.out

5.
-SELECT 'stop' FROM pg_drop_replication_slot('regression_slot');
- ?column?
-----------
- stop
-(1 row)
-

Huh? Why is this unrelated expected output changed by this patch?

The test_decoding test fails for me unless I REVERT this change!! See
my nitpicks diff.

======
.../test_decoding/sql/decoding_into_rel.sql

6.
-SELECT 'stop' FROM pg_drop_replication_slot('regression_slot');
+SELECT 'stop' FROM pg_drop_replication_slot('regression_slot');

Huh, Why does this patch change this code at all? I REVERTED this
change. See my nitpicks diff.

======
.../test_decoding/sql/generated_columns.sql

(see my nitpicks replacement file for this test)

7.
+-- test that we can insert the result of a 'include_generated_columns'
+-- into the tables created. That's really not a good idea in practical terms,
+-- but provides a nice test.

NITPICK - I didn't understand the point of this comment.  I updated
the comment according to my understanding.

~

NITPICK - The comment "when 'include-generated-columns' is not set
then columns will not be replicated" is the opposite of what the
result is. I changed this comment.

NITPICK - modified and unified wording of some of the other comments

NITPICK - changed some blank lines

======
contrib/test_decoding/test_decoding.c

8.
+ else if (strcmp(elem->defname, "include-generated-columns") == 0)
+ {
+ if (elem->arg == NULL)
+ data->include_generated_columns = true;

Is there any way to test that "elem->arg == NULL" in the
generated.sql? OTOH, if it is not possible to get here then is the
code even needed?

======
doc/src/sgml/ddl.sgml

9.
      <para>
-      Generated columns are skipped for logical replication and cannot be
-      specified in a <command>CREATE PUBLICATION</command> column list.
+      'include_generated_columns' option controls whether generated columns
+      should be included in the string representation of tuples during
+      logical decoding in PostgreSQL. The default is <literal>true</literal>.
      </para>

NITPICK - Use proper markdown instead of single quotes for the parameter.

NITPICK - I think this can be reworded slightly to provide a
cross-reference to the CREATE SUBSCRIPTION parameter for more details
(which means then we can avoid repeating details like the default
value here). PSA my nitpicks diff for an example of how I thought docs
should look.

======
doc/src/sgml/protocol.sgml

10.
+        The default is true.

No, it isn't. AFAIK you made the default behaviour true only for
'test_decoding', but the default for CREATE SUBSCRIPTION remains
*false* because that is the existing PG17 behaviour. And the default
for the 'include_generated_columns' in the protocol is *also* false to
match the CREATE SUBSCRIPTION default.

e.g. libpqwalreceiver.c only sets ", include_generated_columns 'true'"
when options->proto.logical.include_generated_columns
e.g. worker.c says: options->proto.logical.include_generated_columns =
MySubscription->includegencols;
e.g. subscriptioncmds.c sets default: opts->include_generated_columns = false;

(This confirmed my previous review expectation that using different
default behaviours for test_decoding and pgoutput would surely lead to
confusion)

~~~

11.
-     <para>
-      Next, the following message part appears for each column included in
-      the publication (except generated columns):
-     </para>
-

AFAIK you cannot just remove this entire paragraph because I thought
it was still relevant to talking about "... the following message
part". But, if you don't want to explain and cross-reference about
'include_generated_columns' then maybe it is OK just to remove the
"(except generated columns)" part?

======
src/test/subscription/t/011_generated.pl

NITPICK - comment typo /tab2/tab3/
NITPICK - remove some blank lines

~~~

12.
# the column was NOT replicated (the result value of 'b' is the
subscriber-side computed value)

NITPICK - I think this comment is wrong for the tab2 test because here
col 'b' IS replicated. I have added much more substantial test case
comments in the attached nitpicks diff. PSA.

======
src/test/subscription/t/031_column_list.pl

13.
NITPICK - IMO there is a missing word "list" in the comment. This bug
existed already on HEAD but since this patch is modifying this comment
I think we can also fix this in passing.

======
Kind Regards,
Peter Smith.
Fujitsu Australia.

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Thu, Jun 27, 2024 at 2:41 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, here are some patch v11-0001 comments.
>
> (BTW, I had difficulty reviewing this because something seemed strange
> with the changes this patch made to the test_decoding tests).
>
> ======
> General
>
> 1. Patch name
>
> Patch name does not need to quote 'logical replication'
>
> ~
>
> 2. test_decoding tests
>
> Multiple test_decoding tests were failing for me. There is something
> very suspicious about the unexplained changes the patch made to the
> expected "binary.out" and "decoding_into_rel.out" etc. I REVERTED all
> those changes in my nitpicks top-up to get everything working. Please
> re-confirm that all the test_decoding tests are OK!
>
> ======
> Commit Message
>
> 3.
> Since you are including the example usage for test_decoding, I think
> it's better to include the example usage of CREATE SUBSCRIPTION also.
>
> ======
> contrib/test_decoding/expected/binary.out
>
> 4.
>  SELECT 'init' FROM
> pg_create_logical_replication_slot('regression_slot',
> 'test_decoding');
> - ?column?
> -----------
> - init
> -(1 row)
> -
> +ERROR:  replication slot "regression_slot" already exists
>
> Huh? Why is this unrelated expected output changed by this patch?
>
> The test_decoding test fails for me unless I REVERT this change!! See
> my nitpicks diff.
>
> ======
> .../expected/decoding_into_rel.out
>
> 5.
> -SELECT 'stop' FROM pg_drop_replication_slot('regression_slot');
> - ?column?
> -----------
> - stop
> -(1 row)
> -
>
> Huh? Why is this unrelated expected output changed by this patch?
>
> The test_decoding test fails for me unless I REVERT this change!! See
> my nitpicks diff.
>
> ======
> .../test_decoding/sql/decoding_into_rel.sql
>
> 6.
> -SELECT 'stop' FROM pg_drop_replication_slot('regression_slot');
> +SELECT 'stop' FROM pg_drop_replication_slot('regression_slot');
>
> Huh, Why does this patch change this code at all? I REVERTED this
> change. See my nitpicks diff.
>
> ======
> .../test_decoding/sql/generated_columns.sql
>
> (see my nitpicks replacement file for this test)
>
> 7.
> +-- test that we can insert the result of a 'include_generated_columns'
> +-- into the tables created. That's really not a good idea in practical terms,
> +-- but provides a nice test.
>
> NITPICK - I didn't understand the point of this comment.  I updated
> the comment according to my understanding.
>
> ~
>
> NITPICK - The comment "when 'include-generated-columns' is not set
> then columns will not be replicated" is the opposite of what the
> result is. I changed this comment.
>
> NITPICK - modified and unified wording of some of the other comments
>
> NITPICK - changed some blank lines
>
> ======
> contrib/test_decoding/test_decoding.c
>
> 8.
> + else if (strcmp(elem->defname, "include-generated-columns") == 0)
> + {
> + if (elem->arg == NULL)
> + data->include_generated_columns = true;
>
> Is there any way to test that "elem->arg == NULL" in the
> generated.sql? OTOH, if it is not possible to get here then is the
> code even needed?
>

Currently I could not find a case where the
'include_generated_columns' option is not specifying any value, but  I
was hesitant to remove this from here as the other options mentioned
follow the same rules. Thoughts?

> ======
> doc/src/sgml/ddl.sgml
>
> 9.
>       <para>
> -      Generated columns are skipped for logical replication and cannot be
> -      specified in a <command>CREATE PUBLICATION</command> column list.
> +      'include_generated_columns' option controls whether generated columns
> +      should be included in the string representation of tuples during
> +      logical decoding in PostgreSQL. The default is <literal>true</literal>.
>       </para>
>
> NITPICK - Use proper markdown instead of single quotes for the parameter.
>
> NITPICK - I think this can be reworded slightly to provide a
> cross-reference to the CREATE SUBSCRIPTION parameter for more details
> (which means then we can avoid repeating details like the default
> value here). PSA my nitpicks diff for an example of how I thought docs
> should look.
>
> ======
> doc/src/sgml/protocol.sgml
>
> 10.
> +        The default is true.
>
> No, it isn't. AFAIK you made the default behaviour true only for
> 'test_decoding', but the default for CREATE SUBSCRIPTION remains
> *false* because that is the existing PG17 behaviour. And the default
> for the 'include_generated_columns' in the protocol is *also* false to
> match the CREATE SUBSCRIPTION default.
>
> e.g. libpqwalreceiver.c only sets ", include_generated_columns 'true'"
> when options->proto.logical.include_generated_columns
> e.g. worker.c says: options->proto.logical.include_generated_columns =
> MySubscription->includegencols;
> e.g. subscriptioncmds.c sets default: opts->include_generated_columns = false;
>
> (This confirmed my previous review expectation that using different
> default behaviours for test_decoding and pgoutput would surely lead to
> confusion)
>
> ~~~
>
> 11.
> -     <para>
> -      Next, the following message part appears for each column included in
> -      the publication (except generated columns):
> -     </para>
> -
>
> AFAIK you cannot just remove this entire paragraph because I thought
> it was still relevant to talking about "... the following message
> part". But, if you don't want to explain and cross-reference about
> 'include_generated_columns' then maybe it is OK just to remove the
> "(except generated columns)" part?
>
> ======
> src/test/subscription/t/011_generated.pl
>
> NITPICK - comment typo /tab2/tab3/
> NITPICK - remove some blank lines
>
> ~~~
>
> 12.
> # the column was NOT replicated (the result value of 'b' is the
> subscriber-side computed value)
>
> NITPICK - I think this comment is wrong for the tab2 test because here
> col 'b' IS replicated. I have added much more substantial test case
> comments in the attached nitpicks diff. PSA.
>
> ======
> src/test/subscription/t/031_column_list.pl
>
> 13.
> NITPICK - IMO there is a missing word "list" in the comment. This bug
> existed already on HEAD but since this patch is modifying this comment
> I think we can also fix this in passing.

All the comments are handled.

The attached Patches contain all the suggested changes.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Mon, Jul 1, 2024 at 8:38 PM Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
>...
> > 8.
> > + else if (strcmp(elem->defname, "include-generated-columns") == 0)
> > + {
> > + if (elem->arg == NULL)
> > + data->include_generated_columns = true;
> >
> > Is there any way to test that "elem->arg == NULL" in the
> > generated.sql? OTOH, if it is not possible to get here then is the
> > code even needed?
> >
>
> Currently I could not find a case where the
> 'include_generated_columns' option is not specifying any value, but  I
> was hesitant to remove this from here as the other options mentioned
> follow the same rules. Thoughts?
>

If you do manage to find a scenario for this then I think a test for
it would be good. But, I agree that the code seems OK because now I
see it is the same pattern as similar nearby code.

~~~

Thanks for the updated patch. Here are some review comments for patch v13-0001.

======
.../expected/generated_columns.out

nitpicks (see generated_columns.sql)

======
.../test_decoding/sql/generated_columns.sql

nitpick - use plural /column/columns/
nitpick - use consistent wording in the comments
nitpick - IMO it is better to INSERT different values for each of the tests

======
doc/src/sgml/protocol.sgml

nitpick - I noticed that none of the other boolean parameters on this
page mention about a default, so maybe here we should do the same and
omit that information.

~~~

1.
-     <para>
-      Next, the following message part appears for each column included in
-      the publication (except generated columns):
-     </para>
-

In a previous review [1 comment #11] I wrote that you can't just
remove this paragraph because AFAIK it is still meaningful. A minimal
change might be to just remove the "(except generated columns)" part.
Alternatively, you could give a more detailed explanation mentioning
the include_generated_columns protocol parameter.

I provided some updated text for this paragraph in my NITPICKS top-up
patch, Please have a look at that for ideas.

======
src/backend/commands/subscriptioncmds.c

It looks like pg_indent needs to be run on this file.

======
src/include/catalog/pg_subscription.h

nitpick - comment /publish/Publish/ for consistency

======
src/include/replication/walreceiver.h

nitpick - comment /publish/Publish/ for consistency

======
src/test/regress/expected/subscription.out

nitpicks - (see subscription.sql)

======
src/test/regress/sql/subscription.sql

nitpick - combine the invalid option combinations test with all the
others (no special comment needed)
nitpick - rename subscription as 'regress_testsub2' same as all its peers.

======
src/test/subscription/t/011_generated.pl

nitpick - add/remove blank lines

======
src/test/subscription/t/031_column_list.pl

nitpick - rewording for a comment. This issue was not strictly caused
by this patch, but since you are modifying the same comment we can fix
this in passing.

======
99.
Please also see the attached top-up patch for all those nitpicks
identified above.

======
[1] v11-0001 review
https://www.postgresql.org/message-id/CAHut%2BPv45gB4cV%2BSSs6730Kb8urQyqjdZ9PBVgmpwqCycr1Ybg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham,

As you can see, most of my recent review comments for patch 0001 are
only cosmetic nitpicks. But, there is still one long-unanswered design
question from a month ago [1, #G.2]

A lot of the patch code of pgoutput.c and proto.c and logicalproto.h
is related to the introduction and passing everywhere of new
'include_generated_columns' function parameters. These same functions
are also always passing "BitMapSet *columns" representing the
publication column list.

My question was about whether we can't make use of the existing BMS
parameter instead of introducing all the new API parameters.

The idea might go something like this:

* If 'include_generated_columns' option is specified true and if no
column list was already specified then perhaps the relentry->columns
can be used for a "dummy" column list that has everything including
all the generated columns.

* By doing this:
 -- you may be able to avoid passing the extra
'include_gernated_columns' everywhere
 -- you may be able to avoid checking for generated columns deeper in
the code (since it is already checked up-front when building the
column list BMS)

~~

I'm not saying this design idea is guaranteed to work, but it might be
worth considering, because if it does work then there is potential to
make the current 0001 patch significantly shorter.

======
[1] https://www.postgresql.org/message-id/CAHut%2BPsuJfcaeg6zst%3D6PE5uyJv_UxVRHU3ck7W2aHb1uQYKng%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Tue, 25 Jun 2024 at 11:56, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are some review comments for the patch v10-0002.
>
> ======
> Commit Message
>
> 1.
> Note that we don't copy columns when the subscriber-side column is also
> generated. Those will be filled as normal with the subscriber-side computed or
> default data.
>
> ~
>
> Now this patch also introduced some errors etc, so I think that patch
> comment should be written differently to explicitly spell out
> behaviour of every combination, something like the below:
>
> Summary
>
> when (include_generated_column = true)
>
> * publisher not-generated column => subscriber not-generated column:
> This is just normal logical replication (not changed by this patch).
>
> * publisher not-generated column => subscriber generated column: This
> will give ERROR.
>
> * publisher generated column => subscriber not-generated column: The
> publisher generated column value is copied.
>
> * publisher generated column => subscriber generated column: The
> publisher generated column value is not copied. The subscriber
> generated column will be filled with the subscriber-side computed or
> default data.
>
> when (include_generated_columns = false)
>
> * publisher not-generated column => subscriber not-generated column:
> This is just normal logical replication (not changed by this patch).
>
> * publisher not-generated column => subscriber generated column: This
> will give ERROR.
>
> * publisher generated column => subscriber not-generated column: This
> will replicate nothing. Publisher generate-column is not replicated.
> The subscriber column will be filled with the subscriber-side default
> data.
>
> * publisher generated column => subscriber generated column:  This
> will replicate nothing. Publisher generate-column is not replicated.
> The subscriber generated column will be filled with the
> subscriber-side computed or default data.
Modified

> ======
> src/backend/replication/logical/relation.c
>
> 2.
> logicalrep_rel_open:
>
> I tested some of the "missing column" logic, and got the following results:
>
> Scenario A:
> PUB
> test_pub=# create table t2(a int, b int);
> test_pub=# create publication pub2 for table t2;
> SUB
> test_sub=# create table t2(a int, b int generated always as (a*2) stored);
> test_sub=# create subscription sub2 connection 'dbname=test_pub'
> publication pub2 with (include_generated_columns = false);
> Result:
> ERROR:  logical replication target relation "public.t2" is missing
> replicated column: "b"
>
> ~
>
> Scenario B:
> PUB/SUB identical to above, but subscription sub2 created "with
> (include_generated_columns = true);"
> Result:
> ERROR:  logical replication target relation "public.t2" has a
> generated column "b" but corresponding column on source relation is
> not a generated column
>
> ~~~
>
> 2a. Question
>
> Why should we get 2 different error messages for what is essentially
> the same problem according to whether the 'include_generated_columns'
> is false or true? Isn't the 2nd error message the more correct and
> useful one for scenarios like this involving generated columns?
>
> Thoughts?
Did the modification to give same error in both cases

> ~
>
> 2b. Missing tests?
>
> I also noticed there seems no TAP test for the current "missing
> replicated column" message. IMO there should be a new test introduced
> for this because the loop involved too much bms logic to go
> untested...
Added the tests 004_sync.pl

> ======
> src/backend/replication/logical/tablesync.c
>
> make_copy_attnamelist:
> NITPICK - minor comment tweak
> NITPICK - add some spaces after "if" code
Applied the changes

> 3.
> Should you pfree the gencollist at the bottom of this function when
> you no longer need it, for tidiness?
Fixed

> ~~~
>
> 4.
>  static void
> -fetch_remote_table_info(char *nspname, char *relname,
> +fetch_remote_table_info(char *nspname, char *relname, bool **remotegenlist,
>   LogicalRepRelation *lrel, List **qual)
>  {
>   WalRcvExecResult *res;
>   StringInfoData cmd;
>   TupleTableSlot *slot;
>   Oid tableRow[] = {OIDOID, CHAROID, CHAROID};
> - Oid attrRow[] = {INT2OID, TEXTOID, OIDOID, BOOLOID};
> + Oid attrRow[] = {INT2OID, TEXTOID, OIDOID, BOOLOID, BOOLOID};
>   Oid qualRow[] = {TEXTOID};
>   bool isnull;
> + bool    *remotegenlist_res;
>
> IMO the names 'remotegenlist' and 'remotegenlist_res' should be
> swapped the other way around, because it is the function parameter
> that is the "result", whereas the 'remotegenlist_res' is just the
> local working var for it.
Fixed

> ~~~
>
> 5. fetch_remote_table_info
>
> Now walrcv_server_version(LogRepWorkerWalRcvConn) is used in multiple
> places, I think it will be better to assign this to a 'server_version'
> variable to be used everywhere instead of having multiple function
> calls.
Fixed

> ~~~
>
> 6.
>   "SELECT a.attnum,"
>   "       a.attname,"
>   "       a.atttypid,"
> - "       a.attnum = ANY(i.indkey)"
> + "       a.attnum = ANY(i.indkey),"
> + " a.attgenerated != ''"
>   "  FROM pg_catalog.pg_attribute a"
>   "  LEFT JOIN pg_catalog.pg_index i"
>   "       ON (i.indexrelid = pg_get_replica_identity_index(%u))"
>   " WHERE a.attnum > 0::pg_catalog.int2"
> - "   AND NOT a.attisdropped %s"
> + "   AND NOT a.attisdropped", lrel->remoteid);
> +
> + if ((walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 &&
> + walrcv_server_version(LogRepWorkerWalRcvConn) <= 160000) ||
> + !MySubscription->includegencols)
> + appendStringInfo(&cmd, " AND a.attgenerated = ''");
> +
>
> If the server version is < PG12 then AFAIK there was no such thing as
> "a.attgenerated", so shouldn't that SELECT " a.attgenerated != ''"
> part also be guarded by some version checking condition like in the
> WHERE? Otherwise won't it cause an ERROR for old servers?
Fixed

> ~~~
>
> 7.
>   /*
> - * For non-tables and tables with row filters, we need to do COPY
> - * (SELECT ...), but we can't just do SELECT * because we need to not
> - * copy generated columns. For tables with any row filters, build a
> - * SELECT query with OR'ed row filters for COPY.
> + * For non-tables and tables with row filters and when
> + * 'include_generated_columns' is specified as 'true', we need to do
> + * COPY (SELECT ...), as normal COPY of generated column is not
> + * supported. For tables with any row filters, build a SELECT query
> + * with OR'ed row filters for COPY.
>   */
>
> NITPICK. I felt this was not quite right. AFAIK the reasons for using
> this COPY (SELECT ...) syntax is different for row-filters and
> generated-columns. Anyway, I updated the comment slightly in my
> nitpicks attachment. Please have a look at it to see if you agree with
> the suggestions. Maybe I am wrong.
Fixed

> ~~~
>
> 8.
> - for (int i = 0; i < lrel.natts; i++)
> + foreach_ptr(String, att_name, attnamelist)
>
> I'm not 100% sure, but isn't foreach_node the macro to use here,
> rather than foreach_ptr?
Fixed

> ======
> src/test/subscription/t/011_generated.pl
>
> 9.
> Please discuss with Shubham how to make all the tab1, tab2, tab3,
> tab4, tab5, tab6 comments use the same kind of style/wording.
> Currently, the patches 0001 and 0002 test comments are a bit
> inconsistent.
Fixed

> ~~~
>
> 10.
> Related to above -- now that patch 0002 supports copy_data=true I
> don't see why we need to test generated columns *both* for
> copy_data=false and also for copy_data=true. IOW, is it really
> necessary to have so many tables/tests? For example, I am thinking
> some of those tests from patch 0001 can be re-used or just removed now
> that copy_data=true works.
Fixed

> ~~~
>
> NITPICK - minor comment tweak
Fixed

> ~~~
>
> 11.
> For tab4 and tab6 I saw the initial sync and normal replication data
> tests are all merged together, but I had expected to see the initial
> sync and normal replication data tests separated so it would be
> consistent with the earlier tab1, tab2, tab3 tests.
Fixed

> ======
>
> 99.
> Also, I have attached a nitpicks diff for some of the cosmetic review
> comments mentioned above. Please apply whatever of these that you
> agree with.
Applied the relevant changes

I have attached a v14 to fix the comments.

Thanks and Regards,
Shlok Kyal

Attachment

Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Tue, 25 Jun 2024 at 18:49, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Shlok,
>
> Thanks for updating patches! Below are my comments, maybe only for 0002.
>
> 01. General
>
> IIUC, we are not discussed why ALTER SUBSCRIPTION ... SET include_generated_columns
> is prohibit. Previously, it seems okay because there are exclusive options. But now,
> such restrictions are gone. Do you have a reason in your mind? It is just not considered
> yet?
We donot support ALTER SUBSCRIPTION to alter
'include_generated_columns'. Suppose initially the user has a logical
replication setup. Publisher has
table t1 with columns (c1 int, c2 int generated always as (c1*2)) and
subscriber has table t1 with columns (c1 int, c2 int). And initially
'incude_generated_column' is true.
Now if we 'ALTER SUBSCRIPTION' to set 'include_generated_columns' as
false. Initial rows will have data for c2 on the subscriber table, but
will not have value after alter. This may be an inconsistent
behaviour.


> 02. General
>
> According to the doc, we allow to alter a column to non-generated one, by ALTER
> TABLE ... ALTER COLUMN ... DROP EXPRESSION command. Not sure, what should be
> when the command is executed on the subscriber while copying the data? Should
> we continue the copy or restart? How do you think?
COPY of data will happen in a single transaction, so if we execute
'ALTER TABLE ... ALTER COLUMN ... DROP EXPRESSION' command, It will
take place after the whole COPY command will finish. So I think it
will not create any issue.

> 03. Tes tcode
>
> IIUC, REFRESH PUBLICATION can also lead the table synchronization. Can you add
> a test for that?
Added

> 04. Test code (maybe for 0001)
>
> Please test the combination with TABLE ... ALTER COLUMN ... DROP EXPRESSION command.
Added

> 05. logicalrep_rel_open
>
> ```
> +            /*
> +             * In case 'include_generated_columns' is 'false', we should skip the
> +             * check of missing attrs for generated columns.
> +             * In case 'include_generated_columns' is 'true', we should check if
> +             * corresponding column for the generated column in publication column
> +             * list is present in the subscription table.
> +             */
> +            if (!MySubscription->includegencols && attr->attgenerated)
> +            {
> +                entry->attrmap->attnums[i] = -1;
> +                continue;
> +            }
> ```
>
> This comment is not very clear to me, because here we do not skip anything.
> Can you clarify the reason why attnums[i] is set to -1 and how will it be used?
This part of the code is removed to address some comments.

> 06. make_copy_attnamelist
>
> ```
> +    gencollist = palloc0(MaxTupleAttributeNumber * sizeof(bool));
> ```
>
> I think this array is too large. Can we reduce a size to (desc->natts * sizeof(bool))?
> Also, the free'ing should be done.
I have changed the name 'gencollist' to 'localgenlist' to make the
name more consistent. Also
size should be (rel->remoterel.natts * sizeof(bool)) as I am storing
if a column is generated like 'localgenlist[attnum] = true;'
where 'attnum' is corresponding attribute number on publisher side.

> 07. make_copy_attnamelist
>
> ```
> +    /* Loop to handle subscription table generated columns. */
> +    for (int i = 0; i < desc->natts; i++)
> ```
>
> IIUC, the loop is needed to find generated columns on the subscriber side, right?
> Can you clarify as comment?
Fixed

> 08. copy_table
>
> ```
> +    /*
> +     * Regular table with no row filter and 'include_generated_columns'
> +     * specified as 'false' during creation of subscription.
> +     */
> ```
>
> I think this comment is not correct. After patching, all tablesync command becomes
> like COPY (SELECT ...) if include_genereted_columns is set to true. Is it right?
> Can we restrict only when the table has generated ones?
Fixed

Please refer to v14 patch for the changes [1].


[1]: https://www.postgresql.org/message-id/CANhcyEW95M_usF1OJDudeejs0L0%2BYOEb%3DdXmt_4Hs-70%3DCXa-g%40mail.gmail.com

Thanks and Regards,
Shlok Kyal



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Wed, 26 Jun 2024 at 08:06, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shlok. Here are my review comments for patch v10-0003
>
> ======
> General.
>
> 1.
> The patch has lots of conditions like:
> if (att->attgenerated && (att->attgenerated !=
> ATTRIBUTE_GENERATED_STORED || !include_generated_columns))
>  continue;
>
> IMO these are hard to read. Although more verbose, please consider if
> all those (for the sake of readability) would be better re-written
> like below :
>
> if (att->generated)
> {
>   if (!include_generated_columns)
>     continue;
>
>   if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
>     continue;
> }
Fixed

> ======
> contrib/test_decoding/test_decoding.c
>
> tuple_to_stringinfo:
>
> NITPICK = refactored the code and comments a bit here to make it easier
> NITPICK - there is no need to mention "virtual". Instead, say we only
> support STORED
Fixed

> ======
> src/backend/catalog/pg_publication.c
>
> publication_translate_columns:
>
> NITPICK - introduced variable 'att' to simplify this code
Fixed

> ~
>
> 2.
> + ereport(ERROR,
> + errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> + errmsg("cannot use virtual generated column \"%s\" in publication
> column list",
> +    colname));
>
> Is it better to avoid referring to "virtual" at all? Instead, consider
> rearranging the wording to say something like "generated column \"%s\"
> is not STORED so cannot be used in a publication column list"
Fixed

> ~~~
>
> pg_get_publication_tables:
>
> NITPICK - split the condition code for readability
Fixed

> ======
> src/backend/replication/logical/relation.c
>
> 3. logicalrep_rel_open
>
> + if (attr->attgenerated && attr->attgenerated != ATTRIBUTE_GENERATED_STORED)
> + continue;
> +
>
> Isn't this missing some code to say "entry->attrmap->attnums[i] =
> -1;", same as all the other nearby code is doing?
Fixed

> ~~~
>
> 4.
> I felt all the "generated column" logic should be kept together, so
> this new condition should be combined with the other generated column
> condition, like:
>
> if (attr->attgenerated)
> {
>   /* comment... */
>   if (!MySubscription->includegencols)
>   {
>     entry->attrmap->attnums[i] = -1;
>     continue;
>   }
>
>   /* comment... */
>   if (attr->attgenerated != ATTRIBUTE_GENERATED_STORED)
>   {
>     entry->attrmap->attnums[i] = -1;
>     continue;
>   }
> }
Fixed

> ======
> src/backend/replication/logical/tablesync.c
>
> 5.
> + if (gencols_allowed)
> + {
> + /* Replication of generated cols is supported, but not VIRTUAL cols. */
> + appendStringInfo(&cmd, " AND a.attgenerated != 'v'");
> + }
>
> Is it better here to use the ATTRIBUTE_GENERATED_VIRTUAL macro instead
> of the hardwired 'v'? (Maybe add another TODO comment to revisit
> this).
>
> Alternatively, consider if it is more future-proof to rearrange so it
> just says what *is* supported instead of what *isn't* supported:
> e.g. "AND a.attgenerated IN ('', 's')"
I feel we should use ATTRIBUTE_GENERATED_VIRTUAL macro. Added a TODO.

> ======
> src/test/subscription/t/011_generated.pl
>
> NITPICK - some comments are missing the word "stored"
> NITPICK - sometimes "col" should be plural "cols"
> NITPICK = typo "GNERATED"
Add the relevant changes.

> ======
>
> 6.
> In a previous review [1, comment #3] I mentioned that there should be
> some docs updates on the "Logical Replication Message Formats" section
> 53.9. So, I expected patch 0001 would make some changes and then patch
> 0003 would have to update it again to say something about "STORED".
> But all that is missing from the v10* patches.
>
> ======
Will fix in upcoming version

>
> 99.
> See also my nitpicks diff which is a top-up patch addressing all the
> nitpick comments mentioned above. Please apply all of these that you
> agree with.
Applied Relevant changes

Please refer v14 patch for the changes [1].


[1]: https://www.postgresql.org/message-id/CANhcyEW95M_usF1OJDudeejs0L0%2BYOEb%3DdXmt_4Hs-70%3DCXa-g%40mail.gmail.com

Thanks and Regards,
Shlok Kyal



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Here are my review comments for v14-0002.

======
src/backend/replication/logical/tablesync.c

make_copy_attnamelist:

nitpick - remove excessive parentheses in palloc0 call.

nitpick - Code is fine AFAICT except it's not immediately obvious
localgenlist is indexed by the *remote* attribute number. So I renamed
'attrnum' variable in my nitpicks diff. OTOH, if you think no change
is necessary, that is OK to (in that case maybe add a comment).

~~~

1. fetch_remote_table_info

+ if ((server_version >= 120000 && server_version <= 160000) ||
+ !MySubscription->includegencols)
+ appendStringInfo(&cmd, " AND a.attgenerated = ''");

Should this say < 180000 instead of <= 160000?

~~~

copy_table:

nitpick - uppercase in comment

nitpick - missing space after "if"

~~~

2. copy_table

+ attnamelist = make_copy_attnamelist(relmapentry, remotegenlist);
+
  /* Start copy on the publisher. */
  initStringInfo(&cmd);

- /* Regular table with no row filter */
- if (lrel.relkind == RELKIND_RELATION && qual == NIL)
+ /* check if remote column list has generated columns */
+ if(MySubscription->includegencols)
+ {
+ for (int i = 0; i < relmapentry->remoterel.natts; i++)
+ {
+ if(remotegenlist[i])
+ {
+ remote_has_gencol = true;
+ break;
+ }
+ }
+ }
+

There is some subtle logic going on here:

For example, the comment here says "Check if the remote column list
has generated columns", and it then proceeds to iterate the remote
attributes checking the remotegenlist[i]. But the remotegenlist[] was
returned from a prior call to make_copy_attnamelist() and according to
the make_copy_attnamelist logic, it is NOT returning all remote
generated-cols in that list. Specifically, it is stripping some of
them -- "Do not include generated columns of the subscription table in
the [remotegenlist] column list.".

So, actually this loop seems to be only finding cases (setting
remote_has_gen = true) where the remote column is generated but the
match local column is *not* generated. Maybe this was the intended
logic all along but then certainly the comment should be improved to
describe it better.

~~~

3.
+ /*
+ * Regular table with no row filter and 'include_generated_columns'
+ * specified as 'false' during creation of subscription.
+ */
+ if (lrel.relkind == RELKIND_RELATION && qual == NIL && !remote_has_gencol)

nitpick - This comment also needs improving. For example, just because
remote_has_gencol is false, it does not follow that
'include_generated_columns' was specified as 'false' -- maybe the
parameter was 'true' but the table just had no generated columns
anyway... I've modified the comment already in my nitpicks diff, but
probably you can improve on that.

~

nitpick - "else" comment is modified slightly too. Please see the nitpicks diff.

~

4.
In hindsight, I felt your variable 'remote_has_gencol' was not
well-named because it is not for saying the remote table has a
generated column -- it is saying the remote table has a generated
column **that we have to copy**. So, rather it should be named
something like 'gencol_copy_needed' (but I didn't change this name in
the nitpick diffs...)

======
src/test/subscription/t/004_sync.pl

nitpick - changes to comment style to make the test case separations
much more obvious
nitpick - minor comment wording tweaks

5.
Here, you are confirming we get an ERROR when replicating from a
non-generated column to a generated column. But I think your patch
also added exactly that same test scenario in the 011_generated (as
the sub5 test). So, maybe this one here should be removed?

======
src/test/subscription/t/011_generated.pl

nitpick - comment wrapping at 80 chars
nitpick - add/remove blank lines for readability
nitpick - typo /subsriber/subscriber/
nitpick - prior to the ALTER test, tab6 is unsubscribed. So add
another test to verify its initial data
nitpick - sometimes the msg 'add a new table to existing publication'
is misplaced
nitpick - the tests for tab6 and tab5 were in opposite to the expected
order, so swapped them.

======
99.
Please see also the attached diff which implements all the nitpicks
described in this post.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Tue, Jul 2, 2024 at 9:53 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Mon, Jul 1, 2024 at 8:38 PM Shubham Khanna
> <khannashubham1197@gmail.com> wrote:
> >
> >...
> > > 8.
> > > + else if (strcmp(elem->defname, "include-generated-columns") == 0)
> > > + {
> > > + if (elem->arg == NULL)
> > > + data->include_generated_columns = true;
> > >
> > > Is there any way to test that "elem->arg == NULL" in the
> > > generated.sql? OTOH, if it is not possible to get here then is the
> > > code even needed?
> > >
> >
> > Currently I could not find a case where the
> > 'include_generated_columns' option is not specifying any value, but  I
> > was hesitant to remove this from here as the other options mentioned
> > follow the same rules. Thoughts?
> >
>
> If you do manage to find a scenario for this then I think a test for
> it would be good. But, I agree that the code seems OK because now I
> see it is the same pattern as similar nearby code.
>
> ~~~
>
> Thanks for the updated patch. Here are some review comments for patch v13-0001.
>
> ======
> .../expected/generated_columns.out
>
> nitpicks (see generated_columns.sql)
>
> ======
> .../test_decoding/sql/generated_columns.sql
>
> nitpick - use plural /column/columns/
> nitpick - use consistent wording in the comments
> nitpick - IMO it is better to INSERT different values for each of the tests
>
> ======
> doc/src/sgml/protocol.sgml
>
> nitpick - I noticed that none of the other boolean parameters on this
> page mention about a default, so maybe here we should do the same and
> omit that information.
>
> ~~~
>
> 1.
> -     <para>
> -      Next, the following message part appears for each column included in
> -      the publication (except generated columns):
> -     </para>
> -
>
> In a previous review [1 comment #11] I wrote that you can't just
> remove this paragraph because AFAIK it is still meaningful. A minimal
> change might be to just remove the "(except generated columns)" part.
> Alternatively, you could give a more detailed explanation mentioning
> the include_generated_columns protocol parameter.
>
> I provided some updated text for this paragraph in my NITPICKS top-up
> patch, Please have a look at that for ideas.
>
> ======
> src/backend/commands/subscriptioncmds.c
>
> It looks like pg_indent needs to be run on this file.
>
> ======
> src/include/catalog/pg_subscription.h
>
> nitpick - comment /publish/Publish/ for consistency
>
> ======
> src/include/replication/walreceiver.h
>
> nitpick - comment /publish/Publish/ for consistency
>
> ======
> src/test/regress/expected/subscription.out
>
> nitpicks - (see subscription.sql)
>
> ======
> src/test/regress/sql/subscription.sql
>
> nitpick - combine the invalid option combinations test with all the
> others (no special comment needed)
> nitpick - rename subscription as 'regress_testsub2' same as all its peers.
>
> ======
> src/test/subscription/t/011_generated.pl
>
> nitpick - add/remove blank lines
>
> ======
> src/test/subscription/t/031_column_list.pl
>
> nitpick - rewording for a comment. This issue was not strictly caused
> by this patch, but since you are modifying the same comment we can fix
> this in passing.
>
> ======
> 99.
> Please also see the attached top-up patch for all those nitpicks
> identified above.
>
> ======
> [1] v11-0001 review
> https://www.postgresql.org/message-id/CAHut%2BPv45gB4cV%2BSSs6730Kb8urQyqjdZ9PBVgmpwqCycr1Ybg%40mail.gmail.com

All the comments are handled.

The attached Patches contain all the suggested changes. Here, v15-0001
is modified to fix the comments, v15-0002 is not modified and v15-0003
is modified according to the changes in v15-0001 patch.
Thanks Shlok Kyal for modifying the v15-0003 Patch.


Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Tue, Jul 2, 2024 at 10:59 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shubham,
>
> As you can see, most of my recent review comments for patch 0001 are
> only cosmetic nitpicks. But, there is still one long-unanswered design
> question from a month ago [1, #G.2]
>
> A lot of the patch code of pgoutput.c and proto.c and logicalproto.h
> is related to the introduction and passing everywhere of new
> 'include_generated_columns' function parameters. These same functions
> are also always passing "BitMapSet *columns" representing the
> publication column list.
>
> My question was about whether we can't make use of the existing BMS
> parameter instead of introducing all the new API parameters.
>
> The idea might go something like this:
>
> * If 'include_generated_columns' option is specified true and if no
> column list was already specified then perhaps the relentry->columns
> can be used for a "dummy" column list that has everything including
> all the generated columns.
>
> * By doing this:
>  -- you may be able to avoid passing the extra
> 'include_gernated_columns' everywhere
>  -- you may be able to avoid checking for generated columns deeper in
> the code (since it is already checked up-front when building the
> column list BMS)
>
> ~~
>
> I'm not saying this design idea is guaranteed to work, but it might be
> worth considering, because if it does work then there is potential to
> make the current 0001 patch significantly shorter.
>
> ======
> [1] https://www.postgresql.org/message-id/CAHut%2BPsuJfcaeg6zst%3D6PE5uyJv_UxVRHU3ck7W2aHb1uQYKng%40mail.gmail.com

I have fixed this issue in the latest Patches.

Please refer to the updated v15 Patches here in [1]. See [1] for the
changes added.

[1] https://www.postgresql.org/message-id/CAHv8Rj%2B%3Dhn--ALJQvzzu7meX3LuO3tJKppDS7eO1BGvNFYBAbg%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Here are review comments for v15-0001

======
doc/src/sgml/ddl.sgml

nitpick - there was a comma (,) which should be a period (.)

======
.../libpqwalreceiver/libpqwalreceiver.c

1.
+ if (options->proto.logical.include_generated_columns &&
+ PQserverVersion(conn->streamConn) >= 170000)
+ appendStringInfoString(&cmd, ", include_generated_columns 'true'");
+

Should now say >= 180000

======
src/backend/replication/pgoutput/pgoutput.c

nitpick - comment wording for RelationSyncEntry.collist.

~~

2.
pgoutput_column_list_init:

I found the current logic to be quite confusing. I assume the code is
working OK, because AFAIK there are plenty of tests and they are all
passing, but the logic seems somewhat repetitive and there are also no
comments to explain it adding to my confusion.

IIUC, PRIOR TO THIS PATCH:

BMS field 'columns' represented the "columns of the column list" or it
was NULL if there was no publication column list (and it was also NULL
if the column list contained every column).

IIUC NOW, WITH THIS PATCH:

The BMS field 'columns' meaning is changed slightly to be something
like "columns to be replicated" or NULL if all columns are to be
replicated. This is almost the same thing except we are now handing
the generated columns up-front, so generated columns will or won't
appear in the BMS according to the "include_generated_columns"
parameter. See how this is all a bit subtle which is why copious new
comments are required to explain it...

So, although the test result evidence suggests this is working OK, I
have many questions/issues about it. Here are some to start with:

2a. It needs a lot more (summary and detailed) comments explaining the
logic now that the meaning is slightly different.

2b. What is the story with the FOR ALL TABLES case now? Previously,
there would always be NULL 'columns' for "FOR ALL TABLES" case -- the
comment still says so. But now you've tacked on a 2nd pass of
iterations to build the BMS outside of the "if (!pub->alltables)"
check. Is that OK?

2c. The following logic seemed unexpected:
- if (bms_num_members(cols) == nliveatts)
+ if (bms_num_members(cols) == nliveatts &&
+ data->include_generated_columns)
  {
  bms_free(cols);
  cols = NULL;
`
I had thought the above code would look different -- more like:
if (att->attgenerated && !data->include_generated_columns)
  continue;

nliveatts++;
...

2d. Was so much duplicated code necessary? It feels like the whole
"Get the number of live attributes." and assignment of cols to NULL
might be made common to both code paths.

2e. I'm beginning to question the pros/cons of the new BMS logic; I
had suggested trying this way (processing the generated columns
up-front in the BMS 'columns' list) to reduce patch code and simplify
all the subsequent API delegation of "include_generated_cloumns"
everywhere like it was in v14-0001. Indeed, that part was a success
and the patch is now smaller. But I don't like much that we've traded
reduced code overall for increased confusing code in that BMS
function. If all this BMS code can be refactored and commented to be
easier to understand then maybe all will be well, but if it can't then
maybe this BMS change was a bridge too far. I haven't given up on it
just yet, but I wonder what was your opinion about it, and do other
people have thoughts about whether this was the good direction to
take?

======
src/bin/pg_dump/pg_dump.c

3.
+ if (fout->remoteVersion >= 170000)
+ appendPQExpBufferStr(query,
+ " s.subincludegencols\n");
+ else
+ appendPQExpBufferStr(query,
+ " false AS subincludegencols\n");

Should now say >= 180000

======
src/bin/psql/describe.c

4.
+ /* include_generated_columns is only supported in v18 and higher */
+ if (pset.sversion >= 170000)
+ appendPQExpBuffer(&buf,
+   ", subincludegencols AS \"%s\"\n",
+   gettext_noop("Include generated columns"));
+

Should now say >= 180000

======
src/include/catalog/pg_subscription.h

nitpick - let's make the comment the same as in WalRcvStreamOptions

======
src/include/replication/logicalproto.h

nitpick - extern for logicalrep_write_update should be unchanged by this patch

======
src/test/regress/sql/subscription.sql

nitpick = the comment "include_generated_columns and copy_data = true
are mutually exclusive" is not necessary because this all falls under
the existing comment "fail - invalid option combinations"

nitpick - let's explicitly put "copy_data = true" in the CREATE
SUBSCRIPTION to make it more obvious

======
99. Please also refer to the attached 'diffs' patch which implements
all of my nitpicks issues mentioned above.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shlok, Here are some review comments for patch v15-0003.

======
src/backend/catalog/pg_publication.c

1. publication_translate_columns

The function comment says:
 * Translate a list of column names to an array of attribute numbers
 * and a Bitmapset with them; verify that each attribute is appropriate
 * to have in a publication column list (no system or generated attributes,
 * no duplicates).  Additional checks with replica identity are done later;
 * see pub_collist_contains_invalid_column.

That part about "[no] generated attributes" seems to have gone stale
-- e.g. not quite correct anymore. Should it say no VIRTUAL generated
attributes?

======
src/backend/replication/logical/proto.c

2. logicalrep_write_tuple and logicalrep_write_attrs

I thought all the code fragments like this:

+ if (att->attgenerated && att->attgenerated != ATTRIBUTE_GENERATED_STORED)
+ continue;
+

don't need to be in the code anymore, because of the BitMapSet (BMS)
processing done to make the "column list" for publication where
disallowed generated cols should already be excluded from the BMS,
right?

So shouldn't all these be detected by the following statement:
if (!column_in_column_list(att->attnum, columns))
  continue;

======
src/backend/replication/logical/tablesync.c
3.
+ if(server_version >= 120000)
+ {
+ bool gencols_allowed = server_version >= 170000 &&
MySubscription->includegencols;
+
+ if (gencols_allowed)
+ {

Should say server_version >= 180000, instead of 170000

======
src/backend/replication/pgoutput/pgoutput.c

4. send_relation_and_attrs

(this is a similar comment for #2 above)

IIUC of the advantages of the BitMapSet (BMS) idea in patch 0001 to
process the generated columns up-front means there is no need to check
them again in code like this.

They should be discovered anyway in the subsequent check:
/* Skip this attribute if it's not present in the column list */
if (columns != NULL && !bms_is_member(att->attnum, columns))
  continue;

======
src/test/subscription/t/011_generated.pl

5.
AFAICT there are still multiple comments (e.g. for the "TEST tab<n>"
comments) where it still says "generated" instead of "stored
generated". I did not make a "nitpicks" diff for these because those
comments are inherited from the prior patch 0002 which still has
outstanding review comments on it too. Please just search/replace
them.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Fri, 5 Jul 2024 at 13:47, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are my review comments for v14-0002.
>
> ======
> src/backend/replication/logical/tablesync.c
>
> 2. copy_table
>
> + attnamelist = make_copy_attnamelist(relmapentry, remotegenlist);
> +
>   /* Start copy on the publisher. */
>   initStringInfo(&cmd);
>
> - /* Regular table with no row filter */
> - if (lrel.relkind == RELKIND_RELATION && qual == NIL)
> + /* check if remote column list has generated columns */
> + if(MySubscription->includegencols)
> + {
> + for (int i = 0; i < relmapentry->remoterel.natts; i++)
> + {
> + if(remotegenlist[i])
> + {
> + remote_has_gencol = true;
> + break;
> + }
> + }
> + }
> +
>
> There is some subtle logic going on here:
>
> For example, the comment here says "Check if the remote column list
> has generated columns", and it then proceeds to iterate the remote
> attributes checking the remotegenlist[i]. But the remotegenlist[] was
> returned from a prior call to make_copy_attnamelist() and according to
> the make_copy_attnamelist logic, it is NOT returning all remote
> generated-cols in that list. Specifically, it is stripping some of
> them -- "Do not include generated columns of the subscription table in
> the [remotegenlist] column list.".
>
> So, actually this loop seems to be only finding cases (setting
> remote_has_gen = true) where the remote column is generated but the
> match local column is *not* generated. Maybe this was the intended
> logic all along but then certainly the comment should be improved to
> describe it better.

'remotegenlist' is actually constructed in function 'fetch_remote_table_info'
and it has an entry for every column in the column list specifying
whether a column is
generated or not.
In the function 'make_copy_attnamelist' we are not modifying the list.
So, I think the current comment would be sufficient. Thoughts?

> ======
> src/test/subscription/t/004_sync.pl
>
> nitpick - changes to comment style to make the test case separations
> much more obvious
> nitpick - minor comment wording tweaks
>
> 5.
> Here, you are confirming we get an ERROR when replicating from a
> non-generated column to a generated column. But I think your patch
> also added exactly that same test scenario in the 011_generated (as
> the sub5 test). So, maybe this one here should be removed?

For 0004_sync.pl, it is tested when 'include_generated_columns' is not
specified. Whereas for the test in 011_generated
'include_generated_columns = true' is specified.
I thought we should have a test for both cases to test if the error
message format is the same for both cases. Thoughts?

I have attached the patches and I have addressed the rest of the
comment and added changes in v16-0002. I have not modified the
v16-0001 patch.


Thanks and Regards,
Shlok Kyal

Attachment

Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Mon, 8 Jul 2024 at 13:20, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shlok, Here are some review comments for patch v15-0003.
>
> ======
> src/backend/catalog/pg_publication.c
>
> 1. publication_translate_columns
>
> The function comment says:
>  * Translate a list of column names to an array of attribute numbers
>  * and a Bitmapset with them; verify that each attribute is appropriate
>  * to have in a publication column list (no system or generated attributes,
>  * no duplicates).  Additional checks with replica identity are done later;
>  * see pub_collist_contains_invalid_column.
>
> That part about "[no] generated attributes" seems to have gone stale
> -- e.g. not quite correct anymore. Should it say no VIRTUAL generated
> attributes?
Yes, we should use VIRTUAL generated attributes, I have modified it.

> ======
> src/backend/replication/logical/proto.c
>
> 2. logicalrep_write_tuple and logicalrep_write_attrs
>
> I thought all the code fragments like this:
>
> + if (att->attgenerated && att->attgenerated != ATTRIBUTE_GENERATED_STORED)
> + continue;
> +
>
> don't need to be in the code anymore, because of the BitMapSet (BMS)
> processing done to make the "column list" for publication where
> disallowed generated cols should already be excluded from the BMS,
> right?
>
> So shouldn't all these be detected by the following statement:
> if (!column_in_column_list(att->attnum, columns))
>   continue;
The current BMS logic do not handle the Virtual Generated Columns.
There can be cases where we do not want a virtual generated column but
it would be present in BMS.
To address this I have added the above logic. I have added this logic
similar to the checks of 'attr->attisdropped'.

> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 4. send_relation_and_attrs
>
> (this is a similar comment for #2 above)
>
> IIUC of the advantages of the BitMapSet (BMS) idea in patch 0001 to
> process the generated columns up-front means there is no need to check
> them again in code like this.
>
> They should be discovered anyway in the subsequent check:
> /* Skip this attribute if it's not present in the column list */
> if (columns != NULL && !bms_is_member(att->attnum, columns))
>   continue;
Same explanation as above.

I have addressed all the comments in v16-0003 patch. Please refer [1].
[1]: https://www.postgresql.org/message-id/CANhcyEXw%3DBFFVUqohWES9EPkdq-ZMC5QRBVQqQPzrO%3DQ7uzFQw%40mail.gmail.com

Thanks and Regards,
Shlok Kyal



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shlok, Here are my review comments for v16-0002

======
src/backend/replication/logical/tablesync.c

1. fetch_remote_table_info

+ if ((server_version >= 120000 && server_version < 180000) ||
+ !MySubscription->includegencols)
+ appendStringInfo(&cmd, " AND a.attgenerated = ''");

I felt this condition was a bit complicated. it needs a comment to
explain that "attgenerated" has been supported only since >= PG12 and
'include_generated_columns' is supported only since >= PG18. The more
I look at this I think this is a bug. For example, what happens if the
server is *before* PG12 and include_generated_cols is false; won't it
then try to build SQL using the "attgenerated" column which will cause
an ERROR on the server?

IIRC this condition is already written properly in your patch 0003.
So, most of that 0003 condition refactoring should be done here in
patch 0002 instead.

~~~

2. copy_table

> > So, actually this loop seems to be only finding cases (setting
> > remote_has_gen = true) where the remote column is generated but the
> > match local column is *not* generated. Maybe this was the intended
> > logic all along but then certainly the comment should be improved to
> > describe it better.
>
> 'remotegenlist' is actually constructed in function 'fetch_remote_table_info'
> and it has an entry for every column in the column list specifying
> whether a column is
> generated or not.
> In the function 'make_copy_attnamelist' we are not modifying the list.
> So, I think the current comment would be sufficient. Thoughts?

Yes, I was mistaken thinking the list is "modified". OTOH, I still
feel the existing comment ("Check if remote column list has any
generated column") is misleading because the remote table might have
generated cols but we are not even interested in them if the
equivalent subscriber column is also generated. Please see nitpicks
diff, for my suggestion how to update this comment.

~~~

nitpick - add space after "if"

======
src/test/subscription/t/004_sync.pl

> > 5.
> > Here, you are confirming we get an ERROR when replicating from a
> > non-generated column to a generated column. But I think your patch
> > also added exactly that same test scenario in the 011_generated (as
> > the sub5 test). So, maybe this one here should be removed?
>
> For 0004_sync.pl, it is tested when 'include_generated_columns' is not
> specified. Whereas for the test in 011_generated
> 'include_generated_columns = true' is specified.
> I thought we should have a test for both cases to test if the error
> message format is the same for both cases. Thoughts?

3.
Sorry, I missed that there was a parameter flag difference. Anyway,
since the code-path to reach this error is the same regardless of the
'include_generated_columns' parameter value IMO having too many tests
might be overkill. YMMV.

Anyway, whether you decide to keep both test cases or not, I think all
testing related to generated column replication belongs in the new
001_generated.pl TAP file -- not here in 04_sync.pl
.
======
src/test/subscription/t/011_generated.pl

4. Untested scenarios for "missing col"?

I have seen (in 04_sync.pl) missing column test cases for:
- publisher not-generated col ==> subscriber missing column

Maybe I am mistaken, but I don't recall seeing any test cases for:
- publisher generated-col ==> subscriber missing col

Unless they are already done somewhere, I think this scenario should
be in 011_generated.pl. Furthermore, maybe it needs to be tested for
both include_generated_columns = true / false, because if the
parameter is false it should be OK, but if the parameter is true it
should give ERROR.

~~~

5.
-# publisher-side tab3 has generated col 'b' but subscriber-side tab3
has DIFFERENT COMPUTATION generated col 'b'.
+# tab3:
+# publisher-side tab3 has generated col 'b' but
+# subscriber-side tab3 has DIFFERENT COMPUTATION generated col 'b'.

I think this change is only improving a comment that was introduced by
patch 0001. This all belongs back in patch 0001, then patch 0002 has
nothing to do here.

======
99.
Please also refer to the attached diffs patch which implements any
nitpicks mentioned above.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shlok, here are my review comments for v16-0003.

======
src/backend/replication/logical/proto.c


On Mon, Jul 8, 2024 at 10:04 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> On Mon, 8 Jul 2024 at 13:20, Peter Smith <smithpb2250@gmail.com> wrote:
> >
> >
> > 2. logicalrep_write_tuple and logicalrep_write_attrs
> >
> > I thought all the code fragments like this:
> >
> > + if (att->attgenerated && att->attgenerated != ATTRIBUTE_GENERATED_STORED)
> > + continue;
> > +
> >
> > don't need to be in the code anymore, because of the BitMapSet (BMS)
> > processing done to make the "column list" for publication where
> > disallowed generated cols should already be excluded from the BMS,
> > right?
> >
> > So shouldn't all these be detected by the following statement:
> > if (!column_in_column_list(att->attnum, columns))
> >   continue;
> The current BMS logic do not handle the Virtual Generated Columns.
> There can be cases where we do not want a virtual generated column but
> it would be present in BMS.
> To address this I have added the above logic. I have added this logic
> similar to the checks of 'attr->attisdropped'.
>

Hmm. I thought the BMS idea of patch 0001 is to discover what columns
should be replicated up-front. If they should not be replicated (e.g.
virtual generated columns cannot be) then they should never be in the
BMS.

So what you said ("There can be cases where we do not want a virtual
generated column but it would be present in BMS") should not be
happening. If that is happening then it sounds more like a bug in the
new BMS logic of pgoutput_column_list_init() function. In other words,
if what you say is true, then it seems like the current extra
conditions you have in patch 0004 are just a band-aid to cover a
problem of the BMS logic of patch 0001. Am I mistaken?

> > ======
> > src/backend/replication/pgoutput/pgoutput.c
> >
> > 4. send_relation_and_attrs
> >
> > (this is a similar comment for #2 above)
> >
> > IIUC of the advantages of the BitMapSet (BMS) idea in patch 0001 to
> > process the generated columns up-front means there is no need to check
> > them again in code like this.
> >
> > They should be discovered anyway in the subsequent check:
> > /* Skip this attribute if it's not present in the column list */
> > if (columns != NULL && !bms_is_member(att->attnum, columns))
> >   continue;
> Same explanation as above.

As above.

======
src/test/subscription/t/011_generated.pl

I'm not sure if you needed to say "STORED" generated cols for the
subscriber-side columns but anyway, whatever is done needs to be done
consistently. FYI, below you did *not* say STORED for subscriber-side
generated cols, but in other comments for subscriber-side generated
columns, you did say STORED.

# tab3:
# publisher-side tab3 has STORED generated col 'b' but
# subscriber-side tab3 has DIFFERENT COMPUTATION generated col 'b'.

~

# tab4:
# publisher-side tab4 has STORED generated cols 'b' and 'c' but
# subscriber-side tab4 has non-generated col 'b', and generated-col 'c'
# where columns on publisher/subscriber are in a different order

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham/Shlok, I was thinking some more about the suggested new
BitMapSet (BMS) idea of patch 0001 that changes the 'columns' meaning
to include generated cols also where necessary.

I feel it is a bit risky to change lots of code without being 100%
confident it will still be in the final push. It's also going to make
the reviewing job harder if stuff gets added and then later removed.

IMO it might be better to revert all the patches (mostly 0001, but
also parts of subsequent patches) to their pre-BMS-change ~v14* state.
Then all the BMS "improvement" can be kept isolated in a new patch
0004.

Some more reasons to split this off into a separate patch are:

* The BMS change is essentially a redesign/cleanup of the code but is
nothing to do with the actual *functionality* of the new "generated
columns" feature.

* Apart from the BMS change I think the rest of the patches are nearly
stable now. So it might be good to get it all finished so the BMS
change can be tackled separately.

* By isolating the BMS change, then we will be able to see exactly
what is the code cost/benefit (e.g. removal of redundant code versus
adding new logic) which is part of the judgement to decide whether to
do it this way or not.

* By isolating the BMS change, then it makes it convenient for testing
before/after in case there are any performance concerns

* By isolating the BMS change, if some unexpected obstacle is
encountered that makes it unfeasible then we can just throw away patch
0004 and everything else (patches 0001,0002,0003) will still be good
to go.

Thoughts?

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Mon, Jul 8, 2024 at 10:53 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are review comments for v15-0001
>
> ======
> doc/src/sgml/ddl.sgml
>
> nitpick - there was a comma (,) which should be a period (.)
>
> ======
> .../libpqwalreceiver/libpqwalreceiver.c
>
> 1.
> + if (options->proto.logical.include_generated_columns &&
> + PQserverVersion(conn->streamConn) >= 170000)
> + appendStringInfoString(&cmd, ", include_generated_columns 'true'");
> +
>
> Should now say >= 180000
>
> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> nitpick - comment wording for RelationSyncEntry.collist.
>
> ~~
>
> 2.
> pgoutput_column_list_init:
>
> I found the current logic to be quite confusing. I assume the code is
> working OK, because AFAIK there are plenty of tests and they are all
> passing, but the logic seems somewhat repetitive and there are also no
> comments to explain it adding to my confusion.
>
> IIUC, PRIOR TO THIS PATCH:
>
> BMS field 'columns' represented the "columns of the column list" or it
> was NULL if there was no publication column list (and it was also NULL
> if the column list contained every column).
>
> IIUC NOW, WITH THIS PATCH:
>
> The BMS field 'columns' meaning is changed slightly to be something
> like "columns to be replicated" or NULL if all columns are to be
> replicated. This is almost the same thing except we are now handing
> the generated columns up-front, so generated columns will or won't
> appear in the BMS according to the "include_generated_columns"
> parameter. See how this is all a bit subtle which is why copious new
> comments are required to explain it...
>
> So, although the test result evidence suggests this is working OK, I
> have many questions/issues about it. Here are some to start with:
>
> 2a. It needs a lot more (summary and detailed) comments explaining the
> logic now that the meaning is slightly different.
>
> 2b. What is the story with the FOR ALL TABLES case now? Previously,
> there would always be NULL 'columns' for "FOR ALL TABLES" case -- the
> comment still says so. But now you've tacked on a 2nd pass of
> iterations to build the BMS outside of the "if (!pub->alltables)"
> check. Is that OK?
>
> 2c. The following logic seemed unexpected:
> - if (bms_num_members(cols) == nliveatts)
> + if (bms_num_members(cols) == nliveatts &&
> + data->include_generated_columns)
>   {
>   bms_free(cols);
>   cols = NULL;
> `
> I had thought the above code would look different -- more like:
> if (att->attgenerated && !data->include_generated_columns)
>   continue;
>
> nliveatts++;
> ...
>
> 2d. Was so much duplicated code necessary? It feels like the whole
> "Get the number of live attributes." and assignment of cols to NULL
> might be made common to both code paths.
>
> 2e. I'm beginning to question the pros/cons of the new BMS logic; I
> had suggested trying this way (processing the generated columns
> up-front in the BMS 'columns' list) to reduce patch code and simplify
> all the subsequent API delegation of "include_generated_cloumns"
> everywhere like it was in v14-0001. Indeed, that part was a success
> and the patch is now smaller. But I don't like much that we've traded
> reduced code overall for increased confusing code in that BMS
> function. If all this BMS code can be refactored and commented to be
> easier to understand then maybe all will be well, but if it can't then
> maybe this BMS change was a bridge too far. I haven't given up on it
> just yet, but I wonder what was your opinion about it, and do other
> people have thoughts about whether this was the good direction to
> take?

I have created a separate patch(v17-0004) for this idea. Will address
this comment in the next version of patches.

> ======
> src/bin/pg_dump/pg_dump.c
>
> 3.
> + if (fout->remoteVersion >= 170000)
> + appendPQExpBufferStr(query,
> + " s.subincludegencols\n");
> + else
> + appendPQExpBufferStr(query,
> + " false AS subincludegencols\n");
>
> Should now say >= 180000
>
> ======
> src/bin/psql/describe.c
>
> 4.
> + /* include_generated_columns is only supported in v18 and higher */
> + if (pset.sversion >= 170000)
> + appendPQExpBuffer(&buf,
> +   ", subincludegencols AS \"%s\"\n",
> +   gettext_noop("Include generated columns"));
> +
>
> Should now say >= 180000
>
> ======
> src/include/catalog/pg_subscription.h
>
> nitpick - let's make the comment the same as in WalRcvStreamOptions
>
> ======
> src/include/replication/logicalproto.h
>
> nitpick - extern for logicalrep_write_update should be unchanged by this patch
>
> ======
> src/test/regress/sql/subscription.sql
>
> nitpick = the comment "include_generated_columns and copy_data = true
> are mutually exclusive" is not necessary because this all falls under
> the existing comment "fail - invalid option combinations"
>
> nitpick - let's explicitly put "copy_data = true" in the CREATE
> SUBSCRIPTION to make it more obvious
>
> ======
> 99. Please also refer to the attached 'diffs' patch which implements
> all of my nitpicks issues mentioned above.

The attached Patches contain all the suggested changes. Here, v17-0001
is modified to fix the comments, v17-0002 and v17-0003 are modified
according to the changes in v17-0001 patch and v17-0004 patch contains
the changes related to Bitmapset(BMS) idea that changes the 'columns'
meaning to include generated cols also where necessary.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Wed, Jul 10, 2024 at 4:22 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shubham/Shlok, I was thinking some more about the suggested new
> BitMapSet (BMS) idea of patch 0001 that changes the 'columns' meaning
> to include generated cols also where necessary.
>
> I feel it is a bit risky to change lots of code without being 100%
> confident it will still be in the final push. It's also going to make
> the reviewing job harder if stuff gets added and then later removed.
>
> IMO it might be better to revert all the patches (mostly 0001, but
> also parts of subsequent patches) to their pre-BMS-change ~v14* state.
> Then all the BMS "improvement" can be kept isolated in a new patch
> 0004.
>
> Some more reasons to split this off into a separate patch are:
>
> * The BMS change is essentially a redesign/cleanup of the code but is
> nothing to do with the actual *functionality* of the new "generated
> columns" feature.
>
> * Apart from the BMS change I think the rest of the patches are nearly
> stable now. So it might be good to get it all finished so the BMS
> change can be tackled separately.
>
> * By isolating the BMS change, then we will be able to see exactly
> what is the code cost/benefit (e.g. removal of redundant code versus
> adding new logic) which is part of the judgement to decide whether to
> do it this way or not.
>
> * By isolating the BMS change, then it makes it convenient for testing
> before/after in case there are any performance concerns
>
> * By isolating the BMS change, if some unexpected obstacle is
> encountered that makes it unfeasible then we can just throw away patch
> 0004 and everything else (patches 0001,0002,0003) will still be good
> to go.

As suggested, I have created  a separate patch for the Bitmapset(BMS)
idea of patch 0001 that changes the 'columns' meaning to include
generated cols also where necessary.
Please refer to the updated v17 Patches here in [1]. See [1] for the
changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjJ0gAUd62PvBRXCPYy2oTNZWEY-Qe8cBNzQaJPVMZCeGA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham.

Thanks for separating the new BMS 'columns' modification.

Here are my review comments for the latest patch v17-0001.

======

1. src/backend/replication/pgoutput/pgoutput.c

  /*
  * Columns included in the publication, or NULL if all columns are
  * included implicitly.  Note that the attnums in this bitmap are not
+ * publication and include_generated_columns option: other reasons should
+ * be checked at user side.  Note that the attnums in this bitmap are not
  * shifted by FirstLowInvalidHeapAttributeNumber.
  */
  Bitmapset  *columns;
With this latest 0001 there is now no change to the original
interpretation of RelationSyncEntry BMS 'columns'. So, I think this
field comment should remain unchanged; i.e. it should be the same as
the current HEAD comment.

======
src/test/subscription/t/011_generated.pl

nitpick - comment changes for 'tab2' and 'tab3' to make them more consistent.

======
99.
Please refer to the attached diff patch which implements any nitpicks
described above.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, here are some review comments about patch v17-0003

======
1.
Missing a docs change?

Previously, (v16-0002) the patch included a change to
doc/src/sgml/protocol.sgml like below to say STORED generated instead
of just generated.

        <para>
-        Boolean option to enable generated columns. This option controls
-        whether generated columns should be included in the string
-        representation of tuples during logical decoding in PostgreSQL.
+        Boolean option to enable <literal>STORED</literal> generated columns.
+        This option controls whether <literal>STORED</literal>
generated columns
+        should be included in the string representation of tuples
during logical
+        decoding in PostgreSQL.
        </para>

Why is that v16 change no longer present in patch v17-0003?

======
src/backend/catalog/pg_publication.c

2.
Previously, (v16-0003) this patch included a change to clarify the
kind of generated cols that are allowed in a column list.

  * Translate a list of column names to an array of attribute numbers
  * and a Bitmapset with them; verify that each attribute is appropriate
- * to have in a publication column list (no system or generated attributes,
- * no duplicates).  Additional checks with replica identity are done later;
- * see pub_collist_contains_invalid_column.
+ * to have in a publication column list (no system or virtual generated
+ * attributes, no duplicates). Additional checks with replica identity
+ * are done later; see pub_collist_contains_invalid_column.

Why is that v16 change no longer present in patch v17-0003?

======
src/backend/replication/logical/tablesync.c

3. make_copy_attnamelist

- if (!attr->attgenerated)
+ if (attr->attgenerated != ATTRIBUTE_GENERATED_STORED)
  continue;

IIUC this logic is checking to make sure the subscriber-side table
column was not a generated column (because we don't replicate on top
of generated columns). So, does the distinction of STORED/VIRTUAL
really matter here?

~~~

fetch_remote_table_info:
nitpick - Should not change any spaces unrelated to the patch

======

send_relation_and_attrs:

- if (att->attgenerated && !include_generated_columns)
+ if (att->attgenerated && (att->attgenerated !=
ATTRIBUTE_GENERATED_STORED || !include_generated_columns))
  continue;

nitpick - It seems over-complicated. Conditions can be split so the
code fragment looks the same as in other places in this patch.

======
99.
Please see the attached diffs patch that implements any nitpicks
mentioned above.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, I had a quick look at the patch v17-0004 which is the split-off
new BMS logic.

IIUC this 0004 is currently undergoing some refactoring and
cleaning-up, so I won't comment much about it except to give the
following observation below.

======
src/backend/replication/logical/proto.c.

I did not expect to see any code fragments that are still checking
generated columns like below:

logicalrep_write_tuple:

  if (att->attgenerated)
  {
- if (!include_generated_columns)
- continue;

  if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
  continue;
~

  if (att->attgenerated)
  {
- if (!include_generated_columns)
- continue;

  if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
  continue;

~~~

logicalrep_write_attrs:

  if (att->attgenerated)
  {
- if (!include_generated_columns)
- continue;

  if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
  continue;

~
if (att->attgenerated)
  {
- if (!include_generated_columns)
- continue;

  if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
  continue;
~~~


AFAIK, now checking support of generated columns will be done when the
BMS 'columns' is assigned, so the continuation code will be handled
like this:

if (!column_in_column_list(att->attnum, columns))
  continue;

======

BTW there is a subtle but significant difference in this 0004 patch.
IOW, we are introducing a difference between the list of published
columns VERSUS a publication column list. So please make sure that all
code comments are adjusted appropriately so they are not misleading by
calling these "column lists" still.

BEFORE: BMS 'columns'  means "columns of the column list" or NULL if
there was no publication column list
AFTER: BMS 'columns' means "columns to be replicated" or NULL if all
columns are to be replicated

======
Kind Regards,
Peter Smith.



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Tue, 9 Jul 2024 at 07:14, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shlok, Here are my review comments for v16-0002
>
> ======
> src/test/subscription/t/004_sync.pl
>
> > > 5.
> > > Here, you are confirming we get an ERROR when replicating from a
> > > non-generated column to a generated column. But I think your patch
> > > also added exactly that same test scenario in the 011_generated (as
> > > the sub5 test). So, maybe this one here should be removed?
> >
> > For 0004_sync.pl, it is tested when 'include_generated_columns' is not
> > specified. Whereas for the test in 011_generated
> > 'include_generated_columns = true' is specified.
> > I thought we should have a test for both cases to test if the error
> > message format is the same for both cases. Thoughts?
>
> 3.
> Sorry, I missed that there was a parameter flag difference. Anyway,
> since the code-path to reach this error is the same regardless of the
> 'include_generated_columns' parameter value IMO having too many tests
> might be overkill. YMMV.
>
> Anyway, whether you decide to keep both test cases or not, I think all
> testing related to generated column replication belongs in the new
> 001_generated.pl TAP file -- not here in 04_sync.pl
I have removed the test

> ======
> src/test/subscription/t/011_generated.pl
>
> 4. Untested scenarios for "missing col"?
>
> I have seen (in 04_sync.pl) missing column test cases for:
> - publisher not-generated col ==> subscriber missing column
>
> Maybe I am mistaken, but I don't recall seeing any test cases for:
> - publisher generated-col ==> subscriber missing col
>
> Unless they are already done somewhere, I think this scenario should
> be in 011_generated.pl. Furthermore, maybe it needs to be tested for
> both include_generated_columns = true / false, because if the
> parameter is false it should be OK, but if the parameter is true it
> should give ERROR.
 Have added the tests in 011_generated.pl

I have also addressed the remaining comments. Please find the updated
v18 patches

v18-0001 - Rebased the patch on HEAD
v18-0002 - Addressed the comments
v18-0003 - Addressed the comments
v18-0004- Rebased the patch

Thanks and Regards,
Shlok Kyal

Attachment

Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Tue, 9 Jul 2024 at 09:53, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shlok, here are my review comments for v16-0003.
>
> ======
> src/backend/replication/logical/proto.c
>
>
> On Mon, Jul 8, 2024 at 10:04 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >
> > On Mon, 8 Jul 2024 at 13:20, Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > >
> > > 2. logicalrep_write_tuple and logicalrep_write_attrs
> > >
> > > I thought all the code fragments like this:
> > >
> > > + if (att->attgenerated && att->attgenerated != ATTRIBUTE_GENERATED_STORED)
> > > + continue;
> > > +
> > >
> > > don't need to be in the code anymore, because of the BitMapSet (BMS)
> > > processing done to make the "column list" for publication where
> > > disallowed generated cols should already be excluded from the BMS,
> > > right?
> > >
> > > So shouldn't all these be detected by the following statement:
> > > if (!column_in_column_list(att->attnum, columns))
> > >   continue;
> > The current BMS logic do not handle the Virtual Generated Columns.
> > There can be cases where we do not want a virtual generated column but
> > it would be present in BMS.
> > To address this I have added the above logic. I have added this logic
> > similar to the checks of 'attr->attisdropped'.
> >
>
> Hmm. I thought the BMS idea of patch 0001 is to discover what columns
> should be replicated up-front. If they should not be replicated (e.g.
> virtual generated columns cannot be) then they should never be in the
> BMS.
>
> So what you said ("There can be cases where we do not want a virtual
> generated column but it would be present in BMS") should not be
> happening. If that is happening then it sounds more like a bug in the
> new BMS logic of pgoutput_column_list_init() function. In other words,
> if what you say is true, then it seems like the current extra
> conditions you have in patch 0004 are just a band-aid to cover a
> problem of the BMS logic of patch 0001. Am I mistaken?
>
We have created a 0004 patch to use the BMS approach. It will be
addressed in the future 0004 patch.

> > > ======
> > > src/backend/replication/pgoutput/pgoutput.c
> > >
> > > 4. send_relation_and_attrs
> > >
> > > (this is a similar comment for #2 above)
> > >
> > > IIUC of the advantages of the BitMapSet (BMS) idea in patch 0001 to
> > > process the generated columns up-front means there is no need to check
> > > them again in code like this.
> > >
> > > They should be discovered anyway in the subsequent check:
> > > /* Skip this attribute if it's not present in the column list */
> > > if (columns != NULL && !bms_is_member(att->attnum, columns))
> > >   continue;
> > Same explanation as above.
>
> As above.
>
We have created a 0004 patch to use the BMS approach. It will be
addressed in the future 0004 patch.

> ======
> src/test/subscription/t/011_generated.pl
>
> I'm not sure if you needed to say "STORED" generated cols for the
> subscriber-side columns but anyway, whatever is done needs to be done
> consistently. FYI, below you did *not* say STORED for subscriber-side
> generated cols, but in other comments for subscriber-side generated
> columns, you did say STORED.
>
> # tab3:
> # publisher-side tab3 has STORED generated col 'b' but
> # subscriber-side tab3 has DIFFERENT COMPUTATION generated col 'b'.
>
> ~
>
> # tab4:
> # publisher-side tab4 has STORED generated cols 'b' and 'c' but
> # subscriber-side tab4 has non-generated col 'b', and generated-col 'c'
> # where columns on publisher/subscriber are in a different order
>
Fixed

Please find the updated patch v18-0003 patch at [1].

[1]: https://www.postgresql.org/message-id/CANhcyEW3LVJpRPScz6VBa%3DZipEMV7b-u76PDEALNcNDFURCYMA%40mail.gmail.com

Thanks and Regards,
Shok Kyal



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Mon, 15 Jul 2024 at 08:08, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, here are some review comments about patch v17-0003

I have addressed the comments in v18-0003 patch [1].

[1]: https://www.postgresql.org/message-id/CANhcyEW3LVJpRPScz6VBa%3DZipEMV7b-u76PDEALNcNDFURCYMA%40mail.gmail.com

Thanks and Regards,
Shlok Kyal



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Fri, Jul 12, 2024 at 12:13 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shubham.
>
> Thanks for separating the new BMS 'columns' modification.
>
> Here are my review comments for the latest patch v17-0001.
>
> ======
>
> 1. src/backend/replication/pgoutput/pgoutput.c
>
>   /*
>   * Columns included in the publication, or NULL if all columns are
>   * included implicitly.  Note that the attnums in this bitmap are not
> + * publication and include_generated_columns option: other reasons should
> + * be checked at user side.  Note that the attnums in this bitmap are not
>   * shifted by FirstLowInvalidHeapAttributeNumber.
>   */
>   Bitmapset  *columns;
> With this latest 0001 there is now no change to the original
> interpretation of RelationSyncEntry BMS 'columns'. So, I think this
> field comment should remain unchanged; i.e. it should be the same as
> the current HEAD comment.
>
> ======
> src/test/subscription/t/011_generated.pl
>
> nitpick - comment changes for 'tab2' and 'tab3' to make them more consistent.
>
> ======
> 99.
> Please refer to the attached diff patch which implements any nitpicks
> described above.

The attached Patches contain all the suggested changes.

v19-0001 - Addressed the comments.
v19-0002 - Rebased the Patch.
v19-0003 - Rebased the Patch.
v19-0004- Addressed all the comments related to Bitmapset(BMS).

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Mon, Jul 15, 2024 at 11:09 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, I had a quick look at the patch v17-0004 which is the split-off
> new BMS logic.
>
> IIUC this 0004 is currently undergoing some refactoring and
> cleaning-up, so I won't comment much about it except to give the
> following observation below.
>
> ======
> src/backend/replication/logical/proto.c.
>
> I did not expect to see any code fragments that are still checking
> generated columns like below:
>
> logicalrep_write_tuple:
>
>   if (att->attgenerated)
>   {
> - if (!include_generated_columns)
> - continue;
>
>   if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
>   continue;
> ~
>
>   if (att->attgenerated)
>   {
> - if (!include_generated_columns)
> - continue;
>
>   if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
>   continue;
>
> ~~~
>
> logicalrep_write_attrs:
>
>   if (att->attgenerated)
>   {
> - if (!include_generated_columns)
> - continue;
>
>   if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
>   continue;
>
> ~
> if (att->attgenerated)
>   {
> - if (!include_generated_columns)
> - continue;
>
>   if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
>   continue;
> ~~~
>
>
> AFAIK, now checking support of generated columns will be done when the
> BMS 'columns' is assigned, so the continuation code will be handled
> like this:
>
> if (!column_in_column_list(att->attnum, columns))
>   continue;
>
> ======
>
> BTW there is a subtle but significant difference in this 0004 patch.
> IOW, we are introducing a difference between the list of published
> columns VERSUS a publication column list. So please make sure that all
> code comments are adjusted appropriately so they are not misleading by
> calling these "column lists" still.
>
> BEFORE: BMS 'columns'  means "columns of the column list" or NULL if
> there was no publication column list
> AFTER: BMS 'columns' means "columns to be replicated" or NULL if all
> columns are to be replicated

I have addressed all the comments in v19-0004 patch.
Please refer to the updated v19-0004 Patch here in [1]. See [1] for
the changes added.

[1] https://www.postgresql.org/message-id/CAHv8Rj%2BR0cj%3Dz1bTMAgQKQWx1EKvkMEnV9QsHGvOqTdnLUQi1A%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham, here are my review comments for patch v19-0001.

======
src/backend/replication/pgoutput/pgoutput.c

1.
  /*
  * Columns included in the publication, or NULL if all columns are
  * included implicitly.  Note that the attnums in this bitmap are not
+ * publication and include_generated_columns option: other reasons should
+ * be checked at user side.  Note that the attnums in this bitmap are not
  * shifted by FirstLowInvalidHeapAttributeNumber.
  */
  Bitmapset  *columns;
You replied [1] "The attached Patches contain all the suggested
changes." but as I previously commented [2, #1], since there is no
change to the interpretation of the 'columns' BMS caused by this
patch, then I expected this comment would be unchanged (i.e. same as
HEAD code). But this fix was missed in v19-0001.

OTOH, if you do think there was a reason to change the comment then
the above is still not good because "are not publication and
include_generated_columns option" wording doesn't make sense.

======
src/test/subscription/t/011_generated.pl

Observation -- I added (in nitpicks diffs) some more comments for
'tab1' (to make all comments consistent with the new tests added). But
when I was doing that I observed that tab1 and tab3 test scenarios are
very similar. It seems only the subscription parameter is not
specified (so 'include_generated_cols' default wll be tested). IIRC
the default for that parameter is "false", so tab1 is not really
testing that properly -- e.g. I thought maybe to test the default
parameter it's better the subscriber-side 'b' should be not-generated?
But doing that would make 'tab1' the same as 'tab2'. Anyway, something
seems amiss -- it seems either something is not tested or is duplicate
tested. Please revisit what the tab1 test intention was and make sure
we are doing the right thing for it...

======
99.
The attached nitpicks diff patch has some tweaked comments.

======
[1] https://www.postgresql.org/message-id/CAHv8Rj%2BR0cj%3Dz1bTMAgQKQWx1EKvkMEnV9QsHGvOqTdnLUQi1A%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAHut%2BPtVfrbx0jb42LCmS%3D-LcMTtWxm%2BvhaoArkjg7Z0mvuXbg%40mail.gmail.com


Kind Regards,
Peter Smith.
Fujitsu Australia.

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, here are some review comments for v19-0002

======
src/backend/replication/logical/tablesync.c

make_copy_attnamelist:
nitpick - tweak function comment
nitpick - tweak other comments

~~~

fetch_remote_table_info:
nitpick - add space after "if"
nitpick - removed a comment because logic is self-evident from the variable name

======
src/test/subscription/t/004_sync.pl

1.
This new test is not related to generated columns. IIRC, this is just
some test that we discovered missing during review of this thread. As
such, I think this change can be posted/patched separately from this
thread.

======
src/test/subscription/t/011_generated.pl

nitpick - change some comment wording to be more consistent with patch 0001.

======
99.
Please see the nitpicks diff attachment which implements any nitpicks
mentioned above.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, here are some review comments for patch v19-0003

======
src/backend/catalog/pg_publication.c

1.
/*
 * Translate a list of column names to an array of attribute numbers
 * and a Bitmapset with them; verify that each attribute is appropriate
 * to have in a publication column list (no system or generated attributes,
 * no duplicates).  Additional checks with replica identity are done later;
 * see pub_collist_contains_invalid_column.
 *
 * Note that the attribute numbers are *not* offset by
 * FirstLowInvalidHeapAttributeNumber; system columns are forbidden so this
 * is okay.
 */
static void
publication_translate_columns(Relation targetrel, List *columns,
  int *natts, AttrNumber **attrs)

~

I though the above comment ought to change: /or generated
attributes/or virtual generated attributes/

IIRC this was already addressed back in v16, but somehow that fix has
been lost (???).

======
src/backend/replication/logical/tablesync.c

fetch_remote_table_info:
nitpick - missing end space in this comment /* TODO: use
ATTRIBUTE_GENERATED_VIRTUAL*/

======

2.
(in patch v19-0001)
+# tab3:
+# publisher-side tab3 has generated col 'b'.
+# subscriber-side tab3 has generated col 'b', using a different computation.

(here, in patch v19-0003)
 # tab3:
-# publisher-side tab3 has generated col 'b'.
-# subscriber-side tab3 has generated col 'b', using a different computation.
+# publisher-side tab3 has stored generated col 'b' but
+# subscriber-side tab3 has DIFFERENT COMPUTATION stored generated col 'b'.

It has become difficult to review these TAP tests, particularly when
different patches are modifying the same comment. e.g. I post
suggestions to modify comments for patch 0001. Those get addressed OK,
only to vanish in subsequent patches like has happened in the above
example.

Really this patch 0003 was only supposed to add the word "stored", not
revert the entire comment to something from an earlier version. Please
take care that all comment changes are carried forward correctly from
one patch to the next.

======
Kind Regards,
Peter Smith.
Fujitsu Australia.



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Thu, 18 Jul 2024 at 13:55, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, here are some review comments for v19-0002
> ======
> src/test/subscription/t/004_sync.pl
>
> 1.
> This new test is not related to generated columns. IIRC, this is just
> some test that we discovered missing during review of this thread. As
> such, I think this change can be posted/patched separately from this
> thread.
>
I have removed the test for this thread.

I have also addressed the remaining comments for v19-0002 patch.
Please find the latest patches.

v20-0001 - not modified
v20-0002 - Addressed the comments
v20-0003 - Addressed the comments
v20-0004 - Not modified

Thanks and Regards,
Shlok Kyal

Attachment

Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Fri, 19 Jul 2024 at 04:59, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, here are some review comments for patch v19-0003
>
> ======
> src/backend/catalog/pg_publication.c
>
> 1.
> /*
>  * Translate a list of column names to an array of attribute numbers
>  * and a Bitmapset with them; verify that each attribute is appropriate
>  * to have in a publication column list (no system or generated attributes,
>  * no duplicates).  Additional checks with replica identity are done later;
>  * see pub_collist_contains_invalid_column.
>  *
>  * Note that the attribute numbers are *not* offset by
>  * FirstLowInvalidHeapAttributeNumber; system columns are forbidden so this
>  * is okay.
>  */
> static void
> publication_translate_columns(Relation targetrel, List *columns,
>   int *natts, AttrNumber **attrs)
>
> ~
>
> I though the above comment ought to change: /or generated
> attributes/or virtual generated attributes/
>
> IIRC this was already addressed back in v16, but somehow that fix has
> been lost (???).
Modified the comment

> ======
> src/backend/replication/logical/tablesync.c
>
> fetch_remote_table_info:
> nitpick - missing end space in this comment /* TODO: use
> ATTRIBUTE_GENERATED_VIRTUAL*/
>
Fixed

> ======
>
> 2.
> (in patch v19-0001)
> +# tab3:
> +# publisher-side tab3 has generated col 'b'.
> +# subscriber-side tab3 has generated col 'b', using a different computation.
>
> (here, in patch v19-0003)
>  # tab3:
> -# publisher-side tab3 has generated col 'b'.
> -# subscriber-side tab3 has generated col 'b', using a different computation.
> +# publisher-side tab3 has stored generated col 'b' but
> +# subscriber-side tab3 has DIFFERENT COMPUTATION stored generated col 'b'.
>
> It has become difficult to review these TAP tests, particularly when
> different patches are modifying the same comment. e.g. I post
> suggestions to modify comments for patch 0001. Those get addressed OK,
> only to vanish in subsequent patches like has happened in the above
> example.
>
> Really this patch 0003 was only supposed to add the word "stored", not
> revert the entire comment to something from an earlier version. Please
> take care that all comment changes are carried forward correctly from
> one patch to the next.
Fixed

I have addressed the comment in v20-0003 patch. Please refer [1].

[1]: https://www.postgresql.org/message-id/CANhcyEUzUurrX38HGvG30gV92YDz6WmnnwNRYMVY4tiga-8KZg%40mail.gmail.com

Thanks and Regards,
Shlok Kyal



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Fri, Jul 19, 2024 at 4:01 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> On Thu, 18 Jul 2024 at 13:55, Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi, here are some review comments for v19-0002
> > ======
> > src/test/subscription/t/004_sync.pl
> >
> > 1.
> > This new test is not related to generated columns. IIRC, this is just
> > some test that we discovered missing during review of this thread. As
> > such, I think this change can be posted/patched separately from this
> > thread.
> >
> I have removed the test for this thread.
>
> I have also addressed the remaining comments for v19-0002 patch.

Hi, I have no more review comments for patch v20-0002 at this time.

I saw that the above test was removed from this thread as suggested,
but I could not find that any new thread was started to propose this
valuable missing test.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Thu, Jul 18, 2024 at 10:47 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shubham, here are my review comments for patch v19-0001.
>
> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 1.
>   /*
>   * Columns included in the publication, or NULL if all columns are
>   * included implicitly.  Note that the attnums in this bitmap are not
> + * publication and include_generated_columns option: other reasons should
> + * be checked at user side.  Note that the attnums in this bitmap are not
>   * shifted by FirstLowInvalidHeapAttributeNumber.
>   */
>   Bitmapset  *columns;
> You replied [1] "The attached Patches contain all the suggested
> changes." but as I previously commented [2, #1], since there is no
> change to the interpretation of the 'columns' BMS caused by this
> patch, then I expected this comment would be unchanged (i.e. same as
> HEAD code). But this fix was missed in v19-0001.
>
> OTOH, if you do think there was a reason to change the comment then
> the above is still not good because "are not publication and
> include_generated_columns option" wording doesn't make sense.
>
> ======
> src/test/subscription/t/011_generated.pl
>
> Observation -- I added (in nitpicks diffs) some more comments for
> 'tab1' (to make all comments consistent with the new tests added). But
> when I was doing that I observed that tab1 and tab3 test scenarios are
> very similar. It seems only the subscription parameter is not
> specified (so 'include_generated_cols' default wll be tested). IIRC
> the default for that parameter is "false", so tab1 is not really
> testing that properly -- e.g. I thought maybe to test the default
> parameter it's better the subscriber-side 'b' should be not-generated?
> But doing that would make 'tab1' the same as 'tab2'. Anyway, something
> seems amiss -- it seems either something is not tested or is duplicate
> tested. Please revisit what the tab1 test intention was and make sure
> we are doing the right thing for it...
>
> ======
> 99.
> The attached nitpicks diff patch has some tweaked comments.
>
> ======
> [1] https://www.postgresql.org/message-id/CAHv8Rj%2BR0cj%3Dz1bTMAgQKQWx1EKvkMEnV9QsHGvOqTdnLUQi1A%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/CAHut%2BPtVfrbx0jb42LCmS%3D-LcMTtWxm%2BvhaoArkjg7Z0mvuXbg%40mail.gmail.com

The attached Patches contain all the suggested changes.

v21-0001 - Addressed the comments.
v21-0002 - Added the TAP Tests for 011_generated.pl file and modified
the patch accordingly.
v21-0003 - Added the TAP Tests for 011_generated.pl file and modified
the patch accordingly.
v21-0004- Rebased the Patch.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Thanks for the patch updates.

Here are my review comments for v21-0001.

I think this patch is mostly OK now except there are still some
comments about the TAP test.

======
Commit Message

0.
Using Create Subscription:
CREATE SUBSCRIPTION sub2_gen_to_gen CONNECTION '$publisher_connstr' PUBLICATION
pub1 WITH (include_generated_columns = true, copy_data = false)"

If you are going to give an example, I think a gen-to-nogen example
would be a better choice. That's because the gen-to-gen (as you have
here) is not going to replicate anything due to the subscriber-side
column being generated.

======
src/test/subscription/t/011_generated.pl

1.
+$node_subscriber2->safe_psql('postgres',
+ "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
* 22) STORED, c int)"
+);

The subscriber2 node was intended only for all the tables where we
need include_generated_columns to be true. Mostly that is the
combination tests. (tab_gen_to_nogen, tab_nogen_to_gen, etc) OTOH,
table 'tab1' already existed. I don't think we need to bother
subscribing to tab1 from subscriber2 because every combination is
already covered by the combination tests. Let's leave this one alone.


~~~

2.
Huh? Where is the "tab_nogen_to_gen" combination test that I sent to
you off-list?

~~~

3.
+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE tab_order (c int GENERATED ALWAYS AS (a * 22) STORED,
a int, b int)"
+);

Maybe you can test 'tab_order' on both subscription nodes but I think
it is overkill. IMO it is enough to test it on subscription2.

~~~

4.
+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE tab_alter (a int, b int, c int GENERATED ALWAYS AS (a
* 22) STORED)"
+);

Ditto above. Maybe you can test 'tab_order' on both subscription nodes
but I think it is overkill. IMO it is enough to test it on
subscription2.

~~~

5.
Don't forget to add initial data for the missing nogen_to_gen table/test.

~~~

6.
 $node_publisher->safe_psql('postgres',
- "CREATE PUBLICATION pub1 FOR ALL TABLES");
+ "CREATE PUBLICATION pub1 FOR TABLE tab1, tab_gen_to_gen,
tab_gen_to_nogen, tab_gen_to_missing, tab_missing_to_gen, tab_order");
+
 $node_subscriber->safe_psql('postgres',
  "CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub1"
 );

It is not a bad idea to reduce the number of publications as you have
done, but IMO jamming all the tables into 1 publication is too much
because it makes it less understandable instead of simpler.

How about this:
- leave the 'pub1' just for 'tab1'.
- have a 'pub_combo' for publication all the gen_to_nogen,
nogen_to_gen etc combination tests.
- and a 'pub_misc' for any other misc tables like tab_order.

~~~

7.
+#####################
 # Wait for initial sync of all subscriptions
+#####################

I think you should write a note here that you have deliberately set
copy_data = false because COPY and include_generated_columns are not
allowed at the same time for patch 0001. And that is why all expected
results on subscriber2 will be empty. Also, say this limitation will
be changed in patch 0002.

~~~

(I didn't yet check 011_generated.pl file results beyond this point...
I'll wait for v22-0001 to review further)

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Mon, Jul 29, 2024 at 12:57 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Thanks for the patch updates.
>
> Here are my review comments for v21-0001.
>
> I think this patch is mostly OK now except there are still some
> comments about the TAP test.
>
> ======
> Commit Message
>
> 0.
> Using Create Subscription:
> CREATE SUBSCRIPTION sub2_gen_to_gen CONNECTION '$publisher_connstr' PUBLICATION
> pub1 WITH (include_generated_columns = true, copy_data = false)"
>
> If you are going to give an example, I think a gen-to-nogen example
> would be a better choice. That's because the gen-to-gen (as you have
> here) is not going to replicate anything due to the subscriber-side
> column being generated.
>
> ======
> src/test/subscription/t/011_generated.pl
>
> 1.
> +$node_subscriber2->safe_psql('postgres',
> + "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> * 22) STORED, c int)"
> +);
>
> The subscriber2 node was intended only for all the tables where we
> need include_generated_columns to be true. Mostly that is the
> combination tests. (tab_gen_to_nogen, tab_nogen_to_gen, etc) OTOH,
> table 'tab1' already existed. I don't think we need to bother
> subscribing to tab1 from subscriber2 because every combination is
> already covered by the combination tests. Let's leave this one alone.
>
>
> ~~~
>
> 2.
> Huh? Where is the "tab_nogen_to_gen" combination test that I sent to
> you off-list?
>
> ~~~
>
> 3.
> +$node_subscriber->safe_psql('postgres',
> + "CREATE TABLE tab_order (c int GENERATED ALWAYS AS (a * 22) STORED,
> a int, b int)"
> +);
>
> Maybe you can test 'tab_order' on both subscription nodes but I think
> it is overkill. IMO it is enough to test it on subscription2.
>
> ~~~
>
> 4.
> +$node_subscriber->safe_psql('postgres',
> + "CREATE TABLE tab_alter (a int, b int, c int GENERATED ALWAYS AS (a
> * 22) STORED)"
> +);
>
> Ditto above. Maybe you can test 'tab_order' on both subscription nodes
> but I think it is overkill. IMO it is enough to test it on
> subscription2.
>
> ~~~
>
> 5.
> Don't forget to add initial data for the missing nogen_to_gen table/test.
>
> ~~~
>
> 6.
>  $node_publisher->safe_psql('postgres',
> - "CREATE PUBLICATION pub1 FOR ALL TABLES");
> + "CREATE PUBLICATION pub1 FOR TABLE tab1, tab_gen_to_gen,
> tab_gen_to_nogen, tab_gen_to_missing, tab_missing_to_gen, tab_order");
> +
>  $node_subscriber->safe_psql('postgres',
>   "CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub1"
>  );
>
> It is not a bad idea to reduce the number of publications as you have
> done, but IMO jamming all the tables into 1 publication is too much
> because it makes it less understandable instead of simpler.
>
> How about this:
> - leave the 'pub1' just for 'tab1'.
> - have a 'pub_combo' for publication all the gen_to_nogen,
> nogen_to_gen etc combination tests.
> - and a 'pub_misc' for any other misc tables like tab_order.
>
> ~~~
>
> 7.
> +#####################
>  # Wait for initial sync of all subscriptions
> +#####################
>
> I think you should write a note here that you have deliberately set
> copy_data = false because COPY and include_generated_columns are not
> allowed at the same time for patch 0001. And that is why all expected
> results on subscriber2 will be empty. Also, say this limitation will
> be changed in patch 0002.
>
> ~~~
>
> (I didn't yet check 011_generated.pl file results beyond this point...
> I'll wait for v22-0001 to review further)

The attached Patches contain all the suggested changes.

v22-0001 - Addressed the comments.
v22-0002 - Rebased the Patch.
v22-0003 - Rebased the Patch.
v22-0004 - Rebased the Patch.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, Here are my review comments for patch v22-0001

All comments now are only for the TAP test.

======
src/test/subscription/t/011_generated.pl

1. I added all new code for the missing combination test case
"gen-to-missing". See nitpicks diff.
- create a separate publication for this "tab_gen_to_missing" table
because the test gives subscription errors.
- for the initial data
- for the replicated data

~~~

2. I added sub1 and sub2 subscriptions for every combo test
(previously some were absent). See nitpicks diff.

~~~

3. There was a missing test case for nogen-to-gen combination, and
after experimenting with this I am getting a bit suspicious,

Currently, it seems that if a COPY is attempted then the error would
be like this:
2024-08-01 17:16:45.110 AEST [18942] ERROR:  column "b" is a generated column
2024-08-01 17:16:45.110 AEST [18942] DETAIL:  Generated columns cannot
be used in COPY.

OTOH, if a COPY is not attempted (e.g. copy_data = false) then patch
0001 allows replication to happen. And the generated value of the
subscriber "b" takes precedence.

I have included these tests in the nitpicks diff of patch 0001.

Those results weren't exactly what I was expecting.  That is why it is
so important to include *every* test combination in these TAP tests --
because unless we know how it works today, we won't know if we are
accidentally breaking the current behaviour with the other (0002,
0003) patches.

Please experiment in patches 0001 and 0002 using tab_nogen_to_gen more
to make sure the (new?) patch errors make sense and don't overstep by
giving ERRORs when they should not.

~~~~

Also, many other smaller issues/changes were done:

~~~

Creating tables:

nitpick - rearranged to keep all combo test SQLs in a consistent order
throughout this file
1/ gen-to-gen
2/ gen-to-nogen
3/ gen-to-missing
4/ missing-to-gen
5/ nogen-to-gen

nitpick - fixed the wrong comment for CREATE TABLE tab_nogen_to_gen.

nitpick - tweaked some CREATE TABLE comments for consistency.

nitpick - in the v22 patch many of the generated col 'b' use different
computations for every test. It makes it unnecessarily difficult to
read/review the expected results. So, I've made them all the same. Now
computation is "a * 2" on the publisher side, and "a * 22" on the
subscriber side.

~~~

Creating Publications and Subscriptions:


nitpick - added comment for all the CREATE PUBLICATION

nitpick - added comment for all the CREATE SUBSCRIPTION

nitpick - I moved the note about copy_data = false to where all the
node_subscriber2 subscriptions are created. Also, don't explicitly
refer to "patch 000" in the comment, because that will not make any
sense after getting pushed.

nitpick - I changed many subscriber names to consistently use "sub1"
or "sub2" within the name (this is the visual cue of which
node_subscriber<n> they are on). e.g.
/regress_sub_combo2/regress_sub2_combo/

~~~

Initial Sync tests:

nitpick - not sure if it is possible to do the initial data tests for
"nogen_to_gen" in the normal place. For now, it is just replaced by a
comment.
NOTE - Maybe this should be refactored later to put all the initial
data checks in one place. I'll think about this point more in the next
review.

~~~

nitpick - Changed cleanup I drop subscriptions before publications.

nitpick - remove the unnecessary blank line at the end.

======

Please see the attached diffs patch (apply it atop patch 0001) which
includes all the nipick changes mentioned above.

~~

BTW, For a quicker turnaround and less churning please consider just
posting the v23-0001 by itself instead of waiting to rebase all the
subsequent patches. When 0001 settles down some more then rebase the
others.

~~

Also, please run the indentation tool over this code ASAP.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Thu, Aug 1, 2024 at 2:02 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, Here are my review comments for patch v22-0001
>
> All comments now are only for the TAP test.
>
> ======
> src/test/subscription/t/011_generated.pl
>
> 1. I added all new code for the missing combination test case
> "gen-to-missing". See nitpicks diff.
> - create a separate publication for this "tab_gen_to_missing" table
> because the test gives subscription errors.
> - for the initial data
> - for the replicated data
>
> ~~~
>
> 2. I added sub1 and sub2 subscriptions for every combo test
> (previously some were absent). See nitpicks diff.
>
> ~~~
>
> 3. There was a missing test case for nogen-to-gen combination, and
> after experimenting with this I am getting a bit suspicious,
>
> Currently, it seems that if a COPY is attempted then the error would
> be like this:
> 2024-08-01 17:16:45.110 AEST [18942] ERROR:  column "b" is a generated column
> 2024-08-01 17:16:45.110 AEST [18942] DETAIL:  Generated columns cannot
> be used in COPY.
>
> OTOH, if a COPY is not attempted (e.g. copy_data = false) then patch
> 0001 allows replication to happen. And the generated value of the
> subscriber "b" takes precedence.
>
> I have included these tests in the nitpicks diff of patch 0001.
>
> Those results weren't exactly what I was expecting.  That is why it is
> so important to include *every* test combination in these TAP tests --
> because unless we know how it works today, we won't know if we are
> accidentally breaking the current behaviour with the other (0002,
> 0003) patches.
>
> Please experiment in patches 0001 and 0002 using tab_nogen_to_gen more
> to make sure the (new?) patch errors make sense and don't overstep by
> giving ERRORs when they should not.
>
> ~~~~
>
> Also, many other smaller issues/changes were done:
>
> ~~~
>
> Creating tables:
>
> nitpick - rearranged to keep all combo test SQLs in a consistent order
> throughout this file
> 1/ gen-to-gen
> 2/ gen-to-nogen
> 3/ gen-to-missing
> 4/ missing-to-gen
> 5/ nogen-to-gen
>
> nitpick - fixed the wrong comment for CREATE TABLE tab_nogen_to_gen.
>
> nitpick - tweaked some CREATE TABLE comments for consistency.
>
> nitpick - in the v22 patch many of the generated col 'b' use different
> computations for every test. It makes it unnecessarily difficult to
> read/review the expected results. So, I've made them all the same. Now
> computation is "a * 2" on the publisher side, and "a * 22" on the
> subscriber side.
>
> ~~~
>
> Creating Publications and Subscriptions:
>
>
> nitpick - added comment for all the CREATE PUBLICATION
>
> nitpick - added comment for all the CREATE SUBSCRIPTION
>
> nitpick - I moved the note about copy_data = false to where all the
> node_subscriber2 subscriptions are created. Also, don't explicitly
> refer to "patch 000" in the comment, because that will not make any
> sense after getting pushed.
>
> nitpick - I changed many subscriber names to consistently use "sub1"
> or "sub2" within the name (this is the visual cue of which
> node_subscriber<n> they are on). e.g.
> /regress_sub_combo2/regress_sub2_combo/
>
> ~~~
>
> Initial Sync tests:
>
> nitpick - not sure if it is possible to do the initial data tests for
> "nogen_to_gen" in the normal place. For now, it is just replaced by a
> comment.
> NOTE - Maybe this should be refactored later to put all the initial
> data checks in one place. I'll think about this point more in the next
> review.
>
> ~~~
>
> nitpick - Changed cleanup I drop subscriptions before publications.
>
> nitpick - remove the unnecessary blank line at the end.
>
> ======
>
> Please see the attached diffs patch (apply it atop patch 0001) which
> includes all the nipick changes mentioned above.
>
> ~~
>
> BTW, For a quicker turnaround and less churning please consider just
> posting the v23-0001 by itself instead of waiting to rebase all the
> subsequent patches. When 0001 settles down some more then rebase the
> others.
>
> ~~
>
> Also, please run the indentation tool over this code ASAP.
>
I have fixed all the comments. The attached Patch(v23-0001) contains
all the changes.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubhab.

Here are some more review comments for the v23-0001.

======
011_generated.pl b/src/test/subscription/t/011_generated.pl

nitpick - renamed /regress_pub/regress_pub_tab1/ and
/regress_sub1/regress_sub1_tab1/
nitpick - typo /inital data /initial data/
nitpick - typo /snode_subscriber2/node_subscriber2/
nitpick - tweak the combo initial sync comments and messages
nitpick - /#Cleanup/# cleanup/
nitpick - tweak all the combo normal replication comments
nitpick - removed blank line at the end

~~~

1. Refactor tab_gen_to_missing initial sync tests.

I moved the tab_gen_to_missing initial sync for node_subscriber2 to be
back where all the other initial sync tests are done.
See the nitpicks patch file.

~~~

2. Refactor tab_nogen_to_gen initial sync tests

I moved all the tab_nogen_to_gen initial sync tests back to where the
other initial sync tests are done.
See the nitpicks patch file.

~~~

3. Added another test case:

Because the (current PG17) nogen-to-gen initial sync test case (with
copy_data=true) gives an ERROR, I have added another combination to
cover normal replication (e.g. using copy_data=false).
See the nitpicks patch file.

(This has exposed an inconsistency which IMO might be a PG17 bug. I
have included TAP test comments about this, and plan to post a
separate thread for it later).

~

4. GUC

Moving and adding more CREATE SUBSCRIPTION exceeded some default GUCs,
so extra configuration was needed.
See the nitpick patch file.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi,

Writing many new test case combinations has exposed a possible bug in
patch 0001.

In my previous post [1] there was questionable behaviour when
replicating from a normal (not generated) column on the publisher side
to a generated column on the subscriber side. Initially, I thought the
test might have exposed a possible PG17 bug, but now I think it has
really found a bug in patch 0001.

~~~

Previously (PG17) this would fail consistently both during COPY and
during normal replication.Now, patch 0001 has changed this behaviour
-- it is not always failing anymore.

The patch should not be impacting this existing behaviour. It only
introduces a new 'include_generated_columns', but since the publisher
side is not a generated column I do not expect there should be any
difference in behaviour for this test case. IMO the TAP test expected
results should be corrected for this scenario. And fix the bug.

Below is an example demonstrating PG17 behaviour.

======


Publisher:
----------

(notice column "b" is not generated)

test_pub=# CREATE TABLE tab_nogen_to_gen (a int, b int);
CREATE TABLE
test_pub=# INSERT INTO tab_nogen_to_gen VALUES (1,101),(2,102);
INSERT 0 2
test_pub=# CREATE PUBLICATION pub1 for TABLE tab_nogen_to_gen;
CREATE PUBLICATION
test_pub=#

Subscriber:
-----------

(notice corresponding column "b" is generated)

test_sub=# CREATE TABLE tab_nogen_to_gen (a int, b int GENERATED
ALWAYS AS (a * 22) STORED);
CREATE TABLE
test_sub=#

Try to create a subscription. Notice we get the error: ERROR:  logical
replication target relation "public.tab_nogen_to_gen" is missing
replicated column: "b"

test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=test_pub'
PUBLICATION pub1;
2024-08-05 13:16:40.043 AEST [20957] WARNING:  subscriptions created
by regression test cases should have names starting with "regress_"
WARNING:  subscriptions created by regression test cases should have
names starting with "regress_"
NOTICE:  created replication slot "sub1" on publisher
CREATE SUBSCRIPTION
test_sub=# 2024-08-05 13:16:40.105 AEST [29258] LOG:  logical
replication apply worker for subscription "sub1" has started
2024-08-05 13:16:40.117 AEST [29260] LOG:  logical replication table
synchronization worker for subscription "sub1", table
"tab_nogen_to_gen" has started
2024-08-05 13:16:40.172 AEST [29260] ERROR:  logical replication
target relation "public.tab_nogen_to_gen" is missing replicated
column: "b"
2024-08-05 13:16:40.173 AEST [20039] LOG:  background worker "logical
replication tablesync worker" (PID 29260) exited with exit code 1
2024-08-05 13:16:45.187 AEST [29400] LOG:  logical replication table
synchronization worker for subscription "sub1", table
"tab_nogen_to_gen" has started
2024-08-05 13:16:45.285 AEST [29400] ERROR:  logical replication
target relation "public.tab_nogen_to_gen" is missing replicated
column: "b"
2024-08-05 13:16:45.286 AEST [20039] LOG:  background worker "logical
replication tablesync worker" (PID 29400) exited with exit code 1
...

Create the subscription again, but this time with copy_data = false

test_sub=# CREATE SUBSCRIPTION sub1_nocopy CONNECTION
'dbname=test_pub' PUBLICATION pub1 WITH (copy_data = false);
2024-08-05 13:22:57.719 AEST [20957] WARNING:  subscriptions created
by regression test cases should have names starting with "regress_"
WARNING:  subscriptions created by regression test cases should have
names starting with "regress_"
NOTICE:  created replication slot "sub1_nocopy" on publisher
CREATE SUBSCRIPTION
test_sub=# 2024-08-05 13:22:57.765 AEST [7012] LOG:  logical
replication apply worker for subscription "sub1_nocopy" has started

test_sub=#

~~~

Then insert data from the publisher to see what happens for normal replication.

test_pub=#
test_pub=# INSERT INTO tab_nogen_to_gen VALUES (3,103),(4,104);
INSERT 0 2

~~~

Notice the subscriber gets the same error as before: ERROR:  logical
replication target relation "public.tab_nogen_to_gen" is missing
replicated column: "b"

2024-08-05 13:25:14.897 AEST [20039] LOG:  background worker "logical
replication apply worker" (PID 10957) exited with exit code 1
2024-08-05 13:25:19.933 AEST [11095] LOG:  logical replication apply
worker for subscription "sub1_nocopy" has started
2024-08-05 13:25:19.966 AEST [11095] ERROR:  logical replication
target relation "public.tab_nogen_to_gen" is missing replicated
column: "b"
2024-08-05 13:25:19.966 AEST [11095] CONTEXT:  processing remote data
for replication origin "pg_16390" during message type "INSERT" in
transaction 742, finished at 0/1967BB0
2024-08-05 13:25:19.968 AEST [20039] LOG:  background worker "logical
replication apply worker" (PID 11095) exited with exit code 1
2024-08-05 13:25:24.917 AEST [11225] LOG:  logical replication apply
worker for subscription "sub1_nocopy" has started
2024-08-05 13:25:24.926 AEST [11225] ERROR:  logical replication
target relation "public.tab_nogen_to_gen" is missing replicated
column: "b"
2024-08-05 13:25:24.926 AEST [11225] CONTEXT:  processing remote data
for replication origin "pg_16390" during message type "INSERT" in
transaction 742, finished at 0/1967BB0
2024-08-05 13:25:24.927 AEST [20039] LOG:  background worker "logical
replication apply worker" (PID 11225) exited with exit code 1
...

======
[1] https://www.postgresql.org/message-id/CAHut%2BPvtT8fKOfvVYr4vANx_fr92vedas%2BZRbQxvMC097rks6w%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Mon, Aug 5, 2024 at 8:10 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shubhab.
>
> Here are some more review comments for the v23-0001.
>
> ======
> 011_generated.pl b/src/test/subscription/t/011_generated.pl
>
> nitpick - renamed /regress_pub/regress_pub_tab1/ and
> /regress_sub1/regress_sub1_tab1/
> nitpick - typo /inital data /initial data/
> nitpick - typo /snode_subscriber2/node_subscriber2/
> nitpick - tweak the combo initial sync comments and messages
> nitpick - /#Cleanup/# cleanup/
> nitpick - tweak all the combo normal replication comments
> nitpick - removed blank line at the end
>
> ~~~
>
> 1. Refactor tab_gen_to_missing initial sync tests.
>
> I moved the tab_gen_to_missing initial sync for node_subscriber2 to be
> back where all the other initial sync tests are done.
> See the nitpicks patch file.
>
> ~~~
>
> 2. Refactor tab_nogen_to_gen initial sync tests
>
> I moved all the tab_nogen_to_gen initial sync tests back to where the
> other initial sync tests are done.
> See the nitpicks patch file.
>
> ~~~
>
> 3. Added another test case:
>
> Because the (current PG17) nogen-to-gen initial sync test case (with
> copy_data=true) gives an ERROR, I have added another combination to
> cover normal replication (e.g. using copy_data=false).
> See the nitpicks patch file.
>
> (This has exposed an inconsistency which IMO might be a PG17 bug. I
> have included TAP test comments about this, and plan to post a
> separate thread for it later).
>
> ~
>
> 4. GUC
>
> Moving and adding more CREATE SUBSCRIPTION exceeded some default GUCs,
> so extra configuration was needed.
> See the nitpick patch file.
>

I have fixed all the comments. The attached Patch(v24-0001) contains
all the changes.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Mon, Aug 5, 2024 at 9:15 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi,
>
> Writing many new test case combinations has exposed a possible bug in
> patch 0001.
>
> In my previous post [1] there was questionable behaviour when
> replicating from a normal (not generated) column on the publisher side
> to a generated column on the subscriber side. Initially, I thought the
> test might have exposed a possible PG17 bug, but now I think it has
> really found a bug in patch 0001.
>
> ~~~
>
> Previously (PG17) this would fail consistently both during COPY and
> during normal replication.Now, patch 0001 has changed this behaviour
> -- it is not always failing anymore.
>
> The patch should not be impacting this existing behaviour. It only
> introduces a new 'include_generated_columns', but since the publisher
> side is not a generated column I do not expect there should be any
> difference in behaviour for this test case. IMO the TAP test expected
> results should be corrected for this scenario. And fix the bug.
>
> Below is an example demonstrating PG17 behaviour.
>
> ======
>
>
> Publisher:
> ----------
>
> (notice column "b" is not generated)
>
> test_pub=# CREATE TABLE tab_nogen_to_gen (a int, b int);
> CREATE TABLE
> test_pub=# INSERT INTO tab_nogen_to_gen VALUES (1,101),(2,102);
> INSERT 0 2
> test_pub=# CREATE PUBLICATION pub1 for TABLE tab_nogen_to_gen;
> CREATE PUBLICATION
> test_pub=#
>
> Subscriber:
> -----------
>
> (notice corresponding column "b" is generated)
>
> test_sub=# CREATE TABLE tab_nogen_to_gen (a int, b int GENERATED
> ALWAYS AS (a * 22) STORED);
> CREATE TABLE
> test_sub=#
>
> Try to create a subscription. Notice we get the error: ERROR:  logical
> replication target relation "public.tab_nogen_to_gen" is missing
> replicated column: "b"
>
> test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=test_pub'
> PUBLICATION pub1;
> 2024-08-05 13:16:40.043 AEST [20957] WARNING:  subscriptions created
> by regression test cases should have names starting with "regress_"
> WARNING:  subscriptions created by regression test cases should have
> names starting with "regress_"
> NOTICE:  created replication slot "sub1" on publisher
> CREATE SUBSCRIPTION
> test_sub=# 2024-08-05 13:16:40.105 AEST [29258] LOG:  logical
> replication apply worker for subscription "sub1" has started
> 2024-08-05 13:16:40.117 AEST [29260] LOG:  logical replication table
> synchronization worker for subscription "sub1", table
> "tab_nogen_to_gen" has started
> 2024-08-05 13:16:40.172 AEST [29260] ERROR:  logical replication
> target relation "public.tab_nogen_to_gen" is missing replicated
> column: "b"
> 2024-08-05 13:16:40.173 AEST [20039] LOG:  background worker "logical
> replication tablesync worker" (PID 29260) exited with exit code 1
> 2024-08-05 13:16:45.187 AEST [29400] LOG:  logical replication table
> synchronization worker for subscription "sub1", table
> "tab_nogen_to_gen" has started
> 2024-08-05 13:16:45.285 AEST [29400] ERROR:  logical replication
> target relation "public.tab_nogen_to_gen" is missing replicated
> column: "b"
> 2024-08-05 13:16:45.286 AEST [20039] LOG:  background worker "logical
> replication tablesync worker" (PID 29400) exited with exit code 1
> ...
>
> Create the subscription again, but this time with copy_data = false
>
> test_sub=# CREATE SUBSCRIPTION sub1_nocopy CONNECTION
> 'dbname=test_pub' PUBLICATION pub1 WITH (copy_data = false);
> 2024-08-05 13:22:57.719 AEST [20957] WARNING:  subscriptions created
> by regression test cases should have names starting with "regress_"
> WARNING:  subscriptions created by regression test cases should have
> names starting with "regress_"
> NOTICE:  created replication slot "sub1_nocopy" on publisher
> CREATE SUBSCRIPTION
> test_sub=# 2024-08-05 13:22:57.765 AEST [7012] LOG:  logical
> replication apply worker for subscription "sub1_nocopy" has started
>
> test_sub=#
>
> ~~~
>
> Then insert data from the publisher to see what happens for normal replication.
>
> test_pub=#
> test_pub=# INSERT INTO tab_nogen_to_gen VALUES (3,103),(4,104);
> INSERT 0 2
>
> ~~~
>
> Notice the subscriber gets the same error as before: ERROR:  logical
> replication target relation "public.tab_nogen_to_gen" is missing
> replicated column: "b"
>
> 2024-08-05 13:25:14.897 AEST [20039] LOG:  background worker "logical
> replication apply worker" (PID 10957) exited with exit code 1
> 2024-08-05 13:25:19.933 AEST [11095] LOG:  logical replication apply
> worker for subscription "sub1_nocopy" has started
> 2024-08-05 13:25:19.966 AEST [11095] ERROR:  logical replication
> target relation "public.tab_nogen_to_gen" is missing replicated
> column: "b"
> 2024-08-05 13:25:19.966 AEST [11095] CONTEXT:  processing remote data
> for replication origin "pg_16390" during message type "INSERT" in
> transaction 742, finished at 0/1967BB0
> 2024-08-05 13:25:19.968 AEST [20039] LOG:  background worker "logical
> replication apply worker" (PID 11095) exited with exit code 1
> 2024-08-05 13:25:24.917 AEST [11225] LOG:  logical replication apply
> worker for subscription "sub1_nocopy" has started
> 2024-08-05 13:25:24.926 AEST [11225] ERROR:  logical replication
> target relation "public.tab_nogen_to_gen" is missing replicated
> column: "b"
> 2024-08-05 13:25:24.926 AEST [11225] CONTEXT:  processing remote data
> for replication origin "pg_16390" during message type "INSERT" in
> transaction 742, finished at 0/1967BB0
> 2024-08-05 13:25:24.927 AEST [20039] LOG:  background worker "logical
> replication apply worker" (PID 11225) exited with exit code 1
>
This is an expected behaviour. The error message here is improvised.
This error is consistent and it is being handled in the 0002 patch.
Below are the logs for the same:
2024-08-07 10:47:45.977 IST [29756] LOG:  logical replication table
synchronization worker for subscription "sub1", table
"tab_nogen_to_gen" has started
2024-08-07 10:47:46.116 IST [29756] ERROR:  logical replication target
relation "public.tab_nogen_to_gen" has a generated column "b" but
corresponding column on source relation is not a generated column
0002 Patch needs to be applied to get rid of this error.

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham,

Here are my review comments for patch v24-0001

I think the TAP tests have incorrect expected results for the nogen-to-gen case.

Whereas the HEAD code will cause "ERROR" for this test scenario, patch
0001 does not. IMO the behaviour should be unchanged for this scenario
which has no generated column on the publisher side. So it seems this
is a bug in patch 0001.

FYI, I have included "FIXME" comments in the attached top-up diff
patch to show which test cases I think are expecting wrong results.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Wed, Aug 7, 2024 at 1:31 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shubham,
>
> Here are my review comments for patch v24-0001
>
> I think the TAP tests have incorrect expected results for the nogen-to-gen case.
>
> Whereas the HEAD code will cause "ERROR" for this test scenario, patch
> 0001 does not. IMO the behaviour should be unchanged for this scenario
> which has no generated column on the publisher side. So it seems this
> is a bug in patch 0001.
>
> FYI, I have included "FIXME" comments in the attached top-up diff
> patch to show which test cases I think are expecting wrong results.
>

Fixed all the comments. The attached Patch(v25-0001) contains all the changes.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham,

I think the v25-0001 patch only half-fixes the problems reported in my
v24-0001 review.

~

Background (from the commit message):
This commit enables support for the 'include_generated_columns' option
in logical replication, allowing the transmission of generated column
information and data alongside regular table changes.

~

The broken TAP test scenario in question is replicating from a
"not-generated" column to a "generated" column. As the generated
column is not on the publishing side, IMO the
'include_generated_columns' option should have zero effect here.

In other words, I expect this TAP test for 'include_generated_columns
= true' case should also be failing, as I wrote already yesterday:

+# FIXME
+# Since there is no generated column on the publishing side this should give
+# the same result as the previous test. -- e.g. something like:
+# ERROR:  logical replication target relation
"public.tab_nogen_to_gen" is missing
+# replicated column: "b"

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Thu, 8 Aug 2024 at 10:53, Shubham Khanna <khannashubham1197@gmail.com> wrote:
>
> On Wed, Aug 7, 2024 at 1:31 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi Shubham,
> >
> > Here are my review comments for patch v24-0001
> >
> > I think the TAP tests have incorrect expected results for the nogen-to-gen case.
> >
> > Whereas the HEAD code will cause "ERROR" for this test scenario, patch
> > 0001 does not. IMO the behaviour should be unchanged for this scenario
> > which has no generated column on the publisher side. So it seems this
> > is a bug in patch 0001.
> >
> > FYI, I have included "FIXME" comments in the attached top-up diff
> > patch to show which test cases I think are expecting wrong results.
> >
>
> Fixed all the comments. The attached Patch(v25-0001) contains all the changes.

Few comments:
1) Can we add one test with replica identity full to show that
generated column is included in case of update operation with
test_decoding.

2) At the end of the file generated_columns.sql a newline is missing:
+-- when 'include-generated-columns' = '0' the generated column 'b'
values will not be replicated
+INSERT INTO gencoltable (a) VALUES (7), (8), (9);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '0');
+
+DROP TABLE gencoltable;
+
+SELECT 'stop' FROM pg_drop_replication_slot('regression_slot');
\ No newline at end of file

3)
3.a)This can be changed:
+-- when 'include-generated-columns' is not set the generated column
'b' values will be replicated
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);

to:
-- By default, 'include-generated-columns' is enabled, so the values
for the generated column 'b' will be replicated even if it is not
explicitly specified.

3.b) This can be changed:
-- when 'include-generated-columns' = '1' the generated column 'b'
values will be replicated
to:
-- when 'include-generated-columns' is enabled, the values of the
generated column 'b' will be replicated.

3.c) This can be changed:
-- when 'include-generated-columns' = '0' the generated column 'b'
values will not be replicated
to:
-- when 'include-generated-columns' is disabled, the values of the
generated column 'b' will not be replicated.

4) I did not see any test for dump, can we add one test for this.

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Tue, Jul 23, 2024 at 9:23 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Fri, Jul 19, 2024 at 4:01 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >
> > On Thu, 18 Jul 2024 at 13:55, Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > Hi, here are some review comments for v19-0002
> > > ======
> > > src/test/subscription/t/004_sync.pl
> > >
> > > 1.
> > > This new test is not related to generated columns. IIRC, this is just
> > > some test that we discovered missing during review of this thread. As
> > > such, I think this change can be posted/patched separately from this
> > > thread.
> > >
> > I have removed the test for this thread.
> >
> > I have also addressed the remaining comments for v19-0002 patch.
>
> Hi, I have no more review comments for patch v20-0002 at this time.
>
> I saw that the above test was removed from this thread as suggested,
> but I could not find that any new thread was started to propose this
> valuable missing test.
>

I still did not find any new thread for adding the missing test case,
so I started one myself [1].

======
[1] https://www.postgresql.org/message-id/CAHut+PtX8P0EGhsk9p=hQGUHrzxeCSzANXSMKOvYiLX-EjdyNw@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Fri, 16 Aug 2024 at 10:04, Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Thu, Aug 8, 2024 at 12:43 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi Shubham,
> >
> > I think the v25-0001 patch only half-fixes the problems reported in my
> > v24-0001 review.
> >
> > ~
> >
> > Background (from the commit message):
> > This commit enables support for the 'include_generated_columns' option
> > in logical replication, allowing the transmission of generated column
> > information and data alongside regular table changes.
> >
> > ~
> >
> > The broken TAP test scenario in question is replicating from a
> > "not-generated" column to a "generated" column. As the generated
> > column is not on the publishing side, IMO the
> > 'include_generated_columns' option should have zero effect here.
> >
> > In other words, I expect this TAP test for 'include_generated_columns
> > = true' case should also be failing, as I wrote already yesterday:
> >
> > +# FIXME
> > +# Since there is no generated column on the publishing side this should give
> > +# the same result as the previous test. -- e.g. something like:
> > +# ERROR:  logical replication target relation
> > "public.tab_nogen_to_gen" is missing
> > +# replicated column: "b"
>
> I have fixed the given comments. The attached v26-0001 Patch contains
> the required changes.

Few comments:
1) There's no need to pass include_generated_columns in this case; we
can retrieve it from ctx->data instead:
@@ -749,7 +764,7 @@ maybe_send_schema(LogicalDecodingContext *ctx,
 static void
 send_relation_and_attrs(Relation relation, TransactionId xid,
                                                LogicalDecodingContext *ctx,
-                                               Bitmapset *columns)
+                                               Bitmapset *columns,
bool include_generated_columns)
 {
        TupleDesc       desc = RelationGetDescr(relation);
        int                     i;
@@ -766,7 +781,10 @@ send_relation_and_attrs(Relation relation,
TransactionId xid,

2) Commit message:
If the subscriber-side column is also a generated column then this option
has no effect; the replicated data will be ignored and the subscriber
column will be filled as normal with the subscriber-side computed or
default data.

An error will occur in this case, so the message should be updated accordingly.

3) The current test is structured as follows: a) Create all required
tables b) Insert data into tables c) Create publications d) Create
subscriptions e) Perform inserts and verify
This approach can make reviewing and maintenance somewhat challenging.

Instead, could you modify it to: a) Create the required table for a
single test b) Insert data for this test c) Create the publication for
this test d) Create the subscriptions for this test e) Perform inserts
and verify f) Clean up

4) We can maintain the test as a separate 0002 patch, as it may need a
few rounds of review and final adjustments. Once it's fully completed,
we can merge it back in.

5) Once we create and drop publication/subscriptions for individual
tests, we won't need such extensive configuration; we should be able
to run them with default values:
+$node_publisher->append_conf(
+       'postgresql.conf',
+       "max_wal_senders = 20
+        max_replication_slots = 20");

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham, here are my review comments for the TAP tests patch v27-0002

======
Commit message

Tap tests for 'include-generated-columns'

~

But, it's more than that-- these are the TAP tests for all
combinations of replication related to generated columns. i.e. both
with and without 'include_generated_columns' option enabled.

======
src/test/subscription/t/011_generated.pl

I was mistaken, thinking that the v27-0002 had already been refactored
according to Vignesh's last review but it is not done yet, so I am not
going to post detailed review comments until the restructuring is
completed.

~

OTOH, there are some problems I felt have crept into v26-0001 (TAP
test is same as v27-0002), so maybe try to also take care of them (see
below) in v28-0002.

In no particular order:

* I felt it is almost useless now to have the "combo" (
"regress_pub_combo")  publication. It used to have many tables when
you first created it but with every version posted it is publishing
less and less so now there are only 2 tables in it. Better to have a
specific publication for each table now and forget about "combos"

* The "TEST tab_gen_to_gen initial sync" seems to be not even checking
the table data. Why not? e.g. Even if you expect no data, you should
test for it.

* The "TEST tab_gen_to_gen replication" seems to be not even checking
the table data. Why not?

* Multiple XXX comments like "... it needs more study to determine if
the above result was actually correct, or a PG17 bug..." should be
removed. AFAIK we should well understand the expected results for all
combinations by now.

* The "TEST tab_order replication" is now getting an error saying
<missing replicated column: "c">, Now, that may now be the correct
error for this situation, but in that case, then I think the test is
not longer testing what it was intended to test (i.e. that column
order does not matter....) Probably the table definition needs
adjusting to make sure we are testing whenwe want to test, and not
just making some random scenario "PASS".

* The test "# TEST tab_alter" expected empty result also seems
unhelpful. It might be related to the previous bullet.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Mon, Aug 19, 2024 at 11:01 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, Here are my review comments for v27-0001.
>
> ======
> contrib/test_decoding/expected/generated_columns.out
> contrib/test_decoding/sql/generated_columns.sql
>
> +-- By default, 'include-generated-columns' is enabled, so the values
> for the generated column 'b' will be replicated even if it is not
> explicitly specified.
>
> nit - The "default" is only like this for "test_decoding" (e.g., the
> CREATE SUBSCRIPTION option is the opposite), so let's make the comment
> clearer about that.
> nit - Use sentence case in the comments.

I have addressed all the comments in the v-28-0001 Patch. Please refer
to the updated v28-0001 Patch here in [1]. See [1] for the changes
added.

[1] https://www.postgresql.org/message-id/CAHv8RjL7rkxk6qSroRPg5ZARWMdK2Nd4-QyYNeoc2vhBm3cdDg%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Mon, Aug 19, 2024 at 12:40 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shubham, here are my review comments for the TAP tests patch v27-0002
>
> ======
> Commit message
>
> Tap tests for 'include-generated-columns'
>
> ~
>
> But, it's more than that-- these are the TAP tests for all
> combinations of replication related to generated columns. i.e. both
> with and without 'include_generated_columns' option enabled.
>
> ======
> src/test/subscription/t/011_generated.pl
>
> I was mistaken, thinking that the v27-0002 had already been refactored
> according to Vignesh's last review but it is not done yet, so I am not
> going to post detailed review comments until the restructuring is
> completed.
>
> ~
>
> OTOH, there are some problems I felt have crept into v26-0001 (TAP
> test is same as v27-0002), so maybe try to also take care of them (see
> below) in v28-0002.
>
> In no particular order:
>
> * I felt it is almost useless now to have the "combo" (
> "regress_pub_combo")  publication. It used to have many tables when
> you first created it but with every version posted it is publishing
> less and less so now there are only 2 tables in it. Better to have a
> specific publication for each table now and forget about "combos"
>
> * The "TEST tab_gen_to_gen initial sync" seems to be not even checking
> the table data. Why not? e.g. Even if you expect no data, you should
> test for it.
>
> * The "TEST tab_gen_to_gen replication" seems to be not even checking
> the table data. Why not?
>
> * Multiple XXX comments like "... it needs more study to determine if
> the above result was actually correct, or a PG17 bug..." should be
> removed. AFAIK we should well understand the expected results for all
> combinations by now.
>
> * The "TEST tab_order replication" is now getting an error saying
> <missing replicated column: "c">, Now, that may now be the correct
> error for this situation, but in that case, then I think the test is
> not longer testing what it was intended to test (i.e. that column
> order does not matter....) Probably the table definition needs
> adjusting to make sure we are testing whenwe want to test, and not
> just making some random scenario "PASS".
>
> * The test "# TEST tab_alter" expected empty result also seems
> unhelpful. It might be related to the previous bullet.

I have addressed all the comments in the v-28-0002 Patch. Please refer
to the updated v28-0002 Patch here in [1]. See [1] for the changes
added.

[1] https://www.postgresql.org/message-id/CAHv8RjL7rkxk6qSroRPg5ZARWMdK2Nd4-QyYNeoc2vhBm3cdDg%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Thu, 22 Aug 2024 at 10:22, Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Fri, Aug 16, 2024 at 2:47 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Fri, 16 Aug 2024 at 10:04, Shubham Khanna
> > <khannashubham1197@gmail.com> wrote:
> > >
> > > On Thu, Aug 8, 2024 at 12:43 PM Peter Smith <smithpb2250@gmail.com> wrote:
> > > >
> > > > Hi Shubham,
> > > >
> > > > I think the v25-0001 patch only half-fixes the problems reported in my
> > > > v24-0001 review.
> > > >
> > > > ~
> > > >
> > > > Background (from the commit message):
> > > > This commit enables support for the 'include_generated_columns' option
> > > > in logical replication, allowing the transmission of generated column
> > > > information and data alongside regular table changes.
> > > >
> > > > ~
> > > >
> > > > The broken TAP test scenario in question is replicating from a
> > > > "not-generated" column to a "generated" column. As the generated
> > > > column is not on the publishing side, IMO the
> > > > 'include_generated_columns' option should have zero effect here.
> > > >
> > > > In other words, I expect this TAP test for 'include_generated_columns
> > > > = true' case should also be failing, as I wrote already yesterday:
> > > >
> > > > +# FIXME
> > > > +# Since there is no generated column on the publishing side this should give
> > > > +# the same result as the previous test. -- e.g. something like:
> > > > +# ERROR:  logical replication target relation
> > > > "public.tab_nogen_to_gen" is missing
> > > > +# replicated column: "b"
> > >
> > > I have fixed the given comments. The attached v26-0001 Patch contains
> > > the required changes.
> >
> > Few comments:
> > 1) There's no need to pass include_generated_columns in this case; we
> > can retrieve it from ctx->data instead:
> > @@ -749,7 +764,7 @@ maybe_send_schema(LogicalDecodingContext *ctx,
> >  static void
> >  send_relation_and_attrs(Relation relation, TransactionId xid,
> >                                                 LogicalDecodingContext *ctx,
> > -                                               Bitmapset *columns)
> > +                                               Bitmapset *columns,
> > bool include_generated_columns)
> >  {
> >         TupleDesc       desc = RelationGetDescr(relation);
> >         int                     i;
> > @@ -766,7 +781,10 @@ send_relation_and_attrs(Relation relation,
> > TransactionId xid,
> >
> > 2) Commit message:
> > If the subscriber-side column is also a generated column then this option
> > has no effect; the replicated data will be ignored and the subscriber
> > column will be filled as normal with the subscriber-side computed or
> > default data.
> >
> > An error will occur in this case, so the message should be updated accordingly.
> >
> > 3) The current test is structured as follows: a) Create all required
> > tables b) Insert data into tables c) Create publications d) Create
> > subscriptions e) Perform inserts and verify
> > This approach can make reviewing and maintenance somewhat challenging.
> >
> > Instead, could you modify it to: a) Create the required table for a
> > single test b) Insert data for this test c) Create the publication for
> > this test d) Create the subscriptions for this test e) Perform inserts
> > and verify f) Clean up
> >
> > 4) We can maintain the test as a separate 0002 patch, as it may need a
> > few rounds of review and final adjustments. Once it's fully completed,
> > we can merge it back in.
> >
> > 5) Once we create and drop publication/subscriptions for individual
> > tests, we won't need such extensive configuration; we should be able
> > to run them with default values:
> > +$node_publisher->append_conf(
> > +       'postgresql.conf',
> > +       "max_wal_senders = 20
> > +        max_replication_slots = 20");
>
> Fixed all the given comments. The attached patches contain the
> suggested changes.

Few comments:
1) This is already been covered in the first existing test case, may
be this can be removed:
# =============================================================================
# Testcase start: Subscriber table with a generated column (b) on the
# subscriber, where column (b) is not present on the publisher.

This existing test:
$node_publisher->safe_psql(
'postgres', qq(
CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED);
INSERT INTO tab1 (a) VALUES (1), (2), (3);
CREATE PUBLICATION pub1 FOR ALL TABLES;
));

$node_subscriber->safe_psql(
'postgres', qq(
CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a *
22) STORED, c int);
CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub1;
));

2) Can we have this test verified with include_generated_columns =
true too like how others are done:
my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';

$node_publisher->safe_psql(
'postgres', qq(
CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED);
INSERT INTO tab1 (a) VALUES (1), (2), (3);
CREATE PUBLICATION pub1 FOR ALL TABLES;
));

$node_subscriber->safe_psql(
'postgres', qq(
CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a *
22) STORED, c int);
CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub1;
));

3) There is a typo in this comment:
3.a)  # Testcase start: Publisher table with a generated column (b)
and subscriber
# table a with regular column (b).

It should be:
# Testcase start: Publisher table with a generated column (b) and subscriber
# table with a regular column (b).

3.b) similarly here too:
# Testcase end: Publisher table with a generated column (b) and subscriber
# table a with regular column (b).

3.c) The comments are not consistent, sometimes mentioned as
column(b) and sometimes as column (b). We can keep it consistent.

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Mon, May 20, 2024 at 1:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, May 8, 2024 at 4:14 PM Shubham Khanna
> <khannashubham1197@gmail.com> wrote:
> >
> > On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal
> > <dangwalrajendra888@gmail.com> wrote:
> > >
> > > Hi PG Hackers.
> > >
> > > We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated columns.
> > > Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for
trackingsuch feature requests? Any insights or assistance you can provide on this matter would be greatly appreciated. 
> >
> > The attached patch has the changes to support capturing generated
> > column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
> > ‘include_generated_columns’ option is specified, the generated column
> > information and generated column data also will be sent.
>
> As Euler mentioned earlier, I think it's a decision not to replicate
> generated columns because we don't know the target table on the
> subscriber has the same expression and there could be locale issues
> even if it looks the same. I can see that a benefit of this proposal
> would be to save cost to compute generated column values if the user
> wants the target table on the subscriber to have exactly the same data
> as the publisher's one. Are there other benefits or use cases?
>

The cost is one but the other is the user may not want the data to be
different based on volatile functions like timeofday() or the table on
subscriber won't have the column marked as generated. Now, considering
such use cases, is providing a subscription-level option a good idea
as the patch is doing? I understand that this can serve the purpose
but it could also lead to having the same behavior for all the tables
in all the publications for a subscription which may or may not be
what the user expects. This could lead to some performance overhead
(due to always sending generated columns for all the tables) for cases
where the user needs it only for a subset of tables.

I think we should consider it as a table-level option while defining
publication in some way. A few ideas could be: (a) We ask users to
explicitly mention the generated column in the columns list while
defining publication. This has a drawback such that users need to
specify the column list even when all columns need to be replicated.
(b) We can have some new syntax to indicate the same like: CREATE
PUBLICATION pub1 FOR TABLE t1 INCLUDE GENERATED COLS, t2, t3, t4
INCLUDE ..., t5;. I haven't analyzed the feasibility of this, so there
could be some challenges but we can at least investigate it.

Yet another idea is to keep this as a publication option
(include_generated_columns or publish_generated_columns) similar to
"publish_via_partition_root". Normally, "publish_via_partition_root"
is used when tables on either side have different partition
hierarchies which is somewhat the case here.

Thoughts?

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Masahiko Sawada
Date:
On Wed, Aug 28, 2024 at 1:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, May 20, 2024 at 1:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, May 8, 2024 at 4:14 PM Shubham Khanna
> > <khannashubham1197@gmail.com> wrote:
> > >
> > > On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal
> > > <dangwalrajendra888@gmail.com> wrote:
> > > >
> > > > Hi PG Hackers.
> > > >
> > > > We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated
columns.
> > > > Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for
trackingsuch feature requests? Any insights or assistance you can provide on this matter would be greatly appreciated. 
> > >
> > > The attached patch has the changes to support capturing generated
> > > column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
> > > ‘include_generated_columns’ option is specified, the generated column
> > > information and generated column data also will be sent.
> >
> > As Euler mentioned earlier, I think it's a decision not to replicate
> > generated columns because we don't know the target table on the
> > subscriber has the same expression and there could be locale issues
> > even if it looks the same. I can see that a benefit of this proposal
> > would be to save cost to compute generated column values if the user
> > wants the target table on the subscriber to have exactly the same data
> > as the publisher's one. Are there other benefits or use cases?
> >
>
> The cost is one but the other is the user may not want the data to be
> different based on volatile functions like timeofday()

Shouldn't the generation expression be immutable?

> or the table on
> subscriber won't have the column marked as generated.

Yeah, it would be another use case.

>  Now, considering
> such use cases, is providing a subscription-level option a good idea
> as the patch is doing? I understand that this can serve the purpose
> but it could also lead to having the same behavior for all the tables
> in all the publications for a subscription which may or may not be
> what the user expects. This could lead to some performance overhead
> (due to always sending generated columns for all the tables) for cases
> where the user needs it only for a subset of tables.

Yeah, it's a downside and I think it's less flexible. For example, if
users want to send both tables with generated columns and tables
without generated columns, they would have to create at least two
subscriptions. Also, they would have to include a different set of
tables to two publications.

>
> I think we should consider it as a table-level option while defining
> publication in some way. A few ideas could be: (a) We ask users to
> explicitly mention the generated column in the columns list while
> defining publication. This has a drawback such that users need to
> specify the column list even when all columns need to be replicated.
> (b) We can have some new syntax to indicate the same like: CREATE
> PUBLICATION pub1 FOR TABLE t1 INCLUDE GENERATED COLS, t2, t3, t4
> INCLUDE ..., t5;. I haven't analyzed the feasibility of this, so there
> could be some challenges but we can at least investigate it.

I think we can create a publication for a single table, so what we can
do with this feature can be done also by the idea you described below.

> Yet another idea is to keep this as a publication option
> (include_generated_columns or publish_generated_columns) similar to
> "publish_via_partition_root". Normally, "publish_via_partition_root"
> is used when tables on either side have different partition
> hierarchies which is somewhat the case here.

It sounds more useful to me.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Thu, Aug 29, 2024 at 8:44 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Aug 28, 2024 at 1:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, May 20, 2024 at 1:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > As Euler mentioned earlier, I think it's a decision not to replicate
> > > generated columns because we don't know the target table on the
> > > subscriber has the same expression and there could be locale issues
> > > even if it looks the same. I can see that a benefit of this proposal
> > > would be to save cost to compute generated column values if the user
> > > wants the target table on the subscriber to have exactly the same data
> > > as the publisher's one. Are there other benefits or use cases?
> > >
> >
> > The cost is one but the other is the user may not want the data to be
> > different based on volatile functions like timeofday()
>
> Shouldn't the generation expression be immutable?
>

Yes, I missed that point.

> > or the table on
> > subscriber won't have the column marked as generated.
>
> Yeah, it would be another use case.
>

Right, apart from that I am not aware of other use cases. If they
have, I would request Euler or Rajendra to share any other use case.

> >  Now, considering
> > such use cases, is providing a subscription-level option a good idea
> > as the patch is doing? I understand that this can serve the purpose
> > but it could also lead to having the same behavior for all the tables
> > in all the publications for a subscription which may or may not be
> > what the user expects. This could lead to some performance overhead
> > (due to always sending generated columns for all the tables) for cases
> > where the user needs it only for a subset of tables.
>
> Yeah, it's a downside and I think it's less flexible. For example, if
> users want to send both tables with generated columns and tables
> without generated columns, they would have to create at least two
> subscriptions.
>

Agreed and that would consume more resources.

> Also, they would have to include a different set of
> tables to two publications.
>
> >
> > I think we should consider it as a table-level option while defining
> > publication in some way. A few ideas could be: (a) We ask users to
> > explicitly mention the generated column in the columns list while
> > defining publication. This has a drawback such that users need to
> > specify the column list even when all columns need to be replicated.
> > (b) We can have some new syntax to indicate the same like: CREATE
> > PUBLICATION pub1 FOR TABLE t1 INCLUDE GENERATED COLS, t2, t3, t4
> > INCLUDE ..., t5;. I haven't analyzed the feasibility of this, so there
> > could be some challenges but we can at least investigate it.
>
> I think we can create a publication for a single table, so what we can
> do with this feature can be done also by the idea you described below.
>
> > Yet another idea is to keep this as a publication option
> > (include_generated_columns or publish_generated_columns) similar to
> > "publish_via_partition_root". Normally, "publish_via_partition_root"
> > is used when tables on either side have different partition
> > hierarchies which is somewhat the case here.
>
> It sounds more useful to me.
>

Fair enough. Let's see if anyone else has any preference among the
proposed methods or can think of a better way.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Masahiko Sawada
Date:
On Mon, Sep 9, 2024 at 2:38 AM Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Thu, Aug 29, 2024 at 11:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Aug 29, 2024 at 8:44 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Wed, Aug 28, 2024 at 1:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Mon, May 20, 2024 at 1:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > > As Euler mentioned earlier, I think it's a decision not to replicate
> > > > > generated columns because we don't know the target table on the
> > > > > subscriber has the same expression and there could be locale issues
> > > > > even if it looks the same. I can see that a benefit of this proposal
> > > > > would be to save cost to compute generated column values if the user
> > > > > wants the target table on the subscriber to have exactly the same data
> > > > > as the publisher's one. Are there other benefits or use cases?
> > > > >
> > > >
> > > > The cost is one but the other is the user may not want the data to be
> > > > different based on volatile functions like timeofday()
> > >
> > > Shouldn't the generation expression be immutable?
> > >
> >
> > Yes, I missed that point.
> >
> > > > or the table on
> > > > subscriber won't have the column marked as generated.
> > >
> > > Yeah, it would be another use case.
> > >
> >
> > Right, apart from that I am not aware of other use cases. If they
> > have, I would request Euler or Rajendra to share any other use case.
> >
> > > >  Now, considering
> > > > such use cases, is providing a subscription-level option a good idea
> > > > as the patch is doing? I understand that this can serve the purpose
> > > > but it could also lead to having the same behavior for all the tables
> > > > in all the publications for a subscription which may or may not be
> > > > what the user expects. This could lead to some performance overhead
> > > > (due to always sending generated columns for all the tables) for cases
> > > > where the user needs it only for a subset of tables.
> > >
> > > Yeah, it's a downside and I think it's less flexible. For example, if
> > > users want to send both tables with generated columns and tables
> > > without generated columns, they would have to create at least two
> > > subscriptions.
> > >
> >
> > Agreed and that would consume more resources.
> >
> > > Also, they would have to include a different set of
> > > tables to two publications.
> > >
> > > >
> > > > I think we should consider it as a table-level option while defining
> > > > publication in some way. A few ideas could be: (a) We ask users to
> > > > explicitly mention the generated column in the columns list while
> > > > defining publication. This has a drawback such that users need to
> > > > specify the column list even when all columns need to be replicated.
> > > > (b) We can have some new syntax to indicate the same like: CREATE
> > > > PUBLICATION pub1 FOR TABLE t1 INCLUDE GENERATED COLS, t2, t3, t4
> > > > INCLUDE ..., t5;. I haven't analyzed the feasibility of this, so there
> > > > could be some challenges but we can at least investigate it.
> > >
> > > I think we can create a publication for a single table, so what we can
> > > do with this feature can be done also by the idea you described below.
> > >
> > > > Yet another idea is to keep this as a publication option
> > > > (include_generated_columns or publish_generated_columns) similar to
> > > > "publish_via_partition_root". Normally, "publish_via_partition_root"
> > > > is used when tables on either side have different partitions
> > > > hierarchies which is somewhat the case here.
> > >
> > > It sounds more useful to me.
> > >
> >
> > Fair enough. Let's see if anyone else has any preference among the
> > proposed methods or can think of a better way.
>
> I have fixed the current issue. I have added the option
> 'publish_generated_columns' to the publisher side and created the new
> test cases accordingly.
> The attached patches contain the desired changes.
>

Thank you for updating the patches. I have some comments:

Do we really need to add this option to test_decoding? I think it
would be good if this improves the test coverage. Otherwise, I'm not
sure we need this part. If we want to add it, I think it would be
better to have it in a separate patch.

---
+         <para>
+          If the publisher-side column is also a generated column
then this option
+          has no effect; the publisher column will be filled as normal with the
+          publisher-side computed or default data.
+         </para>

I don't understand this description. Why does this option have no
effect if the publisher-side column is a generated column?

---
+         <para>
+         This parameter can only be set <literal>true</literal> if
<literal>copy_data</literal> is
+         set to <literal>false</literal>.
+         </para>

If I understand this patch correctly, it doesn't disallow to set
copy_data to true when the publish_generated_columns option is
specified. But do we want to disallow it? I think it would be more
useful and understandable if we allow to use both
publish_generated_columns (publisher option) and copy_data (subscriber
option) at the same time.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Tue, Sep 10, 2024 at 2:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Sep 9, 2024 at 2:38 AM Shubham Khanna
> <khannashubham1197@gmail.com> wrote:
> >
>
> Thank you for updating the patches. I have some comments:
>
> Do we really need to add this option to test_decoding?
>

I don't see any reason to have such an option in test_decoding,
otherwise, we need a separate option for each publication option. I
guess this is leftover of the previous subscriber-side approach.

> I think it
> would be good if this improves the test coverage. Otherwise, I'm not
> sure we need this part. If we want to add it, I think it would be
> better to have it in a separate patch.
>

Right.

> ---
> +         <para>
> +          If the publisher-side column is also a generated column
> then this option
> +          has no effect; the publisher column will be filled as normal with the
> +          publisher-side computed or default data.
> +         </para>
>
> I don't understand this description. Why does this option have no
> effect if the publisher-side column is a generated column?
>

Shouldn't it be subscriber-side?

I have one additional comment:
/*
- * If the publication is FOR ALL TABLES then it is treated the same as
- * if there are no column lists (even if other publications have a
- * list).
+ * If the publication is FOR ALL TABLES and include generated columns
+ * then it is treated the same as if there are no column lists (even
+ * if other publications have a list).
  */
- if (!pub->alltables)
+ if (!pub->alltables || !pub->pubgencolumns)

Why do we treat pubgencolumns at the same level as the FOR ALL TABLES
case? I thought that if the user has provided a column list, we only
need to publish the specified columns even when the
publish_generated_columns option is set.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
IIUC, previously there was a subscriber side option
'include_generated_columns', but now since v30* there is a publisher
side option 'publish_generated_columns'.

Fair enough, but in the v30* patches I can still see remnants of the
old name 'include_generated_columns' all over the place:
- in the commit message
- in the code (struct field names, param names etc)
- in the comments
- in the docs

If the decision is to call the new PUBLICATION option
'publish_generated_columns', then can't we please use that one name
*everywhere* -- e.g. replace all cases where any old name is still
lurking?

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Here are a some more review comments for patch v30-0001.

======
src/sgml/ref/create_publication.sgml

1.
+         <para>
+          If the publisher-side column is also a generated column
then this option
+          has no effect; the publisher column will be filled as normal with the
+          publisher-side computed or default data.
+         </para>

It should say "subscriber-side"; not "publisher-side". The same was
already reported by Sawada-San [1].

~~~

2.
+         <para>
+         This parameter can only be set <literal>true</literal> if
<literal>copy_data</literal> is
+         set to <literal>false</literal>.
+         </para>

IMO this limitation should be addressed by patch 0001 like it was
already done in the previous patches (e.g. v22-0002). I think
Sawada-san suggested the same [1].

Anyway, 'copy_data' is not a PUBLICATION option, so the fact it is
mentioned like this without any reference to the SUBSCRIPTION seems
like a cut/paste error from the previous implementation.

======
src/backend/catalog/pg_publication.c

3. pub_collist_validate
- if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
- ereport(ERROR,
- errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
- errmsg("cannot use generated column \"%s\" in publication column list",
-    colname));
-

Instead of just removing this ERROR entirely here, I thought it would
be more user-friendly to give a WARNING if the PUBLICATION's explicit
column list includes generated cols when the option
"publish_generated_columns" is false. This combination doesn't seem
like something a user would do intentionally, so just silently
ignoring it (like the current patch does) is likely going to give
someone unexpected results/grief.

======
src/backend/replication/logical/proto.c

4. logicalrep_write_tuple, and logicalrep_write_attrs:

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

Why aren't you also checking the new PUBLICATION option here and
skipping all gencols if the "publish_generated_columns" option is
false? Or is the BMS of pgoutput_column_list_init handling this case?
Maybe there should be an Assert for this?

======
src/backend/replication/pgoutput/pgoutput.c

5. send_relation_and_attrs

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

Same question as #4.

~~~

6. prepare_all_columns_bms and pgoutput_column_list_init

+ if (att->attgenerated && !pub->pubgencolumns)
+ cols = bms_del_member(cols, i + 1);

IIUC, the algorithm seems overly tricky filling the BMS with all
columns, before straight away conditionally removing the generated
columns. Can't it be refactored to assign all the correct columns
up-front, to avoid calling bms_del_member()?

======
src/bin/pg_dump/pg_dump.c

7. getPublications

IIUC, there is lots of missing SQL code here (for all older versions)
that should be saying "false AS pubgencolumns".
e.g. compare the SQL with how "false AS pubviaroot" is used.

======
src/bin/pg_dump/t/002_pg_dump.pl

8. Missing tests?

I expected to see a pg_dump test for this new PUBLICATION option.

======
src/test/regress/sql/publication.sql

9. Missing tests?

How about adding another test case that checks this new option must be
"Boolean"?

~~~

10. Missing tests?

--- error: generated column "d" can't be in list
+-- ok: generated columns can be in the list too
 ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d);
+ALTER PUBLICATION testpub_fortable DROP TABLE testpub_tbl5;

(see my earlier comment #3)

IMO there should be another test case for a WARNING here if the user
attempts to include generated column 'd' in an explicit PUBLICATION
column list while the "publish_generated-columns" is false.

======
[1]  https://www.postgresql.org/message-id/CAD21AoA-tdTz0G-vri8KM2TXeFU8RCDsOpBXUBCgwkfokF7%3DjA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham,

Here are my general comments about the v30-0002 TAP test patch.

======

1.
As mentioned in a previous post [1] there are still several references
to the old 'include_generated_columns' option remaining in this patch.
They need replacing.

~~~

2.
+# Furthermore, all combinations are tested for publish_generated_columns=false
+# (see subscription sub1 of database 'postgres'), and
+# publish_generated_columns=true (see subscription sub2 of database
+# 'test_igc_true').

Those 'see subscription' notes and 'test_igc_true' are from the old
implementation. Those need fixing. BTW, 'test_pgc_true' is a better
name for the database now that the option name is changed.

In the previous implementation, the TAP test environment was:
- a common publication pub, on the 'postgres' database
- a subscription sub1 with option include_generated_columns=false, on
the 'postgres' database
- a subscription sub2 with option include_generated_columns=true, on
the 'test_igc_true' database

Now it is like:
- a publication pub1, on the 'postgres' database, with option
publish_generated_columns=false
- a publication pub2, on the 'postgres' database, with option
publish_generated_columns=true
- a subscription sub1, on the 'postgres' database for publication pub1
- a subscription sub2, on the 'test_pgc_true' database for publication pub2

It would be good to document that above convention because knowing how
the naming/numbering works makes it a lot easier to read the
subsequent test cases. Of course, it is really important to
name/number everything consistently otherwise these tests become hard
to follow.  AFAICT it is mostly OK, but the generated -> generated
publication should be called 'regress_pub2_gen_to_gen'

~~~

3.
+# Create table.
+$node_publisher->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE tab_gen_to_nogen (a int, b int GENERATED ALWAYS AS (a *
2) STORED);
+ INSERT INTO tab_gen_to_nogen (a) VALUES (1), (2), (3);
+));
+
+# Create publication with publish_generated_columns=false.
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION regress_pub1_gen_to_nogen FOR TABLE
tab_gen_to_nogen WITH (publish_generated_columns = false)"
+);
+
+# Create table and subscription with copy_data=true.
+$node_subscriber->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE tab_gen_to_nogen (a int, b int);
+ CREATE SUBSCRIPTION regress_sub1_gen_to_nogen CONNECTION
'$publisher_connstr' PUBLICATION regress_pub1_gen_to_nogen WITH
(copy_data = true);
+));
+
+# Create publication with publish_generated_columns=true.
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION regress_pub2_gen_to_nogen FOR TABLE
tab_gen_to_nogen WITH (publish_generated_columns = true)"
+);
+

The code can be restructured to be simpler. Both publications are
always created on the 'postgres' database at the publisher node, so
let's just create them at the same time as the creating the publisher
table. It also makes readability much better e.g.

# Create table, and publications
$node_publisher->safe_psql(
'postgres', qq(
CREATE TABLE tab_gen_to_nogen (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
INSERT INTO tab_gen_to_nogen (a) VALUES (1), (2), (3);
CREATE PUBLICATION regress_pub1_gen_to_nogen FOR TABLE
tab_gen_to_nogen WITH (publish_generated_columns = false);
CREATE PUBLICATION regress_pub2_gen_to_nogen FOR TABLE
tab_gen_to_nogen WITH (publish_generated_columns = true);
));

IFAICT this same simplification can be repeated multiple times in this TAP file.

~~

Similarly, it would be neater to combine DROP PUBLICATION's together too.

~~~

4.
Hopefully, the generated column 'copy_data' can be implemented again
soon for subscriptions, and then the initial sync tests here can be
properly implemented instead of the placeholders currently in patch
0002.

======
[1] https://www.postgresql.org/message-id/CAHut%2BPuDJToG%3DV-ogTi9_6fnhhn2S0%2BsVRGPynhcf9mEh0Q%3DLA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Because this feature is now being implemented as a PUBLICATION option,
there is another scenario that might need consideration; I am thinking
about where the same table is published by multiple PUBLICATIONS (with
different option settings) that are subscribed by a single
SUBSCRIPTION.

e.g.1
-----
CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
-----

e.g.2
-----
CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns = true);
CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
-----

Do you know if this case is supported? If yes, then which publication
option value wins?

The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having
several publications in which the same table has been published with
different column lists are not supported."

Perhaps the user is supposed to deduce that the example above would
work OK if table 't1' has no generated cols. OTOH, if it did have
generated cols then the PUBLICATION column lists must be different and
therefore it is "not supported" (??).

I have not tried this to see what happens, but even if it behaves as
expected, there should probably be some comments/docs/tests for this
scenario to clarify it for the user.

Notice that "publish_via_partition_root" has a similar conundrum, but
in that case, the behaviour is documented in the CREATE PUBLICATION
docs [2]. So, maybe  "publish_generated_columns" should be documented
a bit like that.

======
[1] https://www.postgresql.org/docs/devel/sql-createsubscription.html
[2] https://www.postgresql.org/docs/devel/sql-createpublication.html

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Tue, 10 Sept 2024 at 09:45, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Sep 10, 2024 at 2:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Sep 9, 2024 at 2:38 AM Shubham Khanna
> > <khannashubham1197@gmail.com> wrote:
> > >
> >
> > Thank you for updating the patches. I have some comments:
> >
> > Do we really need to add this option to test_decoding?
> >
>
> I don't see any reason to have such an option in test_decoding,
> otherwise, we need a separate option for each publication option. I
> guess this is leftover of the previous subscriber-side approach.
>
> > I think it
> > would be good if this improves the test coverage. Otherwise, I'm not
> > sure we need this part. If we want to add it, I think it would be
> > better to have it in a separate patch.
> >
>
> Right.
>
> > ---
> > +         <para>
> > +          If the publisher-side column is also a generated column
> > then this option
> > +          has no effect; the publisher column will be filled as normal with the
> > +          publisher-side computed or default data.
> > +         </para>
> >
> > I don't understand this description. Why does this option have no
> > effect if the publisher-side column is a generated column?
> >
>
> Shouldn't it be subscriber-side?
>
> I have one additional comment:
> /*
> - * If the publication is FOR ALL TABLES then it is treated the same as
> - * if there are no column lists (even if other publications have a
> - * list).
> + * If the publication is FOR ALL TABLES and include generated columns
> + * then it is treated the same as if there are no column lists (even
> + * if other publications have a list).
>   */
> - if (!pub->alltables)
> + if (!pub->alltables || !pub->pubgencolumns)
>
> Why do we treat pubgencolumns at the same level as the FOR ALL TABLES
> case? I thought that if the user has provided a column list, we only
> need to publish the specified columns even when the
> publish_generated_columns option is set.

To handle cases where the publish_generated_columns option isn't
specified for all tables in a publication, the pubgencolumns check
needs to be performed. In such cases, we must create a column list
that excludes generated columns. This process involves:
a) Retrieving all columns for the table and adding them to the column
list. b) Iterating through this column list and removing generated
columns. c) Checking if the remaining column count matches the total
number of columns. If they match, set the relation entry's column list
to NULL, so we don’t need to check columns during data replication. If
they do not match, update the column list to include only the relevant
columns, allowing pgoutput to replicate data for these specific
columns.

This step is necessary because some tables in the publication may
include generated columns.
For tables where publish_generated_columns is set, the column list
will be set to NULL, eliminating the need for a column list check
during data publication.
However, modifying the column list based on publish_generated_columns
is not required, this is addressed in the v31 patch posted by Shubham
at [1].

[1] - https://www.postgresql.org/message-id/CAHv8Rj%2BinrG6EU0rpDJxih8mmYLhCUP6ouTAmMN2RDnT9tE_Gg%40mail.gmail.com

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Fri, Sep 13, 2024 at 9:34 PM Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Tue, Sep 10, 2024 at 2:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Sep 9, 2024 at 2:38 AM Shubham Khanna
> > <khannashubham1197@gmail.com> wrote:
> > >
> > > On Thu, Aug 29, 2024 at 11:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Thu, Aug 29, 2024 at 8:44 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > > On Wed, Aug 28, 2024 at 1:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > On Mon, May 20, 2024 at 1:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > > >
> > > > > > > As Euler mentioned earlier, I think it's a decision not to replicate
> > > > > > > generated columns because we don't know the target table on the
> > > > > > > subscriber has the same expression and there could be locale issues
> > > > > > > even if it looks the same. I can see that a benefit of this proposal
> > > > > > > would be to save cost to compute generated column values if the user
> > > > > > > wants the target table on the subscriber to have exactly the same data
> > > > > > > as the publisher's one. Are there other benefits or use cases?
> > > > > > >
> > > > > >
> > > > > > The cost is one but the other is the user may not want the data to be
> > > > > > different based on volatile functions like timeofday()
> > > > >
> > > > > Shouldn't the generation expression be immutable?
> > > > >
> > > >
> > > > Yes, I missed that point.
> > > >
> > > > > > or the table on
> > > > > > subscriber won't have the column marked as generated.
> > > > >
> > > > > Yeah, it would be another use case.
> > > > >
> > > >
> > > > Right, apart from that I am not aware of other use cases. If they
> > > > have, I would request Euler or Rajendra to share any other use case.
> > > >
> > > > > >  Now, considering
> > > > > > such use cases, is providing a subscription-level option a good idea
> > > > > > as the patch is doing? I understand that this can serve the purpose
> > > > > > but it could also lead to having the same behavior for all the tables
> > > > > > in all the publications for a subscription which may or may not be
> > > > > > what the user expects. This could lead to some performance overhead
> > > > > > (due to always sending generated columns for all the tables) for cases
> > > > > > where the user needs it only for a subset of tables.
> > > > >
> > > > > Yeah, it's a downside and I think it's less flexible. For example, if
> > > > > users want to send both tables with generated columns and tables
> > > > > without generated columns, they would have to create at least two
> > > > > subscriptions.
> > > > >
> > > >
> > > > Agreed and that would consume more resources.
> > > >
> > > > > Also, they would have to include a different set of
> > > > > tables to two publications.
> > > > >
> > > > > >
> > > > > > I think we should consider it as a table-level option while defining
> > > > > > publication in some way. A few ideas could be: (a) We ask users to
> > > > > > explicitly mention the generated column in the columns list while
> > > > > > defining publication. This has a drawback such that users need to
> > > > > > specify the column list even when all columns need to be replicated.
> > > > > > (b) We can have some new syntax to indicate the same like: CREATE
> > > > > > PUBLICATION pub1 FOR TABLE t1 INCLUDE GENERATED COLS, t2, t3, t4
> > > > > > INCLUDE ..., t5;. I haven't analyzed the feasibility of this, so there
> > > > > > could be some challenges but we can at least investigate it.
> > > > >
> > > > > I think we can create a publication for a single table, so what we can
> > > > > do with this feature can be done also by the idea you described below.
> > > > >
> > > > > > Yet another idea is to keep this as a publication option
> > > > > > (include_generated_columns or publish_generated_columns) similar to
> > > > > > "publish_via_partition_root". Normally, "publish_via_partition_root"
> > > > > > is used when tables on either side have different partitions
> > > > > > hierarchies which is somewhat the case here.
> > > > >
> > > > > It sounds more useful to me.
> > > > >
> > > >
> > > > Fair enough. Let's see if anyone else has any preference among the
> > > > proposed methods or can think of a better way.
> > >
> > > I have fixed the current issue. I have added the option
> > > 'publish_generated_columns' to the publisher side and created the new
> > > test cases accordingly.
> > > The attached patches contain the desired changes.
> > >
> >
> > Thank you for updating the patches. I have some comments:
> >
> > Do we really need to add this option to test_decoding? I think it
> > would be good if this improves the test coverage. Otherwise, I'm not
> > sure we need this part. If we want to add it, I think it would be
> > better to have it in a separate patch.
> >
>
> I have removed the option from the test_decoding file.
>
> > ---
> > +         <para>
> > +          If the publisher-side column is also a generated column
> > then this option
> > +          has no effect; the publisher column will be filled as normal with the
> > +          publisher-side computed or default data.
> > +         </para>
> >
> > I don't understand this description. Why does this option have no
> > effect if the publisher-side column is a generated column?
> >
>
> The documentation was incorrect. Currently, replicating from a
> publisher table with a generated column to a subscriber table with a
> generated column will result in an error. This has now been updated.
>
> > ---
> > +         <para>
> > +         This parameter can only be set <literal>true</literal> if
> > <literal>copy_data</literal> is
> > +         set to <literal>false</literal>.
> > +         </para>
> >
> > If I understand this patch correctly, it doesn't disallow to set
> > copy_data to true when the publish_generated_columns option is
> > specified. But do we want to disallow it? I think it would be more
> > useful and understandable if we allow to use both
> > publish_generated_columns (publisher option) and copy_data (subscriber
> > option) at the same time.
> >
>
> Support for tablesync with generated columns was not included in the
> initial patch, and this was reflected in the documentation. The
> functionality for syncing generated column data has been introduced
> with the 0002 patch.
>

Since nothing was said otherwise, I assumed my v30-0001 comments were
addressed in v31, but the new code seems to have quite a few of my
suggested changes missing. If you haven't addressed my review comments
for patch 0001 yet, please say so. OTOH, please give reasons for any
rejected comments.

> The attached v31 patches contain the changes for the same. I won't be
> posting the test patch for now. I will share it once this patch has
> been stabilized.

How can the patch become "stabilized" without associated tests to
verify the behaviour is not broken? e.g. I can write a stable function
that says 2+2=5.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Masahiko Sawada
Date:
On Wed, Sep 11, 2024 at 10:30 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Because this feature is now being implemented as a PUBLICATION option,
> there is another scenario that might need consideration; I am thinking
> about where the same table is published by multiple PUBLICATIONS (with
> different option settings) that are subscribed by a single
> SUBSCRIPTION.
>
> e.g.1
> -----
> CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> -----
>
> e.g.2
> -----
> CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns = true);
> CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> -----
>
> Do you know if this case is supported? If yes, then which publication
> option value wins?

I would expect these option values are processed with OR. That is, we
publish changes of the generated columns if at least one publication
sets publish_generated_columns to true. It seems to me that we treat
multiple row filters in the same way.

>
> The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having
> several publications in which the same table has been published with
> different column lists are not supported."
>
> Perhaps the user is supposed to deduce that the example above would
> work OK if table 't1' has no generated cols. OTOH, if it did have
> generated cols then the PUBLICATION column lists must be different and
> therefore it is "not supported" (??).

With the patch, how should this feature work when users specify a
generated column to the column list and set publish_generated_column =
false, in the first place? raise an error (as we do today)? or always
send NULL?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Tue, Sep 17, 2024 at 7:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Sep 11, 2024 at 10:30 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Because this feature is now being implemented as a PUBLICATION option,
> > there is another scenario that might need consideration; I am thinking
> > about where the same table is published by multiple PUBLICATIONS (with
> > different option settings) that are subscribed by a single
> > SUBSCRIPTION.
> >
> > e.g.1
> > -----
> > CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > -----
> >
> > e.g.2
> > -----
> > CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns = true);
> > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > -----
> >
> > Do you know if this case is supported? If yes, then which publication
> > option value wins?
>
> I would expect these option values are processed with OR. That is, we
> publish changes of the generated columns if at least one publication
> sets publish_generated_columns to true. It seems to me that we treat
> multiple row filters in the same way.
>

I thought that the option "publish_generated_columns" is more related
to "column lists" than "row filters".

Let's say table 't1' has columns 'a', 'b', 'c', 'gen1', 'gen2'.

Then:
PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
is equivalent to
PUBLICATION pub1 FOR TABLE t1(a,b,c,gen1,gen2);

And
PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
is equivalent to
PUBLICATION pub2 FOR TABLE t1(a,b,c);

So, I would expect this to fail because the SUBSCRIPTION docs say
"Subscriptions having several publications in which the same table has
been published with different column lists are not supported."

~~

Here's another example:
PUBLICATION pub3 FOR TABLE t1(a,b);
PUBLICATION pub4 FOR TABLE t1(c);

Won't it be strange (e.g. difficult to explain) why pub1 and pub2
table column lists are allowed to be combined in one subscription, but
pub3 and pub4 in one subscription are not supported due to the
different column lists?

> >
> > The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having
> > several publications in which the same table has been published with
> > different column lists are not supported."
> >
> > Perhaps the user is supposed to deduce that the example above would
> > work OK if table 't1' has no generated cols. OTOH, if it did have
> > generated cols then the PUBLICATION column lists must be different and
> > therefore it is "not supported" (??).
>
> With the patch, how should this feature work when users specify a
> generated column to the column list and set publish_generated_column =
> false, in the first place? raise an error (as we do today)? or always
> send NULL?

For this scenario, I suggested (see [1] #3) that the code could give a
WARNING. As I wrote up-thread: This combination doesn't seem
like something a user would do intentionally, so just silently
ignoring it (which the current patch does) is likely going to give
someone unexpected results/grief.

======
[1] https://www.postgresql.org/message-id/CAHut%2BPuaitgE4tu3nfaR%3DPCQEKjB%3DmpDtZ1aWkbwb%3DJZE8YvqQ%40mail.gmail.com

Kind Regards,
Peter Smith
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Masahiko Sawada
Date:
On Mon, Sep 16, 2024 at 8:09 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> I thought that the option "publish_generated_columns" is more related
> to "column lists" than "row filters".
>
> Let's say table 't1' has columns 'a', 'b', 'c', 'gen1', 'gen2'.
>

> And
> PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> is equivalent to
> PUBLICATION pub2 FOR TABLE t1(a,b,c);

This makes sense to me as it preserves the current behavior.

> Then:
> PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> is equivalent to
> PUBLICATION pub1 FOR TABLE t1(a,b,c,gen1,gen2);

This also makes sense. It would also include future generated columns.

> So, I would expect this to fail because the SUBSCRIPTION docs say
> "Subscriptions having several publications in which the same table has
> been published with different column lists are not supported."

So I agree that it would raise an error if users subscribe to both
pub1 and pub2.

And looking back at your examples,

> > > e.g.1
> > > -----
> > > CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > > -----
> > >
> > > e.g.2
> > > -----
> > > CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns = true);
> > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > > -----

Both examples would not be supported.

> > >
> > > The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having
> > > several publications in which the same table has been published with
> > > different column lists are not supported."
> > >
> > > Perhaps the user is supposed to deduce that the example above would
> > > work OK if table 't1' has no generated cols. OTOH, if it did have
> > > generated cols then the PUBLICATION column lists must be different and
> > > therefore it is "not supported" (??).
> >
> > With the patch, how should this feature work when users specify a
> > generated column to the column list and set publish_generated_column =
> > false, in the first place? raise an error (as we do today)? or always
> > send NULL?
>
> For this scenario, I suggested (see [1] #3) that the code could give a
> WARNING. As I wrote up-thread: This combination doesn't seem
> like something a user would do intentionally, so just silently
> ignoring it (which the current patch does) is likely going to give
> someone unexpected results/grief.

It gives a WARNING, and then publishes the specified generated column
data (even if publish_generated_column = false)? If so, it would mean
that specifying the generated column to the column list means to
publish its data regardless of the publish_generated_column parameter
value.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Tue, Sep 17, 2024 at 4:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Sep 16, 2024 at 8:09 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > I thought that the option "publish_generated_columns" is more related
> > to "column lists" than "row filters".
> >
> > Let's say table 't1' has columns 'a', 'b', 'c', 'gen1', 'gen2'.
> >
>
> > And
> > PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > is equivalent to
> > PUBLICATION pub2 FOR TABLE t1(a,b,c);
>
> This makes sense to me as it preserves the current behavior.
>
> > Then:
> > PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> > is equivalent to
> > PUBLICATION pub1 FOR TABLE t1(a,b,c,gen1,gen2);
>
> This also makes sense. It would also include future generated columns.
>
> > So, I would expect this to fail because the SUBSCRIPTION docs say
> > "Subscriptions having several publications in which the same table has
> > been published with different column lists are not supported."
>
> So I agree that it would raise an error if users subscribe to both
> pub1 and pub2.
>
> And looking back at your examples,
>
> > > > e.g.1
> > > > -----
> > > > CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> > > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > > > -----
> > > >
> > > > e.g.2
> > > > -----
> > > > CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns = true);
> > > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > > > -----
>
> Both examples would not be supported.
>
> > > >
> > > > The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having
> > > > several publications in which the same table has been published with
> > > > different column lists are not supported."
> > > >
> > > > Perhaps the user is supposed to deduce that the example above would
> > > > work OK if table 't1' has no generated cols. OTOH, if it did have
> > > > generated cols then the PUBLICATION column lists must be different and
> > > > therefore it is "not supported" (??).
> > >
> > > With the patch, how should this feature work when users specify a
> > > generated column to the column list and set publish_generated_column =
> > > false, in the first place? raise an error (as we do today)? or always
> > > send NULL?
> >
> > For this scenario, I suggested (see [1] #3) that the code could give a
> > WARNING. As I wrote up-thread: This combination doesn't seem
> > like something a user would do intentionally, so just silently
> > ignoring it (which the current patch does) is likely going to give
> > someone unexpected results/grief.
>
> It gives a WARNING, and then publishes the specified generated column
> data (even if publish_generated_column = false)? If so, it would mean
> that specifying the generated column to the column list means to
> publish its data regardless of the publish_generated_column parameter
> value.
>

No. I meant only it can give the WARNING to tell the user user  "Hey,
there is a conflict here because you said publish_generated_column=
false, but you also specified gencols in the column list".

But always it is the option "publish_generated_column" determines the
final publishing behaviour. So if it says
publish_generated_column=false then it would NOT publish generated
columns even if they are gencols in the column list. I think this
makes sense because when there is no column list specified then that
implicitly means "all columns" and the table might have some gencols,
but still 'publish_generated_columns' is what determines the
behaviour.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Thu, Aug 29, 2024 at 8:44 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Aug 28, 2024 at 1:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > >
> > > As Euler mentioned earlier, I think it's a decision not to replicate
> > > generated columns because we don't know the target table on the
> > > subscriber has the same expression and there could be locale issues
> > > even if it looks the same. I can see that a benefit of this proposal
> > > would be to save cost to compute generated column values if the user
> > > wants the target table on the subscriber to have exactly the same data
> > > as the publisher's one. Are there other benefits or use cases?
> > >
> >
> > The cost is one but the other is the user may not want the data to be
> > different based on volatile functions like timeofday()
>
> Shouldn't the generation expression be immutable?
>
> > or the table on
> > subscriber won't have the column marked as generated.
>
> Yeah, it would be another use case.
>

While speaking with one of the decoding output plugin users, I learned
that this feature will be useful when replicating data to a
non-postgres database using the plugin output, especially when the
other database doesn't have a generated column concept.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Tue, Sep 17, 2024 at 12:04 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Tue, Sep 17, 2024 at 4:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Sep 16, 2024 at 8:09 PM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > I thought that the option "publish_generated_columns" is more related
> > > to "column lists" than "row filters".
> > >
> > > Let's say table 't1' has columns 'a', 'b', 'c', 'gen1', 'gen2'.
> > >
> >
> > > And
> > > PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > > is equivalent to
> > > PUBLICATION pub2 FOR TABLE t1(a,b,c);
> >
> > This makes sense to me as it preserves the current behavior.
> >
> > > Then:
> > > PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> > > is equivalent to
> > > PUBLICATION pub1 FOR TABLE t1(a,b,c,gen1,gen2);
> >
> > This also makes sense. It would also include future generated columns.
> >
> > > So, I would expect this to fail because the SUBSCRIPTION docs say
> > > "Subscriptions having several publications in which the same table has
> > > been published with different column lists are not supported."
> >
> > So I agree that it would raise an error if users subscribe to both
> > pub1 and pub2.
> >
> > And looking back at your examples,
> >
> > > > > e.g.1
> > > > > -----
> > > > > CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> > > > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > > > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > > > > -----
> > > > >
> > > > > e.g.2
> > > > > -----
> > > > > CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns = true);
> > > > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > > > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > > > > -----
> >
> > Both examples would not be supported.
> >
> > > > >
> > > > > The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having
> > > > > several publications in which the same table has been published with
> > > > > different column lists are not supported."
> > > > >
> > > > > Perhaps the user is supposed to deduce that the example above would
> > > > > work OK if table 't1' has no generated cols. OTOH, if it did have
> > > > > generated cols then the PUBLICATION column lists must be different and
> > > > > therefore it is "not supported" (??).
> > > >
> > > > With the patch, how should this feature work when users specify a
> > > > generated column to the column list and set publish_generated_column =
> > > > false, in the first place? raise an error (as we do today)? or always
> > > > send NULL?
> > >
> > > For this scenario, I suggested (see [1] #3) that the code could give a
> > > WARNING. As I wrote up-thread: This combination doesn't seem
> > > like something a user would do intentionally, so just silently
> > > ignoring it (which the current patch does) is likely going to give
> > > someone unexpected results/grief.
> >
> > It gives a WARNING, and then publishes the specified generated column
> > data (even if publish_generated_column = false)?


I think that the column list should take priority and we should
publish the generated column if it is mentioned in  irrespective of
the option.

> > If so, it would mean
> > that specifying the generated column to the column list means to
> > publish its data regardless of the publish_generated_column parameter
> > value.
> >
>
> No. I meant only it can give the WARNING to tell the user user  "Hey,
> there is a conflict here because you said publish_generated_column=
> false, but you also specified gencols in the column list".
>

Users can use a publication like "create publication pub1 for table
t1(c1, c2), t2;" where they want t1's generated column to be published
but not for t2. They can specify the generated column name in the
column list of t1 in that case even though the rest of the tables
won't publish generated columns.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Masahiko Sawada
Date:
On Thu, Sep 19, 2024 at 2:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Sep 17, 2024 at 12:04 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Tue, Sep 17, 2024 at 4:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Mon, Sep 16, 2024 at 8:09 PM Peter Smith <smithpb2250@gmail.com> wrote:
> > > >
> > > > I thought that the option "publish_generated_columns" is more related
> > > > to "column lists" than "row filters".
> > > >
> > > > Let's say table 't1' has columns 'a', 'b', 'c', 'gen1', 'gen2'.
> > > >
> > >
> > > > And
> > > > PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > > > is equivalent to
> > > > PUBLICATION pub2 FOR TABLE t1(a,b,c);
> > >
> > > This makes sense to me as it preserves the current behavior.
> > >
> > > > Then:
> > > > PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> > > > is equivalent to
> > > > PUBLICATION pub1 FOR TABLE t1(a,b,c,gen1,gen2);
> > >
> > > This also makes sense. It would also include future generated columns.
> > >
> > > > So, I would expect this to fail because the SUBSCRIPTION docs say
> > > > "Subscriptions having several publications in which the same table has
> > > > been published with different column lists are not supported."
> > >
> > > So I agree that it would raise an error if users subscribe to both
> > > pub1 and pub2.
> > >
> > > And looking back at your examples,
> > >
> > > > > > e.g.1
> > > > > > -----
> > > > > > CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> > > > > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > > > > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > > > > > -----
> > > > > >
> > > > > > e.g.2
> > > > > > -----
> > > > > > CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns = true);
> > > > > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > > > > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > > > > > -----
> > >
> > > Both examples would not be supported.
> > >
> > > > > >
> > > > > > The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having
> > > > > > several publications in which the same table has been published with
> > > > > > different column lists are not supported."
> > > > > >
> > > > > > Perhaps the user is supposed to deduce that the example above would
> > > > > > work OK if table 't1' has no generated cols. OTOH, if it did have
> > > > > > generated cols then the PUBLICATION column lists must be different and
> > > > > > therefore it is "not supported" (??).
> > > > >
> > > > > With the patch, how should this feature work when users specify a
> > > > > generated column to the column list and set publish_generated_column =
> > > > > false, in the first place? raise an error (as we do today)? or always
> > > > > send NULL?
> > > >
> > > > For this scenario, I suggested (see [1] #3) that the code could give a
> > > > WARNING. As I wrote up-thread: This combination doesn't seem
> > > > like something a user would do intentionally, so just silently
> > > > ignoring it (which the current patch does) is likely going to give
> > > > someone unexpected results/grief.
> > >
> > > It gives a WARNING, and then publishes the specified generated column
> > > data (even if publish_generated_column = false)?
>
>
> I think that the column list should take priority and we should
> publish the generated column if it is mentioned in  irrespective of
> the option.

Agreed.

>
> > > If so, it would mean
> > > that specifying the generated column to the column list means to
> > > publish its data regardless of the publish_generated_column parameter
> > > value.
> > >
> >
> > No. I meant only it can give the WARNING to tell the user user  "Hey,
> > there is a conflict here because you said publish_generated_column=
> > false, but you also specified gencols in the column list".
> >
>
> Users can use a publication like "create publication pub1 for table
> t1(c1, c2), t2;" where they want t1's generated column to be published
> but not for t2. They can specify the generated column name in the
> column list of t1 in that case even though the rest of the tables
> won't publish generated columns.

Agreed.

I think that users can use the publish_generated_column option when
they want to publish all generated columns, instead of specifying all
the columns in the column list. It's another advantage of this option
that it will also include the future generated columns.

Given that we publish the generated columns if they are mentioned in
the column list, can we separate the patch into two if it helps
reviews? One is to allow logical replication to publish generated
columns if they are explicitly mentioned in the column list. The
second patch is to introduce the publish_generated_columns option.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Fri, Sep 20, 2024 at 3:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Sep 19, 2024 at 2:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
...
> > I think that the column list should take priority and we should
> > publish the generated column if it is mentioned in  irrespective of
> > the option.
>
> Agreed.
>
> >
...
> >
> > Users can use a publication like "create publication pub1 for table
> > t1(c1, c2), t2;" where they want t1's generated column to be published
> > but not for t2. They can specify the generated column name in the
> > column list of t1 in that case even though the rest of the tables
> > won't publish generated columns.
>
> Agreed.
>
> I think that users can use the publish_generated_column option when
> they want to publish all generated columns, instead of specifying all
> the columns in the column list. It's another advantage of this option
> that it will also include the future generated columns.
>

OK. Let me give some examples below to help understand this idea.

Please correct me if these are incorrect.

======

Assuming these tables:

t1(a,b,gen1,gen2)
t2(c,d,gen1,gen2)

Examples, when publish_generated_columns=false:

CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
(publish_generated_columns=false)
t1 -> publishes a, b, gen2 (e.g. what column list says)
t2 -> publishes c, d

CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=false)
t1 -> publishes a, b
t2 -> publishes gen1 (e.g. what column list says)

CREATE PUBLICATION pub1 FOR t1, t2 WITH (publish_generated_columns=false)
t1 -> publishes a, b
t2 -> publishes c, d

CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns=false)
t1 -> publishes a, b
t2 -> publishes c, d

~~

Examples, when publish_generated_columns=true:

CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
(publish_generated_columns=true)
t1 -> publishes a, b, gen2 (e.g. what column list says)
t2 -> publishes c, d + ALSO gen1, gen2

CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=true)
t1 -> publishes a, b + ALSO gen1, gen2
t2 -> publishes gen1 (e.g. what column list says)

CREATE PUBLICATION pub1 FOR t1, t2 WITH (publish_generated_columns=true)
t1 -> publishes a, b + ALSO gen1, gen2
t2 -> publishes c, d + ALSO gen1, gen2

CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns=true)
t1 -> publishes a, b + ALSO gen1, gen2
t2 -> publishes c, d + ALSO gen1, gen2

======

The idea LGTM, although now the parameter name
('publish_generated_columns') seems a bit misleading since sometimes
generated columns get published "irrespective of the option".

So, I think the original parameter name 'include_generated_columns'
might be better here because IMO "include" seems more like "add them
if they are not already specified", which is exactly what this idea is
doing.

Thoughts?

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Fri, Sep 20, 2024 at 4:16 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Fri, Sep 20, 2024 at 3:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Sep 19, 2024 at 2:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > Users can use a publication like "create publication pub1 for table
> > > t1(c1, c2), t2;" where they want t1's generated column to be published
> > > but not for t2. They can specify the generated column name in the
> > > column list of t1 in that case even though the rest of the tables
> > > won't publish generated columns.
> >
> > Agreed.
> >
> > I think that users can use the publish_generated_column option when
> > they want to publish all generated columns, instead of specifying all
> > the columns in the column list. It's another advantage of this option
> > that it will also include the future generated columns.
> >
>
> OK. Let me give some examples below to help understand this idea.
>
> Please correct me if these are incorrect.
>
> Examples, when publish_generated_columns=true:
>
> CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
> (publish_generated_columns=true)
> t1 -> publishes a, b, gen2 (e.g. what column list says)
> t2 -> publishes c, d + ALSO gen1, gen2
>
> CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=true)
> t1 -> publishes a, b + ALSO gen1, gen2
> t2 -> publishes gen1 (e.g. what column list says)
>

These two could be controversial because one could expect that if
"publish_generated_columns=true" then publish generated columns
irrespective of whether they are mentioned in column_list. I am of the
opinion that column_list should take priority the results should be as
mentioned by you but let us see if anyone thinks otherwise.

>
> ======
>
> The idea LGTM, although now the parameter name
> ('publish_generated_columns') seems a bit misleading since sometimes
> generated columns get published "irrespective of the option".
>
> So, I think the original parameter name 'include_generated_columns'
> might be better here because IMO "include" seems more like "add them
> if they are not already specified", which is exactly what this idea is
> doing.
>

I still prefer 'publish_generated_columns' because it matches with
other publication option names. One can also deduce from
'include_generated_columns' that add all the generated columns even
when some of them are specified in column_list.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Thu, Sep 19, 2024 at 10:56 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
>
> Given that we publish the generated columns if they are mentioned in
> the column list, can we separate the patch into two if it helps
> reviews? One is to allow logical replication to publish generated
> columns if they are explicitly mentioned in the column list. The
> second patch is to introduce the publish_generated_columns option.
>

It sounds like a reasonable idea to me but I haven't looked at the
feasibility of the same. So, if it is possible without much effort, we
should split the patch as per your suggestion.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Tue, Sep 17, 2024 at 1:14 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are some review comments for v31-0001 (for the docs only)
>
> There may be some overlap here with some comments already made for
> v30-0001 which are not yet addressed in v31-0001.
>
> ======
> Commit message
>
> 1.
> When introducing the 'publish_generated_columns' parameter, you must
> also say this is a PUBLICATION parameter.
>
> ~~~
>
> 2.
> With this enhancement, users can now include the 'include_generated_columns'
> option when querying logical replication slots using either the pgoutput
> plugin or the test_decoding plugin. This option, when set to 'true' or '1',
> instructs the replication system to include generated column information
> and data in the replication stream.
>
> ~
>
> The above is stale information because it still refers to the old name
> 'include_generated_columns', and to test_decoding which was already
> removed in this patch.
>
> ======
> doc/src/sgml/ddl.sgml
>
> 3.
> +      Generated columns may be skipped during logical replication
> according to the
> +      <command>CREATE PUBLICATION</command> option
> +      <link linkend="sql-createpublication-params-with-include-generated-columns">
> +      <literal>publish_generated_columns</literal></link>.
>
> 3a.
> nit - The linkend is based on the old name instead of the new name.
>
> 3b.
> nit - Better to call this a parameter instead of an option because
> that is what the CREATE PUBLICATION docs call it.
>
> ======
> doc/src/sgml/protocol.sgml
>
> 4.
> +    <varlistentry>
> +     <term>publish_generated_columns</term>
> +      <listitem>
> +       <para>
> +        Boolean option to enable generated columns. This option controls
> +        whether generated columns should be included in the string
> +        representation of tuples during logical decoding in PostgreSQL.
> +       </para>
> +      </listitem>
> +    </varlistentry>
> +
>
> Is this even needed anymore? Now that the implementation is using a
> PUBLICATION parameter, isn't everything determined just by that
> parameter? I don't see the reason why a protocol change is needed
> anymore. And, if there is no protocol change needed, then this
> documentation change is also not needed.
>
> ~~~~
>
> 5.
>       <para>
> -      Next, the following message part appears for each column included in
> -      the publication (except generated columns):
> +      Next, the following message parts appear for each column included in
> +      the publication (generated columns are excluded unless the parameter
> +      <link linkend="protocol-logical-replication-params">
> +      <literal>publish_generated_columns</literal></link> specifies otherwise):
>       </para>
>
> Like the previous comment above, I think everything is now determined
> by the PUBLICATION parameter. So maybe this should just be referring
> to that instead.
>
> ======
> doc/src/sgml/ref/create_publication.sgml
>
> 6.
> +       <varlistentry
> id="sql-createpublication-params-with-include-generated-columns">
> +        <term><literal>publish_generated_columns</literal>
> (<type>boolean</type>)</term>
> +        <listitem>
>
> nit - the ID is based on the old parameter name.
>
> ~
>
> 7.
> +         <para>
> +          This option is only available for replicating generated
> column data from the publisher
> +          to a regular, non-generated column in the subscriber.
> +         </para>
>
> IMO remove this paragraph. I really don't think you should be
> mentioning the subscriber here at all. AFAIK this parameter is only
> for determining if the generated column will be published or not. What
> happens at the other end (e.g. logic whether it gets ignored or not by
> the subscriber) is more like a matrix of behaviours that could be
> documented in the "Logical Replication" section. But not here.
>
> (I removed this in my nitpicks attachment)
>
> ~~~
>
> 8.
> +         <para>
> +         This parameter can only be set <literal>true</literal> if
> <literal>copy_data</literal> is
> +         set to <literal>false</literal>.
> +         </para>
>
> IMO remove this paragraph too. The user can create a PUBLICATION
> before a SUBSCRIPTION even exists so to say it "can only be set..." is
> not correct. Sure, your patch 0001 does not support the COPY of
> generated columns but if you want to document that then it should be
> documented in the CREATE SUBSCRIBER docs. But not here.
>
> (I removed this in my nitpicks attachment)
>
> TBH, it would be better if patches 0001 and 0002 were merged then you
> can avoid all this. IIUC they were only separate in the first place
> because 2 different people wrote them. It is not making reviews easier
> with them split.
>
> ======
>
> Please see the attachment which implements some of the nits above.
>

I have addressed all the comments in the v32-0001 Patch. Please refer
to the updated v32-0001 Patch here in [1]. See [1] for the changes
added.

[1] https://www.postgresql.org/message-id/CAHv8RjKkoaS1oMsFvPRFB9nPSVC5p_D4Kgq5XB9Y2B2xU7smbA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Tue, Sep 17, 2024 at 3:12 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Review comments for v31-0001.
>
> (I tried to give only new comments, but there might be some overlap
> with comments I previously made for v30-0001)
>
> ======
> src/backend/catalog/pg_publication.c
>
> 1.
> +
> + if (publish_generated_columns_given)
> + {
> + values[Anum_pg_publication_pubgencolumns - 1] =
> BoolGetDatum(publish_generated_columns);
> + replaces[Anum_pg_publication_pubgencolumns - 1] = true;
> + }
>
> nit - unnecessary whitespace above here.
>
> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 2. prepare_all_columns_bms
>
> + /* Iterate the cols until generated columns are found. */
> + cols = bms_add_member(cols, i + 1);
>
> How does the comment relate to the statement that follows it?
>
> ~~~
>
> 3.
> + * Skip generated column if pubgencolumns option was not
> + * specified.
>
> nit - /pubgencolumns option/publish_generated_columns parameter/
>
> ======
> src/bin/pg_dump/pg_dump.c
>
> 4.
> getPublications:
>
> nit - /i_pub_gencolumns/i_pubgencols/ (it's the same information but simpler)
>
> ======
> src/bin/pg_dump/pg_dump.h
>
> 5.
> + bool pubgencolumns;
>  } PublicationInfo;
>
> nit - /pubgencolumns/pubgencols/ (it's the same information but simpler)
>
> ======
> vsrc/bin/psql/describe.c
>
> 6.
>   bool has_pubviaroot;
> + bool has_pubgencol;
>
> nit - /has_pubgencol/has_pubgencols/ (plural consistency)
>
> ======
> src/include/catalog/pg_publication.h
>
> 7.
> + /* true if generated columns data should be published */
> + bool pubgencolumns;
>  } FormData_pg_publication;
>
> nit - /pubgencolumns/pubgencols/ (it's the same information but simpler)
>
> ~~~
>
> 8.
> + bool pubgencolumns;
>   PublicationActions pubactions;
>  } Publication;
>
> nit - /pubgencolumns/pubgencols/ (it's the same information but simpler)
>
> ======
> src/test/regress/sql/publication.sql
>
> 9.
> +-- Test the publication with or without 'PUBLISH_GENERATED_COLUMNS' parameter
> +SET client_min_messages = 'ERROR';
> +CREATE PUBLICATION pub1 FOR ALL TABLES WITH (PUBLISH_GENERATED_COLUMNS=1);
> +\dRp+ pub1
> +
> +CREATE PUBLICATION pub2 FOR ALL TABLES WITH (PUBLISH_GENERATED_COLUMNS=0);
> +\dRp+ pub2
>
> 9a.
> nit - Use lowercase for the parameters.
>
> ~
>
> 9b.
> nit - Fix the comment to say what the test is actually doing:
> "Test the publication 'publish_generated_columns' parameter enabled or disabled"
>
> ======
> src/test/subscription/t/031_column_list.pl
>
> 10.
> Later I think you should add another test here to cover the scenario
> that I was discussing with Sawada-San -- e.g. when there are 2
> publications for the same table subscribed by just 1 subscription but
> having different values of the 'publish_generated_columns' for the
> publications.
>

I have addressed all the comments in the v32-0001 Patch. Please refer
to the updated v32-0001 Patch here in [1]. See [1] for the changes
added.

[1] https://www.postgresql.org/message-id/CAHv8RjKkoaS1oMsFvPRFB9nPSVC5p_D4Kgq5XB9Y2B2xU7smbA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Wed, Sep 18, 2024 at 8:58 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, here are my review comments for patch v31-0002.
>
> ======
>
> 1. General.
>
> IMO patches 0001 and 0002 should be merged when next posted. IIUC the
> reason for the split was only because there were 2 different authors
> but that seems to be not relevant anymore.
>
> ======
> Commit message
>
> 2.
> When 'copy_data' is true, during the initial sync, the data is replicated from
> the publisher to the subscriber using the COPY command. The normal COPY
> command does not copy generated columns, so when 'publish_generated_columns'
> is true, we need to copy using the syntax:
> 'COPY (SELECT column_name FROM table_name) TO STDOUT'.
>
> ~
>
> 2a.
> Should clarify that 'copy_data' is a SUBSCRIPTION parameter.
>
> 2b.
> Should clarify that 'publish_generated_columns' is a PUBLICATION parameter.
>
> ======
> src/backend/replication/logical/tablesync.c
>
> make_copy_attnamelist:
>
> 3.
> - for (i = 0; i < rel->remoterel.natts; i++)
> + desc = RelationGetDescr(rel->localrel);
> + localgenlist = palloc0(rel->remoterel.natts * sizeof(bool));
>
> Each time I review this code I am tricked into thinking it is wrong to
> use rel->remoterel.natts here for the localgenlist. AFAICT the code is
> actually fine because you do not store *all* the subscriber gencols in
> 'localgenlist' -- you only store those with matching names on the
> publisher table. It might be good if you could add an explanatory
> comment about that to prevent any future doubts.
>
> ~~~
>
> 4.
> + if (!remotegenlist[remote_attnum])
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("logical replication target relation \"%s.%s\" has a
> generated column \"%s\" "
> + "but corresponding column on source relation is not a generated column",
> + rel->remoterel.nspname, rel->remoterel.relname, NameStr(attr->attname))));
>
> This error message has lots of good information. OTOH, I think when
> copy_data=false the error would report the subscriber column just as
> "missing", which is maybe less helpful. Perhaps that other
> copy_data=false "missing" case can be improved to share the same error
> message that you have here.
>

This comment is still open. Will fix this in the next set of patches.

> ~~~
>
> fetch_remote_table_info:
>
> 5.
> IIUC, this logic needs to be more sophisticated to handle the case
> that was being discussed earlier with Sawada-san [1]. e.g. when the
> same table has gencols but there are multiple subscribed publications
> where the 'publish_generated_columns' parameter differs.
>
> Also, you'll need test cases for this scenario, because it is too
> difficult to judge correctness just by visual inspection of the code.
>
> ~~~~
>
> 6.
> nit - Change 'hasgencolpub' to 'has_pub_with_pubgencols' for
> readability, and initialize it to 'false' to make it easy to use
> later.
>
> ~~~
>
> 7.
> - * Get column lists for each relation.
> + * Get column lists for each relation and check if any of the publication
> + * has generated column option.
>
> and
>
> + /* Check if any of the publication has generated column option */
> + if (server_version >= 180000)
>
> nit - tweak the comments to name the publication parameter properly.
>
> ~~~
>
> 8.
> foreach(lc, MySubscription->publications)
> {
> if (foreach_current_index(lc) > 0)
> appendStringInfoString(&pub_names, ", ");
> appendStringInfoString(&pub_names, quote_literal_cstr(strVal(lfirst(lc))));
> }
>
> I know this is existing code, but shouldn't all this be done by using
> the purpose-built function 'get_publications_str'
>
> ~~~
>
> 9.
> + ereport(ERROR,
> + errcode(ERRCODE_CONNECTION_FAILURE),
> + errmsg("could not fetch gencolumns information from publication list: %s",
> +    pub_names.data));
>
> and
>
> + errcode(ERRCODE_UNDEFINED_OBJECT),
> + errmsg("failed to fetch tuple for gencols from publication list: %s",
> +    pub_names.data));
>
> nit - /gencolumns information/generated column publication
> information/ to make the errmsg more human-readable
>
> ~~~
>
> 10.
> + bool gencols_allowed = server_version >= 180000 && hasgencolpub;
> +
> + if (!gencols_allowed)
> + appendStringInfo(&cmd, " AND a.attgenerated = ''");
>
> Can the 'gencols_allowed' var be removed, and the condition just be
> replaced with if (!has_pub_with_pubgencols)? It seems equivalent
> unless I am mistaken.
>
> ======
>
> Please refer to the attachment which implements some of the nits
> mentioned above.
>
> ======
> [1] https://www.postgresql.org/message-id/CAD21AoBun9crSWaxteMqyu8A_zme2ppa2uJvLJSJC2E3DJxQVA%40mail.gmail.com
>

I have addressed the comments in the v32-0002 Patch. Please refer to
the updated v32-0002 Patch here in [1]. See [1] for the changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjKkoaS1oMsFvPRFB9nPSVC5p_D4Kgq5XB9Y2B2xU7smbA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Masahiko Sawada
Date:
On Thu, Sep 19, 2024 at 9:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Sep 20, 2024 at 4:16 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Fri, Sep 20, 2024 at 3:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Thu, Sep 19, 2024 at 2:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > >
> > > > Users can use a publication like "create publication pub1 for table
> > > > t1(c1, c2), t2;" where they want t1's generated column to be published
> > > > but not for t2. They can specify the generated column name in the
> > > > column list of t1 in that case even though the rest of the tables
> > > > won't publish generated columns.
> > >
> > > Agreed.
> > >
> > > I think that users can use the publish_generated_column option when
> > > they want to publish all generated columns, instead of specifying all
> > > the columns in the column list. It's another advantage of this option
> > > that it will also include the future generated columns.
> > >
> >
> > OK. Let me give some examples below to help understand this idea.
> >
> > Please correct me if these are incorrect.
> >
> > Examples, when publish_generated_columns=true:
> >
> > CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
> > (publish_generated_columns=true)
> > t1 -> publishes a, b, gen2 (e.g. what column list says)
> > t2 -> publishes c, d + ALSO gen1, gen2
> >
> > CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=true)
> > t1 -> publishes a, b + ALSO gen1, gen2
> > t2 -> publishes gen1 (e.g. what column list says)
> >
>
> These two could be controversial because one could expect that if
> "publish_generated_columns=true" then publish generated columns
> irrespective of whether they are mentioned in column_list. I am of the
> opinion that column_list should take priority the results should be as
> mentioned by you but let us see if anyone thinks otherwise.

I agree with Amit. We also publish t2's future generated column in the
first example and t1's future generated columns in the second example.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Fri, Sep 20, 2024 at 2:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Sep 20, 2024 at 4:16 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Fri, Sep 20, 2024 at 3:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Thu, Sep 19, 2024 at 2:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > >
> > > > Users can use a publication like "create publication pub1 for table
> > > > t1(c1, c2), t2;" where they want t1's generated column to be published
> > > > but not for t2. They can specify the generated column name in the
> > > > column list of t1 in that case even though the rest of the tables
> > > > won't publish generated columns.
> > >
> > > Agreed.
> > >
> > > I think that users can use the publish_generated_column option when
> > > they want to publish all generated columns, instead of specifying all
> > > the columns in the column list. It's another advantage of this option
> > > that it will also include the future generated columns.
> > >
> >
> > OK. Let me give some examples below to help understand this idea.
> >
> > Please correct me if these are incorrect.
> >
> > Examples, when publish_generated_columns=true:
> >
> > CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
> > (publish_generated_columns=true)
> > t1 -> publishes a, b, gen2 (e.g. what column list says)
> > t2 -> publishes c, d + ALSO gen1, gen2
> >
> > CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=true)
> > t1 -> publishes a, b + ALSO gen1, gen2
> > t2 -> publishes gen1 (e.g. what column list says)
> >
>
> These two could be controversial because one could expect that if
> "publish_generated_columns=true" then publish generated columns
> irrespective of whether they are mentioned in column_list. I am of the
> opinion that column_list should take priority the results should be as
> mentioned by you but let us see if anyone thinks otherwise.
>
> >
> > ======
> >
> > The idea LGTM, although now the parameter name
> > ('publish_generated_columns') seems a bit misleading since sometimes
> > generated columns get published "irrespective of the option".
> >
> > So, I think the original parameter name 'include_generated_columns'
> > might be better here because IMO "include" seems more like "add them
> > if they are not already specified", which is exactly what this idea is
> > doing.
> >
>
> I still prefer 'publish_generated_columns' because it matches with
> other publication option names. One can also deduce from
> 'include_generated_columns' that add all the generated columns even
> when some of them are specified in column_list.
>

Fair point. Anyway, to avoid surprises it will be important for the
precedence rules to be documented clearly (probably with some
examples),

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Sat, Sep 21, 2024 at 3:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Sep 19, 2024 at 9:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > >
> > > OK. Let me give some examples below to help understand this idea.
> > >
> > > Please correct me if these are incorrect.
> > >
> > > Examples, when publish_generated_columns=true:
> > >
> > > CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
> > > (publish_generated_columns=true)
> > > t1 -> publishes a, b, gen2 (e.g. what column list says)
> > > t2 -> publishes c, d + ALSO gen1, gen2
> > >
> > > CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=true)
> > > t1 -> publishes a, b + ALSO gen1, gen2
> > > t2 -> publishes gen1 (e.g. what column list says)
> > >
> >
> > These two could be controversial because one could expect that if
> > "publish_generated_columns=true" then publish generated columns
> > irrespective of whether they are mentioned in column_list. I am of the
> > opinion that column_list should take priority the results should be as
> > mentioned by you but let us see if anyone thinks otherwise.
>
> I agree with Amit. We also publish t2's future generated column in the
> first example and t1's future generated columns in the second example.
>

Right, it would be good to have at least one test that shows future
generated columns also get published wherever applicable (like where
column_list is not given and publish_generated_columns is true).

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Mon, Sep 23, 2024 at 4:10 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Fri, Sep 20, 2024 at 2:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Sep 20, 2024 at 4:16 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > On Fri, Sep 20, 2024 at 3:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Thu, Sep 19, 2024 at 2:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > >
> > > > > Users can use a publication like "create publication pub1 for table
> > > > > t1(c1, c2), t2;" where they want t1's generated column to be published
> > > > > but not for t2. They can specify the generated column name in the
> > > > > column list of t1 in that case even though the rest of the tables
> > > > > won't publish generated columns.
> > > >
> > > > Agreed.
> > > >
> > > > I think that users can use the publish_generated_column option when
> > > > they want to publish all generated columns, instead of specifying all
> > > > the columns in the column list. It's another advantage of this option
> > > > that it will also include the future generated columns.
> > > >
> > >
> > > OK. Let me give some examples below to help understand this idea.
> > >
> > > Please correct me if these are incorrect.
> > >
> > > Examples, when publish_generated_columns=true:
> > >
> > > CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
> > > (publish_generated_columns=true)
> > > t1 -> publishes a, b, gen2 (e.g. what column list says)
> > > t2 -> publishes c, d + ALSO gen1, gen2
> > >
> > > CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=true)
> > > t1 -> publishes a, b + ALSO gen1, gen2
> > > t2 -> publishes gen1 (e.g. what column list says)
> > >
> >
> > These two could be controversial because one could expect that if
> > "publish_generated_columns=true" then publish generated columns
> > irrespective of whether they are mentioned in column_list. I am of the
> > opinion that column_list should take priority the results should be as
> > mentioned by you but let us see if anyone thinks otherwise.
> >
> > >
> > > ======
> > >
> > > The idea LGTM, although now the parameter name
> > > ('publish_generated_columns') seems a bit misleading since sometimes
> > > generated columns get published "irrespective of the option".
> > >
> > > So, I think the original parameter name 'include_generated_columns'
> > > might be better here because IMO "include" seems more like "add them
> > > if they are not already specified", which is exactly what this idea is
> > > doing.
> > >
> >
> > I still prefer 'publish_generated_columns' because it matches with
> > other publication option names. One can also deduce from
> > 'include_generated_columns' that add all the generated columns even
> > when some of them are specified in column_list.
> >
>
> Fair point. Anyway, to avoid surprises it will be important for the
> precedence rules to be documented clearly (probably with some
> examples),
>

Yeah, one or two examples would be good, but we can have a separate
doc patch that has clearly mentioned all the rules.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Thu, 12 Sept 2024 at 11:01, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Because this feature is now being implemented as a PUBLICATION option,
> there is another scenario that might need consideration; I am thinking
> about where the same table is published by multiple PUBLICATIONS (with
> different option settings) that are subscribed by a single
> SUBSCRIPTION.
>
> e.g.1
> -----
> CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> -----
>
> e.g.2
> -----
> CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns = true);
> CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> -----
>
> Do you know if this case is supported? If yes, then which publication
> option value wins?

I have verified the various scenarios discussed here and the patch
works as expected:
Test presetup:
-- publisher
CREATE TABLE t1 (a int PRIMARY KEY, b int, c int, gen1 int GENERATED
ALWAYS AS (a * 2) STORED, gen2 int GENERATED ALWAYS AS (a * 2)
STORED);
-- Subscriber
CREATE TABLE t1 (a int PRIMARY KEY, b int, c int, d int, e int);

Test1: Subscriber will have only non-generated columns a,b,c
replicated from publisher:
create publication pub1 for all tables with (
publish_generated_columns = false);
INSERT INTO t1 (a,b,c) VALUES (1,1,1);

--Subscriber will have only non-generated columns a,b,c replicated
from publisher:
subscriber=# select * from t1;
 a | b | c | d | e
---+---+---+---+---
 1 | 1 | 1 |   |
(1 row)

Test2: Subscriber will include generated columns a,b,c replicated from
publisher:
create publication pub1 for all tables with ( publish_generated_columns = true);
INSERT INTO t1 (a,b,c) VALUES (1,1,1);

-- Subscriber will include generated columns a,b,c replicated from publisher:
subscriber=# select * from t1;
 a | b | c | d | e
---+---+---+---+---
 1 | 1 | 1 | 2 | 2
(1 row)

Test3: Cannot have subscription subscribing to publication with
publish_generated_columns as true and false
-- publisher
create publication pub1 for all tables with (publish_generated_columns = false);
create publication pub2 for all tables with (publish_generated_columns = true);

-- subscriber
subscriber=# create subscription sub1 connection 'dbname=postgres
host=localhost port=5432' publication pub1,pub2;
ERROR:  cannot use different column lists for table "public.t1" in
different publications

Test4a: Warning thrown when a generated column is specified in column
list along with publish_generated_columns as false
-- publisher
postgres=# create publication pub1 for table t1(a,b,gen1) with (
publish_generated_columns = false);
WARNING:  specified generated column "gen1" in publication column list
for publication with publish_generated_columns as false
CREATE PUBLICATION

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Fri, 20 Sept 2024 at 04:16, Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Fri, Sep 20, 2024 at 3:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Sep 19, 2024 at 2:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> ...
> > > I think that the column list should take priority and we should
> > > publish the generated column if it is mentioned in  irrespective of
> > > the option.
> >
> > Agreed.
> >
> > >
> ...
> > >
> > > Users can use a publication like "create publication pub1 for table
> > > t1(c1, c2), t2;" where they want t1's generated column to be published
> > > but not for t2. They can specify the generated column name in the
> > > column list of t1 in that case even though the rest of the tables
> > > won't publish generated columns.
> >
> > Agreed.
> >
> > I think that users can use the publish_generated_column option when
> > they want to publish all generated columns, instead of specifying all
> > the columns in the column list. It's another advantage of this option
> > that it will also include the future generated columns.
> >
>
> OK. Let me give some examples below to help understand this idea.
>
> Please correct me if these are incorrect.
>
> ======
>
> Assuming these tables:
>
> t1(a,b,gen1,gen2)
> t2(c,d,gen1,gen2)
>
> Examples, when publish_generated_columns=false:
>
> CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
> (publish_generated_columns=false)
> t1 -> publishes a, b, gen2 (e.g. what column list says)
> t2 -> publishes c, d
>
> CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=false)
> t1 -> publishes a, b
> t2 -> publishes gen1 (e.g. what column list says)
>
> CREATE PUBLICATION pub1 FOR t1, t2 WITH (publish_generated_columns=false)
> t1 -> publishes a, b
> t2 -> publishes c, d
>
> CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns=false)
> t1 -> publishes a, b
> t2 -> publishes c, d
>
> ~~
>
> Examples, when publish_generated_columns=true:
>
> CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
> (publish_generated_columns=true)
> t1 -> publishes a, b, gen2 (e.g. what column list says)
> t2 -> publishes c, d + ALSO gen1, gen2
>
> CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=true)
> t1 -> publishes a, b + ALSO gen1, gen2
> t2 -> publishes gen1 (e.g. what column list says)
>
> CREATE PUBLICATION pub1 FOR t1, t2 WITH (publish_generated_columns=true)
> t1 -> publishes a, b + ALSO gen1, gen2
> t2 -> publishes c, d + ALSO gen1, gen2
>
> CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns=true)
> t1 -> publishes a, b + ALSO gen1, gen2
> t2 -> publishes c, d + ALSO gen1, gen2
>
> ======
>
> The idea LGTM, although now the parameter name
> ('publish_generated_columns') seems a bit misleading since sometimes
> generated columns get published "irrespective of the option".
>
> So, I think the original parameter name 'include_generated_columns'
> might be better here because IMO "include" seems more like "add them
> if they are not already specified", which is exactly what this idea is
> doing.
>
> Thoughts?

I have verified the various scenarios discussed here and the patch
works as expected with v32 version patch shared at [1]:

Test presetup:
-- publisher
CREATE TABLE t1 (a int PRIMARY KEY, b int, gen1 int GENERATED ALWAYS
AS (a * 2) STORED, gen2 int GENERATED ALWAYS AS (a * 2) STORED);
CREATE TABLE t2 (c int PRIMARY KEY, d int, gen1 int GENERATED ALWAYS
AS (c * 2) STORED, gen2 int GENERATED ALWAYS AS (d * 2) STORED);

-- subscriber
CREATE TABLE t1 (a int PRIMARY KEY, b int, gen1 int, gen2 int);
CREATE TABLE t2 (c int PRIMARY KEY, d int, gen1 int, gen1 int);

Test1: Publisher replicates the column list data including generated
columns even though publish_generated_columns option is false:
Publisher:
CREATE PUBLICATION pub1 FOR table t1, t2(gen1) WITH
(publish_generated_columns=false)
insert into t1 values(1,1);
insert into t2 values(1,1);

Subscriber:
--t1 -> publishes a, b
subscriber=# select * from t1;
 a | b | gen1 | gen2
---+---+------+------
 1 | 1 |      |
(1 row)

--t2 -> publishes gen1 (e.g. what column list says)
subscriber=# select * from t2;
 c | d | gen1 | gen2
---+---+------+------
   |   |    2 |
(1 row)

Test2: Publisher does not replication gen column if
publish_generated_columns option is false
Publisher:
CREATE PUBLICATION pub1 FOR table t1, t2 WITH (publish_generated_columns=false)
insert into t1 values(1,1);
insert into t2 values(1,1);

Subscriber:
--t1 -> publishes a, b
subscriber=# select * from t1;
 a | b | gen1 | gen2
---+---+------+------
 1 | 1 |      |
(1 row)

-- t2 -> publishes c, d
subscriber=# select * from t2;
 c | d | gen1 | gen2
---+---+------+------
 1 | 1 |      |
(1 row)

Test3: Publisher does not replication gen column if
publish_generated_columns option is false
Publisher:
CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns=false)
insert into t1 values(1,1);
insert into t2 values(1,1);

Subscriber:
--t1 -> publishes a, b
subscriber=# select * from t1;
 a | b | gen1 | gen2
---+---+------+------
 1 | 1 |      |
(1 row)

-- t2 -> publishes c, d
subscriber=# select * from t2;
 c | d | gen1 | gen2
---+---+------+------
 1 | 1 |      |
(1 row)

Test4: Publisher publishes only the data of the columns specified in
column list skipping other generated/non-generated columns:
Publisher:
CREATE PUBLICATION pub1 FOR table t1(a,b,gen2), t2 WITH
(publish_generated_columns=true)
insert into t1 values(1,1);
insert into t2 values(1,1);

Subscriber:
-- t1 -> publishes a, b, gen2 (e.g. what column list says)
subscriber=#  select * from t1;
 a | b | gen1 | gen2
---+---+------+------
 1 | 1 |      |    2
(1 row)

-- t2 -> publishes c, d + ALSO gen1, gen2
subscriber=# select * from t2;
 c | d | gen1 | gen2
---+---+------+------
 1 | 1 |    2 |    2
(1 row)


Test5: Publisher publishes only the data of the columns specified in
column list skipping other generated/non-generated columns:
Publisher:
CREATE PUBLICATION pub1 FOR table t1, t2(gen1) WITH
(publish_generated_columns=true)
insert into t1 values(1,1);
insert into t2 values(1,1);

Subscriber:
-- t1 -> publishes a, b + ALSO gen1, gen2
subscriber=#  select * from t1;
 a | b | gen1 | gen2
---+---+------+------
 1 | 1 |    2 |    2
(1 row)

-- t2 -> publishes gen1 (e.g. what column list says)
subscriber=# select * from t2;
 c | d | gen1 | gen2
---+---+------+------
   |   |    2 |
(1 row)

Test6: Publisher replicates all columns if publish_generated_columns
is enabled without column list
Publisher:
CREATE PUBLICATION pub1 FOR  table t1, t2 WITH (publish_generated_columns=true)
insert into t1 values(1,1);
insert into t2 values(1,1);

Subscriber:
-- t1 -> publishes a, b + ALSO gen1, gen2
subscriber=#  select * from t1;
 a | b | gen1 | gen2
---+---+------+------
 1 | 1 |    2 |    2
(1 row)

-- t2 -> publishes c, d + ALSO gen1, gen2
subscriber=# select * from t2;
 c | d | gen1 | gen2
---+---+------+------
 1 | 1 |    2 |    2
(1 row)

Test7: Publisher replicates all columns if publish_generated_columns
is enabled without column list
Publisher:
CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns=true)
insert into t1 values(1,1);
insert into t2 values(1,1);

Subscriber:
-- t1 -> publishes a, b + ALSO gen1, gen2
subscriber=#  select * from t1;
 a | b | gen1 | gen2
---+---+------+------
 1 | 1 |    2 |    2
(1 row)

-- t2 -> publishes c, d + ALSO gen1, gen2
subscriber=# select * from t2;
 c | d | gen1 | gen2
---+---+------+------
 1 | 1 |    2 |    2
(1 row)

[1] - https://www.postgresql.org/message-id/CAHv8RjKkoaS1oMsFvPRFB9nPSVC5p_D4Kgq5XB9Y2B2xU7smbA%40mail.gmail.com

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Fri, 20 Sept 2024 at 17:15, Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Wed, Sep 11, 2024 at 8:55 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Here are a some more review comments for patch v30-0001.
> >
> > ======
> > src/sgml/ref/create_publication.sgml
> >
> > 1.
> > +         <para>
> > +          If the publisher-side column is also a generated column
> > then this option
> > +          has no effect; the publisher column will be filled as normal with the
> > +          publisher-side computed or default data.
> > +         </para>
> >
> > It should say "subscriber-side"; not "publisher-side". The same was
> > already reported by Sawada-San [1].
> >
> > ~~~
> >
> > 2.
> > +         <para>
> > +         This parameter can only be set <literal>true</literal> if
> > <literal>copy_data</literal> is
> > +         set to <literal>false</literal>.
> > +         </para>
> >
> > IMO this limitation should be addressed by patch 0001 like it was
> > already done in the previous patches (e.g. v22-0002). I think
> > Sawada-san suggested the same [1].
> >
> > Anyway, 'copy_data' is not a PUBLICATION option, so the fact it is
> > mentioned like this without any reference to the SUBSCRIPTION seems
> > like a cut/paste error from the previous implementation.
> >
> > ======
> > src/backend/catalog/pg_publication.c
> >
> > 3. pub_collist_validate
> > - if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
> > - ereport(ERROR,
> > - errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> > - errmsg("cannot use generated column \"%s\" in publication column list",
> > -    colname));
> > -
> >
> > Instead of just removing this ERROR entirely here, I thought it would
> > be more user-friendly to give a WARNING if the PUBLICATION's explicit
> > column list includes generated cols when the option
> > "publish_generated_columns" is false. This combination doesn't seem
> > like something a user would do intentionally, so just silently
> > ignoring it (like the current patch does) is likely going to give
> > someone unexpected results/grief.
> >
> > ======
> > src/backend/replication/logical/proto.c
> >
> > 4. logicalrep_write_tuple, and logicalrep_write_attrs:
> >
> > - if (att->attisdropped || att->attgenerated)
> > + if (att->attisdropped)
> >   continue;
> >
> > Why aren't you also checking the new PUBLICATION option here and
> > skipping all gencols if the "publish_generated_columns" option is
> > false? Or is the BMS of pgoutput_column_list_init handling this case?
> > Maybe there should be an Assert for this?
> >
> > ======
> > src/backend/replication/pgoutput/pgoutput.c
> >
> > 5. send_relation_and_attrs
> >
> > - if (att->attisdropped || att->attgenerated)
> > + if (att->attisdropped)
> >   continue;
> >
> > Same question as #4.
> >
> > ~~~
> >
> > 6. prepare_all_columns_bms and pgoutput_column_list_init
> >
> > + if (att->attgenerated && !pub->pubgencolumns)
> > + cols = bms_del_member(cols, i + 1);
> >
> > IIUC, the algorithm seems overly tricky filling the BMS with all
> > columns, before straight away conditionally removing the generated
> > columns. Can't it be refactored to assign all the correct columns
> > up-front, to avoid calling bms_del_member()?
> >
> > ======
> > src/bin/pg_dump/pg_dump.c
> >
> > 7. getPublications
> >
> > IIUC, there is lots of missing SQL code here (for all older versions)
> > that should be saying "false AS pubgencolumns".
> > e.g. compare the SQL with how "false AS pubviaroot" is used.
> >
> > ======
> > src/bin/pg_dump/t/002_pg_dump.pl
> >
> > 8. Missing tests?
> >
> > I expected to see a pg_dump test for this new PUBLICATION option.
> >
> > ======
> > src/test/regress/sql/publication.sql
> >
> > 9. Missing tests?
> >
> > How about adding another test case that checks this new option must be
> > "Boolean"?
> >
> > ~~~
> >
> > 10. Missing tests?
> >
> > --- error: generated column "d" can't be in list
> > +-- ok: generated columns can be in the list too
> >  ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d);
> > +ALTER PUBLICATION testpub_fortable DROP TABLE testpub_tbl5;
> >
> > (see my earlier comment #3)
> >
> > IMO there should be another test case for a WARNING here if the user
> > attempts to include generated column 'd' in an explicit PUBLICATION
> > column list while the "publish_generated-columns" is false.
> >
> > ======
> > [1]  https://www.postgresql.org/message-id/CAD21AoA-tdTz0G-vri8KM2TXeFU8RCDsOpBXUBCgwkfokF7%3DjA%40mail.gmail.com
> >
>
> I have fixed all the comments. The attached patches contain the desired changes.
> Also the merging of 0001 and 0002 can be done once there are no
> comments on the patch to help in reviewing.

The warning message appears to be incorrect. Even though
publish_generated_columns is set to true, the warning indicates that
it is false.
CREATE TABLE t1 (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
postgres=# CREATE PUBLICATION pub1 FOR table t1(gen1) WITH
(publish_generated_columns=true);
WARNING:  specified generated column "gen1" in publication column list
for publication with publish_generated_columns as false

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Fri, 20 Sept 2024 at 17:15, Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Wed, Sep 11, 2024 at 8:55 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> I have fixed all the comments. The attached patches contain the desired changes.
> Also the merging of 0001 and 0002 can be done once there are no
> comments on the patch to help in reviewing.

Few comments:
1) This commit  message seems wrong, currently irrespective of
publish_generated_columns, the column specified in column list take
preceedene:
When 'publish_generated_columns' is false, generated columns are not
replicated, even when present in a PUBLICATION col-list.

2) Since we have added pubgencols to pg_pubication.h we can specify
"Bump catversion" in the commit message.

3) In create publication column list/publish_generated_columns
documentation we should mention that if generated column is mentioned
in column list, generated columns mentioned in column list will be
replication irrespective of publish_generated_columns option.

4) This warning should be mentioned only if publish_generated_columns is false:
                if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
-                       ereport(ERROR,
+                       ereport(WARNING,

errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
-                                       errmsg("cannot use generated
column \"%s\" in publication column list",
+                                       errmsg("specified generated
column \"%s\" in publication column list for publication with
publish_generated_columns as false",
                                                   colname));

5) These tests are not required for this feature:
+       'ALTER PUBLICATION pub5 ADD TABLE test_table WHERE (col1 > 0);' => {
+               create_order => 51,
+               create_sql =>
+                 'ALTER PUBLICATION pub5 ADD TABLE
dump_test.test_table WHERE (col1 > 0);',
+               regexp => qr/^
+                       \QALTER PUBLICATION pub5 ADD TABLE ONLY
dump_test.test_table WHERE ((col1 > 0));\E
+                       /xm,
+               like => { %full_runs, section_post_data => 1, },
+               unlike => {
+                       exclude_dump_test_schema => 1,
+                       exclude_test_table => 1,
+               },
+       },
+
+       'ALTER PUBLICATION pub5 ADD TABLE test_second_table WHERE
(col2 = \'test\');'
+         => {
+               create_order => 52,
+               create_sql =>
+                 'ALTER PUBLICATION pub5 ADD TABLE
dump_test.test_second_table WHERE (col2 = \'test\');',
+               regexp => qr/^
+                       \QALTER PUBLICATION pub5 ADD TABLE ONLY
dump_test.test_second_table WHERE ((col2 = 'test'::text));\E
+                       /xm,
+               like => { %full_runs, section_post_data => 1, },
+               unlike => { exclude_dump_test_schema => 1, },
+         },

Regards,
Vignesh