Thread: Pgoutput not capturing the generated columns

Pgoutput not capturing the generated columns

From
Rajendra Kumar Dangwal
Date:
Hi PG Users.

We are using Debezium to capture the CDC events into Kafka.
With decoderbufs and wal2json plugins the connector is able to capture the generated columns in the table but not with
pgoutputplugin. 

We tested with the following example:

CREATE TABLE employees (
   id SERIAL PRIMARY KEY,
   first_name VARCHAR(50),
   last_name VARCHAR(50),
   full_name VARCHAR(100) GENERATED ALWAYS AS (first_name || ' ' || last_name) STORED
);

// Inserted few records when the connector was running

Insert into employees (first_name, last_name) VALUES ('ABC' , 'XYZ’);


With decoderbufs and wal2json the connector is able to capture the generated column `full_name` in above example. But
withpgoutput the generated column was not captured.  
Is this a known limitation of pgoutput plugin? If yes, where can we request to add support for this feature?

Thanks.
Rajendra.


Re: Pgoutput not capturing the generated columns

From
"Euler Taveira"
Date:
On Tue, Aug 1, 2023, at 3:47 AM, Rajendra Kumar Dangwal wrote:
With decoderbufs and wal2json the connector is able to capture the generated column `full_name` in above example. But with pgoutput the generated column was not captured. 

wal2json materializes the generated columns before delivering the output. I
decided to materialized the generated columns in the output plugin because the
target consumers expects a complete row.

Is this a known limitation of pgoutput plugin? If yes, where can we request to add support for this feature?

I wouldn't say limitation but a design decision.

The logical replication design decides to compute the generated columns at
subscriber side. It was a wise decision aiming optimization (it doesn't
overload the publisher that is *already* in charge of logical decoding).

Should pgoutput provide a complete row? Probably. If it is an option that
defaults to false and doesn't impact performance.

The request for features should be done in this mailing list.


--
Euler Taveira

Re: Pgoutput not capturing the generated columns

From
Rajendra Kumar Dangwal
Date:

Thanks Euler,

Greatly appreciate your inputs.


> Should pgoutput provide a complete row? Probably. If it is an option that defaults to false and doesn't impact performance.


Yes, it would be great if this feature can be implemented.


> The logical replication design decides to compute the generated columns at subscriber side.


If I understand correctly, this approach involves establishing a function on the subscriber's side that emulates the operation executed to derive the generated column values.

If yes, I see one potential issue where disparities might surface between the values of generated columns on the subscriber's side and those computed within Postgres. This could happen if the generated column's value relies on the current_time function.

Please let me know how can we track the feature requests and the discussions around that.

Thanks,
Rajendra.

Re: Pgoutput not capturing the generated columns

From
Rajendra Kumar Dangwal
Date:
Hi PG Hackers.

We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated columns.
Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for tracking
suchfeature requests? Any insights or assistance you can provide on this matter would be greatly appreciated. 

Many thanks.
Rajendra.




Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal
<dangwalrajendra888@gmail.com> wrote:
>
> Hi PG Hackers.
>
> We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated columns.
> Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for tracking
suchfeature requests? Any insights or assistance you can provide on this matter would be greatly appreciated. 

The attached patch has the changes to support capturing generated
column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
‘include_generated_columns’ option is specified, the generated column
information and generated column data also will be sent.

Usage from pgoutput plugin:
CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
(a * 2) STORED);
CREATE publication pub1 for all tables;
SELECT 'init' FROM pg_create_logical_replication_slot('slot1', 'pgoutput');
SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
'proto_version', '1', 'publication_names', 'pub1',
'include_generated_columns', 'true');

Usage from test_decoding plugin:
SELECT 'init' FROM pg_create_logical_replication_slot('slot2', 'test_decoding');
CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
(a * 2) STORED);
INSERT INTO gencoltable (a) VALUES (1), (2), (3);
SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1',
'include_generated_columns', '1');

Currently it is not supported as a subscription option because table
sync for the generated column is not possible as copy command does not
support getting data for the generated column. If this feature is
required we can remove this limitation from the copy command and then
add it as a subscription option later.
Thoughts?

Thanks and Regards,
Shubham Khanna.

Attachment

RE: Pgoutput not capturing the generated columns

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shubham,

Thanks for creating a patch! Here are high-level comments.

1.
Please document the feature. If it is hard to describe, we should change the API.

2.
Currently, the option is implemented as streaming option. Are there any reasons
to choose the way? Another approach is to implement as slot option, like failover
and temporary.

3.
You said that subscription option is not supported for now. Not sure, is it mean
that logical replication feature cannot be used for generated columns? If so,
the restriction won't be acceptable. If the combination between this and initial
sync is problematic, can't we exclude them in CreateSubscrition and AlterSubscription?
E.g., create_slot option cannot be set if slot_name is NONE.

4.
Regarding the test_decoding plugin, it has already been able to decode the
generated columns. So... as the first place, is the proposed option really needed
for the plugin? Why do you include it?
If you anyway want to add the option, the default value should be on - which keeps
current behavior.

5.
Assuming that the feature become usable used for logical replicaiton. Not sure,
should we change the protocol version at that time? Nodes prior than PG17 may
not want receive values for generated columns. Can we control only by the option?

6. logicalrep_write_tuple()

```
-        if (!column_in_column_list(att->attnum, columns))
+        if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
+            continue;
```

Hmm, does above mean that generated columns are decoded even if they are not in
the column list? If so, why? I think such columns should not be sent.

7.

Some functions refer data->publish_generated_column many times. Can we store
the value to a variable?

Below comments are for test_decoding part, but they may be not needed.

=====

a. pg_decode_startup()

```
+        else if (strcmp(elem->defname, "include_generated_columns") == 0)
```

Other options for test_decoding do not have underscore. It should be
"include-generated-columns".

b. pg_decode_change()

data->include_generated_columns is referred four times in the function.
Can you store the value to a varibable?


c. pg_decode_change()

```
-                                    true);
+                                    true, data->include_generated_columns );
```

Please remove the blank.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Here are some review comments for the patch v1-0001.

======
GENERAL

G.1. Use consistent names

It seems to add unnecessary complications by having different names
for all the new options, fields and API parameters.

e.g. sometimes 'include_generated_columns'
e.g. sometimes 'publish_generated_columns'

Won't it be better to just use identical names everywhere for
everything? I don't mind which one you choose; I just felt you only
need one name, not two. This comment overrides everything else in this
post so whatever name you choose, make adjustments for all my other
review comments as necessary.

======

G.2. Is it possible to just use the existing bms?

A very large part of this patch is adding more API parameters to
delegate the 'publish_generated_columns' flag value down to when it is
finally checked and used. e.g.

The functions:
- logicalrep_write_insert(), logicalrep_write_update(),
logicalrep_write_delete()
... are delegating the new parameter 'publish_generated_column' down to:
- logicalrep_write_tuple

The functions:
- logicalrep_write_rel()
... are delegating the new parameter 'publish_generated_column' down to:
- logicalrep_write_attrs

AFAICT in all these places the API is already passing a "Bitmapset
*columns". I was wondering if it might be possible to modify the
"Bitmapset *columns" BEFORE any of those functions get called so that
the "columns" BMS either does or doesn't include generated cols (as
appropriate according to the option).

Well, it might not be so simple because there are some NULL BMS
considerations also, but I think it would be worth investigating at
least, because if indeed you can find some common place (somewhere
like pgoutput_change()?) where the columns BMS can be filtered to
remove bits for generated cols then it could mean none of those other
patch API changes are needed at all -- then the patch would only be
1/2 the size.

======
Commit message

1.
Now if include_generated_columns option is specified, the generated
column information and generated column data also will be sent.

Usage from pgoutput plugin:
SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
'proto_version', '1', 'publication_names', 'pub1',
'include_generated_columns', 'true');

Usage from test_decoding plugin:
SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1',
'include_generated_columns', '1');

~

I think there needs to be more background information given here. This
commit message doesn't seem to describe anything about what is the
problem and how this patch fixes it. It just jumps straight into
giving usages of a 'include_generated_columns' option.

It also doesn't say that this is an option that was newly *introduced*
by the patch -- it refers to it as though the reader should already
know about it.

Furthermore, your hacker's post says "Currently it is not supported as
a subscription option because table sync for the generated column is
not possible as copy command does not support getting data for the
generated column. If this feature is required we can remove this
limitation from the copy command and then add it as a subscription
option later." IMO that all seems like the kind of information that
ought to also be mentioned in this commit message.

======
contrib/test_decoding/sql/ddl.sql

2.
+-- check include_generated_columns option with generated column
+CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
AS (a * 2) STORED);
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include_generated_columns', '1');
+DROP TABLE gencoltable;
+

2a.
Perhaps you should include both option values to demonstrate the
difference in behaviour:

'include_generated_columns', '0'
'include_generated_columns', '1'

~

2b.
I think you maybe need to include more some test combinations where
there is and isn't a COLUMN LIST, because I am not 100% sure I
understand the current logic/expectations for all combinations.

e.g. When the generated column is in a column list but
'publish_generated_columns' is false then what should happen? etc.
Also if there are any special rules then those should be mentioned in
the commit message.

======
src/backend/replication/logical/proto.c

3.
For all the API changes the new parameter name should be plural.

/publish_generated_column/publish_generated_columns/

~~~

4. logical_rep_write_tuple:

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

- if (!column_in_column_list(att->attnum, columns))
+ if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
+ continue;
+
+ if (att->attgenerated && !publish_generated_column)
  continue;
That code seems confusing. Shouldn't the logic be exactly as also in
logicalrep_write_attrs()?

e.g. Shouldn't they both look like this:

if (att->attisdropped)
  continue;

if (att->attgenerated && !publish_generated_column)
  continue;

if (!column_in_column_list(att->attnum, columns))
  continue;
======
src/backend/replication/pgoutput/pgoutput.c

5.
 static void send_relation_and_attrs(Relation relation, TransactionId xid,
  LogicalDecodingContext *ctx,
- Bitmapset *columns);
+ Bitmapset *columns,
+ bool publish_generated_column);

Use plural. /publish_generated_column/publish_generated_columns/

~~~

6. parse_output_parameters

  bool origin_option_given = false;
+ bool generate_column_option_given = false;

  data->binary = false;
  data->streaming = LOGICALREP_STREAM_OFF;
  data->messages = false;
  data->two_phase = false;
+ data->publish_generated_column = false;

I think the 1st var should be 'include_generated_columns_option_given'
for consistency with the name of the actual option that was given.

======
src/include/replication/logicalproto.h

7.
(Same as a previous review comment)

For all the API changes the new parameter name should be plural.

/publish_generated_column/publish_generated_columns/

======
src/include/replication/pgoutput.h

8.
  bool publish_no_origin;
+ bool publish_generated_column;
 } PGOutputData;

/publish_generated_column/publish_generated_columns/

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
Hi Kuroda-san,

Thanks for reviewing the patch. I have fixed some of the comments
> 2.
> Currently, the option is implemented as streaming option. Are there any reasons
> to choose the way? Another approach is to implement as slot option, like failover
> and temporary.
I think the current approach is appropriate. The options such as
failover and temporary seem like properties of a slot and I think
decoding of generated column should not be slot specific. Also adding
a new option for slot may create an overhead.

> 3.
> You said that subscription option is not supported for now. Not sure, is it mean
> that logical replication feature cannot be used for generated columns? If so,
> the restriction won't be acceptable. If the combination between this and initial
> sync is problematic, can't we exclude them in CreateSubscrition and AlterSubscription?
> E.g., create_slot option cannot be set if slot_name is NONE.
Added an option 'generated_column' for create subscription. Currently
it allow to set 'generated_column' option as true only if 'copy_data'
is set to false.
Also we don't allow user to alter the 'generated_column' option.

> 6. logicalrep_write_tuple()
>
> ```
> -        if (!column_in_column_list(att->attnum, columns))
> +        if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
> +            continue;
> ```
>
> Hmm, does above mean that generated columns are decoded even if they are not in
> the column list? If so, why? I think such columns should not be sent.
Fixed

Thanks and Regards,
Shlok Kyal

Attachment

Re: Pgoutput not capturing the generated columns

From
Masahiko Sawada
Date:
Hi,

On Wed, May 8, 2024 at 4:14 PM Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal
> <dangwalrajendra888@gmail.com> wrote:
> >
> > Hi PG Hackers.
> >
> > We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated columns.
> > Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for tracking
suchfeature requests? Any insights or assistance you can provide on this matter would be greatly appreciated. 
>
> The attached patch has the changes to support capturing generated
> column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
> ‘include_generated_columns’ option is specified, the generated column
> information and generated column data also will be sent.

As Euler mentioned earlier, I think it's a decision not to replicate
generated columns because we don't know the target table on the
subscriber has the same expression and there could be locale issues
even if it looks the same. I can see that a benefit of this proposal
would be to save cost to compute generated column values if the user
wants the target table on the subscriber to have exactly the same data
as the publisher's one. Are there other benefits or use cases?

>
> Usage from pgoutput plugin:
> CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
> (a * 2) STORED);
> CREATE publication pub1 for all tables;
> SELECT 'init' FROM pg_create_logical_replication_slot('slot1', 'pgoutput');
> SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
> 'proto_version', '1', 'publication_names', 'pub1',
> 'include_generated_columns', 'true');
>
> Usage from test_decoding plugin:
> SELECT 'init' FROM pg_create_logical_replication_slot('slot2', 'test_decoding');
> CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
> (a * 2) STORED);
> INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include_generated_columns', '1');
>
> Currently it is not supported as a subscription option because table
> sync for the generated column is not possible as copy command does not
> support getting data for the generated column. If this feature is
> required we can remove this limitation from the copy command and then
> add it as a subscription option later.
> Thoughts?

I think that if we want to support an option to replicate generated
columns, the initial tablesync should support it too. Otherwise, we
end up filling the target columns data with NULL during the initial
tablesync but with replicated data during the streaming changes.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Mon, 20 May 2024 at 13:49, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Hi,
>
> On Wed, May 8, 2024 at 4:14 PM Shubham Khanna
> <khannashubham1197@gmail.com> wrote:
> >
> > On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal
> > <dangwalrajendra888@gmail.com> wrote:
> > >
> > > Hi PG Hackers.
> > >
> > > We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated columns.
> > > Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for
trackingsuch feature requests? Any insights or assistance you can provide on this matter would be greatly appreciated. 
> >
> > The attached patch has the changes to support capturing generated
> > column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
> > ‘include_generated_columns’ option is specified, the generated column
> > information and generated column data also will be sent.
>
> As Euler mentioned earlier, I think it's a decision not to replicate
> generated columns because we don't know the target table on the
> subscriber has the same expression and there could be locale issues
> even if it looks the same. I can see that a benefit of this proposal
> would be to save cost to compute generated column values if the user
> wants the target table on the subscriber to have exactly the same data
> as the publisher's one. Are there other benefits or use cases?

I think this will be useful mainly for the use cases where the
publisher has generated columns and the subscriber does not have
generated  columns.
In the case where both the publisher and subscriber have generated
columns, the current patch will overwrite the generated column values
based on the expression for the generated column in the subscriber.

> >
> > Usage from pgoutput plugin:
> > CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
> > (a * 2) STORED);
> > CREATE publication pub1 for all tables;
> > SELECT 'init' FROM pg_create_logical_replication_slot('slot1', 'pgoutput');
> > SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
> > 'proto_version', '1', 'publication_names', 'pub1',
> > 'include_generated_columns', 'true');
> >
> > Usage from test_decoding plugin:
> > SELECT 'init' FROM pg_create_logical_replication_slot('slot2', 'test_decoding');
> > CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
> > (a * 2) STORED);
> > INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> > SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
> > 'include-xids', '0', 'skip-empty-xacts', '1',
> > 'include_generated_columns', '1');
> >
> > Currently it is not supported as a subscription option because table
> > sync for the generated column is not possible as copy command does not
> > support getting data for the generated column. If this feature is
> > required we can remove this limitation from the copy command and then
> > add it as a subscription option later.
> > Thoughts?
>
> I think that if we want to support an option to replicate generated
> columns, the initial tablesync should support it too. Otherwise, we
> end up filling the target columns data with NULL during the initial
> tablesync but with replicated data during the streaming changes.

+1 for supporting initial sync.
Currently copy_data = true and generate_column = true are not
supported, this limitation will be removed in one of the upcoming
patches.

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi,

AFAICT this v2-0001 patch differences from v1 is mostly about adding
the new CREATE SUBSCRIPTION option. Specifically, I don't think it is
addressing any of my previous review comments for patch v1. [1]. So
these comments below are limited only to the new option code; All my
previous review comments probably still apply.

======
Commit message

1. (General)
The commit message is seriously lacking background explanation to describe:
- What is the current behaviour w.r.t. generated columns
- What is the problem with the current behaviour?
- What exactly is this patch doing to address that problem?

~

2.
New option generated_option is added in create subscription. Now if this
option is specified as 'true' during create subscription, generated
columns in the tables, present in publisher (to which this subscription is
subscribed) can also be replicated.

-

2A.
"generated_option" is not the name of the new option.

~

2B.
"create subscription" stmt should be UPPERCASE; will also be more
readable if the option name is quoted.

~

2C.
Needs more information like under what condition is this option ignored etc.

======
doc/src/sgml/ref/create_subscription.sgml

3.
+       <varlistentry id="sql-createsubscription-params-with-generated-column">
+        <term><literal>generated-column</literal> (<type>boolean</type>)</term>
+        <listitem>
+         <para>
+          Specifies whether the generated columns present in the tables
+          associated with the subscription should be replicated. The default is
+          <literal>false</literal>.
+         </para>
+
+         <para>
+          This parameter can only be set true if copy_data is set to false.
+          This option works fine when a generated column (in
publisher) is replicated to a
+          non-generated column (in subscriber). Else if it is
replicated to a generated
+          column, it will ignore the replicated data and fill the
column with computed or
+          default data.
+         </para>
+        </listitem>
+       </varlistentry>

3A.
There is a typo in the name "generated-column" because we should use
underscores (not hyphens) for the option names.

~

3B.
This it is not a good option name because there is no verb so it
doesn't mean anything to set it true/false -- actually there IS a verb
"generate" but we are not saying generate = true/false, so this name
is also quite confusing.

I think "include_generated_columns" would be much better, but if
others think that name is too long then maybe "include_generated_cols"
or "include_gen_cols" or similar. Of course, whatever if the final
decision should be propagated same thru all the code comments, params,
fields, etc.

~

3C.
copy_data and false should be marked up as <literal> fonts in the sgml

~

3D.

Suggest re-word this part. Don't need to explain when it "works fine".

BEFORE
This option works fine when a generated column (in publisher) is
replicated to a non-generated column (in subscriber). Else if it is
replicated to a generated column, it will ignore the replicated data
and fill the column with computed or default data.

SUGGESTION
If the subscriber-side column is also a generated column then this
option has no effect; the replicated data will be ignored and the
subscriber column will be filled as normal with the subscriber-side
computed or default data.

======
src/backend/commands/subscriptioncmds.c

4. AlterSubscription
    SUBOPT_STREAMING | SUBOPT_DISABLE_ON_ERR |
    SUBOPT_PASSWORD_REQUIRED |
    SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER |
-   SUBOPT_ORIGIN);
+   SUBOPT_ORIGIN | SUBOPT_GENERATED_COLUMN);

Hmm. Is this correct? If ALTER is not allowed (later in this patch
there is a message "toggling generated_column option is not allowed."
then why are we even saying that SUBOPT_GENERATED_COLUMN is a
support_opt for ALTER?

~~~

5.
+ if (IsSet(opts.specified_opts, SUBOPT_GENERATED_COLUMN))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("toggling generated_column option is not allowed.")));
+ }

5A.
I suspect this is not even needed if the 'supported_opt' is fixed per
the previous comment.

~

5B.
But if this message is still needed then I think it should say "ALTER
is not allowed" (not "toggling is not allowed") and also the option
name should be quoted as per the new guidelines for error messages.

======
src/backend/replication/logical/proto.c


6. logicalrep_write_tuple

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

  if (!column_in_column_list(att->attnum, columns))
  continue;

+ if (att->attgenerated && !publish_generated_column)
+

Calling column_in_column_list() might be a more expensive operation
than checking just generated columns flag so maybe reverse the order
and check the generated columns first for a tiny performance gain.

~~

7.
- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

  if (!column_in_column_list(att->attnum, columns))
  continue;

+ if (att->attgenerated && !publish_generated_column)
+ continue;

ditto #6

~~

8. logicalrep_write_attrs

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

  if (!column_in_column_list(att->attnum, columns))
  continue;

+ if (att->attgenerated && !publish_generated_column)
+ continue;
+

ditto #6

~~

9.
- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

  if (!column_in_column_list(att->attnum, columns))
  continue;

+ if (att->attgenerated && !publish_generated_column)
+ continue;

ditto #6

======
src/include/catalog/pg_subscription.h


10. CATALOG

+ bool subgeneratedcolumn; /* True if generated colums must be published */

/colums/columns/

======
src/test/regress/sql/publication.sql

11.
--- error: generated column "d" can't be in list
+-- ok


Maybe change "ok" to say like "ok: generated cols can be in the list too"

======

12.
GENERAL - Missing CREATE SUBSCRIPTION test?
GENERAL - Missing ALTER SUBSCRIPTION test?

How come this patch adds a new CREATE SUBSCRIPTION option but does not
seem to include any test case for that option in either the CREATE
SUBSCRIPTION or ALTER SUBSCRIPTION regression tests?

======
[1] My v1 review -
https://www.postgresql.org/message-id/CAHut+PsuJfcaeg6zst=6PE5uyJv_UxVRHU3ck7W2aHb1uQYKng@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Peter Eisentraut
Date:
On 08.05.24 09:13, Shubham Khanna wrote:
> The attached patch has the changes to support capturing generated
> column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
> ‘include_generated_columns’ option is specified, the generated column
> information and generated column data also will be sent.

It might be worth keeping half an eye on the development of virtual 
generated columns [0].  I think it won't be possible to include those 
into the replication output stream.

I think having an option for including stored generated columns is in 
general ok.


[0]: 
https://www.postgresql.org/message-id/flat/a368248e-69e4-40be-9c07-6c3b5880b0a6@eisentraut.org



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
> Dear Shubham,
>
> Thanks for creating a patch! Here are high-level comments.

> 1.
> Please document the feature. If it is hard to describe, we should change the API.

I have added the feature in the document.

> 4.
> Regarding the test_decoding plugin, it has already been able to decode the
> generated columns. So... as the first place, is the proposed option really needed
> for the plugin? Why do you include it?
> If you anyway want to add the option, the default value should be on - which keeps
> current behavior.

I have made the generated column options as true for test_decoding
plugin so by default we will send generated column data.

> 5.
> Assuming that the feature become usable used for logical replicaiton. Not sure,
> should we change the protocol version at that time? Nodes prior than PG17 may
> not want receive values for generated columns. Can we control only by the option?

I verified the backward compatibility test by using the generated
column option and it worked fine. I think there is no need to make any
further changes.

> 7.
>
> Some functions refer data->publish_generated_column many times. Can we store
> the value to a variable?
>
> Below comments are for test_decoding part, but they may be not needed.
>
> =====
>
> a. pg_decode_startup()
>
> ```
> +        else if (strcmp(elem->defname, "include_generated_columns") == 0)
> ```
>
> Other options for test_decoding do not have underscore. It should be
> "include-generated-columns".
>
> b. pg_decode_change()
>
> data->include_generated_columns is referred four times in the function.
> Can you store the value to a varibable?
>
>
> c. pg_decode_change()
>
> ```
> -                                    true);
> +                                    true, data->include_generated_columns );
> ```
>
> Please remove the blank.

Fixed.
The attached v3 Patch has the changes for the same.

Thanks and Regards,
Shubham Khanna.

Attachment

RE: Pgoutput not capturing the generated columns

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shubham,

Thanks for updating the patch! I checked your patches briefly. Here are my comments.

01. API

Since the option for test_decoding is enabled by default, I think it should be renamed.
E.g., "skip-generated-columns" or something.

02. ddl.sql

```
+-- check include-generated-columns option with generated column
+CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED);
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts',
'1','include-generated-columns', '1');
 
+                            data                             
+-------------------------------------------------------------
+ BEGIN
+ table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
+ table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
+ table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
+ COMMIT
+(5 rows)
```

We should test non-default case, which the generated columns are not generated.

03. ddl.sql

Not sure new tests are in the correct place. Do we have to add new file and move tests to it?
Thought?

04. protocol.sgml

Please keep the format of the sgml file.

05. protocol.sgml

The option is implemented as the streaming option of pgoutput plugin, so they should be
located under "Logical Streaming Replication Parameters" section.

05. AlterSubscription

```
+                if (IsSet(opts.specified_opts, SUBOPT_GENERATED_COLUMN))
+                {
+                    ereport(ERROR,
+                            (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                             errmsg("toggling generated_column option is not allowed.")));
+                }
```

If you don't want to support the option, you can remove SUBOPT_GENERATED_COLUMN
macro from the function. But can you clarify the reason why you do not want?

07. logicalrep_write_tuple

```
-        if (!column_in_column_list(att->attnum, columns))
+        if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
+            continue;
+
+        if (att->attgenerated && !publish_generated_column)
             continue;
```

I think changes in v2 was reverted or wrongly merged.

08. test code

Can you add tests that generated columns are replicated by the logical replication?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Thu, 23 May 2024 at 09:19, Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> > Dear Shubham,
> >
> > Thanks for creating a patch! Here are high-level comments.
>
> > 1.
> > Please document the feature. If it is hard to describe, we should change the API.
>
> I have added the feature in the document.
>
> > 4.
> > Regarding the test_decoding plugin, it has already been able to decode the
> > generated columns. So... as the first place, is the proposed option really needed
> > for the plugin? Why do you include it?
> > If you anyway want to add the option, the default value should be on - which keeps
> > current behavior.
>
> I have made the generated column options as true for test_decoding
> plugin so by default we will send generated column data.
>
> > 5.
> > Assuming that the feature become usable used for logical replicaiton. Not sure,
> > should we change the protocol version at that time? Nodes prior than PG17 may
> > not want receive values for generated columns. Can we control only by the option?
>
> I verified the backward compatibility test by using the generated
> column option and it worked fine. I think there is no need to make any
> further changes.
>
> > 7.
> >
> > Some functions refer data->publish_generated_column many times. Can we store
> > the value to a variable?
> >
> > Below comments are for test_decoding part, but they may be not needed.
> >
> > =====
> >
> > a. pg_decode_startup()
> >
> > ```
> > +        else if (strcmp(elem->defname, "include_generated_columns") == 0)
> > ```
> >
> > Other options for test_decoding do not have underscore. It should be
> > "include-generated-columns".
> >
> > b. pg_decode_change()
> >
> > data->include_generated_columns is referred four times in the function.
> > Can you store the value to a varibable?
> >
> >
> > c. pg_decode_change()
> >
> > ```
> > -                                    true);
> > +                                    true, data->include_generated_columns );
> > ```
> >
> > Please remove the blank.
>
> Fixed.
> The attached v3 Patch has the changes for the same.

Few comments:
1) Since this is removed, tupdesc variable is not required anymore:
+++ b/src/backend/catalog/pg_publication.c
@@ -534,12 +534,6 @@ publication_translate_columns(Relation targetrel,
List *columns,
                                        errmsg("cannot use system
column \"%s\" in publication column list",
                                                   colname));

-               if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
-                       ereport(ERROR,
-
errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
-                                       errmsg("cannot use generated
column \"%s\" in publication column list",
-                                                  colname));

2) In test_decoding include_generated_columns option is used:
+               else if (strcmp(elem->defname,
"include_generated_columns") == 0)
+               {
+                       if (elem->arg == NULL)
+                               continue;
+                       else if (!parse_bool(strVal(elem->arg),
&data->include_generated_columns))
+                               ereport(ERROR,
+
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                                                errmsg("could not
parse value \"%s\" for parameter \"%s\"",
+
strVal(elem->arg), elem->defname)));
+               }

In subscription we have used generated_column, we can try to use the
same option in both places:
+               else if (IsSet(supported_opts, SUBOPT_GENERATED_COLUMN) &&
+                                strcmp(defel->defname,
"generated_column") == 0)
+               {
+                       if (IsSet(opts->specified_opts,
SUBOPT_GENERATED_COLUMN))
+                               errorConflictingDefElem(defel, pstate);
+
+                       opts->specified_opts |= SUBOPT_GENERATED_COLUMN;
+                       opts->generated_column = defGetBoolean(defel);
+               }

3) Tab completion can be added for create subscription to include
generated_column option

4) There are few whitespace issues while applying the patch, check for
git diff --check

5) Add few tests for the new option added

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi,

Here are some review comments for the patch v3-0001.

I don't think v3 addressed any of my previous review comments for
patches v1 and v2. [1][2]

So the comments below are limited only to the new code (i.e. the v3
versus v2 differences). Meanwhile, all my previous review comments may
still apply.

======
GENERAL

The patch applied gives whitespace warnings:

[postgres@CentOS7-x64 oss_postgres_misc]$ git apply
../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch
../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:150:
trailing whitespace.

../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:202:
trailing whitespace.

../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:730:
trailing whitespace.
warning: 3 lines add whitespace errors.

======
contrib/test_decoding/test_decoding.c

1. pg_decode_change

  MemoryContext old;
+ bool include_generated_columns;
+

I'm not really convinced this variable saves any code.

======
doc/src/sgml/protocol.sgml

2.
+        <varlistentry>
+         <term><replaceable
class="parameter">include-generated-columns</replaceable></term>
+         <listitem>
+        <para>
+        The include-generated-columns option controls whether
generated columns should be included in the string representation of
tuples during logical decoding in PostgreSQL. This allows users to
customize the output format based on whether they want to include
these columns or not.
+         </para>
+         </listitem>
+         </varlistentry>

2a.
Something is not correct when this name has hyphens and all the nearby
parameter names do not. Shouldn't it be all uppercase like the other
boolean parameter?

~

2b.
Text in the SGML file should be wrapped properly.

~

2c.
IMO the comment can be more terse and it also needs to specify that it
is a boolean type, and what is the default value if not passed.

SUGGESTION

INCLUDE_GENERATED_COLUMNS [ boolean ]

If true, then generated columns should be included in the string
representation of tuples during logical decoding in PostgreSQL. The
default is false.

======
src/backend/replication/logical/proto.c

3. logicalrep_write_tuple

- if (!column_in_column_list(att->attnum, columns))
+ if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
+ continue;
+
+ if (att->attgenerated && !publish_generated_column)
  continue;

3a.
This code seems overcomplicated checking the same flag multiple times.

SUGGESTION
if (att->attgenerated)
{
  if (!publish_generated_column)
    continue;
}
else
{
  if (!column_in_column_list(att->attnum, columns))
    continue;
}

~

3b.
The same logic occurs several times in logicalrep_write_tuple

~~~

4. logicalrep_write_attrs

  if (!column_in_column_list(att->attnum, columns))
  continue;

+ if (att->attgenerated && !publish_generated_column)
+ continue;
+

Shouldn't these code fragments (2x in this function) look the same as
in logicalrep_write_tuple? See the above review comments.

======
src/backend/replication/pgoutput/pgoutput.c

5. maybe_send_schema

  TransactionId topxid = InvalidTransactionId;
+ bool publish_generated_column = data->publish_generated_column;

I'm not convinced this saves any code, and anyway, it is not
consistent with other fields in this function that are not extracted
to another variable (e.g. data->streaming).

~~~

6. pgoutput_change
-
+ bool publish_generated_column = data->publish_generated_column;
+

I'm not convinced this saves any code, and anyway, it is not
consistent with other fields in this function that are not extracted
to another variable (e.g. data->binary).

======
[1] My v1 review -
https://www.postgresql.org/message-id/CAHut+PsuJfcaeg6zst=6PE5uyJv_UxVRHU3ck7W2aHb1uQYKng@mail.gmail.com
[2] My v2 review -
https://www.postgresql.org/message-id/CAHut%2BPv4RpOsUgkEaXDX%3DW2rhHAsJLiMWdUrUGZOcoRHuWj5%2BQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Thu, May 16, 2024 at 11:35 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are some review comments for the patch v1-0001.
>
> ======
> GENERAL
>
> G.1. Use consistent names
>
> It seems to add unnecessary complications by having different names
> for all the new options, fields and API parameters.
>
> e.g. sometimes 'include_generated_columns'
> e.g. sometimes 'publish_generated_columns'
>
> Won't it be better to just use identical names everywhere for
> everything? I don't mind which one you choose; I just felt you only
> need one name, not two. This comment overrides everything else in this
> post so whatever name you choose, make adjustments for all my other
> review comments as necessary.

I have updated the name to 'include_generated_columns' everywhere in the Patch.

> ======
>
> G.2. Is it possible to just use the existing bms?
>
> A very large part of this patch is adding more API parameters to
> delegate the 'publish_generated_columns' flag value down to when it is
> finally checked and used. e.g.
>
> The functions:
> - logicalrep_write_insert(), logicalrep_write_update(),
> logicalrep_write_delete()
> ... are delegating the new parameter 'publish_generated_column' down to:
> - logicalrep_write_tuple
>
> The functions:
> - logicalrep_write_rel()
> ... are delegating the new parameter 'publish_generated_column' down to:
> - logicalrep_write_attrs
>
> AFAICT in all these places the API is already passing a "Bitmapset
> *columns". I was wondering if it might be possible to modify the
> "Bitmapset *columns" BEFORE any of those functions get called so that
> the "columns" BMS either does or doesn't include generated cols (as
> appropriate according to the option).
>
> Well, it might not be so simple because there are some NULL BMS
> considerations also, but I think it would be worth investigating at
> least, because if indeed you can find some common place (somewhere
> like pgoutput_change()?) where the columns BMS can be filtered to
> remove bits for generated cols then it could mean none of those other
> patch API changes are needed at all -- then the patch would only be
> 1/2 the size.

I will analyse and reply to this in the next version.

> ======
> Commit message
>
> 1.
> Now if include_generated_columns option is specified, the generated
> column information and generated column data also will be sent.
>
> Usage from pgoutput plugin:
> SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
> 'proto_version', '1', 'publication_names', 'pub1',
> 'include_generated_columns', 'true');
>
> Usage from test_decoding plugin:
> SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include_generated_columns', '1');
>
> ~
>
> I think there needs to be more background information given here. This
> commit message doesn't seem to describe anything about what is the
> problem and how this patch fixes it. It just jumps straight into
> giving usages of a 'include_generated_columns' option.
>
> It also doesn't say that this is an option that was newly *introduced*
> by the patch -- it refers to it as though the reader should already
> know about it.
>
> Furthermore, your hacker's post says "Currently it is not supported as
> a subscription option because table sync for the generated column is
> not possible as copy command does not support getting data for the
> generated column. If this feature is required we can remove this
> limitation from the copy command and then add it as a subscription
> option later." IMO that all seems like the kind of information that
> ought to also be mentioned in this commit message.

I have updated the Commit message mentioning the suggested changes.

> ======
> contrib/test_decoding/sql/ddl.sql
>
> 2.
> +-- check include_generated_columns option with generated column
> +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
> AS (a * 2) STORED);
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include_generated_columns', '1');
> +DROP TABLE gencoltable;
> +
>
> 2a.
> Perhaps you should include both option values to demonstrate the
> difference in behaviour:
>
> 'include_generated_columns', '0'
> 'include_generated_columns', '1'

Added the other option values to demonstrate the difference in behaviour:

> 2b.
> I think you maybe need to include more some test combinations where
> there is and isn't a COLUMN LIST, because I am not 100% sure I
> understand the current logic/expectations for all combinations.
>
> e.g. When the generated column is in a column list but
> 'publish_generated_columns' is false then what should happen? etc.
> Also if there are any special rules then those should be mentioned in
> the commit message.

Test case is added and the same is mentioned in the documentation.

> ======
> src/backend/replication/logical/proto.c
>
> 3.
> For all the API changes the new parameter name should be plural.
>
> /publish_generated_column/publish_generated_columns/

Updated the name to 'include_generated_columns'

> 4. logical_rep_write_tuple:
>
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
>   continue;
>
> - if (!column_in_column_list(att->attnum, columns))
> + if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
> + continue;
> +
> + if (att->attgenerated && !publish_generated_column)
>   continue;
> That code seems confusing. Shouldn't the logic be exactly as also in
> logicalrep_write_attrs()?
>
> e.g. Shouldn't they both look like this:
>
> if (att->attisdropped)
>   continue;
>
> if (att->attgenerated && !publish_generated_column)
>   continue;
>
> if (!column_in_column_list(att->attnum, columns))
>   continue;

Fixed.

> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 5.
>  static void send_relation_and_attrs(Relation relation, TransactionId xid,
>   LogicalDecodingContext *ctx,
> - Bitmapset *columns);
> + Bitmapset *columns,
> + bool publish_generated_column);
>
> Use plural. /publish_generated_column/publish_generated_columns/

Updated the name to 'include_generated_columns'

> 6. parse_output_parameters
>
>   bool origin_option_given = false;
> + bool generate_column_option_given = false;
>
>   data->binary = false;
>   data->streaming = LOGICALREP_STREAM_OFF;
>   data->messages = false;
>   data->two_phase = false;
> + data->publish_generated_column = false;
>
> I think the 1st var should be 'include_generated_columns_option_given'
> for consistency with the name of the actual option that was given.

Updated the name to 'include_generated_columns_option_given'

> ======
> src/include/replication/logicalproto.h
>
> 7.
> (Same as a previous review comment)
>
> For all the API changes the new parameter name should be plural.
>
> /publish_generated_column/publish_generated_columns/

Updated the name to 'include_generated_columns'

> ======
> src/include/replication/pgoutput.h
>
> 8.
>   bool publish_no_origin;
> + bool publish_generated_column;
>  } PGOutputData;
>
> /publish_generated_column/publish_generated_columns/

Updated the name to 'include_generated_columns'

The attached Patch contains the suggested changes.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Tue, May 21, 2024 at 12:23 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi,
>
> AFAICT this v2-0001 patch differences from v1 is mostly about adding
> the new CREATE SUBSCRIPTION option. Specifically, I don't think it is
> addressing any of my previous review comments for patch v1. [1]. So
> these comments below are limited only to the new option code; All my
> previous review comments probably still apply.
>
> ======
> Commit message
>
> 1. (General)
> The commit message is seriously lacking background explanation to describe:
> - What is the current behaviour w.r.t. generated columns
> - What is the problem with the current behaviour?
> - What exactly is this patch doing to address that problem?

Added the information related to this inside the Patch.

> 2.
> New option generated_option is added in create subscription. Now if this
> option is specified as 'true' during create subscription, generated
> columns in the tables, present in publisher (to which this subscription is
> subscribed) can also be replicated.
>
> -
>
> 2A.
> "generated_option" is not the name of the new option.
>
> ~
>
> 2B.
> "create subscription" stmt should be UPPERCASE; will also be more
> readable if the option name is quoted.
>
> ~
>
> 2C.
> Needs more information like under what condition is this option ignored etc.

Fixed.

> ======
> doc/src/sgml/ref/create_subscription.sgml
>
> 3.
> +       <varlistentry id="sql-createsubscription-params-with-generated-column">
> +        <term><literal>generated-column</literal> (<type>boolean</type>)</term>
> +        <listitem>
> +         <para>
> +          Specifies whether the generated columns present in the tables
> +          associated with the subscription should be replicated. The default is
> +          <literal>false</literal>.
> +         </para>
> +
> +         <para>
> +          This parameter can only be set true if copy_data is set to false.
> +          This option works fine when a generated column (in
> publisher) is replicated to a
> +          non-generated column (in subscriber). Else if it is
> replicated to a generated
> +          column, it will ignore the replicated data and fill the
> column with computed or
> +          default data.
> +         </para>
> +        </listitem>
> +       </varlistentry>
>
> 3A.
> There is a typo in the name "generated-column" because we should use
> underscores (not hyphens) for the option names.
>
> ~
>
> 3B.
> This it is not a good option name because there is no verb so it
> doesn't mean anything to set it true/false -- actually there IS a verb
> "generate" but we are not saying generate = true/false, so this name
> is also quite confusing.
>
> I think "include_generated_columns" would be much better, but if
> others think that name is too long then maybe "include_generated_cols"
> or "include_gen_cols" or similar. Of course, whatever if the final
> decision should be propagated same thru all the code comments, params,
> fields, etc.
>
> ~
>
> 3C.
> copy_data and false should be marked up as <literal> fonts in the sgml
>
> ~
>
> 3D.
>
> Suggest re-word this part. Don't need to explain when it "works fine".
>
> BEFORE
> This option works fine when a generated column (in publisher) is
> replicated to a non-generated column (in subscriber). Else if it is
> replicated to a generated column, it will ignore the replicated data
> and fill the column with computed or default data.
>
> SUGGESTION
> If the subscriber-side column is also a generated column then this
> option has no effect; the replicated data will be ignored and the
> subscriber column will be filled as normal with the subscriber-side
> computed or default data.

Fixed.

> ======
> src/backend/commands/subscriptioncmds.c
>
> 4. AlterSubscription
>     SUBOPT_STREAMING | SUBOPT_DISABLE_ON_ERR |
>     SUBOPT_PASSWORD_REQUIRED |
>     SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER |
> -   SUBOPT_ORIGIN);
> +   SUBOPT_ORIGIN | SUBOPT_GENERATED_COLUMN);
>
> Hmm. Is this correct? If ALTER is not allowed (later in this patch
> there is a message "toggling generated_column option is not allowed."
> then why are we even saying that SUBOPT_GENERATED_COLUMN is a
> support_opt for ALTER?

Fixed.

> 5.
> + if (IsSet(opts.specified_opts, SUBOPT_GENERATED_COLUMN))
> + {
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("toggling generated_column option is not allowed.")));
> + }
>
> 5A.
> I suspect this is not even needed if the 'supported_opt' is fixed per
> the previous comment.
>
> ~
>
> 5B.
> But if this message is still needed then I think it should say "ALTER
> is not allowed" (not "toggling is not allowed") and also the option
> name should be quoted as per the new guidelines for error messages.
>
> ======
> src/backend/replication/logical/proto.c

Fixed.

> 6. logicalrep_write_tuple
>
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
>   continue;
>
>   if (!column_in_column_list(att->attnum, columns))
>   continue;
>
> + if (att->attgenerated && !publish_generated_column)
> +
>
> Calling column_in_column_list() might be a more expensive operation
> than checking just generated columns flag so maybe reverse the order
> and check the generated columns first for a tiny performance gain.

Fixed.

> 7.
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
>   continue;
>
>   if (!column_in_column_list(att->attnum, columns))
>   continue;
>
> + if (att->attgenerated && !publish_generated_column)
> + continue;
>
> ditto #6

Fixed.

> 8. logicalrep_write_attrs
>
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
>   continue;
>
>   if (!column_in_column_list(att->attnum, columns))
>   continue;
>
> + if (att->attgenerated && !publish_generated_column)
> + continue;
> +
>
> ditto #6

Fixed.

> 9.
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
>   continue;
>
>   if (!column_in_column_list(att->attnum, columns))
>   continue;
>
> + if (att->attgenerated && !publish_generated_column)
> + continue;
>
> ditto #6
>
> ======
> src/include/catalog/pg_subscription.h

Fixed.

> 10. CATALOG
>
> + bool subgeneratedcolumn; /* True if generated colums must be published */
>
> /colums/columns/
>
> ======
> src/test/regress/sql/publication.sql

Fixed.

> 11.
> --- error: generated column "d" can't be in list
> +-- ok
>
>
> Maybe change "ok" to say like "ok: generated cols can be in the list too"

Fixed.

> 12.
> GENERAL - Missing CREATE SUBSCRIPTION test?
> GENERAL - Missing ALTER SUBSCRIPTION test?
>
> How come this patch adds a new CREATE SUBSCRIPTION option but does not
> seem to include any test case for that option in either the CREATE
> SUBSCRIPTION or ALTER SUBSCRIPTION regression tests?

Added the test cases for the same.

Patch v4-0001 contains all the changes required. See [1] for the changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjJcOsk%3Dy%2BvJ3y%2BvXhzR9ZUzUEURvS_90hQW3MNfJ5di7A%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Thu, May 23, 2024 at 10:56 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Shubham,
>
> Thanks for updating the patch! I checked your patches briefly. Here are my comments.
>
> 01. API
>
> Since the option for test_decoding is enabled by default, I think it should be renamed.
> E.g., "skip-generated-columns" or something.

Let's keep the same name 'include_generated_columns' for both the cases.

> 02. ddl.sql
>
> ```
> +-- check include-generated-columns option with generated column
> +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED);
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts',
'1','include-generated-columns', '1'); 
> +                            data
> +-------------------------------------------------------------
> + BEGIN
> + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
> + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
> + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
> + COMMIT
> +(5 rows)
> ```
>
> We should test non-default case, which the generated columns are not generated.

Added the non-default case, which the generated columns are not generated.

> 03. ddl.sql
>
> Not sure new tests are in the correct place. Do we have to add new file and move tests to it?
> Thought?

Added the new tests in the 'decoding_into_rel.out' file.

> 04. protocol.sgml
>
> Please keep the format of the sgml file.

Fixed.

> 05. protocol.sgml
>
> The option is implemented as the streaming option of pgoutput plugin, so they should be
> located under "Logical Streaming Replication Parameters" section.

Fixed.

> 05. AlterSubscription
>
> ```
> +                               if (IsSet(opts.specified_opts, SUBOPT_GENERATED_COLUMN))
> +                               {
> +                                       ereport(ERROR,
> +                                                       (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +                                                        errmsg("toggling generated_column option is not
allowed.")));
> +                               }
> ```
>
> If you don't want to support the option, you can remove SUBOPT_GENERATED_COLUMN
> macro from the function. But can you clarify the reason why you do not want?

Fixed.

> 07. logicalrep_write_tuple
>
> ```
> -               if (!column_in_column_list(att->attnum, columns))
> +               if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
> +                       continue;
> +
> +               if (att->attgenerated && !publish_generated_column)
>                         continue;
> ```
>
> I think changes in v2 was reverted or wrongly merged.

Fixed.

> 08. test code
>
> Can you add tests that generated columns are replicated by the logical replication?

Added the test cases.

Patch v4-0001 contains all the changes required. See [1] for the changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjJcOsk%3Dy%2BvJ3y%2BvXhzR9ZUzUEURvS_90hQW3MNfJ5di7A%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Thu, May 23, 2024 at 5:56 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Thu, 23 May 2024 at 09:19, Shubham Khanna
> <khannashubham1197@gmail.com> wrote:
> >
> > > Dear Shubham,
> > >
> > > Thanks for creating a patch! Here are high-level comments.
> >
> > > 1.
> > > Please document the feature. If it is hard to describe, we should change the API.
> >
> > I have added the feature in the document.
> >
> > > 4.
> > > Regarding the test_decoding plugin, it has already been able to decode the
> > > generated columns. So... as the first place, is the proposed option really needed
> > > for the plugin? Why do you include it?
> > > If you anyway want to add the option, the default value should be on - which keeps
> > > current behavior.
> >
> > I have made the generated column options as true for test_decoding
> > plugin so by default we will send generated column data.
> >
> > > 5.
> > > Assuming that the feature become usable used for logical replicaiton. Not sure,
> > > should we change the protocol version at that time? Nodes prior than PG17 may
> > > not want receive values for generated columns. Can we control only by the option?
> >
> > I verified the backward compatibility test by using the generated
> > column option and it worked fine. I think there is no need to make any
> > further changes.
> >
> > > 7.
> > >
> > > Some functions refer data->publish_generated_column many times. Can we store
> > > the value to a variable?
> > >
> > > Below comments are for test_decoding part, but they may be not needed.
> > >
> > > =====
> > >
> > > a. pg_decode_startup()
> > >
> > > ```
> > > +        else if (strcmp(elem->defname, "include_generated_columns") == 0)
> > > ```
> > >
> > > Other options for test_decoding do not have underscore. It should be
> > > "include-generated-columns".
> > >
> > > b. pg_decode_change()
> > >
> > > data->include_generated_columns is referred four times in the function.
> > > Can you store the value to a varibable?
> > >
> > >
> > > c. pg_decode_change()
> > >
> > > ```
> > > -                                    true);
> > > +                                    true, data->include_generated_columns );
> > > ```
> > >
> > > Please remove the blank.
> >
> > Fixed.
> > The attached v3 Patch has the changes for the same.
>
> Few comments:
> 1) Since this is removed, tupdesc variable is not required anymore:
> +++ b/src/backend/catalog/pg_publication.c
> @@ -534,12 +534,6 @@ publication_translate_columns(Relation targetrel,
> List *columns,
>                                         errmsg("cannot use system
> column \"%s\" in publication column list",
>                                                    colname));
>
> -               if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
> -                       ereport(ERROR,
> -
> errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> -                                       errmsg("cannot use generated
> column \"%s\" in publication column list",
> -                                                  colname));

Fixed.

> 2) In test_decoding include_generated_columns option is used:
> +               else if (strcmp(elem->defname,
> "include_generated_columns") == 0)
> +               {
> +                       if (elem->arg == NULL)
> +                               continue;
> +                       else if (!parse_bool(strVal(elem->arg),
> &data->include_generated_columns))
> +                               ereport(ERROR,
> +
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                                                errmsg("could not
> parse value \"%s\" for parameter \"%s\"",
> +
> strVal(elem->arg), elem->defname)));
> +               }
>
> In subscription we have used generated_column, we can try to use the
> same option in both places:
> +               else if (IsSet(supported_opts, SUBOPT_GENERATED_COLUMN) &&
> +                                strcmp(defel->defname,
> "generated_column") == 0)
> +               {
> +                       if (IsSet(opts->specified_opts,
> SUBOPT_GENERATED_COLUMN))
> +                               errorConflictingDefElem(defel, pstate);
> +
> +                       opts->specified_opts |= SUBOPT_GENERATED_COLUMN;
> +                       opts->generated_column = defGetBoolean(defel);
> +               }

Will update the name to 'include_generated_columns' in the next
version of the Patch.

> 3) Tab completion can be added for create subscription to include
> generated_column option

Fixed.

> 4) There are few whitespace issues while applying the patch, check for
> git diff --check

Fixed.

> 5) Add few tests for the new option added

Added new test cases.

Patch v4-0001 contains all the changes required. See [1] for the changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjJcOsk%3Dy%2BvJ3y%2BvXhzR9ZUzUEURvS_90hQW3MNfJ5di7A%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Fri, May 24, 2024 at 8:26 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi,
>
> Here are some review comments for the patch v3-0001.
>
> I don't think v3 addressed any of my previous review comments for
> patches v1 and v2. [1][2]
>
> So the comments below are limited only to the new code (i.e. the v3
> versus v2 differences). Meanwhile, all my previous review comments may
> still apply.

Patch v4-0001 addresses the previous review comments for patches v1
and v2. [1][2]

> ======
> GENERAL
>
> The patch applied gives whitespace warnings:
>
> [postgres@CentOS7-x64 oss_postgres_misc]$ git apply
> ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch
> ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:150:
> trailing whitespace.
>
> ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:202:
> trailing whitespace.
>
> ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:730:
> trailing whitespace.
> warning: 3 lines add whitespace errors.

Fixed.

> ======
> contrib/test_decoding/test_decoding.c
>
> 1. pg_decode_change
>
>   MemoryContext old;
> + bool include_generated_columns;
> +
>
> I'm not really convinced this variable saves any code.

Fixed.

> ======
> doc/src/sgml/protocol.sgml
>
> 2.
> +        <varlistentry>
> +         <term><replaceable
> class="parameter">include-generated-columns</replaceable></term>
> +         <listitem>
> +        <para>
> +        The include-generated-columns option controls whether
> generated columns should be included in the string representation of
> tuples during logical decoding in PostgreSQL. This allows users to
> customize the output format based on whether they want to include
> these columns or not.
> +         </para>
> +         </listitem>
> +         </varlistentry>
>
> 2a.
> Something is not correct when this name has hyphens and all the nearby
> parameter names do not. Shouldn't it be all uppercase like the other
> boolean parameter?
>
> ~
>
> 2b.
> Text in the SGML file should be wrapped properly.
>
> ~
>
> 2c.
> IMO the comment can be more terse and it also needs to specify that it
> is a boolean type, and what is the default value if not passed.
>
> SUGGESTION
>
> INCLUDE_GENERATED_COLUMNS [ boolean ]
>
> If true, then generated columns should be included in the string
> representation of tuples during logical decoding in PostgreSQL. The
> default is false.

Fixed.

> ======
> src/backend/replication/logical/proto.c
>
> 3. logicalrep_write_tuple
>
> - if (!column_in_column_list(att->attnum, columns))
> + if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
> + continue;
> +
> + if (att->attgenerated && !publish_generated_column)
>   continue;
>
> 3a.
> This code seems overcomplicated checking the same flag multiple times.
>
> SUGGESTION
> if (att->attgenerated)
> {
>   if (!publish_generated_column)
>     continue;
> }
> else
> {
>   if (!column_in_column_list(att->attnum, columns))
>     continue;
> }
>
> ~
>
> 3b.
> The same logic occurs several times in logicalrep_write_tuple

Fixed.

> 4. logicalrep_write_attrs
>
>   if (!column_in_column_list(att->attnum, columns))
>   continue;
>
> + if (att->attgenerated && !publish_generated_column)
> + continue;
> +
>
> Shouldn't these code fragments (2x in this function) look the same as
> in logicalrep_write_tuple? See the above review comments.

Fixed.

> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 5. maybe_send_schema
>
>   TransactionId topxid = InvalidTransactionId;
> + bool publish_generated_column = data->publish_generated_column;
>
> I'm not convinced this saves any code, and anyway, it is not
> consistent with other fields in this function that are not extracted
> to another variable (e.g. data->streaming).

Fixed.

> 6. pgoutput_change
> -
> + bool publish_generated_column = data->publish_generated_column;
> +
>
> I'm not convinced this saves any code, and anyway, it is not
> consistent with other fields in this function that are not extracted
> to another variable (e.g. data->binary).

Fixed.

> ======
> [1] My v1 review -
> https://www.postgresql.org/message-id/CAHut+PsuJfcaeg6zst=6PE5uyJv_UxVRHU3ck7W2aHb1uQYKng@mail.gmail.com
> [2] My v2 review -
> https://www.postgresql.org/message-id/CAHut%2BPv4RpOsUgkEaXDX%3DW2rhHAsJLiMWdUrUGZOcoRHuWj5%2BQ%40mail.gmail.com

Patch v4-0001 contains all the changes required. See [1] for the changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjJcOsk%3Dy%2BvJ3y%2BvXhzR9ZUzUEURvS_90hQW3MNfJ5di7A%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Mon, 3 Jun 2024 at 13:03, Shubham Khanna <khannashubham1197@gmail.com> wrote:
>
> On Thu, May 16, 2024 at 11:35 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Here are some review comments for the patch v1-0001.
> >
> > ======
> > GENERAL
> >
> > G.1. Use consistent names
> >
> > It seems to add unnecessary complications by having different names
> > for all the new options, fields and API parameters.
> >
> > e.g. sometimes 'include_generated_columns'
> > e.g. sometimes 'publish_generated_columns'
> >
> > Won't it be better to just use identical names everywhere for
> > everything? I don't mind which one you choose; I just felt you only
> > need one name, not two. This comment overrides everything else in this
> > post so whatever name you choose, make adjustments for all my other
> > review comments as necessary.
>
> I have updated the name to 'include_generated_columns' everywhere in the Patch.
>
> > ======
> >
> > G.2. Is it possible to just use the existing bms?
> >
> > A very large part of this patch is adding more API parameters to
> > delegate the 'publish_generated_columns' flag value down to when it is
> > finally checked and used. e.g.
> >
> > The functions:
> > - logicalrep_write_insert(), logicalrep_write_update(),
> > logicalrep_write_delete()
> > ... are delegating the new parameter 'publish_generated_column' down to:
> > - logicalrep_write_tuple
> >
> > The functions:
> > - logicalrep_write_rel()
> > ... are delegating the new parameter 'publish_generated_column' down to:
> > - logicalrep_write_attrs
> >
> > AFAICT in all these places the API is already passing a "Bitmapset
> > *columns". I was wondering if it might be possible to modify the
> > "Bitmapset *columns" BEFORE any of those functions get called so that
> > the "columns" BMS either does or doesn't include generated cols (as
> > appropriate according to the option).
> >
> > Well, it might not be so simple because there are some NULL BMS
> > considerations also, but I think it would be worth investigating at
> > least, because if indeed you can find some common place (somewhere
> > like pgoutput_change()?) where the columns BMS can be filtered to
> > remove bits for generated cols then it could mean none of those other
> > patch API changes are needed at all -- then the patch would only be
> > 1/2 the size.
>
> I will analyse and reply to this in the next version.
>
> > ======
> > Commit message
> >
> > 1.
> > Now if include_generated_columns option is specified, the generated
> > column information and generated column data also will be sent.
> >
> > Usage from pgoutput plugin:
> > SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
> > 'proto_version', '1', 'publication_names', 'pub1',
> > 'include_generated_columns', 'true');
> >
> > Usage from test_decoding plugin:
> > SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
> > 'include-xids', '0', 'skip-empty-xacts', '1',
> > 'include_generated_columns', '1');
> >
> > ~
> >
> > I think there needs to be more background information given here. This
> > commit message doesn't seem to describe anything about what is the
> > problem and how this patch fixes it. It just jumps straight into
> > giving usages of a 'include_generated_columns' option.
> >
> > It also doesn't say that this is an option that was newly *introduced*
> > by the patch -- it refers to it as though the reader should already
> > know about it.
> >
> > Furthermore, your hacker's post says "Currently it is not supported as
> > a subscription option because table sync for the generated column is
> > not possible as copy command does not support getting data for the
> > generated column. If this feature is required we can remove this
> > limitation from the copy command and then add it as a subscription
> > option later." IMO that all seems like the kind of information that
> > ought to also be mentioned in this commit message.
>
> I have updated the Commit message mentioning the suggested changes.
>
> > ======
> > contrib/test_decoding/sql/ddl.sql
> >
> > 2.
> > +-- check include_generated_columns option with generated column
> > +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
> > AS (a * 2) STORED);
> > +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> > NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> > 'include_generated_columns', '1');
> > +DROP TABLE gencoltable;
> > +
> >
> > 2a.
> > Perhaps you should include both option values to demonstrate the
> > difference in behaviour:
> >
> > 'include_generated_columns', '0'
> > 'include_generated_columns', '1'
>
> Added the other option values to demonstrate the difference in behaviour:
>
> > 2b.
> > I think you maybe need to include more some test combinations where
> > there is and isn't a COLUMN LIST, because I am not 100% sure I
> > understand the current logic/expectations for all combinations.
> >
> > e.g. When the generated column is in a column list but
> > 'publish_generated_columns' is false then what should happen? etc.
> > Also if there are any special rules then those should be mentioned in
> > the commit message.
>
> Test case is added and the same is mentioned in the documentation.
>
> > ======
> > src/backend/replication/logical/proto.c
> >
> > 3.
> > For all the API changes the new parameter name should be plural.
> >
> > /publish_generated_column/publish_generated_columns/
>
> Updated the name to 'include_generated_columns'
>
> > 4. logical_rep_write_tuple:
> >
> > - if (att->attisdropped || att->attgenerated)
> > + if (att->attisdropped)
> >   continue;
> >
> > - if (!column_in_column_list(att->attnum, columns))
> > + if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
> > + continue;
> > +
> > + if (att->attgenerated && !publish_generated_column)
> >   continue;
> > That code seems confusing. Shouldn't the logic be exactly as also in
> > logicalrep_write_attrs()?
> >
> > e.g. Shouldn't they both look like this:
> >
> > if (att->attisdropped)
> >   continue;
> >
> > if (att->attgenerated && !publish_generated_column)
> >   continue;
> >
> > if (!column_in_column_list(att->attnum, columns))
> >   continue;
>
> Fixed.
>
> > ======
> > src/backend/replication/pgoutput/pgoutput.c
> >
> > 5.
> >  static void send_relation_and_attrs(Relation relation, TransactionId xid,
> >   LogicalDecodingContext *ctx,
> > - Bitmapset *columns);
> > + Bitmapset *columns,
> > + bool publish_generated_column);
> >
> > Use plural. /publish_generated_column/publish_generated_columns/
>
> Updated the name to 'include_generated_columns'
>
> > 6. parse_output_parameters
> >
> >   bool origin_option_given = false;
> > + bool generate_column_option_given = false;
> >
> >   data->binary = false;
> >   data->streaming = LOGICALREP_STREAM_OFF;
> >   data->messages = false;
> >   data->two_phase = false;
> > + data->publish_generated_column = false;
> >
> > I think the 1st var should be 'include_generated_columns_option_given'
> > for consistency with the name of the actual option that was given.
>
> Updated the name to 'include_generated_columns_option_given'
>
> > ======
> > src/include/replication/logicalproto.h
> >
> > 7.
> > (Same as a previous review comment)
> >
> > For all the API changes the new parameter name should be plural.
> >
> > /publish_generated_column/publish_generated_columns/
>
> Updated the name to 'include_generated_columns'
>
> > ======
> > src/include/replication/pgoutput.h
> >
> > 8.
> >   bool publish_no_origin;
> > + bool publish_generated_column;
> >  } PGOutputData;
> >
> > /publish_generated_column/publish_generated_columns/
>
> Updated the name to 'include_generated_columns'
>
> The attached Patch contains the suggested changes.

Thanks for the updated patch, few comments:
1) The option name seems wrong here:
In one place include_generated_column is specified and other place
include_generated_columns is specified:

+               else if (IsSet(supported_opts,
SUBOPT_INCLUDE_GENERATED_COLUMN) &&
+                                strcmp(defel->defname,
"include_generated_column") == 0)
+               {
+                       if (IsSet(opts->specified_opts,
SUBOPT_INCLUDE_GENERATED_COLUMN))
+                               errorConflictingDefElem(defel, pstate);
+
+                       opts->specified_opts |= SUBOPT_INCLUDE_GENERATED_COLUMN;
+                       opts->include_generated_column = defGetBoolean(defel);
+               }

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index d453e224d9..e8ff752fd9 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3365,7 +3365,7 @@ psql_completion(const char *text, int start, int end)
                COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
                                          "disable_on_error",
"enabled", "failover", "origin",
                                          "password_required",
"run_as_owner", "slot_name",
-                                         "streaming",
"synchronous_commit", "two_phase");
+                                         "streaming",
"synchronous_commit", "two_phase","include_generated_columns");

2) This small data table need not have a primary key column as it will
create an index and insertion will happen in the index too.
+-- check include-generated-columns option with generated column
+CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
AS (a * 2) STORED);
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '1');

3) Please add a test case for this:
+          set to <literal>false</literal>. If the subscriber-side
column is also a
+          generated column then this option has no effect; the
replicated data will
+          be ignored and the subscriber column will be filled as
normal with the
+          subscriber-side computed or default data.

4) You can use a new style of ereport to remove the brackets around errcode
4.a)
+                       else if (!parse_bool(strVal(elem->arg),
&data->include_generated_columns))
+                               ereport(ERROR,
+
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                                                errmsg("could not
parse value \"%s\" for parameter \"%s\"",
+
strVal(elem->arg), elem->defname)));

4.b) similarly here too:
+               ereport(ERROR,
+                               (errcode(ERRCODE_SYNTAX_ERROR),
+               /*- translator: both %s are strings of the form
"option = value" */
+                                       errmsg("%s and %s are mutually
exclusive options",
+                                               "copy_data = true",
"include_generated_column = true")));

4.c) similarly here too:
+                       if (include_generated_columns_option_given)
+                               ereport(ERROR,
+                                               (errcode(ERRCODE_SYNTAX_ERROR),
+                                                errmsg("conflicting
or redundant options")));

5) These variable names can be changed to keep it smaller, something
like gencol or generatedcol or gencolumn, etc
+++ b/src/include/catalog/pg_subscription.h
@@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
BKI_SHARED_RELATION BKI_ROW
  * slots) in the upstream database are enabled
  * to be synchronized to the standbys. */

+ bool subincludegeneratedcolumn; /* True if generated columns must be
published */
+
 #ifdef CATALOG_VARLEN /* variable-length fields start here */
  /* Connection string to the publisher */
  text subconninfo BKI_FORCE_NOT_NULL;
@@ -157,6 +159,7 @@ typedef struct Subscription
  List    *publications; /* List of publication names to subscribe to */
  char    *origin; /* Only publish data originating from the
  * specified origin */
+ bool includegeneratedcolumn; /* publish generated column data */
 } Subscription;

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
>
> The attached Patch contains the suggested changes.
>

Hi,

Currently, COPY command does not work for generated columns and
therefore, COPY of generated column is not supported during tablesync
process. So, in patch v4-0001 we added a check to allow replication of
the generated column only if 'copy_data = false'.

I am attaching patches to resolve the above issues.

v5-0001: not changed
v5-0002: Support COPY of generated column
v5-0003: Support COPY of generated column during tablesync process

Thought?


Thanks and Regards,
Shlok Kyal

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi,

Here are some review comments for patch v5-0001.

======
GENERAL G.1

The patch changes HEAD behaviour for PUBLICATION col-lists right? e.g.
maybe before they were always ignored, but now they are not?

OTOH, when 'include_generated_columns' is false then the PUBLICATION
col-list will ignore any generated cols even when they are present in
a PUBLICATION col-list, right?

These kinds of points should be noted in the commit message and in the
(col-list?) documentation.

======
Commit message

General 1a.
IMO the commit message needs some background to say something like:
"Currently generated column values are not replicated because it is
assumed that the corresponding subscriber-side table will generate its
own values for those columns."

~

General 1b.
Somewhere in this commit message, you need to give all the other
special rules --- e.g. the docs says "If the subscriber-side column is
also a generated column then this option has no effect"

~~~

2.
This commit enables support for the 'include_generated_columns' option
in logical replication, allowing the transmission of generated column
information and data alongside regular table changes. This option is
particularly useful for scenarios where applications require access to
generated column values for downstream processing or synchronization.

~

I don't think the sentence "This option is particularly useful..." is
helpful. It seems like just saying "This commit supports option XXX.
This is particularly useful if you want XXX".

~~~

3.
CREATE SUBSCRIPTION test1 connection 'dbname=postgres host=localhost port=9999
'publication pub1;

~

What is this CREATE SUBSCRIPTION for? Shouldn't it have an example of
the new parameter being used in it?

~~~

4.
Currently copy_data option with include_generated_columns option is
not supported. A future patch will remove this limitation.

~

Suggest to single-quote those parameter names for better readability.

~~~

5.
This commit aims to enhance the flexibility and utility of logical
replication by allowing users to include generated column information
in replication streams, paving the way for more robust data
synchronization and processing workflows.

~

IMO this paragraph can be omitted.

======
.../test_decoding/sql/decoding_into_rel.sql

6.
+-- check include-generated-columns option with generated column
+CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
AS (a * 2) STORED);
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '1');
+INSERT INTO gencoltable (a) VALUES (4), (5), (6);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '0');
+DROP TABLE gencoltable;
+

6a.
I felt some additional explicit comments might help the readabilty of
the output file.

e.g.1
-- When 'include-generated=columns' = '1' the generated column 'b'
values will be replicated
SELECT data FROM pg_logical_slot_get_changes...

e.g.2
-- When 'include-generated=columns' = '0' the generated column 'b'
values will not be replicated
SELECT data FROM pg_logical_slot_get_changes...

~~

6b.
Suggest adding one more test case (where 'include-generated=columns'
is not set) to confirm/demonstrate the default behaviour for
replicated generated cols.

======
doc/src/sgml/protocol.sgml

7.
+    <varlistentry>
+     <term><replaceable
class="parameter">include-generated-columns</replaceable></term>
+      <listitem>
+       <para>
+        Boolean option to enable generated columns.
+        The include-generated-columns option controls whether generated
+        columns should be included in the string representation of tuples
+        during logical decoding in PostgreSQL. This allows users to
+        customize the output format based on whether they want to include
+        these columns or not. The default is false.
+       </para>
+      </listitem>
+    </varlistentry>

7a.
It doesn't render properly. e.g. Should not be bold italic (probably
the class is wrong?), because none of the nearby parameters look this
way.

~

7b.
The name here should NOT have hyphens. It needs underscores same as
all other nearby protocol parameters.

~

7c.
The description seems overly verbose.

SUGGESTION
Boolean option to enable generated columns. This option controls
whether generated columns should be included in the string
representation of tuples during logical decoding in PostgreSQL. The
default is false.

======
doc/src/sgml/ref/create_subscription.sgml

8.
+
+       <varlistentry
id="sql-createsubscription-params-with-include-generated-column">
+        <term><literal>include_generated_column</literal>
(<type>boolean</type>)</term>
+        <listitem>
+         <para>
+          Specifies whether the generated columns present in the tables
+          associated with the subscription should be replicated. The default is
+          <literal>false</literal>.
+         </para>

The parameter name should be plural (include_generated_columns).

======
src/backend/commands/subscriptioncmds.c

9.
 #define SUBOPT_ORIGIN 0x00008000
+#define SUBOPT_INCLUDE_GENERATED_COLUMN 0x00010000

Should be plural COLUMNS

~~~

10.
+ else if (IsSet(supported_opts, SUBOPT_INCLUDE_GENERATED_COLUMN) &&
+ strcmp(defel->defname, "include_generated_column") == 0)

The new subscription parameter should be plural ("include_generated_columns").

~~~

11.
+
+ /*
+ * Do additional checking for disallowed combination when copy_data and
+ * include_generated_column are true. COPY of generated columns is
not supported
+ * yet.
+ */
+ if (opts->copy_data && opts->include_generated_column)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: both %s are strings of the form "option = value" */
+ errmsg("%s and %s are mutually exclusive options",
+ "copy_data = true", "include_generated_column = true")));
+ }

/combination/combinations/

The parameter name should be plural in the comment and also in the
error message.

======
src/bin/psql/tab-complete.c

12.
  COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
    "disable_on_error", "enabled", "failover", "origin",
    "password_required", "run_as_owner", "slot_name",
-   "streaming", "synchronous_commit", "two_phase");
+   "streaming", "synchronous_commit", "two_phase","include_generated_columns");

The new param should be added in alphabetical order same as all the others.

======
src/include/catalog/pg_subscription.h

13.
+ bool subincludegeneratedcolumn; /* True if generated columns must be
published */
+

The field name should be plural.

~~~

14.
+ bool includegeneratedcolumn; /* publish generated column data */
 } Subscription;

The field name should be plural.

======
src/include/replication/walreceiver.h

15.
  * prepare time */
  char    *origin; /* Only publish data originating from the
  * specified origin */
+ bool include_generated_column; /* publish generated columns */
  } logical;
  } proto;
 } WalRcvStreamOptions;

~

This new field name should be plural.

======
src/test/subscription/t/011_generated.pl

16.
+my ($cmdret, $stdout, $stderr) = $node_subscriber->psql('postgres', qq(
+ CREATE SUBSCRIPTION sub2 CONNECTION '$publisher_connstr' PUBLICATION
pub2 WITH (include_generated_column = true)
+));
+ok( $stderr =~
+   qr/copy_data = true and include_generated_column = true are
mutually exclusive options/,
+ 'cannot use both include_generated_column and copy_data as true');

Isn't this mutual exclusiveness of options something that could have
been tested in the regress test suite instead of TAP tests? e.g. AFAIK
you won't require a connection to test this case.

~~~

17. Missing test?

IIUC there is a missing test scenario. You can add another subscriber
table TAB3 which *already* has generated cols (e.g. generating
different values to the publisher) so then you can verify they are NOT
overwritten, even when the 'include_generated_cols' is true.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi,

Here are some review comments for patch v5-0002.

======
GENERAL

G1.
IIUC now you are unconditionally allowing all generated columns to be copied.

I think this is assuming that the table sync code (in the next patch
0003?) is going to explicitly name all the columns it wants to copy
(so if it wants to get generated cols then it will name the generated
cols, and if is doesn't want generated cols then it won't name them).

Maybe that is OK for the logical replication tablesync case, but I am
not sure if it will be desirable to *always* copy generated columns in
other user scenarios.

e.g. I was wondering if there should be a new COPY command option
introduced here -- INCLUDE_GENERATED_COLUMNS (with default false) so
then the current HEAD behaviour is unaffected unless that option is
enabled.

~~~

G2.
The current COPY command documentation [1] says "If no column list is
specified, all columns of the table except generated columns will be
copied."

But this 0002 patch has changed that documented behaviour, and so the
documentation needs to be changed as well, right?

======
Commit Message

1.
Currently COPY command do not copy generated column. With this commit
added support for COPY for generated column.

~

The grammar/cardinality is not good here. Try some tool (Grammarly or
chatGPT, etc) to help correct it.

======
src/backend/commands/copy.c

======
src/test/regress/expected/generated.out

======
src/test/regress/sql/generated.sql

2.
I think these COPY test cases require some explicit comments to
describe what they are doing, and what are the expected results.

Currently, I have doubts about some of this test input/output

e.g.1. Why is the 'b' column sometimes specified as 1? It needs some
explanation. Are you expecting this generated col value to be
ignored/overwritten or what?

COPY gtest1 (a, b) FROM stdin DELIMITER ' ';
5 1
6 1
\.

e.g.2. what is the reason for this new 'missing data for column "b"'
error? Or is it some introduced quirk because "b" now cannot be
generated since there is no value for "a"? I don't know if the
expected *.out here is OK or not, so some test comments may help to
clarify it.

======
[1] https://www.postgresql.org/docs/devel/sql-copy.html

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi,

Here are some review comments for patch v5-0003.

======
0. Whitespace warnings when the patch was applied.

[postgres@CentOS7-x64 oss_postgres_misc]$ git apply
../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch
../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:29:
trailing whitespace.
          has no effect; the replicated data will be ignored and the subscriber
../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:30:
trailing whitespace.
          column will be filled as normal with the subscriber-side computed or
../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:189:
trailing whitespace.
(walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 &&
warning: 3 lines add whitespace errors.

======
src/backend/commands/subscriptioncmds.c

1.
- res = walrcv_exec(wrconn, cmd.data, check_columnlist ? 3 : 2, tableRow);
+ column_count = (!include_generated_column && check_gen_col) ? 4 :
(check_columnlist ? 3 : 2);
+ res = walrcv_exec(wrconn, cmd.data, column_count, tableRow);

The 'column_count' seems out of control. Won't it be far simpler to
assign/increment the value dynamically only as required instead of the
tricky calculation at the end which is unnecessarily difficult to
understand?

~~~

2.
+ /*
+ * If include_generated_column option is false and all the column of
the table in the
+ * publication are generated then we should throw an error.
+ */
+ if (!isnull && !include_generated_column && check_gen_col)
+ {
+ attlist = DatumGetArrayTypeP(attlistdatum);
+ gen_col_count = DatumGetInt32(slot_getattr(slot, 4, &isnull));
+ Assert(!isnull);
+
+ attcount = ArrayGetNItems(ARR_NDIM(attlist), ARR_DIMS(attlist));
+
+ if (attcount != 0 && attcount == gen_col_count)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use only generated column for table \"%s.%s\" in
publication when generated_column option is false",
+    nspname, relname));
+ }
+

Why do you think this new logic/error is necessary?

IIUC the 'include_generated_columns' should be false to match the
existing HEAD behavior. So this scenario where your publisher-side
table *only* has generated columns is something that could already
happen, right? IOW, this introduced error could be a candidate for
another discussion/thread/patch, but is it really required for this
current patch?

======
src/backend/replication/logical/tablesync.c

3.
  lrel->remoteid,
- (walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 ?
-   "AND a.attgenerated = ''" : ""),
+ (walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 &&
+ (walrcv_server_version(LogRepWorkerWalRcvConn) <= 160000 ||
+ !MySubscription->includegeneratedcolumn) ? "AND a.attgenerated = ''" : ""),

This ternary within one big appendStringInfo seems quite complicated.
Won't it be better to split the appendStringInfo into multiple parts
so the generated-cols calculation can be done more simply?

======
src/test/subscription/t/011_generated.pl

4.
I think there should be a variety of different tablesync scenarios
(when 'include_generated_columns' is true) tested here instead of just
one, and all varieties with lots of comments to say what they are
doing, expectations etc.

a. publisher-side gen-col "a" replicating to subscriber-side NOT
gen-col "a" (ok, value gets replicated)
b. publisher-side gen-col "a" replicating to subscriber-side gen-col
(ok, but ignored)
c. publisher-side NOT gen-col "b" replicating to subscriber-side
gen-col "b" (error?)

~~

5.
+$result = $node_subscriber->safe_psql('postgres', "SELECT a, b FROM tab3");
+is( $result, qq(1|2
+2|4
+3|6), 'generated columns initial sync with include_generated_column = true');

Should this say "ORDER BY..." so it will not fail if the row order
happens to be something unanticipated?

======

99.
Also, see the attached file with numerous other nitpicks:
- plural param- and var-names
- typos in comments
- missing spaces
- SQL keyword should be UPPERCASE
- etc.

Please apply any/all of these if you agree with them.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Mon, Jun 3, 2024 at 9:52 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> >
> > The attached Patch contains the suggested changes.
> >
>
> Hi,
>
> Currently, COPY command does not work for generated columns and
> therefore, COPY of generated column is not supported during tablesync
> process. So, in patch v4-0001 we added a check to allow replication of
> the generated column only if 'copy_data = false'.
>
> I am attaching patches to resolve the above issues.
>
> v5-0001: not changed
> v5-0002: Support COPY of generated column
> v5-0003: Support COPY of generated column during tablesync process
>

Hi Shlok, I have a question about patch v5-0003.

According to the patch 0001 docs "If the subscriber-side column is
also a generated column then this option has no effect; the replicated
data will be ignored and the subscriber column will be filled as
normal with the subscriber-side computed or default data".

Doesn't this mean it will be a waste of effort/resources to COPY any
column value where the subscriber-side column is generated since we
know that any copied value will be ignored anyway?

But I don't recall seeing any comment or logic for this kind of copy
optimisation in the patch 0003. Is this already accounted for
somewhere and I missed it, or is my understanding wrong?

======
Kind Regards,
Peter Smith.
Fujitsu Australia



RE: Pgoutput not capturing the generated columns

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shlok and Shubham,

Thanks for updating the patch!

I briefly checked the v5-0002. IIUC, your patch allows to copy generated
columns unconditionally. I think the behavior affects many people so that it is
hard to get agreement.

Can we add a new option like `GENERATED_COLUMNS [boolean]`? If the default is set
to off, we can keep the current specification.

Thought?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
> Thanks for the updated patch, few comments:
> 1) The option name seems wrong here:
> In one place include_generated_column is specified and other place
> include_generated_columns is specified:
>
> +               else if (IsSet(supported_opts,
> SUBOPT_INCLUDE_GENERATED_COLUMN) &&
> +                                strcmp(defel->defname,
> "include_generated_column") == 0)
> +               {
> +                       if (IsSet(opts->specified_opts,
> SUBOPT_INCLUDE_GENERATED_COLUMN))
> +                               errorConflictingDefElem(defel, pstate);
> +
> +                       opts->specified_opts |= SUBOPT_INCLUDE_GENERATED_COLUMN;
> +                       opts->include_generated_column = defGetBoolean(defel);
> +               }

Fixed.

> diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
> index d453e224d9..e8ff752fd9 100644
> --- a/src/bin/psql/tab-complete.c
> +++ b/src/bin/psql/tab-complete.c
> @@ -3365,7 +3365,7 @@ psql_completion(const char *text, int start, int end)
>                 COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
>                                           "disable_on_error",
> "enabled", "failover", "origin",
>                                           "password_required",
> "run_as_owner", "slot_name",
> -                                         "streaming",
> "synchronous_commit", "two_phase");
> +                                         "streaming",
> "synchronous_commit", "two_phase","include_generated_columns");
>
> 2) This small data table need not have a primary key column as it will
> create an index and insertion will happen in the index too.
> +-- check include-generated-columns option with generated column
> +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
> AS (a * 2) STORED);
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '1');

Fixed.

> 3) Please add a test case for this:
> +          set to <literal>false</literal>. If the subscriber-side
> column is also a
> +          generated column then this option has no effect; the
> replicated data will
> +          be ignored and the subscriber column will be filled as
> normal with the
> +          subscriber-side computed or default data.

Added the required test case.

> 4) You can use a new style of ereport to remove the brackets around errcode
> 4.a)
> +                       else if (!parse_bool(strVal(elem->arg),
> &data->include_generated_columns))
> +                               ereport(ERROR,
> +
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                                                errmsg("could not
> parse value \"%s\" for parameter \"%s\"",
> +
> strVal(elem->arg), elem->defname)));
>
> 4.b) similarly here too:
> +               ereport(ERROR,
> +                               (errcode(ERRCODE_SYNTAX_ERROR),
> +               /*- translator: both %s are strings of the form
> "option = value" */
> +                                       errmsg("%s and %s are mutually
> exclusive options",
> +                                               "copy_data = true",
> "include_generated_column = true")));
>
> 4.c) similarly here too:
> +                       if (include_generated_columns_option_given)
> +                               ereport(ERROR,
> +                                               (errcode(ERRCODE_SYNTAX_ERROR),
> +                                                errmsg("conflicting
> or redundant options")));

Fixed.

> 5) These variable names can be changed to keep it smaller, something
> like gencol or generatedcol or gencolumn, etc
> +++ b/src/include/catalog/pg_subscription.h
> @@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
> BKI_SHARED_RELATION BKI_ROW
>   * slots) in the upstream database are enabled
>   * to be synchronized to the standbys. */
>
> + bool subincludegeneratedcolumn; /* True if generated columns must be
> published */
> +
>  #ifdef CATALOG_VARLEN /* variable-length fields start here */
>   /* Connection string to the publisher */
>   text subconninfo BKI_FORCE_NOT_NULL;
> @@ -157,6 +159,7 @@ typedef struct Subscription
>   List    *publications; /* List of publication names to subscribe to */
>   char    *origin; /* Only publish data originating from the
>   * specified origin */
> + bool includegeneratedcolumn; /* publish generated column data */
>  } Subscription;

Fixed.

The attached Patch contains the suggested changes.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Tue, Jun 4, 2024 at 8:12 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi,
>
> Here are some review comments for patch v5-0001.
>
> ======
> GENERAL G.1
>
> The patch changes HEAD behaviour for PUBLICATION col-lists right? e.g.
> maybe before they were always ignored, but now they are not?
>
> OTOH, when 'include_generated_columns' is false then the PUBLICATION
> col-list will ignore any generated cols even when they are present in
> a PUBLICATION col-list, right?
>
> These kinds of points should be noted in the commit message and in the
> (col-list?) documentation.

Fixed.

> ======
> Commit message
>
> General 1a.
> IMO the commit message needs some background to say something like:
> "Currently generated column values are not replicated because it is
> assumed that the corresponding subscriber-side table will generate its
> own values for those columns."
>
> ~
>
> General 1b.
> Somewhere in this commit message, you need to give all the other
> special rules --- e.g. the docs says "If the subscriber-side column is
> also a generated column then this option has no effect"
>
> ~~~

Fixed.

> 2.
> This commit enables support for the 'include_generated_columns' option
> in logical replication, allowing the transmission of generated column
> information and data alongside regular table changes. This option is
> particularly useful for scenarios where applications require access to
> generated column values for downstream processing or synchronization.
>
> ~
>
> I don't think the sentence "This option is particularly useful..." is
> helpful. It seems like just saying "This commit supports option XXX.
> This is particularly useful if you want XXX".
>

Fixed.

>
> 3.
> CREATE SUBSCRIPTION test1 connection 'dbname=postgres host=localhost port=9999
> 'publication pub1;
>
> ~
>
> What is this CREATE SUBSCRIPTION for? Shouldn't it have an example of
> the new parameter being used in it?
>

Added the description for this in the Patch.

>
> 4.
> Currently copy_data option with include_generated_columns option is
> not supported. A future patch will remove this limitation.
>
> ~
>
> Suggest to single-quote those parameter names for better readability.
>

Fixed.

>
> 5.
> This commit aims to enhance the flexibility and utility of logical
> replication by allowing users to include generated column information
> in replication streams, paving the way for more robust data
> synchronization and processing workflows.
>
> ~
>
> IMO this paragraph can be omitted.

Fixed.

> ======
> .../test_decoding/sql/decoding_into_rel.sql
>
> 6.
> +-- check include-generated-columns option with generated column
> +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
> AS (a * 2) STORED);
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '1');
> +INSERT INTO gencoltable (a) VALUES (4), (5), (6);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '0');
> +DROP TABLE gencoltable;
> +
>
> 6a.
> I felt some additional explicit comments might help the readabilty of
> the output file.
>
> e.g.1
> -- When 'include-generated=columns' = '1' the generated column 'b'
> values will be replicated
> SELECT data FROM pg_logical_slot_get_changes...
>
> e.g.2
> -- When 'include-generated=columns' = '0' the generated column 'b'
> values will not be replicated
> SELECT data FROM pg_logical_slot_get_changes...

Added the required description for this.

> 6b.
> Suggest adding one more test case (where 'include-generated=columns'
> is not set) to confirm/demonstrate the default behaviour for
> replicated generated cols.

Added the required Test case.

> ======
> doc/src/sgml/protocol.sgml
>
> 7.
> +    <varlistentry>
> +     <term><replaceable
> class="parameter">include-generated-columns</replaceable></term>
> +      <listitem>
> +       <para>
> +        Boolean option to enable generated columns.
> +        The include-generated-columns option controls whether generated
> +        columns should be included in the string representation of tuples
> +        during logical decoding in PostgreSQL. This allows users to
> +        customize the output format based on whether they want to include
> +        these columns or not. The default is false.
> +       </para>
> +      </listitem>
> +    </varlistentry>
>
> 7a.
> It doesn't render properly. e.g. Should not be bold italic (probably
> the class is wrong?), because none of the nearby parameters look this
> way.
>
> ~
>
> 7b.
> The name here should NOT have hyphens. It needs underscores same as
> all other nearby protocol parameters.
>
> ~
>
> 7c.
> The description seems overly verbose.
>
> SUGGESTION
> Boolean option to enable generated columns. This option controls
> whether generated columns should be included in the string
> representation of tuples during logical decoding in PostgreSQL. The
> default is false.

Fixed.

> ======
> doc/src/sgml/ref/create_subscription.sgml
>
> 8.
> +
> +       <varlistentry
> id="sql-createsubscription-params-with-include-generated-column">
> +        <term><literal>include_generated_column</literal>
> (<type>boolean</type>)</term>
> +        <listitem>
> +         <para>
> +          Specifies whether the generated columns present in the tables
> +          associated with the subscription should be replicated. The default is
> +          <literal>false</literal>.
> +         </para>
>
> The parameter name should be plural (include_generated_columns).

Fixed.

> ======
> src/backend/commands/subscriptioncmds.c
>
> 9.
>  #define SUBOPT_ORIGIN 0x00008000
> +#define SUBOPT_INCLUDE_GENERATED_COLUMN 0x00010000
>
> Should be plural COLUMNS
>
Fixed.

> 10.
> + else if (IsSet(supported_opts, SUBOPT_INCLUDE_GENERATED_COLUMN) &&
> + strcmp(defel->defname, "include_generated_column") == 0)
>
> The new subscription parameter should be plural ("include_generated_columns").

Fixed.

> 11.
> +
> + /*
> + * Do additional checking for disallowed combination when copy_data and
> + * include_generated_column are true. COPY of generated columns is
> not supported
> + * yet.
> + */
> + if (opts->copy_data && opts->include_generated_column)
> + {
> + ereport(ERROR,
> + (errcode(ERRCODE_SYNTAX_ERROR),
> + /*- translator: both %s are strings of the form "option = value" */
> + errmsg("%s and %s are mutually exclusive options",
> + "copy_data = true", "include_generated_column = true")));
> + }
>
> /combination/combinations/
>
> The parameter name should be plural in the comment and also in the
> error message.

Fixed.

> ======
> src/bin/psql/tab-complete.c
>
> 12.
>   COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
>     "disable_on_error", "enabled", "failover", "origin",
>     "password_required", "run_as_owner", "slot_name",
> -   "streaming", "synchronous_commit", "two_phase");
> +   "streaming", "synchronous_commit", "two_phase","include_generated_columns");
>
> The new param should be added in alphabetical order same as all the others.

Fixed.

> ======
> src/include/catalog/pg_subscription.h
>
> 13.
> + bool subincludegeneratedcolumn; /* True if generated columns must be
> published */
> +
>
> The field name should be plural.

Fixed.

>
> 14.
> + bool includegeneratedcolumn; /* publish generated column data */
>  } Subscription;
>
> The field name should be plural.

Fixed.

> ======
> src/include/replication/walreceiver.h
>
> 15.
>   * prepare time */
>   char    *origin; /* Only publish data originating from the
>   * specified origin */
> + bool include_generated_column; /* publish generated columns */
>   } logical;
>   } proto;
>  } WalRcvStreamOptions;
>
> ~
>
> This new field name should be plural.

Fixed.

> ======
> src/test/subscription/t/011_generated.pl
>
> 16.
> +my ($cmdret, $stdout, $stderr) = $node_subscriber->psql('postgres', qq(
> + CREATE SUBSCRIPTION sub2 CONNECTION '$publisher_connstr' PUBLICATION
> pub2 WITH (include_generated_column = true)
> +));
> +ok( $stderr =~
> +   qr/copy_data = true and include_generated_column = true are
> mutually exclusive options/,
> + 'cannot use both include_generated_column and copy_data as true');
>
> Isn't this mutual exclusiveness of options something that could have
> been tested in the regress test suite instead of TAP tests? e.g. AFAIK
> you won't require a connection to test this case.


> 17. Missing test?
>
> IIUC there is a missing test scenario. You can add another subscriber
> table TAB3 which *already* has generated cols (e.g. generating
> different values to the publisher) so then you can verify they are NOT
> overwritten, even when the 'include_generated_cols' is true.
>
> ======

Moved this test case to the Regression test.

Patch v6-0001 contains all the changes required. See [1] for the changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjJn6EiyAitJbbvkvVV2d45fV3Wjr2VmWFugm3RsbaU%2BRg%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Tue, 4 Jun 2024 at 10:21, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi,
>
> Here are some review comments for patch v5-0002.
>
> ======
> GENERAL
>
> G1.
> IIUC now you are unconditionally allowing all generated columns to be copied.
>
> I think this is assuming that the table sync code (in the next patch
> 0003?) is going to explicitly name all the columns it wants to copy
> (so if it wants to get generated cols then it will name the generated
> cols, and if is doesn't want generated cols then it won't name them).
>
> Maybe that is OK for the logical replication tablesync case, but I am
> not sure if it will be desirable to *always* copy generated columns in
> other user scenarios.
>
> e.g. I was wondering if there should be a new COPY command option
> introduced here -- INCLUDE_GENERATED_COLUMNS (with default false) so
> then the current HEAD behaviour is unaffected unless that option is
> enabled.
>
> ~~~
>
> G2.
> The current COPY command documentation [1] says "If no column list is
> specified, all columns of the table except generated columns will be
> copied."
>
> But this 0002 patch has changed that documented behaviour, and so the
> documentation needs to be changed as well, right?
>
> ======
> Commit Message
>
> 1.
> Currently COPY command do not copy generated column. With this commit
> added support for COPY for generated column.
>
> ~
>
> The grammar/cardinality is not good here. Try some tool (Grammarly or
> chatGPT, etc) to help correct it.
>
> ======
> src/backend/commands/copy.c
>
> ======
> src/test/regress/expected/generated.out
>
> ======
> src/test/regress/sql/generated.sql
>
> 2.
> I think these COPY test cases require some explicit comments to
> describe what they are doing, and what are the expected results.
>
> Currently, I have doubts about some of this test input/output
>
> e.g.1. Why is the 'b' column sometimes specified as 1? It needs some
> explanation. Are you expecting this generated col value to be
> ignored/overwritten or what?
>
> COPY gtest1 (a, b) FROM stdin DELIMITER ' ';
> 5 1
> 6 1
> \.
>
> e.g.2. what is the reason for this new 'missing data for column "b"'
> error? Or is it some introduced quirk because "b" now cannot be
> generated since there is no value for "a"? I don't know if the
> expected *.out here is OK or not, so some test comments may help to
> clarify it.
>
> ======
> [1] https://www.postgresql.org/docs/devel/sql-copy.html
>
Hi Peter,

I have removed the changes in the COPY command. I came up with an
approach which requires changes only in tablesync code. We can COPY
generated columns during tablesync using syntax 'COPY (SELECT
column_name from table) TO STDOUT.'

I have attached the patch for the same.
v7-0001 : Not Modified
v7-0002: Support replication of generated columns during initial sync.

Thanks and Regards,
Shlok Kyal

Attachment

Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Tue, 4 Jun 2024 at 15:01, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi,
>
> Here are some review comments for patch v5-0003.
>
> ======
> 0. Whitespace warnings when the patch was applied.
>
> [postgres@CentOS7-x64 oss_postgres_misc]$ git apply
> ../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch
> ../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:29:
> trailing whitespace.
>           has no effect; the replicated data will be ignored and the subscriber
> ../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:30:
> trailing whitespace.
>           column will be filled as normal with the subscriber-side computed or
> ../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:189:
> trailing whitespace.
> (walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 &&
> warning: 3 lines add whitespace errors.
>
Fixed

> ======
> src/backend/commands/subscriptioncmds.c
>
> 1.
> - res = walrcv_exec(wrconn, cmd.data, check_columnlist ? 3 : 2, tableRow);
> + column_count = (!include_generated_column && check_gen_col) ? 4 :
> (check_columnlist ? 3 : 2);
> + res = walrcv_exec(wrconn, cmd.data, column_count, tableRow);
>
> The 'column_count' seems out of control. Won't it be far simpler to
> assign/increment the value dynamically only as required instead of the
> tricky calculation at the end which is unnecessarily difficult to
> understand?
>
I have removed this piece of code.

> ~~~
>
> 2.
> + /*
> + * If include_generated_column option is false and all the column of
> the table in the
> + * publication are generated then we should throw an error.
> + */
> + if (!isnull && !include_generated_column && check_gen_col)
> + {
> + attlist = DatumGetArrayTypeP(attlistdatum);
> + gen_col_count = DatumGetInt32(slot_getattr(slot, 4, &isnull));
> + Assert(!isnull);
> +
> + attcount = ArrayGetNItems(ARR_NDIM(attlist), ARR_DIMS(attlist));
> +
> + if (attcount != 0 && attcount == gen_col_count)
> + ereport(ERROR,
> + errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("cannot use only generated column for table \"%s.%s\" in
> publication when generated_column option is false",
> +    nspname, relname));
> + }
> +
>
> Why do you think this new logic/error is necessary?
>
> IIUC the 'include_generated_columns' should be false to match the
> existing HEAD behavior. So this scenario where your publisher-side
> table *only* has generated columns is something that could already
> happen, right? IOW, this introduced error could be a candidate for
> another discussion/thread/patch, but is it really required for this
> current patch?
>
Yes, this scenario can also happen in HEAD. For this patch I have
removed this check.

> ======
> src/backend/replication/logical/tablesync.c
>
> 3.
>   lrel->remoteid,
> - (walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 ?
> -   "AND a.attgenerated = ''" : ""),
> + (walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 &&
> + (walrcv_server_version(LogRepWorkerWalRcvConn) <= 160000 ||
> + !MySubscription->includegeneratedcolumn) ? "AND a.attgenerated = ''" : ""),
>
> This ternary within one big appendStringInfo seems quite complicated.
> Won't it be better to split the appendStringInfo into multiple parts
> so the generated-cols calculation can be done more simply?
>
Fixed

> ======
> src/test/subscription/t/011_generated.pl
>
> 4.
> I think there should be a variety of different tablesync scenarios
> (when 'include_generated_columns' is true) tested here instead of just
> one, and all varieties with lots of comments to say what they are
> doing, expectations etc.
>
> a. publisher-side gen-col "a" replicating to subscriber-side NOT
> gen-col "a" (ok, value gets replicated)
> b. publisher-side gen-col "a" replicating to subscriber-side gen-col
> (ok, but ignored)
> c. publisher-side NOT gen-col "b" replicating to subscriber-side
> gen-col "b" (error?)
>
Added the tests

> ~~
>
> 5.
> +$result = $node_subscriber->safe_psql('postgres', "SELECT a, b FROM tab3");
> +is( $result, qq(1|2
> +2|4
> +3|6), 'generated columns initial sync with include_generated_column = true');
>
> Should this say "ORDER BY..." so it will not fail if the row order
> happens to be something unanticipated?
>
Fixed

> ======
>
> 99.
> Also, see the attached file with numerous other nitpicks:
> - plural param- and var-names
> - typos in comments
> - missing spaces
> - SQL keyword should be UPPERCASE
> - etc.
>
> Please apply any/all of these if you agree with them.
Fixed

Patch 7-0002 contains all the changes. Please refer [1]
[1]: https://www.postgresql.org/message-id/CANhcyEUz0FcyR3T76b%2BNhtmvWO7o96O_oEwsLZNZksEoPmVzXw%40mail.gmail.com

Thanks and Regards,
Shlok Kyal



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Wed, 5 Jun 2024 at 05:49, Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Mon, Jun 3, 2024 at 9:52 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >
> > >
> > > The attached Patch contains the suggested changes.
> > >
> >
> > Hi,
> >
> > Currently, COPY command does not work for generated columns and
> > therefore, COPY of generated column is not supported during tablesync
> > process. So, in patch v4-0001 we added a check to allow replication of
> > the generated column only if 'copy_data = false'.
> >
> > I am attaching patches to resolve the above issues.
> >
> > v5-0001: not changed
> > v5-0002: Support COPY of generated column
> > v5-0003: Support COPY of generated column during tablesync process
> >
>
> Hi Shlok, I have a question about patch v5-0003.
>
> According to the patch 0001 docs "If the subscriber-side column is
> also a generated column then this option has no effect; the replicated
> data will be ignored and the subscriber column will be filled as
> normal with the subscriber-side computed or default data".
>
> Doesn't this mean it will be a waste of effort/resources to COPY any
> column value where the subscriber-side column is generated since we
> know that any copied value will be ignored anyway?
>
> But I don't recall seeing any comment or logic for this kind of copy
> optimisation in the patch 0003. Is this already accounted for
> somewhere and I missed it, or is my understanding wrong?
Your understanding is correct.
With v7-0002, if a subscriber-side column is generated, then we do not
include that column in the column list during COPY. This will address
the above issue.

Patch 7-0002 contains all the changes. Please refer [1]
[1]: https://www.postgresql.org/message-id/CANhcyEUz0FcyR3T76b%2BNhtmvWO7o96O_oEwsLZNZksEoPmVzXw%40mail.gmail.com

Thanks and Regards,
Shlok Kyal



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Thu, 6 Jun 2024 at 08:29, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Shlok and Shubham,
>
> Thanks for updating the patch!
>
> I briefly checked the v5-0002. IIUC, your patch allows to copy generated
> columns unconditionally. I think the behavior affects many people so that it is
> hard to get agreement.
>
> Can we add a new option like `GENERATED_COLUMNS [boolean]`? If the default is set
> to off, we can keep the current specification.
>
> Thought?
Hi Kuroda-san,

I agree that we should not allow to copy generated columns unconditionally.
With patch v7-0002, I have used a different approach which does not
require any code changes in COPY.

Please refer [1] for patch v7-0002.
[1]: https://www.postgresql.org/message-id/CANhcyEUz0FcyR3T76b%2BNhtmvWO7o96O_oEwsLZNZksEoPmVzXw%40mail.gmail.com

Thanks and Regards,
Shlok Kyal



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Fri, 14 Jun 2024 at 15:52, Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> The attached Patch contains the suggested changes.
>

Hi Shubham, thanks for providing a patch.
I have some comments for v6-0001.

1. create_subscription.sgml
There is repetition of the same line.

+         <para>
+          Specifies whether the generated columns present in the tables
+          associated with the subscription should be replicated. If the
+          subscriber-side column is also a generated column then this option
+          has no effect; the replicated data will be ignored and the subscriber
+          column will be filled as normal with the subscriber-side computed or
+          default data.
+          <literal>false</literal>.
+         </para>
+
+         <para>
+          This parameter can only be set true if
<literal>copy_data</literal> is
+          set to <literal>false</literal>. If the subscriber-side
column is also a
+          generated column then this option has no effect; the
replicated data will
+          be ignored and the subscriber column will be filled as
normal with the
+          subscriber-side computed or default data.
+         </para>

==============================
2. subscriptioncmds.c

2a. The macro name should be in uppercase. We can use a short name
like 'SUBOPT_INCLUDE_GEN_COL'. Thought?
+#define SUBOPT_include_generated_columns 0x00010000

2b.Update macro name accordingly
+ if (IsSet(supported_opts, SUBOPT_include_generated_columns))
+ opts->include_generated_columns = false;

2c. Update macro name accordingly
+ else if (IsSet(supported_opts, SUBOPT_include_generated_columns) &&
+ strcmp(defel->defname, "include_generated_columns") == 0)
+ {
+ if (IsSet(opts->specified_opts, SUBOPT_include_generated_columns))
+ errorConflictingDefElem(defel, pstate);
+
+ opts->specified_opts |= SUBOPT_include_generated_columns;
+ opts->include_generated_columns = defGetBoolean(defel);
+ }

2d. Update macro name accordingly
+   SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER | SUBOPT_ORIGIN |
+   SUBOPT_include_generated_columns);


==============================

3. decoding_into_rel.out

3a. In comment, I think it should be "When 'include-generated-columns'
= '1' the generated column 'b' values will be replicated"
+-- When 'include-generated-columns' = '1' the generated column 'b'
values will not be replicated
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '1');
+                            data
+-------------------------------------------------------------
+ BEGIN
+ table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
+ table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
+ table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
+ COMMIT

3b. In comment, I think it should be "When 'include-generated-columns'
= '1' the generated column 'b' values will not be replicated"
+-- When 'include-generated-columns' = '0' the generated column 'b'
values will be replicated
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '0');
+                      data
+------------------------------------------------
+ BEGIN
+ table public.gencoltable: INSERT: a[integer]:4
+ table public.gencoltable: INSERT: a[integer]:5
+ table public.gencoltable: INSERT: a[integer]:6
+ COMMIT
+(5 rows)

=========================

4. Here names for both the tests are the same. I think we should use
different names.

+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
+is( $result, qq(4|8
+5|10), 'generated columns replicated to non-generated column on subscriber');
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)");
+
+$node_publisher->wait_for_catchup('sub3');
+
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
+is( $result, qq(4|24
+5|25), 'generated columns replicated to non-generated column on subscriber');

Thanks and Regards,
Shlok Kyal



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Fri, 14 Jun 2024 at 15:52, Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> > Thanks for the updated patch, few comments:
> > 1) The option name seems wrong here:
> > In one place include_generated_column is specified and other place
> > include_generated_columns is specified:
> >
> > +               else if (IsSet(supported_opts,
> > SUBOPT_INCLUDE_GENERATED_COLUMN) &&
> > +                                strcmp(defel->defname,
> > "include_generated_column") == 0)
> > +               {
> > +                       if (IsSet(opts->specified_opts,
> > SUBOPT_INCLUDE_GENERATED_COLUMN))
> > +                               errorConflictingDefElem(defel, pstate);
> > +
> > +                       opts->specified_opts |= SUBOPT_INCLUDE_GENERATED_COLUMN;
> > +                       opts->include_generated_column = defGetBoolean(defel);
> > +               }
>
> Fixed.
>
> > diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
> > index d453e224d9..e8ff752fd9 100644
> > --- a/src/bin/psql/tab-complete.c
> > +++ b/src/bin/psql/tab-complete.c
> > @@ -3365,7 +3365,7 @@ psql_completion(const char *text, int start, int end)
> >                 COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
> >                                           "disable_on_error",
> > "enabled", "failover", "origin",
> >                                           "password_required",
> > "run_as_owner", "slot_name",
> > -                                         "streaming",
> > "synchronous_commit", "two_phase");
> > +                                         "streaming",
> > "synchronous_commit", "two_phase","include_generated_columns");
> >
> > 2) This small data table need not have a primary key column as it will
> > create an index and insertion will happen in the index too.
> > +-- check include-generated-columns option with generated column
> > +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
> > AS (a * 2) STORED);
> > +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> > NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> > 'include-generated-columns', '1');
>
> Fixed.
>
> > 3) Please add a test case for this:
> > +          set to <literal>false</literal>. If the subscriber-side
> > column is also a
> > +          generated column then this option has no effect; the
> > replicated data will
> > +          be ignored and the subscriber column will be filled as
> > normal with the
> > +          subscriber-side computed or default data.
>
> Added the required test case.
>
> > 4) You can use a new style of ereport to remove the brackets around errcode
> > 4.a)
> > +                       else if (!parse_bool(strVal(elem->arg),
> > &data->include_generated_columns))
> > +                               ereport(ERROR,
> > +
> > (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> > +                                                errmsg("could not
> > parse value \"%s\" for parameter \"%s\"",
> > +
> > strVal(elem->arg), elem->defname)));
> >
> > 4.b) similarly here too:
> > +               ereport(ERROR,
> > +                               (errcode(ERRCODE_SYNTAX_ERROR),
> > +               /*- translator: both %s are strings of the form
> > "option = value" */
> > +                                       errmsg("%s and %s are mutually
> > exclusive options",
> > +                                               "copy_data = true",
> > "include_generated_column = true")));
> >
> > 4.c) similarly here too:
> > +                       if (include_generated_columns_option_given)
> > +                               ereport(ERROR,
> > +                                               (errcode(ERRCODE_SYNTAX_ERROR),
> > +                                                errmsg("conflicting
> > or redundant options")));
>
> Fixed.
>
> > 5) These variable names can be changed to keep it smaller, something
> > like gencol or generatedcol or gencolumn, etc
> > +++ b/src/include/catalog/pg_subscription.h
> > @@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
> > BKI_SHARED_RELATION BKI_ROW
> >   * slots) in the upstream database are enabled
> >   * to be synchronized to the standbys. */
> >
> > + bool subincludegeneratedcolumn; /* True if generated columns must be
> > published */
> > +
> >  #ifdef CATALOG_VARLEN /* variable-length fields start here */
> >   /* Connection string to the publisher */
> >   text subconninfo BKI_FORCE_NOT_NULL;
> > @@ -157,6 +159,7 @@ typedef struct Subscription
> >   List    *publications; /* List of publication names to subscribe to */
> >   char    *origin; /* Only publish data originating from the
> >   * specified origin */
> > + bool includegeneratedcolumn; /* publish generated column data */
> >  } Subscription;
>
> Fixed.
>
> The attached Patch contains the suggested changes.

Few comments:
1) Here tab1 and tab2 are exactly the same tables, just check if the
table tab1 itself can be used for your tests.
@@ -24,20 +24,50 @@ $node_publisher->safe_psql('postgres',
        "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS
AS (a * 2) STORED)"
 );
+$node_publisher->safe_psql('postgres',
+       "CREATE TABLE tab2 (a int PRIMARY KEY, b int GENERATED ALWAYS
AS (a * 2) STORED)"
+);

2) We can document  that the include_generate_columns option cannot be altered.

3) You can mention that include-generated-columns is true by default
and generated column data will be selected
+-- When 'include-generated-columns' is not set
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+                            data
+-------------------------------------------------------------
+ BEGIN
+ table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
+ table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
+ table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
+ COMMIT
+(5 rows)

4)  The comment seems to be wrong here, the comment says b will not be
replicated but b is being selected:
-- When 'include-generated-columns' = '1' the generated column 'b'
values will not be replicated
INSERT INTO gencoltable (a) VALUES (1), (2), (3);
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '1');
                            data
-------------------------------------------------------------
 BEGIN
 table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
 table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
 table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
 COMMIT
(5 rows)

5)  Similarly here too the comment seems to be wrong, the comment says
b will not replicated but b is not being selected:
INSERT INTO gencoltable (a) VALUES (4), (5), (6);
-- When 'include-generated-columns' = '0' the generated column 'b'
values will be replicated
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '0');
                      data
------------------------------------------------
 BEGIN
 table public.gencoltable: INSERT: a[integer]:4
 table public.gencoltable: INSERT: a[integer]:5
 table public.gencoltable: INSERT: a[integer]:6
 COMMIT
(5 rows)

6) SUBOPT_include_generated_columns change it to SUBOPT_GENERATED to
keep the name consistent:
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -72,6 +72,7 @@
 #define SUBOPT_FAILOVER                                0x00002000
 #define SUBOPT_LSN                                     0x00004000
 #define SUBOPT_ORIGIN                          0x00008000
+#define SUBOPT_include_generated_columns               0x00010000

7) The comment style seems to be inconsistent, both of them can start
in lower case
+-- check include-generated-columns option with generated column
+CREATE TABLE gencoltable (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);
+-- When 'include-generated-columns' is not set
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+                            data
+-------------------------------------------------------------
+ BEGIN
+ table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
+ table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
+ table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
+ COMMIT
+(5 rows)
+
+-- When 'include-generated-columns' = '1' the generated column 'b'
values will not be replicated

8) This could be changed to remove the insert statements by using
pg_logical_slot_peek_changes:
-- When 'include-generated-columns' is not set
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
-- When 'include-generated-columns' = '1' the generated column 'b'
values will not be replicated
INSERT INTO gencoltable (a) VALUES (1), (2), (3);
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '1');
INSERT INTO gencoltable (a) VALUES (4), (5), (6);
-- When 'include-generated-columns' = '0' the generated column 'b'
values will be replicated
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '0');
to:
-- When 'include-generated-columns' is not set
SELECT data FROM pg_logical_slot_peek_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
-- When 'include-generated-columns' = '1' the generated column 'b'
values will not be replicated
SELECT data FROM pg_logical_slot_peek_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '1');
-- When 'include-generated-columns' = '0' the generated column 'b'
values will be replicated
SELECT data FROM pg_logical_slot_peek_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '0');

9) In commit message  the  option used is wrong
include_generated_columns should actually be
include-generated-columns:
Usage from test_decoding plugin:
SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1',
                                      'include_generated_columns','1');

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, here are my review comments for patch v7-0001.

======
1. GENERAL - \dRs+

Shouldn't the new SUBSCRIPTION parameter be exposed via "describe"
(e.g. \dRs+ mysub) the same as the other boolean parameters?

======
Commit message

2.
When 'include_generated_columns' is false then the PUBLICATION
col-list will ignore any generated cols even when they are present in
a PUBLICATION col-list

~

Maybe you don't need to mention "PUBLICATION col-list" twice.

SUGGESTION
When 'include_generated_columns' is false, generated columns are not
replicated, even when present in a PUBLICATION col-list.

~~~

2.
CREATE SUBSCRIPTION test1 connection 'dbname=postgres host=localhost port=9999
'publication pub1;

~

2a.
(I've questioned this one in previous reviews)

What exactly is the purpose of this statement in the commit message?
Was this supposed to demonstrate the usage of the
'include_generated_columns' parameter?

~

2b.
/publication/ PUBLICATION/


~~~

3.
If the subscriber-side column is also a generated column then
thisoption has no effect; the replicated data will be ignored and the
subscriber column will be filled as normal with the subscriber-side
computed or default data.

~

Missing space: /thisoption/this option/

======
.../expected/decoding_into_rel.out

4.
+-- When 'include-generated-columns' is not set
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+                            data
+-------------------------------------------------------------
+ BEGIN
+ table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
+ table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
+ table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
+ COMMIT
+(5 rows)

Why is the default value here equivalent to
'include-generated-columns' = '1' here instead of '0'? The default for
the CREATE SUBSCRIPTION parameter 'include_generated_columns' is
false, and IMO it seems confusing for these 2 defaults to be
different. Here I think it should default to '0' *regardless* of what
the previous functionality might have done -- e.g. this is a "test
decoder" so the parameter should behave sensibly.

======
.../test_decoding/sql/decoding_into_rel.sql

NITPICK - wrong comments.

======
doc/src/sgml/protocol.sgml

5.
+    <varlistentry>
+     <term>include_generated_columns</term>
+      <listitem>
+       <para>
+        Boolean option to enable generated columns. This option controls
+        whether generated columns should be included in the string
+        representation of tuples during logical decoding in PostgreSQL.
+        The default is false.
+       </para>
+      </listitem>
+    </varlistentry>
+

Does the protocol version need to be bumped to support this new option
and should that be mentioned on this page similar to how all other
version values are mentioned?

======
doc/src/sgml/ref/create_subscription.sgml

NITPICK - some missing words/sentence.
NITPICK - some missing <literal> tags.
NITPICK - remove duplicated sentence.
NITPICK - add another <para>.

======
src/backend/commands/subscriptioncmds.c

6.
 #define SUBOPT_ORIGIN 0x00008000
+#define SUBOPT_include_generated_columns 0x00010000

Please use UPPERCASE for consistency with other macros.

======
.../libpqwalreceiver/libpqwalreceiver.c

7.
+ if (options->proto.logical.include_generated_columns &&
+ PQserverVersion(conn->streamConn) >= 170000)
+ appendStringInfoString(&cmd, ", include_generated_columns 'on'");
+

IMO it makes more sense to say 'true' here instead of 'on'. It seems
like this was just cut/paste from the above code (where 'on' was
sensible).

======
src/include/catalog/pg_subscription.h

8.
@@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
BKI_SHARED_RELATION BKI_ROW
  * slots) in the upstream database are enabled
  * to be synchronized to the standbys. */

+ bool subincludegencol; /* True if generated columns must be published */
+

Not fixed as claimed. This field name ought to be plural.

/subincludegencol/subincludegencols/

~~~

9.
  char    *origin; /* Only publish data originating from the
  * specified origin */
+ bool includegencol; /* publish generated column data */
 } Subscription;

Not fixed as claimed. This field name ought to be plural.

/includegencol/includegencols/

======
src/test/subscription/t/031_column_list.pl

10.
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE tab2 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
* 2) STORED)"
+);
+
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE tab3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
+ 10) STORED)"
+);
+
 $node_subscriber->safe_psql('postgres',
  "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
* 22) STORED, c int)"
 );

+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE tab2 (a int PRIMARY KEY, b int)"
+);
+
+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE tab3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
+ 20) STORED)"
+);

IMO the test needs lots more comments to describe what it is doing:

For example, the setup deliberately has made:
* publisher-side tab2 has generated col 'b' but subscriber-side tab2
has NON-gnerated col 'b'.
* publisher-side tab3 has generated col 'b' but subscriber-side tab2
has DIFFERENT COMPUTATION generated col 'b'.

So it will be better to have comments to explain all this instead of
having to figure it out.

~~~

11.
 # data for initial sync

 $node_publisher->safe_psql('postgres',
  "INSERT INTO tab1 (a) VALUES (1), (2), (3)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab2 (a) VALUES (1), (2), (3)");

 $node_publisher->safe_psql('postgres',
- "CREATE PUBLICATION pub1 FOR ALL TABLES");
+ "CREATE PUBLICATION pub1 FOR TABLE tab1");
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION pub2 FOR TABLE tab2");
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION pub3 FOR TABLE tab3");
+

# Wait for initial sync of all subscriptions
$node_subscriber->wait_for_subscription_sync;

my $result = $node_subscriber->safe_psql('postgres', "SELECT a, b FROM tab1");
is( $result, qq(1|22
2|44
3|66), 'generated columns initial sync');

~

IMO (and for completeness) it would be better to INSERT data for all
the tables and alsot to validate that tables tab2 and tab3 has zero
rows replicated. Yes, I know there is 'copy_data=false', but it is
just easier to see all the tables instead of guessing why some are
omitted, and anyway this test case will be needed after the next patch
implements the COPY support for gen-cols.

~~~

12.
+$node_publisher->safe_psql('postgres', "INSERT INTO tab2 VALUES (4), (5)");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
+is( $result, qq(4|8
+5|10), 'generated columns replicated to non-generated column on subscriber');
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)");
+
+$node_publisher->wait_for_catchup('sub3');
+
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
+is( $result, qq(4|24
+5|25), 'generated columns replicated to non-generated column on subscriber');
+

Here also I think there should be explicit comments about what these
cases are testing, what results you are expecting, and why. The
comments will look something like the message parameter of those
safe_psql(...)

e.g.
# confirm generated columns ARE replicated when the subscriber-side
column is not generated

e.g.
# confirm generated columns are NOT replicated when the
subscriber-side column is also generated

======

99.
Please also see my nitpicks attachment patch for various other
cosmetic and docs problems, and apply theseif you agree:
- documentation wording/rendering
- wrong comments
- spacing
- etc.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, here are my review comments for patch v7-0002

======
Commit Message

NITPICKS
- rearrange paragraphs
- typo "donot"
- don't start a sentence with "And"
- etc.

Please see the attachment for my suggested commit message text updates
and take from it whatever you agree with.

======
doc/src/sgml/ref/create_subscription.sgml

1.
+          If the subscriber-side column is also a generated column
then this option
+          has no effect; the replicated data will be ignored and the subscriber
+          column will be filled as normal with the subscriber-side computed or
+          default data. And during table synchronization, the data
corresponding to
+          the generated column on subscriber-side will not be sent from the
+          publisher to the subscriber.

This text already mentions subscriber-side generated cols. IMO you
don't need to say anything at all about table synchronization --
that's just an internal code optimization, which is not something the
user needs to know about. IOW, the entire last sentence ("And
during...") should be removed.

======
src/backend/replication/logical/relation.c

2. logicalrep_rel_open

- if (attr->attisdropped)
+ if (attr->attisdropped ||
+ (!MySubscription->includegencol && attr->attgenerated))
  {
  entry->attrmap->attnums[i] = -1;
  continue;

~

Maybe I'm mistaken, but isn't this code for skipping checking for
"missing" subscriber-side (aka local) columns? Can't it just
unconditionally skip every attr->attgenerated -- i.e. why does it
matter if the MySubscription->includegencol was set or not?

======
src/backend/replication/logical/tablesync.c

3. make_copy_attnamelist

- for (i = 0; i < rel->remoterel.natts; i++)
+ desc = RelationGetDescr(rel->localrel);
+
+ for (i = 0; i < desc->natts; i++)
  {
- attnamelist = lappend(attnamelist,
-   makeString(rel->remoterel.attnames[i]));
+ int attnum;
+ Form_pg_attribute attr = TupleDescAttr(desc, i);
+
+ if (!attr->attgenerated)
+ continue;
+
+ attnum = logicalrep_rel_att_by_name(&rel->remoterel,
+ NameStr(attr->attname));
+
+ /*
+ * Check if subscription table have a generated column with same
+ * column name as a non-generated column in the corresponding
+ * publication table.
+ */
+ if (attnum >=0 && !attgenlist[attnum])
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical replication target relation \"%s.%s\" is missing
replicated column: \"%s\"",
+ rel->remoterel.nspname, rel->remoterel.relname, NameStr(attr->attname))));
+
+ if (attnum >= 0)
+ gencollist = lappend_int(gencollist, attnum);
  }

~

NITPICK - Use C99-style for loop variables
NITPICK - Typo in comment
NITPICK - spaces

~

3a.
I think above code should be refactored so there is only one check for
"if (attnum >= 0)" -- e.g. other condition should be nested.

~

3b.
That ERROR message says "missing replicated column", but that doesn't
seem much like what the code-comment was saying this code is about.

~~~

4.
+ for (i = 0; i < rel->remoterel.natts; i++)
+ {
+
+ if (gencollist != NIL && j < gencollist->length &&
+ list_nth_int(gencollist, j) == i)
+ j++;
+ else
+ attnamelist = lappend(attnamelist,
+   makeString(rel->remoterel.attnames[i]));
+ }

NITPICK - Use C99-style for loop variables
NITPICK - Unnecessary blank lines

~

IIUC the subscriber-side table and the publisher-side table do NOT
have to have all the columns in identical order for the logical
replication to work correcly. AFAIK it works fine so long as the
column names match for the replicated columns. Therefore, I am
suspicious that this new patch code seems to be imposing some new
ordering assumptions/restrictions (e.g. list_nth_int stuff) which are
not current requirements.

~~~

copy_table:

NITPICK - comment typo
NITPICK - comment wording

~

5.
+ int i = 0;
+ ListCell *l;
+
  appendStringInfoString(&cmd, "COPY (SELECT ");
- for (int i = 0; i < lrel.natts; i++)
+ foreach(l, attnamelist)
  {
- appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i]));
- if (i < lrel.natts - 1)
+ appendStringInfoString(&cmd, quote_identifier(strVal(lfirst(l))));
+ if (i < attnamelist->length - 1)
  appendStringInfoString(&cmd, ", ");
+ i++;
  }
IIUC for new code like this, it is preferred to use the foreach*
macros instead of ListCell.

======
src/test/regress/sql/subscription.sql

6.
--- fail - copy_data and include_generated_columns are mutually
exclusive options
-CREATE SUBSCRIPTION sub2 CONNECTION 'dbname=regress_doesnotexist'
PUBLICATION testpub WITH (include_generated_columns = true);
-ERROR:  copy_data = true and include_generated_columns = true are
mutually exclusive options

It is OK to delete this test now but IMO still needs to be some
"include_generated_columns must be boolean" test cases (e.g. same as
there was two_phase). Actually, this should probably be done by the
0001 patch.

======
src/test/subscription/t/011_generated.pl

7.
All the PRIMARY KEY stuff may be overkill. Are primary keys really
needed for these tests?

~~~

8.
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE tab4 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
* 2) STORED, c int GENERATED ALWAYS AS (a * 2) STORED)"
+);
+
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE tab5 (a int PRIMARY KEY, b int)"
+);
+

Maybe add comments on what is special about all these tables, so don't
have to read the tests later to deduce their purpose.

tab4: publisher-side generated col 'b' and 'c'  ==> subscriber-side
non-generated col 'b', and generated-col 'c'
tab5: publisher-side non-generated col 'b' --> subscriber-side
non-generated col 'b'

~~~

9.
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION sub4 CONNECTION '$publisher_connstr'
PUBLICATION pub4 WITH (include_generated_columns = true)"
+ );
+

All the publications are created together, and all the subscriptions
are created together except for 'sub5'. Consider including a comment
to say why you deliberately created the 'sub5' subscription separate
from all others.

======

99.
Please also see my code nitpicks attachment patch for various other
cosmetic problems, and apply them if you agree.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
> Hi Shubham, thanks for providing a patch.
> I have some comments for v6-0001.
>
> 1. create_subscription.sgml
> There is repetition of the same line.
>
> +         <para>
> +          Specifies whether the generated columns present in the tables
> +          associated with the subscription should be replicated. If the
> +          subscriber-side column is also a generated column then this option
> +          has no effect; the replicated data will be ignored and the subscriber
> +          column will be filled as normal with the subscriber-side computed or
> +          default data.
> +          <literal>false</literal>.
> +         </para>
> +
> +         <para>
> +          This parameter can only be set true if
> <literal>copy_data</literal> is
> +          set to <literal>false</literal>. If the subscriber-side
> column is also a
> +          generated column then this option has no effect; the
> replicated data will
> +          be ignored and the subscriber column will be filled as
> normal with the
> +          subscriber-side computed or default data.
> +         </para>
>
> ==============================
> 2. subscriptioncmds.c
>
> 2a. The macro name should be in uppercase. We can use a short name
> like 'SUBOPT_INCLUDE_GEN_COL'. Thought?
> +#define SUBOPT_include_generated_columns 0x00010000
>
> 2b.Update macro name accordingly
> + if (IsSet(supported_opts, SUBOPT_include_generated_columns))
> + opts->include_generated_columns = false;
>
> 2c. Update macro name accordingly
> + else if (IsSet(supported_opts, SUBOPT_include_generated_columns) &&
> + strcmp(defel->defname, "include_generated_columns") == 0)
> + {
> + if (IsSet(opts->specified_opts, SUBOPT_include_generated_columns))
> + errorConflictingDefElem(defel, pstate);
> +
> + opts->specified_opts |= SUBOPT_include_generated_columns;
> + opts->include_generated_columns = defGetBoolean(defel);
> + }
>
> 2d. Update macro name accordingly
> +   SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER | SUBOPT_ORIGIN |
> +   SUBOPT_include_generated_columns);
>
>
> ==============================
>
> 3. decoding_into_rel.out
>
> 3a. In comment, I think it should be "When 'include-generated-columns'
> = '1' the generated column 'b' values will be replicated"
> +-- When 'include-generated-columns' = '1' the generated column 'b'
> values will not be replicated
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '1');
> +                            data
> +-------------------------------------------------------------
> + BEGIN
> + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
> + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
> + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
> + COMMIT
>
> 3b. In comment, I think it should be "When 'include-generated-columns'
> = '1' the generated column 'b' values will not be replicated"
> +-- When 'include-generated-columns' = '0' the generated column 'b'
> values will be replicated
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '0');
> +                      data
> +------------------------------------------------
> + BEGIN
> + table public.gencoltable: INSERT: a[integer]:4
> + table public.gencoltable: INSERT: a[integer]:5
> + table public.gencoltable: INSERT: a[integer]:6
> + COMMIT
> +(5 rows)
>
> =========================
>
> 4. Here names for both the tests are the same. I think we should use
> different names.
>
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
> +is( $result, qq(4|8
> +5|10), 'generated columns replicated to non-generated column on subscriber');
> +
> +$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)");
> +
> +$node_publisher->wait_for_catchup('sub3');
> +
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
> +is( $result, qq(4|24
> +5|25), 'generated columns replicated to non-generated column on subscriber');

All the comments are handled.

The attached Patch contains all the suggested changes.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
> Few comments:
> 1) Here tab1 and tab2 are exactly the same tables, just check if the
> table tab1 itself can be used for your tests.
> @@ -24,20 +24,50 @@ $node_publisher->safe_psql('postgres',
>         "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS
> AS (a * 2) STORED)"
>  );
> +$node_publisher->safe_psql('postgres',
> +       "CREATE TABLE tab2 (a int PRIMARY KEY, b int GENERATED ALWAYS
> AS (a * 2) STORED)"
> +);

On the subscription side the tables have different descriptions, so we
need to have different tables on the publisher side.

> 2) We can document  that the include_generate_columns option cannot be altered.
>
> 3) You can mention that include-generated-columns is true by default
> and generated column data will be selected
> +-- When 'include-generated-columns' is not set
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> +                            data
> +-------------------------------------------------------------
> + BEGIN
> + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
> + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
> + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
> + COMMIT
> +(5 rows)
>
> 4)  The comment seems to be wrong here, the comment says b will not be
> replicated but b is being selected:
> -- When 'include-generated-columns' = '1' the generated column 'b'
> values will not be replicated
> INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '1');
>                             data
> -------------------------------------------------------------
>  BEGIN
>  table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
>  table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
>  table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
>  COMMIT
> (5 rows)
>
> 5)  Similarly here too the comment seems to be wrong, the comment says
> b will not replicated but b is not being selected:
> INSERT INTO gencoltable (a) VALUES (4), (5), (6);
> -- When 'include-generated-columns' = '0' the generated column 'b'
> values will be replicated
> SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '0');
>                       data
> ------------------------------------------------
>  BEGIN
>  table public.gencoltable: INSERT: a[integer]:4
>  table public.gencoltable: INSERT: a[integer]:5
>  table public.gencoltable: INSERT: a[integer]:6
>  COMMIT
> (5 rows)
>
> 6) SUBOPT_include_generated_columns change it to SUBOPT_GENERATED to
> keep the name consistent:
> --- a/src/backend/commands/subscriptioncmds.c
> +++ b/src/backend/commands/subscriptioncmds.c
> @@ -72,6 +72,7 @@
>  #define SUBOPT_FAILOVER                                0x00002000
>  #define SUBOPT_LSN                                     0x00004000
>  #define SUBOPT_ORIGIN                          0x00008000
> +#define SUBOPT_include_generated_columns               0x00010000
>
> 7) The comment style seems to be inconsistent, both of them can start
> in lower case
> +-- check include-generated-columns option with generated column
> +CREATE TABLE gencoltable (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +-- When 'include-generated-columns' is not set
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> +                            data
> +-------------------------------------------------------------
> + BEGIN
> + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
> + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
> + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
> + COMMIT
> +(5 rows)
> +
> +-- When 'include-generated-columns' = '1' the generated column 'b'
> values will not be replicated
>
> 8) This could be changed to remove the insert statements by using
> pg_logical_slot_peek_changes:
> -- When 'include-generated-columns' is not set
> SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> -- When 'include-generated-columns' = '1' the generated column 'b'
> values will not be replicated
> INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '1');
> INSERT INTO gencoltable (a) VALUES (4), (5), (6);
> -- When 'include-generated-columns' = '0' the generated column 'b'
> values will be replicated
> SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '0');
> to:
> -- When 'include-generated-columns' is not set
> SELECT data FROM pg_logical_slot_peek_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> -- When 'include-generated-columns' = '1' the generated column 'b'
> values will not be replicated
> SELECT data FROM pg_logical_slot_peek_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '1');
> -- When 'include-generated-columns' = '0' the generated column 'b'
> values will be replicated
> SELECT data FROM pg_logical_slot_peek_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '0');
>
> 9) In commit message  the  option used is wrong
> include_generated_columns should actually be
> include-generated-columns:
> Usage from test_decoding plugin:
> SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1',
>                                       'include_generated_columns','1');

All the comments are handled.

Patch v8-0001 contains all the changes required. See [1] for the changes added.

[1] https://www.postgresql.org/message-id/CAHv8Rj%2BAi0CgtXiAga82bWpWB8fVcOWycNyJ_jqXm788v3R8rQ%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Mon, Jun 17, 2024 at 1:57 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, here are my review comments for patch v7-0001.
>
> ======
> 1. GENERAL - \dRs+
>
> Shouldn't the new SUBSCRIPTION parameter be exposed via "describe"
> (e.g. \dRs+ mysub) the same as the other boolean parameters?
>
> ======
> Commit message
>
> 2.
> When 'include_generated_columns' is false then the PUBLICATION
> col-list will ignore any generated cols even when they are present in
> a PUBLICATION col-list
>
> ~
>
> Maybe you don't need to mention "PUBLICATION col-list" twice.
>
> SUGGESTION
> When 'include_generated_columns' is false, generated columns are not
> replicated, even when present in a PUBLICATION col-list.
>
> ~~~
>
> 2.
> CREATE SUBSCRIPTION test1 connection 'dbname=postgres host=localhost port=9999
> 'publication pub1;
>
> ~
>
> 2a.
> (I've questioned this one in previous reviews)
>
> What exactly is the purpose of this statement in the commit message?
> Was this supposed to demonstrate the usage of the
> 'include_generated_columns' parameter?
>
> ~
>
> 2b.
> /publication/ PUBLICATION/
>
>
> ~~~
>
> 3.
> If the subscriber-side column is also a generated column then
> thisoption has no effect; the replicated data will be ignored and the
> subscriber column will be filled as normal with the subscriber-side
> computed or default data.
>
> ~
>
> Missing space: /thisoption/this option/
>
> ======
> .../expected/decoding_into_rel.out
>
> 4.
> +-- When 'include-generated-columns' is not set
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> +                            data
> +-------------------------------------------------------------
> + BEGIN
> + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
> + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
> + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
> + COMMIT
> +(5 rows)
>
> Why is the default value here equivalent to
> 'include-generated-columns' = '1' here instead of '0'? The default for
> the CREATE SUBSCRIPTION parameter 'include_generated_columns' is
> false, and IMO it seems confusing for these 2 defaults to be
> different. Here I think it should default to '0' *regardless* of what
> the previous functionality might have done -- e.g. this is a "test
> decoder" so the parameter should behave sensibly.
>
> ======
> .../test_decoding/sql/decoding_into_rel.sql
>
> NITPICK - wrong comments.
>
> ======
> doc/src/sgml/protocol.sgml
>
> 5.
> +    <varlistentry>
> +     <term>include_generated_columns</term>
> +      <listitem>
> +       <para>
> +        Boolean option to enable generated columns. This option controls
> +        whether generated columns should be included in the string
> +        representation of tuples during logical decoding in PostgreSQL.
> +        The default is false.
> +       </para>
> +      </listitem>
> +    </varlistentry>
> +
>
> Does the protocol version need to be bumped to support this new option
> and should that be mentioned on this page similar to how all other
> version values are mentioned?

I already did the Backward Compatibility test earlier and decided that
protocol bump is not needed.

> doc/src/sgml/ref/create_subscription.sgml
>
> NITPICK - some missing words/sentence.
> NITPICK - some missing <literal> tags.
> NITPICK - remove duplicated sentence.
> NITPICK - add another <para>.
>
> ======
> src/backend/commands/subscriptioncmds.c
>
> 6.
>  #define SUBOPT_ORIGIN 0x00008000
> +#define SUBOPT_include_generated_columns 0x00010000
>
> Please use UPPERCASE for consistency with other macros.
>
> ======
> .../libpqwalreceiver/libpqwalreceiver.c
>
> 7.
> + if (options->proto.logical.include_generated_columns &&
> + PQserverVersion(conn->streamConn) >= 170000)
> + appendStringInfoString(&cmd, ", include_generated_columns 'on'");
> +
>
> IMO it makes more sense to say 'true' here instead of 'on'. It seems
> like this was just cut/paste from the above code (where 'on' was
> sensible).
>
> ======
> src/include/catalog/pg_subscription.h
>
> 8.
> @@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
> BKI_SHARED_RELATION BKI_ROW
>   * slots) in the upstream database are enabled
>   * to be synchronized to the standbys. */
>
> + bool subincludegencol; /* True if generated columns must be published */
> +
>
> Not fixed as claimed. This field name ought to be plural.
>
> /subincludegencol/subincludegencols/
>
> ~~~
>
> 9.
>   char    *origin; /* Only publish data originating from the
>   * specified origin */
> + bool includegencol; /* publish generated column data */
>  } Subscription;
>
> Not fixed as claimed. This field name ought to be plural.
>
> /includegencol/includegencols/
>
> ======
> src/test/subscription/t/031_column_list.pl
>
> 10.
> +$node_publisher->safe_psql('postgres',
> + "CREATE TABLE tab2 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> * 2) STORED)"
> +);
> +
> +$node_publisher->safe_psql('postgres',
> + "CREATE TABLE tab3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> + 10) STORED)"
> +);
> +
>  $node_subscriber->safe_psql('postgres',
>   "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> * 22) STORED, c int)"
>  );
>
> +$node_subscriber->safe_psql('postgres',
> + "CREATE TABLE tab2 (a int PRIMARY KEY, b int)"
> +);
> +
> +$node_subscriber->safe_psql('postgres',
> + "CREATE TABLE tab3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> + 20) STORED)"
> +);
>
> IMO the test needs lots more comments to describe what it is doing:
>
> For example, the setup deliberately has made:
> * publisher-side tab2 has generated col 'b' but subscriber-side tab2
> has NON-gnerated col 'b'.
> * publisher-side tab3 has generated col 'b' but subscriber-side tab2
> has DIFFERENT COMPUTATION generated col 'b'.
>
> So it will be better to have comments to explain all this instead of
> having to figure it out.
>
> ~~~
>
> 11.
>  # data for initial sync
>
>  $node_publisher->safe_psql('postgres',
>   "INSERT INTO tab1 (a) VALUES (1), (2), (3)");
> +$node_publisher->safe_psql('postgres',
> + "INSERT INTO tab2 (a) VALUES (1), (2), (3)");
>
>  $node_publisher->safe_psql('postgres',
> - "CREATE PUBLICATION pub1 FOR ALL TABLES");
> + "CREATE PUBLICATION pub1 FOR TABLE tab1");
> +$node_publisher->safe_psql('postgres',
> + "CREATE PUBLICATION pub2 FOR TABLE tab2");
> +$node_publisher->safe_psql('postgres',
> + "CREATE PUBLICATION pub3 FOR TABLE tab3");
> +
>
> # Wait for initial sync of all subscriptions
> $node_subscriber->wait_for_subscription_sync;
>
> my $result = $node_subscriber->safe_psql('postgres', "SELECT a, b FROM tab1");
> is( $result, qq(1|22
> 2|44
> 3|66), 'generated columns initial sync');
>
> ~
>
> IMO (and for completeness) it would be better to INSERT data for all
> the tables and alsot to validate that tables tab2 and tab3 has zero
> rows replicated. Yes, I know there is 'copy_data=false', but it is
> just easier to see all the tables instead of guessing why some are
> omitted, and anyway this test case will be needed after the next patch
> implements the COPY support for gen-cols.
>
> ~~~
>
> 12.
> +$node_publisher->safe_psql('postgres', "INSERT INTO tab2 VALUES (4), (5)");
> +
> +$node_publisher->wait_for_catchup('sub2');
> +
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
> +is( $result, qq(4|8
> +5|10), 'generated columns replicated to non-generated column on subscriber');
> +
> +$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)");
> +
> +$node_publisher->wait_for_catchup('sub3');
> +
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
> +is( $result, qq(4|24
> +5|25), 'generated columns replicated to non-generated column on subscriber');
> +
>
> Here also I think there should be explicit comments about what these
> cases are testing, what results you are expecting, and why. The
> comments will look something like the message parameter of those
> safe_psql(...)
>
> e.g.
> # confirm generated columns ARE replicated when the subscriber-side
> column is not generated
>
> e.g.
> # confirm generated columns are NOT replicated when the
> subscriber-side column is also generated
>
> ======
>
> 99.
> Please also see my nitpicks attachment patch for various other
> cosmetic and docs problems, and apply theseif you agree:
> - documentation wording/rendering
> - wrong comments
> - spacing
> - etc.

All the comments are handled.

Patch v8-0001 contains all the changes required. See [1] for the changes added.

[1] https://www.postgresql.org/message-id/CAHv8Rj%2BAi0CgtXiAga82bWpWB8fVcOWycNyJ_jqXm788v3R8rQ%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Peter Eisentraut
Date:
On 19.06.24 13:22, Shubham Khanna wrote:
> All the comments are handled.
> 
> The attached Patch contains all the suggested changes.

Please also take a look at the proposed patch for virtual generated 
columns [0] and consider how that would affect your patch.  I think your 
feature can only replicate *stored* generated columns.  So perhaps the 
documentation and terminology in your patch should reflect that.


[0]: 
https://www.postgresql.org/message-id/flat/a368248e-69e4-40be-9c07-6c3b5880b0a6@eisentraut.org




Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, here are my review comments for v8-0001.

======
Commit message.

1.
It seems like the patch name was accidentally omitted, so it became a
mess when it defaulted to the 1st paragraph of the commit message.

======
contrib/test_decoding/test_decoding.c

2.
+ data->include_generated_columns = true;

I previously posted a comment [1, #4] that this should default to
false; IMO it is unintuitive for the test_decoding to have an
*opposite* default behaviour compared to CREATE SUBSCRIPTION.

======
doc/src/sgml/ref/create_subscription.sgml

NITPICK - remove the inconsistent blank line in SGML

======
src/backend/commands/subscriptioncmds.c

3.
+#define SUBOPT_include_generated_columns 0x00010000

I previously posted a comment [1, #6] that this should be UPPERCASE,
but it is not yet fixed.

======
src/bin/psql/describe.c

NITPICK - move and reword the bogus comment

~

4.
+ if (pset.sversion >= 170000)
+ appendPQExpBuffer(&buf,
+ ", subincludegencols AS \"%s\"\n",
+ gettext_noop("include_generated_columns"));

4a.
For consistency with every other parameter, that column title should
be written in words "Include generated columns" (not
"include_generated_columns").

~

4b.
IMO this new column belongs with the other subscription parameter
columns (e.g. put it ahead of the "Conninfo" column).

======
src/test/subscription/t/011_generated.pl

NITPICK - fixed a comment

5.
IMO, it would be better for readability if all the matching CREATE
TABLE for publisher and subscriber are kept together, instead of the
current code which is creating all publisher tables and then creating
all subscriber tables.

~~~

6.
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
+is( $result, qq(4|8
+5|10), 'confirm generated columns ARE replicated when the
subscriber-side column is not generated');
+
...
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
+is( $result, qq(4|24
+5|25), 'confirm generated columns are NOT replicated when the
subscriber-side column is also generated');
+

6a.
These SELECT all need ORDER BY to protect against the SELECT *
returning rows in some unexpected order.

~

6b.
IMO there should be more comments here to explain how you can tell the
column was NOT replicated. E.g. it is because the result value of 'b'
is the subscriber-side computed value (which you made deliberately
different to the publisher-side computed value).

======

99.
Please also refer to the attached nitpicks top-up patch for minor
cosmetic stuff.

======
[1] https://www.postgresql.org/message-id/CAHv8RjLeZtTeXpFdoY6xCPO41HtuOPMSSZgshVdb%2BV%3Dp2YHL8Q%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Wed, 19 Jun 2024 at 21:43, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 19.06.24 13:22, Shubham Khanna wrote:
> > All the comments are handled.
> >
> > The attached Patch contains all the suggested changes.
>
> Please also take a look at the proposed patch for virtual generated
> columns [0] and consider how that would affect your patch.  I think your
> feature can only replicate *stored* generated columns.  So perhaps the
> documentation and terminology in your patch should reflect that.

This patch is unable to manage virtual generated columns because it
stores NULL values for them. Along with documentation the initial sync
command being generated also should be changed to sync data
exclusively for stored generated columns, omitting virtual ones. I
suggest treating these changes as a separate patch(0003) for future
merging or a separate commit, depending on the order of patch
acceptance.

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Tue, 18 Jun 2024 at 10:57, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, here are my review comments for patch v7-0002
>
> ======
> Commit Message
>
> NITPICKS
> - rearrange paragraphs
> - typo "donot"
> - don't start a sentence with "And"
> - etc.
>
> Please see the attachment for my suggested commit message text updates
> and take from it whatever you agree with.
>
Fixed

> ======
> doc/src/sgml/ref/create_subscription.sgml
>
> 1.
> +          If the subscriber-side column is also a generated column
> then this option
> +          has no effect; the replicated data will be ignored and the subscriber
> +          column will be filled as normal with the subscriber-side computed or
> +          default data. And during table synchronization, the data
> corresponding to
> +          the generated column on subscriber-side will not be sent from the
> +          publisher to the subscriber.
>
> This text already mentions subscriber-side generated cols. IMO you
> don't need to say anything at all about table synchronization --
> that's just an internal code optimization, which is not something the
> user needs to know about. IOW, the entire last sentence ("And
> during...") should be removed.
>
Fixed

> ======
> src/backend/replication/logical/relation.c
>
> 2. logicalrep_rel_open
>
> - if (attr->attisdropped)
> + if (attr->attisdropped ||
> + (!MySubscription->includegencol && attr->attgenerated))
>   {
>   entry->attrmap->attnums[i] = -1;
>   continue;
>
> ~
>
> Maybe I'm mistaken, but isn't this code for skipping checking for
> "missing" subscriber-side (aka local) columns? Can't it just
> unconditionally skip every attr->attgenerated -- i.e. why does it
> matter if the MySubscription->includegencol was set or not?
>
In case 'include_generated_columns' is 'true'. column list in
remoterel will have an entry for generated columns.
So, in this case if we skip every attr->attgenerated, we will get a
missing column error.

> ======
> src/backend/replication/logical/tablesync.c
>
> 3. make_copy_attnamelist
>
> - for (i = 0; i < rel->remoterel.natts; i++)
> + desc = RelationGetDescr(rel->localrel);
> +
> + for (i = 0; i < desc->natts; i++)
>   {
> - attnamelist = lappend(attnamelist,
> -   makeString(rel->remoterel.attnames[i]));
> + int attnum;
> + Form_pg_attribute attr = TupleDescAttr(desc, i);
> +
> + if (!attr->attgenerated)
> + continue;
> +
> + attnum = logicalrep_rel_att_by_name(&rel->remoterel,
> + NameStr(attr->attname));
> +
> + /*
> + * Check if subscription table have a generated column with same
> + * column name as a non-generated column in the corresponding
> + * publication table.
> + */
> + if (attnum >=0 && !attgenlist[attnum])
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("logical replication target relation \"%s.%s\" is missing
> replicated column: \"%s\"",
> + rel->remoterel.nspname, rel->remoterel.relname, NameStr(attr->attname))));
> +
> + if (attnum >= 0)
> + gencollist = lappend_int(gencollist, attnum);
>   }
>
> ~
>
> NITPICK - Use C99-style for loop variables
> NITPICK - Typo in comment
> NITPICK - spaces
>
> ~
>
> 3a.
> I think above code should be refactored so there is only one check for
> "if (attnum >= 0)" -- e.g. other condition should be nested.
>
> ~
>
> 3b.
> That ERROR message says "missing replicated column", but that doesn't
> seem much like what the code-comment was saying this code is about.
>
Fixed

> ~~~
>
> 4.
> + for (i = 0; i < rel->remoterel.natts; i++)
> + {
> +
> + if (gencollist != NIL && j < gencollist->length &&
> + list_nth_int(gencollist, j) == i)
> + j++;
> + else
> + attnamelist = lappend(attnamelist,
> +   makeString(rel->remoterel.attnames[i]));
> + }
>
> NITPICK - Use C99-style for loop variables
> NITPICK - Unnecessary blank lines
>
> ~
>
> IIUC the subscriber-side table and the publisher-side table do NOT
> have to have all the columns in identical order for the logical
> replication to work correcly. AFAIK it works fine so long as the
> column names match for the replicated columns. Therefore, I am
> suspicious that this new patch code seems to be imposing some new
> ordering assumptions/restrictions (e.g. list_nth_int stuff) which are
> not current requirements.
>
> ~~~
>
> copy_table:
>
> NITPICK - comment typo
> NITPICK - comment wording
>
Fixed

> ~
>
> 5.
> + int i = 0;
> + ListCell *l;
> +
>   appendStringInfoString(&cmd, "COPY (SELECT ");
> - for (int i = 0; i < lrel.natts; i++)
> + foreach(l, attnamelist)
>   {
> - appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i]));
> - if (i < lrel.natts - 1)
> + appendStringInfoString(&cmd, quote_identifier(strVal(lfirst(l))));
> + if (i < attnamelist->length - 1)
>   appendStringInfoString(&cmd, ", ");
> + i++;
>   }
> IIUC for new code like this, it is preferred to use the foreach*
> macros instead of ListCell.
>
Fixed

> ======
> src/test/regress/sql/subscription.sql
>
> 6.
> --- fail - copy_data and include_generated_columns are mutually
> exclusive options
> -CREATE SUBSCRIPTION sub2 CONNECTION 'dbname=regress_doesnotexist'
> PUBLICATION testpub WITH (include_generated_columns = true);
> -ERROR:  copy_data = true and include_generated_columns = true are
> mutually exclusive options
>
> It is OK to delete this test now but IMO still needs to be some
> "include_generated_columns must be boolean" test cases (e.g. same as
> there was two_phase). Actually, this should probably be done by the
> 0001 patch.
>
Fixed

> ======
> src/test/subscription/t/011_generated.pl
>
> 7.
> All the PRIMARY KEY stuff may be overkill. Are primary keys really
> needed for these tests?
>
Fixed

> ~~~
>
> 8.
> +$node_publisher->safe_psql('postgres',
> + "CREATE TABLE tab4 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> * 2) STORED, c int GENERATED ALWAYS AS (a * 2) STORED)"
> +);
> +
> +$node_publisher->safe_psql('postgres',
> + "CREATE TABLE tab5 (a int PRIMARY KEY, b int)"
> +);
> +
>
> Maybe add comments on what is special about all these tables, so don't
> have to read the tests later to deduce their purpose.
>
> tab4: publisher-side generated col 'b' and 'c'  ==> subscriber-side
> non-generated col 'b', and generated-col 'c'
> tab5: publisher-side non-generated col 'b' --> subscriber-side
> non-generated col 'b'
>
Fixed

> ~~~
>
> 9.
> +$node_subscriber->safe_psql('postgres',
> + "CREATE SUBSCRIPTION sub4 CONNECTION '$publisher_connstr'
> PUBLICATION pub4 WITH (include_generated_columns = true)"
> + );
> +
>
> All the publications are created together, and all the subscriptions
> are created together except for 'sub5'. Consider including a comment
> to say why you deliberately created the 'sub5' subscription separate
> from all others.
>
Fixed

> ======
>
> 99.
> Please also see my code nitpicks attachment patch for various other
> cosmetic problems, and apply them if you agree.
>
Applied the changes

I have fixed the comments and attached the patches. I have also
attached the v9-0003 patch. It will resolve the issue suggested by
Vignesh in [1]. I have also updated the documentation for the same.
v9-0001 - Not Modified
v9-0002 - Support replication of generated columns during initial sync.
v9-0003 - Fix behaviour of tablesync for Virtual Generated Columns.

[1]: https://www.postgresql.org/message-id/CALDaNm3Ufg872XqgPvBVzXHvUVenu-8%2BGz2dyEuKq3CN0UxfKw%40mail.gmail.com

Thanks and Regards,
Shlok Kyal

Attachment

Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Thu, 20 Jun 2024 at 12:52, vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, 19 Jun 2024 at 21:43, Peter Eisentraut <peter@eisentraut.org> wrote:
> >
> > On 19.06.24 13:22, Shubham Khanna wrote:
> > > All the comments are handled.
> > >
> > > The attached Patch contains all the suggested changes.
> >
> > Please also take a look at the proposed patch for virtual generated
> > columns [0] and consider how that would affect your patch.  I think your
> > feature can only replicate *stored* generated columns.  So perhaps the
> > documentation and terminology in your patch should reflect that.
>
> This patch is unable to manage virtual generated columns because it
> stores NULL values for them. Along with documentation the initial sync
> command being generated also should be changed to sync data
> exclusively for stored generated columns, omitting virtual ones. I
> suggest treating these changes as a separate patch(0003) for future
> merging or a separate commit, depending on the order of patch
> acceptance.
>

I have addressed the issue and updated the documentation accordingly.
And created a new 0003 patch.
Please refer to v9-0003 patch for the same in [1].

[1]: https://www.postgresql.org/message-id/CANhcyEXmjLEPNgOSAtjS4YGb9JvS8w-SO9S%2BjRzzzXo2RavNWw%40mail.gmail.com

Thanks and Regards,
Shlok Kyal



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham, here are some more patch v8-0001 comments that I missed yesterday.

======
src/test/subscription/t/011_generated.pl

1.
Are the PRIMARY KEY qualifiers needed for the new tab2, tab3 tables? I
don't think so.

~~~

2.
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
+is( $result, qq(4|8
+5|10), 'confirm generated columns ARE replicated when the
subscriber-side column is not generated');
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)");
+
+$node_publisher->wait_for_catchup('sub3');
+
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
+is( $result, qq(4|24
+5|25), 'confirm generated columns are NOT replicated when the
subscriber-side column is also generated');
+

It would be prudent to do explicit "SELECT a,b" instead of "SELECT *",
for readability and to avoid any surprises.

======
Kind Regards,
Peter Smith.
Fujitsu Australia.



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, here are some review comments for patch v9-0002.

======
src/backend/replication/logical/relation.c

1. logicalrep_rel_open

- if (attr->attisdropped)
+ if (attr->attisdropped ||
+ (!MySubscription->includegencols && attr->attgenerated))

You replied to my question from the previous review [1, #2] as follows:
In case 'include_generated_columns' is 'true'. column list in
remoterel will have an entry for generated columns. So, in this case
if we skip every attr->attgenerated, we will get a missing column
error.

~

TBH, the reason seems very subtle to me. Perhaps that
"(!MySubscription->includegencols && attr->attgenerated))" condition
should be coded as a separate "if", so then you can include a comment
similar to your answer, to explain it.

======
src/backend/replication/logical/tablesync.c

make_copy_attnamelist:

NITPICK - punctuation in function comment
NITPICK - add/reword some more comments
NITPICK - rearrange comments to be closer to the code they are commenting

~

2. make_copy_attnamelist.

+ /*
+ * Construct column list for COPY.
+ */
+ for (int i = 0; i < rel->remoterel.natts; i++)
+ {
+ if(!gencollist[i])
+ attnamelist = lappend(attnamelist,
+   makeString(rel->remoterel.attnames[i]));
+ }

IIUC isn't this assuming that the attribute number (aka column order)
is the same on the subscriber side (e.g. gencollist idx) and on the
publisher side (e.g. remoterel.attnames[i]).  AFAIK logical
replication does not require this ordering must be match like that,
therefore I am suspicious this new logic is accidentally imposing new
unwanted assumptions/restrictions. I had asked the same question
before [1-#4] about this code, but there was no reply.

Ideally, there would be more test cases for when the columns
(including the generated ones) are all in different orders on the
pub/sub tables.

~~~

3. General - varnames.

It would help with understanding if the 'attgenlist' variables in all
these functions are re-named to make it very clear that this is
referring to the *remote* (publisher-side) table genlist, not the
subscriber table genlist.

~~~

4.
+ int i = 0;
+
  appendStringInfoString(&cmd, "COPY (SELECT ");
- for (int i = 0; i < lrel.natts; i++)
+ foreach_ptr(ListCell, l, attnamelist)
  {
- appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i]));
- if (i < lrel.natts - 1)
+ appendStringInfoString(&cmd, quote_identifier(strVal(l)));
+ if (i < attnamelist->length - 1)
  appendStringInfoString(&cmd, ", ");
+ i++;
  }

4a.
I think the purpose of the new macros is to avoid using ListCell, and
also 'l' is an unhelpful variable name. Shouldn't this code be more
like:
foreach_node(String, att_name, attnamelist)

~

4b.
The code can be far simpler if you just put the comma (", ") always
up-front except the *first* iteration, so you can avoid checking the
list length every time. For example:

if (i++)
  appendStringInfoString(&cmd, ", ");

======
src/test/subscription/t/011_generated.pl

5. General.

Hmm. This patch 0002 included many formatting changes to tables tab2
and tab3 and subscriptions sub2 and sub3 but they do not belong here.
The proper formatting for those needs to be done back in patch 0001
where they were introduced. Patch 0002 should just concentrate only on
the new stuff for patch 0002.

~

6. CREATE TABLES would be better in pairs

IMO it will be better if the matching CREATE TABLE for pub and sub are
kept together, instead of separating them by doing all pub then all
sub. I previously made the same comment for patch 0001, so maybe it
will be addressed next time...

~

7. SELECT *

+$result =
+  $node_subscriber->safe_psql('postgres', "SELECT * FROM tab4 ORDER BY a");

It will be prudent to do explicit "SELECT a,b,c" instead of "SELECT
*", for readability and so there are no surprises.

======

99.
Please also refer to my attached nitpicks diff for numerous cosmetic
changes, and apply if you agree.

======
[1] https://www.postgresql.org/message-id/CAHut%2BPtAsEc3PEB1KUk1kFF5tcCrDCCTcbboougO29vP1B4E2Q%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham/Shlok.

FYI,  my patch describing the current PG17 behaviour of logical
replication of generated columns was recently pushed [1].

Note that this will have some impact on your patch set. e.g. You are
changing the current replication behaviour, so the "Generated Columns"
section note will now need to be modified by your patches.

======
[1] https://github.com/postgres/postgres/commit/7a089f6e6a14ca3a5dc8822c393c6620279968b9
[2]

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, Here are some review comments for patch v9-0003

======
Commit Message

/fix/fixes/

======
1.
General. Is tablesync enough?

I don't understand why is the patch only concerned about tablesync?
Does it make sense to prevent VIRTUAL column replication during
tablesync, if you aren't also going to prevent VIRTUAL columns from
normal logical replication (e.g. when copy_data = false)? Or is this
already handled somewhere?

~~~

2.
General. Missing test.

Add some test cases to verify behaviour is different for STORED versus
VIRTUAL generated columns

======
src/sgml/ref/create_subscription.sgml

NITPICK - consider rearranging as shown in my nitpicks diff
NITPICK - use <literal> sgml markup for STORED

======
src/backend/replication/logical/tablesync.c

3.
- if ((walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 &&
- walrcv_server_version(LogRepWorkerWalRcvConn) <= 160000) ||
- !MySubscription->includegencols)
+ if (walrcv_server_version(LogRepWorkerWalRcvConn) < 170000)
+ {
+ if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000)
  appendStringInfo(&cmd, " AND a.attgenerated = ''");
+ }
+ else if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 170000)
+ {
+ if(MySubscription->includegencols)
+ appendStringInfo(&cmd, " AND a.attgenerated != 'v'");
+ else
+ appendStringInfo(&cmd, " AND a.attgenerated = ''");
+ }

IMO this logic is too tricky to remain uncommented -- please add some comments.
Also, it seems somewhat complex. I think you can achieve the same more simply:

SUGGESTION

if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000)
{
  bool gencols_allowed = walrcv_server_version(LogRepWorkerWalRcvConn) >= 170000
    && MySubscription->includegencols;
  if (gencols_allowed)
  {
    /* Replication of generated cols is supported, but not VIRTUAL cols. */
    appendStringInfo(&cmd, " AND a.attgenerated != 'v'");
  }
  else
  {
    /* Replication of generated cols is not supported. */
    appendStringInfo(&cmd, " AND a.attgenerated = ''");
  }
}

======

99.
Please refer also to my attached nitpick diffs and apply those if you agree.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham/Shlok.

FYI, there is some other documentation page that mentions generated
column replication messages.

This page [1] says:
"Next, the following message part appears for each column included in
the publication (except generated columns):"

But, with the introduction of your new feature, I think that the
"except generated columns" wording is not strictly correct anymore.
IOW that docs page needs updating but AFAICT your patches have not
addressed this yet.

======
[1] https://www.postgresql.org/docs/17/protocol-logicalrep-message-formats.html

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Thu, Jun 20, 2024 at 9:03 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, here are my review comments for v8-0001.
>
> ======
> Commit message.
>
> 1.
> It seems like the patch name was accidentally omitted, so it became a
> mess when it defaulted to the 1st paragraph of the commit message.
>
> ======
> contrib/test_decoding/test_decoding.c
>
> 2.
> + data->include_generated_columns = true;
>
> I previously posted a comment [1, #4] that this should default to
> false; IMO it is unintuitive for the test_decoding to have an
> *opposite* default behaviour compared to CREATE SUBSCRIPTION.
>
> ======
> doc/src/sgml/ref/create_subscription.sgml
>
> NITPICK - remove the inconsistent blank line in SGML
>
> ======
> src/backend/commands/subscriptioncmds.c
>
> 3.
> +#define SUBOPT_include_generated_columns 0x00010000
>
> I previously posted a comment [1, #6] that this should be UPPERCASE,
> but it is not yet fixed.
>
> ======
> src/bin/psql/describe.c
>
> NITPICK - move and reword the bogus comment
>
> ~
>
> 4.
> + if (pset.sversion >= 170000)
> + appendPQExpBuffer(&buf,
> + ", subincludegencols AS \"%s\"\n",
> + gettext_noop("include_generated_columns"));
>
> 4a.
> For consistency with every other parameter, that column title should
> be written in words "Include generated columns" (not
> "include_generated_columns").
>
> ~
>
> 4b.
> IMO this new column belongs with the other subscription parameter
> columns (e.g. put it ahead of the "Conninfo" column).
>
> ======
> src/test/subscription/t/011_generated.pl
>
> NITPICK - fixed a comment
>
> 5.
> IMO, it would be better for readability if all the matching CREATE
> TABLE for publisher and subscriber are kept together, instead of the
> current code which is creating all publisher tables and then creating
> all subscriber tables.
>
> ~~~
>
> 6.
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
> +is( $result, qq(4|8
> +5|10), 'confirm generated columns ARE replicated when the
> subscriber-side column is not generated');
> +
> ...
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
> +is( $result, qq(4|24
> +5|25), 'confirm generated columns are NOT replicated when the
> subscriber-side column is also generated');
> +
>
> 6a.
> These SELECT all need ORDER BY to protect against the SELECT *
> returning rows in some unexpected order.
>
> ~
>
> 6b.
> IMO there should be more comments here to explain how you can tell the
> column was NOT replicated. E.g. it is because the result value of 'b'
> is the subscriber-side computed value (which you made deliberately
> different to the publisher-side computed value).
>
> ======
>
> 99.
> Please also refer to the attached nitpicks top-up patch for minor
> cosmetic stuff.

All the comments are handled.

The attached Patch contains all the suggested changes.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Fri, Jun 21, 2024 at 8:23 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shubham, here are some more patch v8-0001 comments that I missed yesterday.
>
> ======
> src/test/subscription/t/011_generated.pl
>
> 1.
> Are the PRIMARY KEY qualifiers needed for the new tab2, tab3 tables? I
> don't think so.
>
> ~~~
>
> 2.
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
> +is( $result, qq(4|8
> +5|10), 'confirm generated columns ARE replicated when the
> subscriber-side column is not generated');
> +
> +$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)");
> +
> +$node_publisher->wait_for_catchup('sub3');
> +
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
> +is( $result, qq(4|24
> +5|25), 'confirm generated columns are NOT replicated when the
> subscriber-side column is also generated');
> +
>
> It would be prudent to do explicit "SELECT a,b" instead of "SELECT *",
> for readability and to avoid any surprises.

Both the comments are handled.

Patch v9-0001 contains all the changes required. See [1] for the changes added.

[1] https://www.postgresql.org/message-id/CAHv8Rj%2B6kwOGmn5MsRaTmciJDxtvNsyszMoPXV62OGPGzkxrCg%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



RE: Pgoutput not capturing the generated columns

From
"Hayato Kuroda (Fujitsu)"
Date:
Hi Shubham,

Thanks for sharing new patch! You shared as v9, but it should be v10, right?
Also, since there are no commitfest entry, I registered [1]. You can rename the
title based on the needs. Currently CFbot said OK.

Anyway, below are my comments.

01. General
Your patch contains unnecessary changes. Please remove all of them. E.g., 

```
                          " s.subpublications,\n");
-
```
And
```
         appendPQExpBufferStr(query, " o.remote_lsn AS suboriginremotelsn,\n"
-                             " s.subenabled,\n");
+                            " s.subenabled,\n");
```

02. General
Again, please run the pgindent/pgperltidy.

03. test_decoding
Previously I suggested to the default value of to be include_generated_columns
should be true, so you modified like that. However, Peter suggested opposite
opinion [3] and you just revised accordingly. I think either way might be okay, but
at least you must clarify the reason why you preferred to set default to false and
changed accordingly.

04. decoding_into_rel.sql
According to the comment atop this file, this test should insert result to a table.
But added case does not - we should put them at another place. I.e., create another
file.

05. decoding_into_rel.sql
```
+-- when 'include-generated-columns' is not set
```
Can you clarify the expected behavior as a comment?

06. getSubscriptions
```
+    else
+        appendPQExpBufferStr(query,
+                        " false AS subincludegencols,\n");
```
I think the comma is not needed.
Also, this error meant that you did not test to use pg_dump for instances prior PG16.
Please verify whether we can dump subscriptions and restore them accordingly.

[1]: https://commitfest.postgresql.org/48/5068/
[2]:
https://www.postgresql.org/message-id/OSBPR01MB25529997E012DEABA8E15A02F5E52%40OSBPR01MB2552.jpnprd01.prod.outlook.com
[3]: https://www.postgresql.org/message-id/CAHut%2BPujrRQ63ju8P41tBkdjkQb4X9uEdLK_Wkauxum1MVUdfA%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, here are some patch v9-0001 comments.

I saw Kuroda-san has already posted comments for this patch so there
may be some duplication here.

======
GENERAL

1.
The later patches 0002 etc are checking to support only STORED
gencols. But, doesn't that restriction belong in this patch 0001 so
VIRTUAL columns are not decoded etc in the first place... (??)

~~~

2.
The "Generated Columns" docs mentioned in my previous review comment
[2] should be modified by this 0001 patch.

~~~

3.
I think the "Message Format" page mentioned in my previous review
comment [3] should be modified by this 0001 patch.

======
Commit message

4.
The patch name is still broken as previously mentioned [1, #1]

======
doc/src/sgml/protocol.sgml

5.
Should this docs be referring to STORED generated columns, instead of
just generated columns?

======
doc/src/sgml/ref/create_subscription.sgml

6.
Should this be docs referring to STORED generated columns, instead of
just generated columns?

======
src/bin/pg_dump/pg_dump.c

getSubscriptions:
NITPICK - tabs
NITPICK - patch removed a blank line it should not be touching
NITPICK = patch altered indents it should not be touching
NITPICK - a missing blank line that was previously present

7.
+ else
+ appendPQExpBufferStr(query,
+ " false AS subincludegencols,\n");

There is an unwanted comma here.

~

dumpSubscription
NITPICK - patch altered indents it should not be touching

======
src/bin/pg_dump/pg_dump.h

NITPICK - unnecessary blank line

======
src/bin/psql/describe.c

describeSubscriptions
NITPICK - bad indentation

8.
In my previous review [1, #4b] I suggested this new column should be
in a different order (e.g. adjacent to the other ones ahead of
'Conninfo'), but this is not yet addressed.

======
src/test/subscription/t/011_generated.pl

NITPICK - missing space in comment
NITPICK - misleading "because" wording in the comment

======

99.
See also my attached nitpicks diff, for cosmetic issues. Please apply
whatever you agree with.

======
[1] My v8-0001 review -
https://www.postgresql.org/message-id/CAHut%2BPujrRQ63ju8P41tBkdjkQb4X9uEdLK_Wkauxum1MVUdfA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAHut%2BPvsRWq9t2tEErt5ZWZCVpNFVZjfZ_owqfdjOhh4yXb_3Q%40mail.gmail.com
[3] https://www.postgresql.org/message-id/CAHut%2BPsHsT3V1wQ5uoH9ynbmWn4ZQqOe34X%2Bg37LSi7sgE_i2g%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia.

Attachment

Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Fri, 21 Jun 2024 at 09:03, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, here are some review comments for patch v9-0002.
>
> ======
> src/backend/replication/logical/relation.c
>
> 1. logicalrep_rel_open
>
> - if (attr->attisdropped)
> + if (attr->attisdropped ||
> + (!MySubscription->includegencols && attr->attgenerated))
>
> You replied to my question from the previous review [1, #2] as follows:
> In case 'include_generated_columns' is 'true'. column list in
> remoterel will have an entry for generated columns. So, in this case
> if we skip every attr->attgenerated, we will get a missing column
> error.
>
> ~
>
> TBH, the reason seems very subtle to me. Perhaps that
> "(!MySubscription->includegencols && attr->attgenerated))" condition
> should be coded as a separate "if", so then you can include a comment
> similar to your answer, to explain it.
Fixed

> ======
> src/backend/replication/logical/tablesync.c
>
> make_copy_attnamelist:
>
> NITPICK - punctuation in function comment
> NITPICK - add/reword some more comments
> NITPICK - rearrange comments to be closer to the code they are commenting
Applied the changes

> ~
>
> 2. make_copy_attnamelist.
>
> + /*
> + * Construct column list for COPY.
> + */
> + for (int i = 0; i < rel->remoterel.natts; i++)
> + {
> + if(!gencollist[i])
> + attnamelist = lappend(attnamelist,
> +   makeString(rel->remoterel.attnames[i]));
> + }
>
> IIUC isn't this assuming that the attribute number (aka column order)
> is the same on the subscriber side (e.g. gencollist idx) and on the
> publisher side (e.g. remoterel.attnames[i]).  AFAIK logical
> replication does not require this ordering must be match like that,
> therefore I am suspicious this new logic is accidentally imposing new
> unwanted assumptions/restrictions. I had asked the same question
> before [1-#4] about this code, but there was no reply.
>
> Ideally, there would be more test cases for when the columns
> (including the generated ones) are all in different orders on the
> pub/sub tables.
'gencollist' is set according to the remoterel
+           gencollist[attnum] = true;
where attnum is the attribute number of the corresponding column on remote rel.

I have also added the tests to confirm the behaviour

> ~~~
>
> 3. General - varnames.
>
> It would help with understanding if the 'attgenlist' variables in all
> these functions are re-named to make it very clear that this is
> referring to the *remote* (publisher-side) table genlist, not the
> subscriber table genlist.
Fixed

> ~~~
>
> 4.
> + int i = 0;
> +
>   appendStringInfoString(&cmd, "COPY (SELECT ");
> - for (int i = 0; i < lrel.natts; i++)
> + foreach_ptr(ListCell, l, attnamelist)
>   {
> - appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i]));
> - if (i < lrel.natts - 1)
> + appendStringInfoString(&cmd, quote_identifier(strVal(l)));
> + if (i < attnamelist->length - 1)
>   appendStringInfoString(&cmd, ", ");
> + i++;
>   }
>
> 4a.
> I think the purpose of the new macros is to avoid using ListCell, and
> also 'l' is an unhelpful variable name. Shouldn't this code be more
> like:
> foreach_node(String, att_name, attnamelist)
>
> ~
>
> 4b.
> The code can be far simpler if you just put the comma (", ") always
> up-front except the *first* iteration, so you can avoid checking the
> list length every time. For example:
>
> if (i++)
>   appendStringInfoString(&cmd, ", ");
Fixed

> ======
> src/test/subscription/t/011_generated.pl
>
> 5. General.
>
> Hmm. This patch 0002 included many formatting changes to tables tab2
> and tab3 and subscriptions sub2 and sub3 but they do not belong here.
> The proper formatting for those needs to be done back in patch 0001
> where they were introduced. Patch 0002 should just concentrate only on
> the new stuff for patch 0002.
Fixed

> ~
>
> 6. CREATE TABLES would be better in pairs
>
> IMO it will be better if the matching CREATE TABLE for pub and sub are
> kept together, instead of separating them by doing all pub then all
> sub. I previously made the same comment for patch 0001, so maybe it
> will be addressed next time...
Fixed

> ~
>
> 7. SELECT *
>
> +$result =
> +  $node_subscriber->safe_psql('postgres', "SELECT * FROM tab4 ORDER BY a");
>
> It will be prudent to do explicit "SELECT a,b,c" instead of "SELECT
> *", for readability and so there are no surprises.
Fixed

> ======
>
> 99.
> Please also refer to my attached nitpicks diff for numerous cosmetic
> changes, and apply if you agree.
Applied the changes.

> ======
> [1] https://www.postgresql.org/message-id/CAHut%2BPtAsEc3PEB1KUk1kFF5tcCrDCCTcbboougO29vP1B4E2Q%40mail.gmail.com

I have attached a v10 patch to address the comments:
v10-0001 - Not Modified
v10-0002 - Support replication of generated columns during initial sync.
v10-0003 - Fix behaviour for Virtual Generated Columns.

Thanks and Regards,
Shlok Kyal

Attachment

Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Fri, 21 Jun 2024 at 12:51, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, Here are some review comments for patch v9-0003
>
> ======
> Commit Message
>
> /fix/fixes/
Fixed

> ======
> 1.
> General. Is tablesync enough?
>
> I don't understand why is the patch only concerned about tablesync?
> Does it make sense to prevent VIRTUAL column replication during
> tablesync, if you aren't also going to prevent VIRTUAL columns from
> normal logical replication (e.g. when copy_data = false)? Or is this
> already handled somewhere?
I checked the behaviour during incremental changes. I saw during
decoding 'null' values are present for Virtual Generated Columns. I
made the relevant changes to not support replication of Virtual
generated columns.

> ~~~
>
> 2.
> General. Missing test.
>
> Add some test cases to verify behaviour is different for STORED versus
> VIRTUAL generated columns
I have not added the tests as it would give an error in cfbot.
I have added a TODO note for the same. This can be done once the
VIRTUAL generated columns patch is committted.

> ======
> src/sgml/ref/create_subscription.sgml
>
> NITPICK - consider rearranging as shown in my nitpicks diff
> NITPICK - use <literal> sgml markup for STORED
Fixed

> ======
> src/backend/replication/logical/tablesync.c
>
> 3.
> - if ((walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 &&
> - walrcv_server_version(LogRepWorkerWalRcvConn) <= 160000) ||
> - !MySubscription->includegencols)
> + if (walrcv_server_version(LogRepWorkerWalRcvConn) < 170000)
> + {
> + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000)
>   appendStringInfo(&cmd, " AND a.attgenerated = ''");
> + }
> + else if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 170000)
> + {
> + if(MySubscription->includegencols)
> + appendStringInfo(&cmd, " AND a.attgenerated != 'v'");
> + else
> + appendStringInfo(&cmd, " AND a.attgenerated = ''");
> + }
>
> IMO this logic is too tricky to remain uncommented -- please add some comments.
> Also, it seems somewhat complex. I think you can achieve the same more simply:
>
> SUGGESTION
>
> if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000)
> {
>   bool gencols_allowed = walrcv_server_version(LogRepWorkerWalRcvConn) >= 170000
>     && MySubscription->includegencols;
>   if (gencols_allowed)
>   {
>     /* Replication of generated cols is supported, but not VIRTUAL cols. */
>     appendStringInfo(&cmd, " AND a.attgenerated != 'v'");
>   }
>   else
>   {
>     /* Replication of generated cols is not supported. */
>     appendStringInfo(&cmd, " AND a.attgenerated = ''");
>   }
> }
Fixed

> ======
>
> 99.
> Please refer also to my attached nitpick diffs and apply those if you agree.
Applied the changes.

I have attached the updated patch v10 here in [1].
[1]: https://www.postgresql.org/message-id/CANhcyEUMCk6cCbw0vVZWo8FRd6ae9CmKG%3DgKP-9Q67jLn8HqtQ%40mail.gmail.com

Thanks and Regards,
Shlok Kyal



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Here are some review comments for the patch v10-0002.

======
Commit Message

1.
Note that we don't copy columns when the subscriber-side column is also
generated. Those will be filled as normal with the subscriber-side computed or
default data.

~

Now this patch also introduced some errors etc, so I think that patch
comment should be written differently to explicitly spell out
behaviour of every combination, something like the below:

Summary

when (include_generated_column = true)

* publisher not-generated column => subscriber not-generated column:
This is just normal logical replication (not changed by this patch).

* publisher not-generated column => subscriber generated column: This
will give ERROR.

* publisher generated column => subscriber not-generated column: The
publisher generated column value is copied.

* publisher generated column => subscriber generated column: The
publisher generated column value is not copied. The subscriber
generated column will be filled with the subscriber-side computed or
default data.

when (include_generated_columns = false)

* publisher not-generated column => subscriber not-generated column:
This is just normal logical replication (not changed by this patch).

* publisher not-generated column => subscriber generated column: This
will give ERROR.

* publisher generated column => subscriber not-generated column: This
will replicate nothing. Publisher generate-column is not replicated.
The subscriber column will be filled with the subscriber-side default
data.

* publisher generated column => subscriber generated column:  This
will replicate nothing. Publisher generate-column is not replicated.
The subscriber generated column will be filled with the
subscriber-side computed or default data.

======
src/backend/replication/logical/relation.c

2.
logicalrep_rel_open:

I tested some of the "missing column" logic, and got the following results:

Scenario A:
PUB
test_pub=# create table t2(a int, b int);
test_pub=# create publication pub2 for table t2;
SUB
test_sub=# create table t2(a int, b int generated always as (a*2) stored);
test_sub=# create subscription sub2 connection 'dbname=test_pub'
publication pub2 with (include_generated_columns = false);
Result:
ERROR:  logical replication target relation "public.t2" is missing
replicated column: "b"

~

Scenario B:
PUB/SUB identical to above, but subscription sub2 created "with
(include_generated_columns = true);"
Result:
ERROR:  logical replication target relation "public.t2" has a
generated column "b" but corresponding column on source relation is
not a generated column

~~~

2a. Question

Why should we get 2 different error messages for what is essentially
the same problem according to whether the 'include_generated_columns'
is false or true? Isn't the 2nd error message the more correct and
useful one for scenarios like this involving generated columns?

Thoughts?

~

2b. Missing tests?

I also noticed there seems no TAP test for the current "missing
replicated column" message. IMO there should be a new test introduced
for this because the loop involved too much bms logic to go
untested...

======
src/backend/replication/logical/tablesync.c

make_copy_attnamelist:
NITPICK - minor comment tweak
NITPICK - add some spaces after "if" code

3.
Should you pfree the gencollist at the bottom of this function when
you no longer need it, for tidiness?

~~~

4.
 static void
-fetch_remote_table_info(char *nspname, char *relname,
+fetch_remote_table_info(char *nspname, char *relname, bool **remotegenlist,
  LogicalRepRelation *lrel, List **qual)
 {
  WalRcvExecResult *res;
  StringInfoData cmd;
  TupleTableSlot *slot;
  Oid tableRow[] = {OIDOID, CHAROID, CHAROID};
- Oid attrRow[] = {INT2OID, TEXTOID, OIDOID, BOOLOID};
+ Oid attrRow[] = {INT2OID, TEXTOID, OIDOID, BOOLOID, BOOLOID};
  Oid qualRow[] = {TEXTOID};
  bool isnull;
+ bool    *remotegenlist_res;

IMO the names 'remotegenlist' and 'remotegenlist_res' should be
swapped the other way around, because it is the function parameter
that is the "result", whereas the 'remotegenlist_res' is just the
local working var for it.

~~~

5. fetch_remote_table_info

Now walrcv_server_version(LogRepWorkerWalRcvConn) is used in multiple
places, I think it will be better to assign this to a 'server_version'
variable to be used everywhere instead of having multiple function
calls.

~~~

6.
  "SELECT a.attnum,"
  "       a.attname,"
  "       a.atttypid,"
- "       a.attnum = ANY(i.indkey)"
+ "       a.attnum = ANY(i.indkey),"
+ " a.attgenerated != ''"
  "  FROM pg_catalog.pg_attribute a"
  "  LEFT JOIN pg_catalog.pg_index i"
  "       ON (i.indexrelid = pg_get_replica_identity_index(%u))"
  " WHERE a.attnum > 0::pg_catalog.int2"
- "   AND NOT a.attisdropped %s"
+ "   AND NOT a.attisdropped", lrel->remoteid);
+
+ if ((walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 &&
+ walrcv_server_version(LogRepWorkerWalRcvConn) <= 160000) ||
+ !MySubscription->includegencols)
+ appendStringInfo(&cmd, " AND a.attgenerated = ''");
+

If the server version is < PG12 then AFAIK there was no such thing as
"a.attgenerated", so shouldn't that SELECT " a.attgenerated != ''"
part also be guarded by some version checking condition like in the
WHERE? Otherwise won't it cause an ERROR for old servers?

~~~

7.
  /*
- * For non-tables and tables with row filters, we need to do COPY
- * (SELECT ...), but we can't just do SELECT * because we need to not
- * copy generated columns. For tables with any row filters, build a
- * SELECT query with OR'ed row filters for COPY.
+ * For non-tables and tables with row filters and when
+ * 'include_generated_columns' is specified as 'true', we need to do
+ * COPY (SELECT ...), as normal COPY of generated column is not
+ * supported. For tables with any row filters, build a SELECT query
+ * with OR'ed row filters for COPY.
  */

NITPICK. I felt this was not quite right. AFAIK the reasons for using
this COPY (SELECT ...) syntax is different for row-filters and
generated-columns. Anyway, I updated the comment slightly in my
nitpicks attachment. Please have a look at it to see if you agree with
the suggestions. Maybe I am wrong.

~~~

8.
- for (int i = 0; i < lrel.natts; i++)
+ foreach_ptr(String, att_name, attnamelist)

I'm not 100% sure, but isn't foreach_node the macro to use here,
rather than foreach_ptr?
======
src/test/subscription/t/011_generated.pl

9.
Please discuss with Shubham how to make all the tab1, tab2, tab3,
tab4, tab5, tab6 comments use the same kind of style/wording.
Currently, the patches 0001 and 0002 test comments are a bit
inconsistent.

~~~

10.
Related to above -- now that patch 0002 supports copy_data=true I
don't see why we need to test generated columns *both* for
copy_data=false and also for copy_data=true. IOW, is it really
necessary to have so many tables/tests? For example, I am thinking
some of those tests from patch 0001 can be re-used or just removed now
that copy_data=true works.

~~~

NITPICK - minor comment tweak

~~~

11.
For tab4 and tab6 I saw the initial sync and normal replication data
tests are all merged together, but I had expected to see the initial
sync and normal replication data tests separated so it would be
consistent with the earlier tab1, tab2, tab3 tests.

======

99.
Also, I have attached a nitpicks diff for some of the cosmetic review
comments mentioned above. Please apply whatever of these that you
agree with.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

RE: Pgoutput not capturing the generated columns

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shlok,

Thanks for updating patches! Below are my comments, maybe only for 0002.

01. General

IIUC, we are not discussed why ALTER SUBSCRIPTION ... SET include_generated_columns
is prohibit. Previously, it seems okay because there are exclusive options. But now,
such restrictions are gone. Do you have a reason in your mind? It is just not considered
yet?

02. General

According to the doc, we allow to alter a column to non-generated one, by ALTER
TABLE ... ALTER COLUMN ... DROP EXPRESSION command. Not sure, what should be
when the command is executed on the subscriber while copying the data? Should
we continue the copy or restart? How do you think?

03. Tes tcode

IIUC, REFRESH PUBLICATION can also lead the table synchronization. Can you add
a test for that?

04. Test code (maybe for 0001)

Please test the combination with TABLE ... ALTER COLUMN ... DROP EXPRESSION command.

05. logicalrep_rel_open

```
+            /*
+             * In case 'include_generated_columns' is 'false', we should skip the
+             * check of missing attrs for generated columns.
+             * In case 'include_generated_columns' is 'true', we should check if
+             * corresponding column for the generated column in publication column
+             * list is present in the subscription table.
+             */
+            if (!MySubscription->includegencols && attr->attgenerated)
+            {
+                entry->attrmap->attnums[i] = -1;
+                continue;
+            }
```

This comment is not very clear to me, because here we do not skip anything.
Can you clarify the reason why attnums[i] is set to -1 and how will it be used?

06. make_copy_attnamelist

```
+    gencollist = palloc0(MaxTupleAttributeNumber * sizeof(bool));
```

I think this array is too large. Can we reduce a size to (desc->natts * sizeof(bool))?
Also, the free'ing should be done.

07. make_copy_attnamelist

```
+    /* Loop to handle subscription table generated columns. */
+    for (int i = 0; i < desc->natts; i++)
```

IIUC, the loop is needed to find generated columns on the subscriber side, right?
Can you clarify as comment?

08. copy_table

```
+    /*
+     * Regular table with no row filter and 'include_generated_columns'
+     * specified as 'false' during creation of subscription.
+     */
```

I think this comment is not correct. After patching, all tablesync command becomes
like COPY (SELECT ...) if include_genereted_columns is set to true. Is it right?
Can we restrict only when the table has generated ones?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shlok. Here are my review comments for patch v10-0003

======
General.

1.
The patch has lots of conditions like:
if (att->attgenerated && (att->attgenerated !=
ATTRIBUTE_GENERATED_STORED || !include_generated_columns))
 continue;

IMO these are hard to read. Although more verbose, please consider if
all those (for the sake of readability) would be better re-written
like below :

if (att->generated)
{
  if (!include_generated_columns)
    continue;

  if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
    continue;
}

======
contrib/test_decoding/test_decoding.c

tuple_to_stringinfo:

NITPICK = refactored the code and comments a bit here to make it easier
NITPICK - there is no need to mention "virtual". Instead, say we only
support STORED

======
src/backend/catalog/pg_publication.c

publication_translate_columns:

NITPICK - introduced variable 'att' to simplify this code

~

2.
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+ errmsg("cannot use virtual generated column \"%s\" in publication
column list",
+    colname));

Is it better to avoid referring to "virtual" at all? Instead, consider
rearranging the wording to say something like "generated column \"%s\"
is not STORED so cannot be used in a publication column list"

~~~

pg_get_publication_tables:

NITPICK - split the condition code for readability

======
src/backend/replication/logical/relation.c

3. logicalrep_rel_open

+ if (attr->attgenerated && attr->attgenerated != ATTRIBUTE_GENERATED_STORED)
+ continue;
+

Isn't this missing some code to say "entry->attrmap->attnums[i] =
-1;", same as all the other nearby code is doing?

~~~

4.
I felt all the "generated column" logic should be kept together, so
this new condition should be combined with the other generated column
condition, like:

if (attr->attgenerated)
{
  /* comment... */
  if (!MySubscription->includegencols)
  {
    entry->attrmap->attnums[i] = -1;
    continue;
  }

  /* comment... */
  if (attr->attgenerated != ATTRIBUTE_GENERATED_STORED)
  {
    entry->attrmap->attnums[i] = -1;
    continue;
  }
}

======
src/backend/replication/logical/tablesync.c

5.
+ if (gencols_allowed)
+ {
+ /* Replication of generated cols is supported, but not VIRTUAL cols. */
+ appendStringInfo(&cmd, " AND a.attgenerated != 'v'");
+ }

Is it better here to use the ATTRIBUTE_GENERATED_VIRTUAL macro instead
of the hardwired 'v'? (Maybe add another TODO comment to revisit
this).

Alternatively, consider if it is more future-proof to rearrange so it
just says what *is* supported instead of what *isn't* supported:
e.g. "AND a.attgenerated IN ('', 's')"

======
src/test/subscription/t/011_generated.pl

NITPICK - some comments are missing the word "stored"
NITPICK - sometimes "col" should be plural "cols"
NITPICK = typo "GNERATED"

======

6.
In a previous review [1, comment #3] I mentioned that there should be
some docs updates on the "Logical Replication Message Formats" section
53.9. So, I expected patch 0001 would make some changes and then patch
0003 would have to update it again to say something about "STORED".
But all that is missing from the v10* patches.

======

99.
See also my nitpicks diff which is a top-up patch addressing all the
nitpick comments mentioned above. Please apply all of these that you
agree with.

======
[1] https://www.postgresql.org/message-id/CAHut%2BPvQ8CLq-JysTTeRj4u5SC9vTVcx3AgwTHcPUEOh-UnKcQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

On Mon, Jun 24, 2024 at 10:56 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> On Fri, 21 Jun 2024 at 09:03, Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi, here are some review comments for patch v9-0002.
> >
> > ======
> > src/backend/replication/logical/relation.c
> >
> > 1. logicalrep_rel_open
> >
> > - if (attr->attisdropped)
> > + if (attr->attisdropped ||
> > + (!MySubscription->includegencols && attr->attgenerated))
> >
> > You replied to my question from the previous review [1, #2] as follows:
> > In case 'include_generated_columns' is 'true'. column list in
> > remoterel will have an entry for generated columns. So, in this case
> > if we skip every attr->attgenerated, we will get a missing column
> > error.
> >
> > ~
> >
> > TBH, the reason seems very subtle to me. Perhaps that
> > "(!MySubscription->includegencols && attr->attgenerated))" condition
> > should be coded as a separate "if", so then you can include a comment
> > similar to your answer, to explain it.
> Fixed
>
> > ======
> > src/backend/replication/logical/tablesync.c
> >
> > make_copy_attnamelist:
> >
> > NITPICK - punctuation in function comment
> > NITPICK - add/reword some more comments
> > NITPICK - rearrange comments to be closer to the code they are commenting
> Applied the changes
>
> > ~
> >
> > 2. make_copy_attnamelist.
> >
> > + /*
> > + * Construct column list for COPY.
> > + */
> > + for (int i = 0; i < rel->remoterel.natts; i++)
> > + {
> > + if(!gencollist[i])
> > + attnamelist = lappend(attnamelist,
> > +   makeString(rel->remoterel.attnames[i]));
> > + }
> >
> > IIUC isn't this assuming that the attribute number (aka column order)
> > is the same on the subscriber side (e.g. gencollist idx) and on the
> > publisher side (e.g. remoterel.attnames[i]).  AFAIK logical
> > replication does not require this ordering must be match like that,
> > therefore I am suspicious this new logic is accidentally imposing new
> > unwanted assumptions/restrictions. I had asked the same question
> > before [1-#4] about this code, but there was no reply.
> >
> > Ideally, there would be more test cases for when the columns
> > (including the generated ones) are all in different orders on the
> > pub/sub tables.
> 'gencollist' is set according to the remoterel
> +           gencollist[attnum] = true;
> where attnum is the attribute number of the corresponding column on remote rel.
>
> I have also added the tests to confirm the behaviour
>
> > ~~~
> >
> > 3. General - varnames.
> >
> > It would help with understanding if the 'attgenlist' variables in all
> > these functions are re-named to make it very clear that this is
> > referring to the *remote* (publisher-side) table genlist, not the
> > subscriber table genlist.
> Fixed
>
> > ~~~
> >
> > 4.
> > + int i = 0;
> > +
> >   appendStringInfoString(&cmd, "COPY (SELECT ");
> > - for (int i = 0; i < lrel.natts; i++)
> > + foreach_ptr(ListCell, l, attnamelist)
> >   {
> > - appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i]));
> > - if (i < lrel.natts - 1)
> > + appendStringInfoString(&cmd, quote_identifier(strVal(l)));
> > + if (i < attnamelist->length - 1)
> >   appendStringInfoString(&cmd, ", ");
> > + i++;
> >   }
> >
> > 4a.
> > I think the purpose of the new macros is to avoid using ListCell, and
> > also 'l' is an unhelpful variable name. Shouldn't this code be more
> > like:
> > foreach_node(String, att_name, attnamelist)
> >
> > ~
> >
> > 4b.
> > The code can be far simpler if you just put the comma (", ") always
> > up-front except the *first* iteration, so you can avoid checking the
> > list length every time. For example:
> >
> > if (i++)
> >   appendStringInfoString(&cmd, ", ");
> Fixed
>
> > ======
> > src/test/subscription/t/011_generated.pl
> >
> > 5. General.
> >
> > Hmm. This patch 0002 included many formatting changes to tables tab2
> > and tab3 and subscriptions sub2 and sub3 but they do not belong here.
> > The proper formatting for those needs to be done back in patch 0001
> > where they were introduced. Patch 0002 should just concentrate only on
> > the new stuff for patch 0002.
> Fixed
>
> > ~
> >
> > 6. CREATE TABLES would be better in pairs
> >
> > IMO it will be better if the matching CREATE TABLE for pub and sub are
> > kept together, instead of separating them by doing all pub then all
> > sub. I previously made the same comment for patch 0001, so maybe it
> > will be addressed next time...
> Fixed
>
> > ~
> >
> > 7. SELECT *
> >
> > +$result =
> > +  $node_subscriber->safe_psql('postgres', "SELECT * FROM tab4 ORDER BY a");
> >
> > It will be prudent to do explicit "SELECT a,b,c" instead of "SELECT
> > *", for readability and so there are no surprises.
> Fixed
>
> > ======
> >
> > 99.
> > Please also refer to my attached nitpicks diff for numerous cosmetic
> > changes, and apply if you agree.
> Applied the changes.
>
> > ======
> > [1] https://www.postgresql.org/message-id/CAHut%2BPtAsEc3PEB1KUk1kFF5tcCrDCCTcbboougO29vP1B4E2Q%40mail.gmail.com
>
> I have attached a v10 patch to address the comments:
> v10-0001 - Not Modified
> v10-0002 - Support replication of generated columns during initial sync.
> v10-0003 - Fix behaviour for Virtual Generated Columns.
>
> Thanks and Regards,
> Shlok Kyal

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Sun, Jun 23, 2024 at 10:28 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Hi Shubham,
>
> Thanks for sharing new patch! You shared as v9, but it should be v10, right?
> Also, since there are no commitfest entry, I registered [1]. You can rename the
> title based on the needs. Currently CFbot said OK.
>
> Anyway, below are my comments.
>
> 01. General
> Your patch contains unnecessary changes. Please remove all of them. E.g.,
>
> ```
>                                                  " s.subpublications,\n");
> -
> ```
> And
> ```
>                 appendPQExpBufferStr(query, " o.remote_lsn AS suboriginremotelsn,\n"
> -                                                        " s.subenabled,\n");
> +                                                       " s.subenabled,\n");
> ```
>
> 02. General
> Again, please run the pgindent/pgperltidy.
>
> 03. test_decoding
> Previously I suggested to the default value of to be include_generated_columns
> should be true, so you modified like that. However, Peter suggested opposite
> opinion [3] and you just revised accordingly. I think either way might be okay, but
> at least you must clarify the reason why you preferred to set default to false and
> changed accordingly.

I have set the default value as true in case of test_decoding. The
reason for this is even before the new feature implementation,
generated columns were getting selected.

> 04. decoding_into_rel.sql
> According to the comment atop this file, this test should insert result to a table.
> But added case does not - we should put them at another place. I.e., create another
> file.
>
> 05. decoding_into_rel.sql
> ```
> +-- when 'include-generated-columns' is not set
> ```
> Can you clarify the expected behavior as a comment?
>
> 06. getSubscriptions
> ```
> +       else
> +               appendPQExpBufferStr(query,
> +                                               " false AS subincludegencols,\n");
> ```
> I think the comma is not needed.
> Also, this error meant that you did not test to use pg_dump for instances prior PG16.
> Please verify whether we can dump subscriptions and restore them accordingly.
>
> [1]: https://commitfest.postgresql.org/48/5068/
> [2]:
https://www.postgresql.org/message-id/OSBPR01MB25529997E012DEABA8E15A02F5E52%40OSBPR01MB2552.jpnprd01.prod.outlook.com
> [3]: https://www.postgresql.org/message-id/CAHut%2BPujrRQ63ju8P41tBkdjkQb4X9uEdLK_Wkauxum1MVUdfA%40mail.gmail.com

All the comments are handled.

The attached Patches contains all the suggested changes.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Mon, Jun 24, 2024 at 8:21 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, here are some patch v9-0001 comments.
>
> I saw Kuroda-san has already posted comments for this patch so there
> may be some duplication here.
>
> ======
> GENERAL
>
> 1.
> The later patches 0002 etc are checking to support only STORED
> gencols. But, doesn't that restriction belong in this patch 0001 so
> VIRTUAL columns are not decoded etc in the first place... (??)
>
> ~~~
>
> 2.
> The "Generated Columns" docs mentioned in my previous review comment
> [2] should be modified by this 0001 patch.
>
> ~~~
>
> 3.
> I think the "Message Format" page mentioned in my previous review
> comment [3] should be modified by this 0001 patch.
>
> ======
> Commit message
>
> 4.
> The patch name is still broken as previously mentioned [1, #1]
>
> ======
> doc/src/sgml/protocol.sgml
>
> 5.
> Should this docs be referring to STORED generated columns, instead of
> just generated columns?
>
> ======
> doc/src/sgml/ref/create_subscription.sgml
>
> 6.
> Should this be docs referring to STORED generated columns, instead of
> just generated columns?
>
> ======
> src/bin/pg_dump/pg_dump.c
>
> getSubscriptions:
> NITPICK - tabs
> NITPICK - patch removed a blank line it should not be touching
> NITPICK = patch altered indents it should not be touching
> NITPICK - a missing blank line that was previously present
>
> 7.
> + else
> + appendPQExpBufferStr(query,
> + " false AS subincludegencols,\n");
>
> There is an unwanted comma here.
>
> ~
>
> dumpSubscription
> NITPICK - patch altered indents it should not be touching
>
> ======
> src/bin/pg_dump/pg_dump.h
>
> NITPICK - unnecessary blank line
>
> ======
> src/bin/psql/describe.c
>
> describeSubscriptions
> NITPICK - bad indentation
>
> 8.
> In my previous review [1, #4b] I suggested this new column should be
> in a different order (e.g. adjacent to the other ones ahead of
> 'Conninfo'), but this is not yet addressed.
>
> ======
> src/test/subscription/t/011_generated.pl
>
> NITPICK - missing space in comment
> NITPICK - misleading "because" wording in the comment
>
> ======
>
> 99.
> See also my attached nitpicks diff, for cosmetic issues. Please apply
> whatever you agree with.
>
> ======
> [1] My v8-0001 review -
> https://www.postgresql.org/message-id/CAHut%2BPujrRQ63ju8P41tBkdjkQb4X9uEdLK_Wkauxum1MVUdfA%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/CAHut%2BPvsRWq9t2tEErt5ZWZCVpNFVZjfZ_owqfdjOhh4yXb_3Q%40mail.gmail.com
> [3] https://www.postgresql.org/message-id/CAHut%2BPsHsT3V1wQ5uoH9ynbmWn4ZQqOe34X%2Bg37LSi7sgE_i2g%40mail.gmail.com

All the comments are handled.

I have attached the updated patch v11 here in [1]. See [1] for the
changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjJpS_XDkR6OrsmMZtCBZNxeYoCdENhC0%3Dbe0rLmNvhiQw%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
>All the comments are handled.
>
> The attached Patches contain all the suggested changes.

v11-0003 patch was not getting applied, so here are the updated
patches for the same.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, here are some patch v11-0001 comments.

(BTW, I had difficulty reviewing this because something seemed strange
with the changes this patch made to the test_decoding tests).

======
General

1. Patch name

Patch name does not need to quote 'logical replication'

~

2. test_decoding tests

Multiple test_decoding tests were failing for me. There is something
very suspicious about the unexplained changes the patch made to the
expected "binary.out" and "decoding_into_rel.out" etc. I REVERTED all
those changes in my nitpicks top-up to get everything working. Please
re-confirm that all the test_decoding tests are OK!

======
Commit Message

3.
Since you are including the example usage for test_decoding, I think
it's better to include the example usage of CREATE SUBSCRIPTION also.

======
contrib/test_decoding/expected/binary.out

4.
 SELECT 'init' FROM
pg_create_logical_replication_slot('regression_slot',
'test_decoding');
- ?column?
-----------
- init
-(1 row)
-
+ERROR:  replication slot "regression_slot" already exists

Huh? Why is this unrelated expected output changed by this patch?

The test_decoding test fails for me unless I REVERT this change!! See
my nitpicks diff.

======
.../expected/decoding_into_rel.out

5.
-SELECT 'stop' FROM pg_drop_replication_slot('regression_slot');
- ?column?
-----------
- stop
-(1 row)
-

Huh? Why is this unrelated expected output changed by this patch?

The test_decoding test fails for me unless I REVERT this change!! See
my nitpicks diff.

======
.../test_decoding/sql/decoding_into_rel.sql

6.
-SELECT 'stop' FROM pg_drop_replication_slot('regression_slot');
+SELECT 'stop' FROM pg_drop_replication_slot('regression_slot');

Huh, Why does this patch change this code at all? I REVERTED this
change. See my nitpicks diff.

======
.../test_decoding/sql/generated_columns.sql

(see my nitpicks replacement file for this test)

7.
+-- test that we can insert the result of a 'include_generated_columns'
+-- into the tables created. That's really not a good idea in practical terms,
+-- but provides a nice test.

NITPICK - I didn't understand the point of this comment.  I updated
the comment according to my understanding.

~

NITPICK - The comment "when 'include-generated-columns' is not set
then columns will not be replicated" is the opposite of what the
result is. I changed this comment.

NITPICK - modified and unified wording of some of the other comments

NITPICK - changed some blank lines

======
contrib/test_decoding/test_decoding.c

8.
+ else if (strcmp(elem->defname, "include-generated-columns") == 0)
+ {
+ if (elem->arg == NULL)
+ data->include_generated_columns = true;

Is there any way to test that "elem->arg == NULL" in the
generated.sql? OTOH, if it is not possible to get here then is the
code even needed?

======
doc/src/sgml/ddl.sgml

9.
      <para>
-      Generated columns are skipped for logical replication and cannot be
-      specified in a <command>CREATE PUBLICATION</command> column list.
+      'include_generated_columns' option controls whether generated columns
+      should be included in the string representation of tuples during
+      logical decoding in PostgreSQL. The default is <literal>true</literal>.
      </para>

NITPICK - Use proper markdown instead of single quotes for the parameter.

NITPICK - I think this can be reworded slightly to provide a
cross-reference to the CREATE SUBSCRIPTION parameter for more details
(which means then we can avoid repeating details like the default
value here). PSA my nitpicks diff for an example of how I thought docs
should look.

======
doc/src/sgml/protocol.sgml

10.
+        The default is true.

No, it isn't. AFAIK you made the default behaviour true only for
'test_decoding', but the default for CREATE SUBSCRIPTION remains
*false* because that is the existing PG17 behaviour. And the default
for the 'include_generated_columns' in the protocol is *also* false to
match the CREATE SUBSCRIPTION default.

e.g. libpqwalreceiver.c only sets ", include_generated_columns 'true'"
when options->proto.logical.include_generated_columns
e.g. worker.c says: options->proto.logical.include_generated_columns =
MySubscription->includegencols;
e.g. subscriptioncmds.c sets default: opts->include_generated_columns = false;

(This confirmed my previous review expectation that using different
default behaviours for test_decoding and pgoutput would surely lead to
confusion)

~~~

11.
-     <para>
-      Next, the following message part appears for each column included in
-      the publication (except generated columns):
-     </para>
-

AFAIK you cannot just remove this entire paragraph because I thought
it was still relevant to talking about "... the following message
part". But, if you don't want to explain and cross-reference about
'include_generated_columns' then maybe it is OK just to remove the
"(except generated columns)" part?

======
src/test/subscription/t/011_generated.pl

NITPICK - comment typo /tab2/tab3/
NITPICK - remove some blank lines

~~~

12.
# the column was NOT replicated (the result value of 'b' is the
subscriber-side computed value)

NITPICK - I think this comment is wrong for the tab2 test because here
col 'b' IS replicated. I have added much more substantial test case
comments in the attached nitpicks diff. PSA.

======
src/test/subscription/t/031_column_list.pl

13.
NITPICK - IMO there is a missing word "list" in the comment. This bug
existed already on HEAD but since this patch is modifying this comment
I think we can also fix this in passing.

======
Kind Regards,
Peter Smith.
Fujitsu Australia.

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Thu, Jun 27, 2024 at 2:41 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, here are some patch v11-0001 comments.
>
> (BTW, I had difficulty reviewing this because something seemed strange
> with the changes this patch made to the test_decoding tests).
>
> ======
> General
>
> 1. Patch name
>
> Patch name does not need to quote 'logical replication'
>
> ~
>
> 2. test_decoding tests
>
> Multiple test_decoding tests were failing for me. There is something
> very suspicious about the unexplained changes the patch made to the
> expected "binary.out" and "decoding_into_rel.out" etc. I REVERTED all
> those changes in my nitpicks top-up to get everything working. Please
> re-confirm that all the test_decoding tests are OK!
>
> ======
> Commit Message
>
> 3.
> Since you are including the example usage for test_decoding, I think
> it's better to include the example usage of CREATE SUBSCRIPTION also.
>
> ======
> contrib/test_decoding/expected/binary.out
>
> 4.
>  SELECT 'init' FROM
> pg_create_logical_replication_slot('regression_slot',
> 'test_decoding');
> - ?column?
> -----------
> - init
> -(1 row)
> -
> +ERROR:  replication slot "regression_slot" already exists
>
> Huh? Why is this unrelated expected output changed by this patch?
>
> The test_decoding test fails for me unless I REVERT this change!! See
> my nitpicks diff.
>
> ======
> .../expected/decoding_into_rel.out
>
> 5.
> -SELECT 'stop' FROM pg_drop_replication_slot('regression_slot');
> - ?column?
> -----------
> - stop
> -(1 row)
> -
>
> Huh? Why is this unrelated expected output changed by this patch?
>
> The test_decoding test fails for me unless I REVERT this change!! See
> my nitpicks diff.
>
> ======
> .../test_decoding/sql/decoding_into_rel.sql
>
> 6.
> -SELECT 'stop' FROM pg_drop_replication_slot('regression_slot');
> +SELECT 'stop' FROM pg_drop_replication_slot('regression_slot');
>
> Huh, Why does this patch change this code at all? I REVERTED this
> change. See my nitpicks diff.
>
> ======
> .../test_decoding/sql/generated_columns.sql
>
> (see my nitpicks replacement file for this test)
>
> 7.
> +-- test that we can insert the result of a 'include_generated_columns'
> +-- into the tables created. That's really not a good idea in practical terms,
> +-- but provides a nice test.
>
> NITPICK - I didn't understand the point of this comment.  I updated
> the comment according to my understanding.
>
> ~
>
> NITPICK - The comment "when 'include-generated-columns' is not set
> then columns will not be replicated" is the opposite of what the
> result is. I changed this comment.
>
> NITPICK - modified and unified wording of some of the other comments
>
> NITPICK - changed some blank lines
>
> ======
> contrib/test_decoding/test_decoding.c
>
> 8.
> + else if (strcmp(elem->defname, "include-generated-columns") == 0)
> + {
> + if (elem->arg == NULL)
> + data->include_generated_columns = true;
>
> Is there any way to test that "elem->arg == NULL" in the
> generated.sql? OTOH, if it is not possible to get here then is the
> code even needed?
>

Currently I could not find a case where the
'include_generated_columns' option is not specifying any value, but  I
was hesitant to remove this from here as the other options mentioned
follow the same rules. Thoughts?

> ======
> doc/src/sgml/ddl.sgml
>
> 9.
>       <para>
> -      Generated columns are skipped for logical replication and cannot be
> -      specified in a <command>CREATE PUBLICATION</command> column list.
> +      'include_generated_columns' option controls whether generated columns
> +      should be included in the string representation of tuples during
> +      logical decoding in PostgreSQL. The default is <literal>true</literal>.
>       </para>
>
> NITPICK - Use proper markdown instead of single quotes for the parameter.
>
> NITPICK - I think this can be reworded slightly to provide a
> cross-reference to the CREATE SUBSCRIPTION parameter for more details
> (which means then we can avoid repeating details like the default
> value here). PSA my nitpicks diff for an example of how I thought docs
> should look.
>
> ======
> doc/src/sgml/protocol.sgml
>
> 10.
> +        The default is true.
>
> No, it isn't. AFAIK you made the default behaviour true only for
> 'test_decoding', but the default for CREATE SUBSCRIPTION remains
> *false* because that is the existing PG17 behaviour. And the default
> for the 'include_generated_columns' in the protocol is *also* false to
> match the CREATE SUBSCRIPTION default.
>
> e.g. libpqwalreceiver.c only sets ", include_generated_columns 'true'"
> when options->proto.logical.include_generated_columns
> e.g. worker.c says: options->proto.logical.include_generated_columns =
> MySubscription->includegencols;
> e.g. subscriptioncmds.c sets default: opts->include_generated_columns = false;
>
> (This confirmed my previous review expectation that using different
> default behaviours for test_decoding and pgoutput would surely lead to
> confusion)
>
> ~~~
>
> 11.
> -     <para>
> -      Next, the following message part appears for each column included in
> -      the publication (except generated columns):
> -     </para>
> -
>
> AFAIK you cannot just remove this entire paragraph because I thought
> it was still relevant to talking about "... the following message
> part". But, if you don't want to explain and cross-reference about
> 'include_generated_columns' then maybe it is OK just to remove the
> "(except generated columns)" part?
>
> ======
> src/test/subscription/t/011_generated.pl
>
> NITPICK - comment typo /tab2/tab3/
> NITPICK - remove some blank lines
>
> ~~~
>
> 12.
> # the column was NOT replicated (the result value of 'b' is the
> subscriber-side computed value)
>
> NITPICK - I think this comment is wrong for the tab2 test because here
> col 'b' IS replicated. I have added much more substantial test case
> comments in the attached nitpicks diff. PSA.
>
> ======
> src/test/subscription/t/031_column_list.pl
>
> 13.
> NITPICK - IMO there is a missing word "list" in the comment. This bug
> existed already on HEAD but since this patch is modifying this comment
> I think we can also fix this in passing.

All the comments are handled.

The attached Patches contain all the suggested changes.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Mon, Jul 1, 2024 at 8:38 PM Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
>...
> > 8.
> > + else if (strcmp(elem->defname, "include-generated-columns") == 0)
> > + {
> > + if (elem->arg == NULL)
> > + data->include_generated_columns = true;
> >
> > Is there any way to test that "elem->arg == NULL" in the
> > generated.sql? OTOH, if it is not possible to get here then is the
> > code even needed?
> >
>
> Currently I could not find a case where the
> 'include_generated_columns' option is not specifying any value, but  I
> was hesitant to remove this from here as the other options mentioned
> follow the same rules. Thoughts?
>

If you do manage to find a scenario for this then I think a test for
it would be good. But, I agree that the code seems OK because now I
see it is the same pattern as similar nearby code.

~~~

Thanks for the updated patch. Here are some review comments for patch v13-0001.

======
.../expected/generated_columns.out

nitpicks (see generated_columns.sql)

======
.../test_decoding/sql/generated_columns.sql

nitpick - use plural /column/columns/
nitpick - use consistent wording in the comments
nitpick - IMO it is better to INSERT different values for each of the tests

======
doc/src/sgml/protocol.sgml

nitpick - I noticed that none of the other boolean parameters on this
page mention about a default, so maybe here we should do the same and
omit that information.

~~~

1.
-     <para>
-      Next, the following message part appears for each column included in
-      the publication (except generated columns):
-     </para>
-

In a previous review [1 comment #11] I wrote that you can't just
remove this paragraph because AFAIK it is still meaningful. A minimal
change might be to just remove the "(except generated columns)" part.
Alternatively, you could give a more detailed explanation mentioning
the include_generated_columns protocol parameter.

I provided some updated text for this paragraph in my NITPICKS top-up
patch, Please have a look at that for ideas.

======
src/backend/commands/subscriptioncmds.c

It looks like pg_indent needs to be run on this file.

======
src/include/catalog/pg_subscription.h

nitpick - comment /publish/Publish/ for consistency

======
src/include/replication/walreceiver.h

nitpick - comment /publish/Publish/ for consistency

======
src/test/regress/expected/subscription.out

nitpicks - (see subscription.sql)

======
src/test/regress/sql/subscription.sql

nitpick - combine the invalid option combinations test with all the
others (no special comment needed)
nitpick - rename subscription as 'regress_testsub2' same as all its peers.

======
src/test/subscription/t/011_generated.pl

nitpick - add/remove blank lines

======
src/test/subscription/t/031_column_list.pl

nitpick - rewording for a comment. This issue was not strictly caused
by this patch, but since you are modifying the same comment we can fix
this in passing.

======
99.
Please also see the attached top-up patch for all those nitpicks
identified above.

======
[1] v11-0001 review
https://www.postgresql.org/message-id/CAHut%2BPv45gB4cV%2BSSs6730Kb8urQyqjdZ9PBVgmpwqCycr1Ybg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham,

As you can see, most of my recent review comments for patch 0001 are
only cosmetic nitpicks. But, there is still one long-unanswered design
question from a month ago [1, #G.2]

A lot of the patch code of pgoutput.c and proto.c and logicalproto.h
is related to the introduction and passing everywhere of new
'include_generated_columns' function parameters. These same functions
are also always passing "BitMapSet *columns" representing the
publication column list.

My question was about whether we can't make use of the existing BMS
parameter instead of introducing all the new API parameters.

The idea might go something like this:

* If 'include_generated_columns' option is specified true and if no
column list was already specified then perhaps the relentry->columns
can be used for a "dummy" column list that has everything including
all the generated columns.

* By doing this:
 -- you may be able to avoid passing the extra
'include_gernated_columns' everywhere
 -- you may be able to avoid checking for generated columns deeper in
the code (since it is already checked up-front when building the
column list BMS)

~~

I'm not saying this design idea is guaranteed to work, but it might be
worth considering, because if it does work then there is potential to
make the current 0001 patch significantly shorter.

======
[1] https://www.postgresql.org/message-id/CAHut%2BPsuJfcaeg6zst%3D6PE5uyJv_UxVRHU3ck7W2aHb1uQYKng%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Tue, 25 Jun 2024 at 11:56, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are some review comments for the patch v10-0002.
>
> ======
> Commit Message
>
> 1.
> Note that we don't copy columns when the subscriber-side column is also
> generated. Those will be filled as normal with the subscriber-side computed or
> default data.
>
> ~
>
> Now this patch also introduced some errors etc, so I think that patch
> comment should be written differently to explicitly spell out
> behaviour of every combination, something like the below:
>
> Summary
>
> when (include_generated_column = true)
>
> * publisher not-generated column => subscriber not-generated column:
> This is just normal logical replication (not changed by this patch).
>
> * publisher not-generated column => subscriber generated column: This
> will give ERROR.
>
> * publisher generated column => subscriber not-generated column: The
> publisher generated column value is copied.
>
> * publisher generated column => subscriber generated column: The
> publisher generated column value is not copied. The subscriber
> generated column will be filled with the subscriber-side computed or
> default data.
>
> when (include_generated_columns = false)
>
> * publisher not-generated column => subscriber not-generated column:
> This is just normal logical replication (not changed by this patch).
>
> * publisher not-generated column => subscriber generated column: This
> will give ERROR.
>
> * publisher generated column => subscriber not-generated column: This
> will replicate nothing. Publisher generate-column is not replicated.
> The subscriber column will be filled with the subscriber-side default
> data.
>
> * publisher generated column => subscriber generated column:  This
> will replicate nothing. Publisher generate-column is not replicated.
> The subscriber generated column will be filled with the
> subscriber-side computed or default data.
Modified

> ======
> src/backend/replication/logical/relation.c
>
> 2.
> logicalrep_rel_open:
>
> I tested some of the "missing column" logic, and got the following results:
>
> Scenario A:
> PUB
> test_pub=# create table t2(a int, b int);
> test_pub=# create publication pub2 for table t2;
> SUB
> test_sub=# create table t2(a int, b int generated always as (a*2) stored);
> test_sub=# create subscription sub2 connection 'dbname=test_pub'
> publication pub2 with (include_generated_columns = false);
> Result:
> ERROR:  logical replication target relation "public.t2" is missing
> replicated column: "b"
>
> ~
>
> Scenario B:
> PUB/SUB identical to above, but subscription sub2 created "with
> (include_generated_columns = true);"
> Result:
> ERROR:  logical replication target relation "public.t2" has a
> generated column "b" but corresponding column on source relation is
> not a generated column
>
> ~~~
>
> 2a. Question
>
> Why should we get 2 different error messages for what is essentially
> the same problem according to whether the 'include_generated_columns'
> is false or true? Isn't the 2nd error message the more correct and
> useful one for scenarios like this involving generated columns?
>
> Thoughts?
Did the modification to give same error in both cases

> ~
>
> 2b. Missing tests?
>
> I also noticed there seems no TAP test for the current "missing
> replicated column" message. IMO there should be a new test introduced
> for this because the loop involved too much bms logic to go
> untested...
Added the tests 004_sync.pl

> ======
> src/backend/replication/logical/tablesync.c
>
> make_copy_attnamelist:
> NITPICK - minor comment tweak
> NITPICK - add some spaces after "if" code
Applied the changes

> 3.
> Should you pfree the gencollist at the bottom of this function when
> you no longer need it, for tidiness?
Fixed

> ~~~
>
> 4.
>  static void
> -fetch_remote_table_info(char *nspname, char *relname,
> +fetch_remote_table_info(char *nspname, char *relname, bool **remotegenlist,
>   LogicalRepRelation *lrel, List **qual)
>  {
>   WalRcvExecResult *res;
>   StringInfoData cmd;
>   TupleTableSlot *slot;
>   Oid tableRow[] = {OIDOID, CHAROID, CHAROID};
> - Oid attrRow[] = {INT2OID, TEXTOID, OIDOID, BOOLOID};
> + Oid attrRow[] = {INT2OID, TEXTOID, OIDOID, BOOLOID, BOOLOID};
>   Oid qualRow[] = {TEXTOID};
>   bool isnull;
> + bool    *remotegenlist_res;
>
> IMO the names 'remotegenlist' and 'remotegenlist_res' should be
> swapped the other way around, because it is the function parameter
> that is the "result", whereas the 'remotegenlist_res' is just the
> local working var for it.
Fixed

> ~~~
>
> 5. fetch_remote_table_info
>
> Now walrcv_server_version(LogRepWorkerWalRcvConn) is used in multiple
> places, I think it will be better to assign this to a 'server_version'
> variable to be used everywhere instead of having multiple function
> calls.
Fixed

> ~~~
>
> 6.
>   "SELECT a.attnum,"
>   "       a.attname,"
>   "       a.atttypid,"
> - "       a.attnum = ANY(i.indkey)"
> + "       a.attnum = ANY(i.indkey),"
> + " a.attgenerated != ''"
>   "  FROM pg_catalog.pg_attribute a"
>   "  LEFT JOIN pg_catalog.pg_index i"
>   "       ON (i.indexrelid = pg_get_replica_identity_index(%u))"
>   " WHERE a.attnum > 0::pg_catalog.int2"
> - "   AND NOT a.attisdropped %s"
> + "   AND NOT a.attisdropped", lrel->remoteid);
> +
> + if ((walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 &&
> + walrcv_server_version(LogRepWorkerWalRcvConn) <= 160000) ||
> + !MySubscription->includegencols)
> + appendStringInfo(&cmd, " AND a.attgenerated = ''");
> +
>
> If the server version is < PG12 then AFAIK there was no such thing as
> "a.attgenerated", so shouldn't that SELECT " a.attgenerated != ''"
> part also be guarded by some version checking condition like in the
> WHERE? Otherwise won't it cause an ERROR for old servers?
Fixed

> ~~~
>
> 7.
>   /*
> - * For non-tables and tables with row filters, we need to do COPY
> - * (SELECT ...), but we can't just do SELECT * because we need to not
> - * copy generated columns. For tables with any row filters, build a
> - * SELECT query with OR'ed row filters for COPY.
> + * For non-tables and tables with row filters and when
> + * 'include_generated_columns' is specified as 'true', we need to do
> + * COPY (SELECT ...), as normal COPY of generated column is not
> + * supported. For tables with any row filters, build a SELECT query
> + * with OR'ed row filters for COPY.
>   */
>
> NITPICK. I felt this was not quite right. AFAIK the reasons for using
> this COPY (SELECT ...) syntax is different for row-filters and
> generated-columns. Anyway, I updated the comment slightly in my
> nitpicks attachment. Please have a look at it to see if you agree with
> the suggestions. Maybe I am wrong.
Fixed

> ~~~
>
> 8.
> - for (int i = 0; i < lrel.natts; i++)
> + foreach_ptr(String, att_name, attnamelist)
>
> I'm not 100% sure, but isn't foreach_node the macro to use here,
> rather than foreach_ptr?
Fixed

> ======
> src/test/subscription/t/011_generated.pl
>
> 9.
> Please discuss with Shubham how to make all the tab1, tab2, tab3,
> tab4, tab5, tab6 comments use the same kind of style/wording.
> Currently, the patches 0001 and 0002 test comments are a bit
> inconsistent.
Fixed

> ~~~
>
> 10.
> Related to above -- now that patch 0002 supports copy_data=true I
> don't see why we need to test generated columns *both* for
> copy_data=false and also for copy_data=true. IOW, is it really
> necessary to have so many tables/tests? For example, I am thinking
> some of those tests from patch 0001 can be re-used or just removed now
> that copy_data=true works.
Fixed

> ~~~
>
> NITPICK - minor comment tweak
Fixed

> ~~~
>
> 11.
> For tab4 and tab6 I saw the initial sync and normal replication data
> tests are all merged together, but I had expected to see the initial
> sync and normal replication data tests separated so it would be
> consistent with the earlier tab1, tab2, tab3 tests.
Fixed

> ======
>
> 99.
> Also, I have attached a nitpicks diff for some of the cosmetic review
> comments mentioned above. Please apply whatever of these that you
> agree with.
Applied the relevant changes

I have attached a v14 to fix the comments.

Thanks and Regards,
Shlok Kyal

Attachment

Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Tue, 25 Jun 2024 at 18:49, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Shlok,
>
> Thanks for updating patches! Below are my comments, maybe only for 0002.
>
> 01. General
>
> IIUC, we are not discussed why ALTER SUBSCRIPTION ... SET include_generated_columns
> is prohibit. Previously, it seems okay because there are exclusive options. But now,
> such restrictions are gone. Do you have a reason in your mind? It is just not considered
> yet?
We donot support ALTER SUBSCRIPTION to alter
'include_generated_columns'. Suppose initially the user has a logical
replication setup. Publisher has
table t1 with columns (c1 int, c2 int generated always as (c1*2)) and
subscriber has table t1 with columns (c1 int, c2 int). And initially
'incude_generated_column' is true.
Now if we 'ALTER SUBSCRIPTION' to set 'include_generated_columns' as
false. Initial rows will have data for c2 on the subscriber table, but
will not have value after alter. This may be an inconsistent
behaviour.


> 02. General
>
> According to the doc, we allow to alter a column to non-generated one, by ALTER
> TABLE ... ALTER COLUMN ... DROP EXPRESSION command. Not sure, what should be
> when the command is executed on the subscriber while copying the data? Should
> we continue the copy or restart? How do you think?
COPY of data will happen in a single transaction, so if we execute
'ALTER TABLE ... ALTER COLUMN ... DROP EXPRESSION' command, It will
take place after the whole COPY command will finish. So I think it
will not create any issue.

> 03. Tes tcode
>
> IIUC, REFRESH PUBLICATION can also lead the table synchronization. Can you add
> a test for that?
Added

> 04. Test code (maybe for 0001)
>
> Please test the combination with TABLE ... ALTER COLUMN ... DROP EXPRESSION command.
Added

> 05. logicalrep_rel_open
>
> ```
> +            /*
> +             * In case 'include_generated_columns' is 'false', we should skip the
> +             * check of missing attrs for generated columns.
> +             * In case 'include_generated_columns' is 'true', we should check if
> +             * corresponding column for the generated column in publication column
> +             * list is present in the subscription table.
> +             */
> +            if (!MySubscription->includegencols && attr->attgenerated)
> +            {
> +                entry->attrmap->attnums[i] = -1;
> +                continue;
> +            }
> ```
>
> This comment is not very clear to me, because here we do not skip anything.
> Can you clarify the reason why attnums[i] is set to -1 and how will it be used?
This part of the code is removed to address some comments.

> 06. make_copy_attnamelist
>
> ```
> +    gencollist = palloc0(MaxTupleAttributeNumber * sizeof(bool));
> ```
>
> I think this array is too large. Can we reduce a size to (desc->natts * sizeof(bool))?
> Also, the free'ing should be done.
I have changed the name 'gencollist' to 'localgenlist' to make the
name more consistent. Also
size should be (rel->remoterel.natts * sizeof(bool)) as I am storing
if a column is generated like 'localgenlist[attnum] = true;'
where 'attnum' is corresponding attribute number on publisher side.

> 07. make_copy_attnamelist
>
> ```
> +    /* Loop to handle subscription table generated columns. */
> +    for (int i = 0; i < desc->natts; i++)
> ```
>
> IIUC, the loop is needed to find generated columns on the subscriber side, right?
> Can you clarify as comment?
Fixed

> 08. copy_table
>
> ```
> +    /*
> +     * Regular table with no row filter and 'include_generated_columns'
> +     * specified as 'false' during creation of subscription.
> +     */
> ```
>
> I think this comment is not correct. After patching, all tablesync command becomes
> like COPY (SELECT ...) if include_genereted_columns is set to true. Is it right?
> Can we restrict only when the table has generated ones?
Fixed

Please refer to v14 patch for the changes [1].


[1]: https://www.postgresql.org/message-id/CANhcyEW95M_usF1OJDudeejs0L0%2BYOEb%3DdXmt_4Hs-70%3DCXa-g%40mail.gmail.com

Thanks and Regards,
Shlok Kyal



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Wed, 26 Jun 2024 at 08:06, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shlok. Here are my review comments for patch v10-0003
>
> ======
> General.
>
> 1.
> The patch has lots of conditions like:
> if (att->attgenerated && (att->attgenerated !=
> ATTRIBUTE_GENERATED_STORED || !include_generated_columns))
>  continue;
>
> IMO these are hard to read. Although more verbose, please consider if
> all those (for the sake of readability) would be better re-written
> like below :
>
> if (att->generated)
> {
>   if (!include_generated_columns)
>     continue;
>
>   if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
>     continue;
> }
Fixed

> ======
> contrib/test_decoding/test_decoding.c
>
> tuple_to_stringinfo:
>
> NITPICK = refactored the code and comments a bit here to make it easier
> NITPICK - there is no need to mention "virtual". Instead, say we only
> support STORED
Fixed

> ======
> src/backend/catalog/pg_publication.c
>
> publication_translate_columns:
>
> NITPICK - introduced variable 'att' to simplify this code
Fixed

> ~
>
> 2.
> + ereport(ERROR,
> + errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> + errmsg("cannot use virtual generated column \"%s\" in publication
> column list",
> +    colname));
>
> Is it better to avoid referring to "virtual" at all? Instead, consider
> rearranging the wording to say something like "generated column \"%s\"
> is not STORED so cannot be used in a publication column list"
Fixed

> ~~~
>
> pg_get_publication_tables:
>
> NITPICK - split the condition code for readability
Fixed

> ======
> src/backend/replication/logical/relation.c
>
> 3. logicalrep_rel_open
>
> + if (attr->attgenerated && attr->attgenerated != ATTRIBUTE_GENERATED_STORED)
> + continue;
> +
>
> Isn't this missing some code to say "entry->attrmap->attnums[i] =
> -1;", same as all the other nearby code is doing?
Fixed

> ~~~
>
> 4.
> I felt all the "generated column" logic should be kept together, so
> this new condition should be combined with the other generated column
> condition, like:
>
> if (attr->attgenerated)
> {
>   /* comment... */
>   if (!MySubscription->includegencols)
>   {
>     entry->attrmap->attnums[i] = -1;
>     continue;
>   }
>
>   /* comment... */
>   if (attr->attgenerated != ATTRIBUTE_GENERATED_STORED)
>   {
>     entry->attrmap->attnums[i] = -1;
>     continue;
>   }
> }
Fixed

> ======
> src/backend/replication/logical/tablesync.c
>
> 5.
> + if (gencols_allowed)
> + {
> + /* Replication of generated cols is supported, but not VIRTUAL cols. */
> + appendStringInfo(&cmd, " AND a.attgenerated != 'v'");
> + }
>
> Is it better here to use the ATTRIBUTE_GENERATED_VIRTUAL macro instead
> of the hardwired 'v'? (Maybe add another TODO comment to revisit
> this).
>
> Alternatively, consider if it is more future-proof to rearrange so it
> just says what *is* supported instead of what *isn't* supported:
> e.g. "AND a.attgenerated IN ('', 's')"
I feel we should use ATTRIBUTE_GENERATED_VIRTUAL macro. Added a TODO.

> ======
> src/test/subscription/t/011_generated.pl
>
> NITPICK - some comments are missing the word "stored"
> NITPICK - sometimes "col" should be plural "cols"
> NITPICK = typo "GNERATED"
Add the relevant changes.

> ======
>
> 6.
> In a previous review [1, comment #3] I mentioned that there should be
> some docs updates on the "Logical Replication Message Formats" section
> 53.9. So, I expected patch 0001 would make some changes and then patch
> 0003 would have to update it again to say something about "STORED".
> But all that is missing from the v10* patches.
>
> ======
Will fix in upcoming version

>
> 99.
> See also my nitpicks diff which is a top-up patch addressing all the
> nitpick comments mentioned above. Please apply all of these that you
> agree with.
Applied Relevant changes

Please refer v14 patch for the changes [1].


[1]: https://www.postgresql.org/message-id/CANhcyEW95M_usF1OJDudeejs0L0%2BYOEb%3DdXmt_4Hs-70%3DCXa-g%40mail.gmail.com

Thanks and Regards,
Shlok Kyal



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Here are my review comments for v14-0002.

======
src/backend/replication/logical/tablesync.c

make_copy_attnamelist:

nitpick - remove excessive parentheses in palloc0 call.

nitpick - Code is fine AFAICT except it's not immediately obvious
localgenlist is indexed by the *remote* attribute number. So I renamed
'attrnum' variable in my nitpicks diff. OTOH, if you think no change
is necessary, that is OK to (in that case maybe add a comment).

~~~

1. fetch_remote_table_info

+ if ((server_version >= 120000 && server_version <= 160000) ||
+ !MySubscription->includegencols)
+ appendStringInfo(&cmd, " AND a.attgenerated = ''");

Should this say < 180000 instead of <= 160000?

~~~

copy_table:

nitpick - uppercase in comment

nitpick - missing space after "if"

~~~

2. copy_table

+ attnamelist = make_copy_attnamelist(relmapentry, remotegenlist);
+
  /* Start copy on the publisher. */
  initStringInfo(&cmd);

- /* Regular table with no row filter */
- if (lrel.relkind == RELKIND_RELATION && qual == NIL)
+ /* check if remote column list has generated columns */
+ if(MySubscription->includegencols)
+ {
+ for (int i = 0; i < relmapentry->remoterel.natts; i++)
+ {
+ if(remotegenlist[i])
+ {
+ remote_has_gencol = true;
+ break;
+ }
+ }
+ }
+

There is some subtle logic going on here:

For example, the comment here says "Check if the remote column list
has generated columns", and it then proceeds to iterate the remote
attributes checking the remotegenlist[i]. But the remotegenlist[] was
returned from a prior call to make_copy_attnamelist() and according to
the make_copy_attnamelist logic, it is NOT returning all remote
generated-cols in that list. Specifically, it is stripping some of
them -- "Do not include generated columns of the subscription table in
the [remotegenlist] column list.".

So, actually this loop seems to be only finding cases (setting
remote_has_gen = true) where the remote column is generated but the
match local column is *not* generated. Maybe this was the intended
logic all along but then certainly the comment should be improved to
describe it better.

~~~

3.
+ /*
+ * Regular table with no row filter and 'include_generated_columns'
+ * specified as 'false' during creation of subscription.
+ */
+ if (lrel.relkind == RELKIND_RELATION && qual == NIL && !remote_has_gencol)

nitpick - This comment also needs improving. For example, just because
remote_has_gencol is false, it does not follow that
'include_generated_columns' was specified as 'false' -- maybe the
parameter was 'true' but the table just had no generated columns
anyway... I've modified the comment already in my nitpicks diff, but
probably you can improve on that.

~

nitpick - "else" comment is modified slightly too. Please see the nitpicks diff.

~

4.
In hindsight, I felt your variable 'remote_has_gencol' was not
well-named because it is not for saying the remote table has a
generated column -- it is saying the remote table has a generated
column **that we have to copy**. So, rather it should be named
something like 'gencol_copy_needed' (but I didn't change this name in
the nitpick diffs...)

======
src/test/subscription/t/004_sync.pl

nitpick - changes to comment style to make the test case separations
much more obvious
nitpick - minor comment wording tweaks

5.
Here, you are confirming we get an ERROR when replicating from a
non-generated column to a generated column. But I think your patch
also added exactly that same test scenario in the 011_generated (as
the sub5 test). So, maybe this one here should be removed?

======
src/test/subscription/t/011_generated.pl

nitpick - comment wrapping at 80 chars
nitpick - add/remove blank lines for readability
nitpick - typo /subsriber/subscriber/
nitpick - prior to the ALTER test, tab6 is unsubscribed. So add
another test to verify its initial data
nitpick - sometimes the msg 'add a new table to existing publication'
is misplaced
nitpick - the tests for tab6 and tab5 were in opposite to the expected
order, so swapped them.

======
99.
Please see also the attached diff which implements all the nitpicks
described in this post.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Tue, Jul 2, 2024 at 9:53 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Mon, Jul 1, 2024 at 8:38 PM Shubham Khanna
> <khannashubham1197@gmail.com> wrote:
> >
> >...
> > > 8.
> > > + else if (strcmp(elem->defname, "include-generated-columns") == 0)
> > > + {
> > > + if (elem->arg == NULL)
> > > + data->include_generated_columns = true;
> > >
> > > Is there any way to test that "elem->arg == NULL" in the
> > > generated.sql? OTOH, if it is not possible to get here then is the
> > > code even needed?
> > >
> >
> > Currently I could not find a case where the
> > 'include_generated_columns' option is not specifying any value, but  I
> > was hesitant to remove this from here as the other options mentioned
> > follow the same rules. Thoughts?
> >
>
> If you do manage to find a scenario for this then I think a test for
> it would be good. But, I agree that the code seems OK because now I
> see it is the same pattern as similar nearby code.
>
> ~~~
>
> Thanks for the updated patch. Here are some review comments for patch v13-0001.
>
> ======
> .../expected/generated_columns.out
>
> nitpicks (see generated_columns.sql)
>
> ======
> .../test_decoding/sql/generated_columns.sql
>
> nitpick - use plural /column/columns/
> nitpick - use consistent wording in the comments
> nitpick - IMO it is better to INSERT different values for each of the tests
>
> ======
> doc/src/sgml/protocol.sgml
>
> nitpick - I noticed that none of the other boolean parameters on this
> page mention about a default, so maybe here we should do the same and
> omit that information.
>
> ~~~
>
> 1.
> -     <para>
> -      Next, the following message part appears for each column included in
> -      the publication (except generated columns):
> -     </para>
> -
>
> In a previous review [1 comment #11] I wrote that you can't just
> remove this paragraph because AFAIK it is still meaningful. A minimal
> change might be to just remove the "(except generated columns)" part.
> Alternatively, you could give a more detailed explanation mentioning
> the include_generated_columns protocol parameter.
>
> I provided some updated text for this paragraph in my NITPICKS top-up
> patch, Please have a look at that for ideas.
>
> ======
> src/backend/commands/subscriptioncmds.c
>
> It looks like pg_indent needs to be run on this file.
>
> ======
> src/include/catalog/pg_subscription.h
>
> nitpick - comment /publish/Publish/ for consistency
>
> ======
> src/include/replication/walreceiver.h
>
> nitpick - comment /publish/Publish/ for consistency
>
> ======
> src/test/regress/expected/subscription.out
>
> nitpicks - (see subscription.sql)
>
> ======
> src/test/regress/sql/subscription.sql
>
> nitpick - combine the invalid option combinations test with all the
> others (no special comment needed)
> nitpick - rename subscription as 'regress_testsub2' same as all its peers.
>
> ======
> src/test/subscription/t/011_generated.pl
>
> nitpick - add/remove blank lines
>
> ======
> src/test/subscription/t/031_column_list.pl
>
> nitpick - rewording for a comment. This issue was not strictly caused
> by this patch, but since you are modifying the same comment we can fix
> this in passing.
>
> ======
> 99.
> Please also see the attached top-up patch for all those nitpicks
> identified above.
>
> ======
> [1] v11-0001 review
> https://www.postgresql.org/message-id/CAHut%2BPv45gB4cV%2BSSs6730Kb8urQyqjdZ9PBVgmpwqCycr1Ybg%40mail.gmail.com

All the comments are handled.

The attached Patches contain all the suggested changes. Here, v15-0001
is modified to fix the comments, v15-0002 is not modified and v15-0003
is modified according to the changes in v15-0001 patch.
Thanks Shlok Kyal for modifying the v15-0003 Patch.


Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Tue, Jul 2, 2024 at 10:59 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shubham,
>
> As you can see, most of my recent review comments for patch 0001 are
> only cosmetic nitpicks. But, there is still one long-unanswered design
> question from a month ago [1, #G.2]
>
> A lot of the patch code of pgoutput.c and proto.c and logicalproto.h
> is related to the introduction and passing everywhere of new
> 'include_generated_columns' function parameters. These same functions
> are also always passing "BitMapSet *columns" representing the
> publication column list.
>
> My question was about whether we can't make use of the existing BMS
> parameter instead of introducing all the new API parameters.
>
> The idea might go something like this:
>
> * If 'include_generated_columns' option is specified true and if no
> column list was already specified then perhaps the relentry->columns
> can be used for a "dummy" column list that has everything including
> all the generated columns.
>
> * By doing this:
>  -- you may be able to avoid passing the extra
> 'include_gernated_columns' everywhere
>  -- you may be able to avoid checking for generated columns deeper in
> the code (since it is already checked up-front when building the
> column list BMS)
>
> ~~
>
> I'm not saying this design idea is guaranteed to work, but it might be
> worth considering, because if it does work then there is potential to
> make the current 0001 patch significantly shorter.
>
> ======
> [1] https://www.postgresql.org/message-id/CAHut%2BPsuJfcaeg6zst%3D6PE5uyJv_UxVRHU3ck7W2aHb1uQYKng%40mail.gmail.com

I have fixed this issue in the latest Patches.

Please refer to the updated v15 Patches here in [1]. See [1] for the
changes added.

[1] https://www.postgresql.org/message-id/CAHv8Rj%2B%3Dhn--ALJQvzzu7meX3LuO3tJKppDS7eO1BGvNFYBAbg%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Here are review comments for v15-0001

======
doc/src/sgml/ddl.sgml

nitpick - there was a comma (,) which should be a period (.)

======
.../libpqwalreceiver/libpqwalreceiver.c

1.
+ if (options->proto.logical.include_generated_columns &&
+ PQserverVersion(conn->streamConn) >= 170000)
+ appendStringInfoString(&cmd, ", include_generated_columns 'true'");
+

Should now say >= 180000

======
src/backend/replication/pgoutput/pgoutput.c

nitpick - comment wording for RelationSyncEntry.collist.

~~

2.
pgoutput_column_list_init:

I found the current logic to be quite confusing. I assume the code is
working OK, because AFAIK there are plenty of tests and they are all
passing, but the logic seems somewhat repetitive and there are also no
comments to explain it adding to my confusion.

IIUC, PRIOR TO THIS PATCH:

BMS field 'columns' represented the "columns of the column list" or it
was NULL if there was no publication column list (and it was also NULL
if the column list contained every column).

IIUC NOW, WITH THIS PATCH:

The BMS field 'columns' meaning is changed slightly to be something
like "columns to be replicated" or NULL if all columns are to be
replicated. This is almost the same thing except we are now handing
the generated columns up-front, so generated columns will or won't
appear in the BMS according to the "include_generated_columns"
parameter. See how this is all a bit subtle which is why copious new
comments are required to explain it...

So, although the test result evidence suggests this is working OK, I
have many questions/issues about it. Here are some to start with:

2a. It needs a lot more (summary and detailed) comments explaining the
logic now that the meaning is slightly different.

2b. What is the story with the FOR ALL TABLES case now? Previously,
there would always be NULL 'columns' for "FOR ALL TABLES" case -- the
comment still says so. But now you've tacked on a 2nd pass of
iterations to build the BMS outside of the "if (!pub->alltables)"
check. Is that OK?

2c. The following logic seemed unexpected:
- if (bms_num_members(cols) == nliveatts)
+ if (bms_num_members(cols) == nliveatts &&
+ data->include_generated_columns)
  {
  bms_free(cols);
  cols = NULL;
`
I had thought the above code would look different -- more like:
if (att->attgenerated && !data->include_generated_columns)
  continue;

nliveatts++;
...

2d. Was so much duplicated code necessary? It feels like the whole
"Get the number of live attributes." and assignment of cols to NULL
might be made common to both code paths.

2e. I'm beginning to question the pros/cons of the new BMS logic; I
had suggested trying this way (processing the generated columns
up-front in the BMS 'columns' list) to reduce patch code and simplify
all the subsequent API delegation of "include_generated_cloumns"
everywhere like it was in v14-0001. Indeed, that part was a success
and the patch is now smaller. But I don't like much that we've traded
reduced code overall for increased confusing code in that BMS
function. If all this BMS code can be refactored and commented to be
easier to understand then maybe all will be well, but if it can't then
maybe this BMS change was a bridge too far. I haven't given up on it
just yet, but I wonder what was your opinion about it, and do other
people have thoughts about whether this was the good direction to
take?

======
src/bin/pg_dump/pg_dump.c

3.
+ if (fout->remoteVersion >= 170000)
+ appendPQExpBufferStr(query,
+ " s.subincludegencols\n");
+ else
+ appendPQExpBufferStr(query,
+ " false AS subincludegencols\n");

Should now say >= 180000

======
src/bin/psql/describe.c

4.
+ /* include_generated_columns is only supported in v18 and higher */
+ if (pset.sversion >= 170000)
+ appendPQExpBuffer(&buf,
+   ", subincludegencols AS \"%s\"\n",
+   gettext_noop("Include generated columns"));
+

Should now say >= 180000

======
src/include/catalog/pg_subscription.h

nitpick - let's make the comment the same as in WalRcvStreamOptions

======
src/include/replication/logicalproto.h

nitpick - extern for logicalrep_write_update should be unchanged by this patch

======
src/test/regress/sql/subscription.sql

nitpick = the comment "include_generated_columns and copy_data = true
are mutually exclusive" is not necessary because this all falls under
the existing comment "fail - invalid option combinations"

nitpick - let's explicitly put "copy_data = true" in the CREATE
SUBSCRIPTION to make it more obvious

======
99. Please also refer to the attached 'diffs' patch which implements
all of my nitpicks issues mentioned above.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shlok, Here are some review comments for patch v15-0003.

======
src/backend/catalog/pg_publication.c

1. publication_translate_columns

The function comment says:
 * Translate a list of column names to an array of attribute numbers
 * and a Bitmapset with them; verify that each attribute is appropriate
 * to have in a publication column list (no system or generated attributes,
 * no duplicates).  Additional checks with replica identity are done later;
 * see pub_collist_contains_invalid_column.

That part about "[no] generated attributes" seems to have gone stale
-- e.g. not quite correct anymore. Should it say no VIRTUAL generated
attributes?

======
src/backend/replication/logical/proto.c

2. logicalrep_write_tuple and logicalrep_write_attrs

I thought all the code fragments like this:

+ if (att->attgenerated && att->attgenerated != ATTRIBUTE_GENERATED_STORED)
+ continue;
+

don't need to be in the code anymore, because of the BitMapSet (BMS)
processing done to make the "column list" for publication where
disallowed generated cols should already be excluded from the BMS,
right?

So shouldn't all these be detected by the following statement:
if (!column_in_column_list(att->attnum, columns))
  continue;

======
src/backend/replication/logical/tablesync.c
3.
+ if(server_version >= 120000)
+ {
+ bool gencols_allowed = server_version >= 170000 &&
MySubscription->includegencols;
+
+ if (gencols_allowed)
+ {

Should say server_version >= 180000, instead of 170000

======
src/backend/replication/pgoutput/pgoutput.c

4. send_relation_and_attrs

(this is a similar comment for #2 above)

IIUC of the advantages of the BitMapSet (BMS) idea in patch 0001 to
process the generated columns up-front means there is no need to check
them again in code like this.

They should be discovered anyway in the subsequent check:
/* Skip this attribute if it's not present in the column list */
if (columns != NULL && !bms_is_member(att->attnum, columns))
  continue;

======
src/test/subscription/t/011_generated.pl

5.
AFAICT there are still multiple comments (e.g. for the "TEST tab<n>"
comments) where it still says "generated" instead of "stored
generated". I did not make a "nitpicks" diff for these because those
comments are inherited from the prior patch 0002 which still has
outstanding review comments on it too. Please just search/replace
them.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Fri, 5 Jul 2024 at 13:47, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are my review comments for v14-0002.
>
> ======
> src/backend/replication/logical/tablesync.c
>
> 2. copy_table
>
> + attnamelist = make_copy_attnamelist(relmapentry, remotegenlist);
> +
>   /* Start copy on the publisher. */
>   initStringInfo(&cmd);
>
> - /* Regular table with no row filter */
> - if (lrel.relkind == RELKIND_RELATION && qual == NIL)
> + /* check if remote column list has generated columns */
> + if(MySubscription->includegencols)
> + {
> + for (int i = 0; i < relmapentry->remoterel.natts; i++)
> + {
> + if(remotegenlist[i])
> + {
> + remote_has_gencol = true;
> + break;
> + }
> + }
> + }
> +
>
> There is some subtle logic going on here:
>
> For example, the comment here says "Check if the remote column list
> has generated columns", and it then proceeds to iterate the remote
> attributes checking the remotegenlist[i]. But the remotegenlist[] was
> returned from a prior call to make_copy_attnamelist() and according to
> the make_copy_attnamelist logic, it is NOT returning all remote
> generated-cols in that list. Specifically, it is stripping some of
> them -- "Do not include generated columns of the subscription table in
> the [remotegenlist] column list.".
>
> So, actually this loop seems to be only finding cases (setting
> remote_has_gen = true) where the remote column is generated but the
> match local column is *not* generated. Maybe this was the intended
> logic all along but then certainly the comment should be improved to
> describe it better.

'remotegenlist' is actually constructed in function 'fetch_remote_table_info'
and it has an entry for every column in the column list specifying
whether a column is
generated or not.
In the function 'make_copy_attnamelist' we are not modifying the list.
So, I think the current comment would be sufficient. Thoughts?

> ======
> src/test/subscription/t/004_sync.pl
>
> nitpick - changes to comment style to make the test case separations
> much more obvious
> nitpick - minor comment wording tweaks
>
> 5.
> Here, you are confirming we get an ERROR when replicating from a
> non-generated column to a generated column. But I think your patch
> also added exactly that same test scenario in the 011_generated (as
> the sub5 test). So, maybe this one here should be removed?

For 0004_sync.pl, it is tested when 'include_generated_columns' is not
specified. Whereas for the test in 011_generated
'include_generated_columns = true' is specified.
I thought we should have a test for both cases to test if the error
message format is the same for both cases. Thoughts?

I have attached the patches and I have addressed the rest of the
comment and added changes in v16-0002. I have not modified the
v16-0001 patch.


Thanks and Regards,
Shlok Kyal

Attachment

Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Mon, 8 Jul 2024 at 13:20, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shlok, Here are some review comments for patch v15-0003.
>
> ======
> src/backend/catalog/pg_publication.c
>
> 1. publication_translate_columns
>
> The function comment says:
>  * Translate a list of column names to an array of attribute numbers
>  * and a Bitmapset with them; verify that each attribute is appropriate
>  * to have in a publication column list (no system or generated attributes,
>  * no duplicates).  Additional checks with replica identity are done later;
>  * see pub_collist_contains_invalid_column.
>
> That part about "[no] generated attributes" seems to have gone stale
> -- e.g. not quite correct anymore. Should it say no VIRTUAL generated
> attributes?
Yes, we should use VIRTUAL generated attributes, I have modified it.

> ======
> src/backend/replication/logical/proto.c
>
> 2. logicalrep_write_tuple and logicalrep_write_attrs
>
> I thought all the code fragments like this:
>
> + if (att->attgenerated && att->attgenerated != ATTRIBUTE_GENERATED_STORED)
> + continue;
> +
>
> don't need to be in the code anymore, because of the BitMapSet (BMS)
> processing done to make the "column list" for publication where
> disallowed generated cols should already be excluded from the BMS,
> right?
>
> So shouldn't all these be detected by the following statement:
> if (!column_in_column_list(att->attnum, columns))
>   continue;
The current BMS logic do not handle the Virtual Generated Columns.
There can be cases where we do not want a virtual generated column but
it would be present in BMS.
To address this I have added the above logic. I have added this logic
similar to the checks of 'attr->attisdropped'.

> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 4. send_relation_and_attrs
>
> (this is a similar comment for #2 above)
>
> IIUC of the advantages of the BitMapSet (BMS) idea in patch 0001 to
> process the generated columns up-front means there is no need to check
> them again in code like this.
>
> They should be discovered anyway in the subsequent check:
> /* Skip this attribute if it's not present in the column list */
> if (columns != NULL && !bms_is_member(att->attnum, columns))
>   continue;
Same explanation as above.

I have addressed all the comments in v16-0003 patch. Please refer [1].
[1]: https://www.postgresql.org/message-id/CANhcyEXw%3DBFFVUqohWES9EPkdq-ZMC5QRBVQqQPzrO%3DQ7uzFQw%40mail.gmail.com

Thanks and Regards,
Shlok Kyal



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shlok, Here are my review comments for v16-0002

======
src/backend/replication/logical/tablesync.c

1. fetch_remote_table_info

+ if ((server_version >= 120000 && server_version < 180000) ||
+ !MySubscription->includegencols)
+ appendStringInfo(&cmd, " AND a.attgenerated = ''");

I felt this condition was a bit complicated. it needs a comment to
explain that "attgenerated" has been supported only since >= PG12 and
'include_generated_columns' is supported only since >= PG18. The more
I look at this I think this is a bug. For example, what happens if the
server is *before* PG12 and include_generated_cols is false; won't it
then try to build SQL using the "attgenerated" column which will cause
an ERROR on the server?

IIRC this condition is already written properly in your patch 0003.
So, most of that 0003 condition refactoring should be done here in
patch 0002 instead.

~~~

2. copy_table

> > So, actually this loop seems to be only finding cases (setting
> > remote_has_gen = true) where the remote column is generated but the
> > match local column is *not* generated. Maybe this was the intended
> > logic all along but then certainly the comment should be improved to
> > describe it better.
>
> 'remotegenlist' is actually constructed in function 'fetch_remote_table_info'
> and it has an entry for every column in the column list specifying
> whether a column is
> generated or not.
> In the function 'make_copy_attnamelist' we are not modifying the list.
> So, I think the current comment would be sufficient. Thoughts?

Yes, I was mistaken thinking the list is "modified". OTOH, I still
feel the existing comment ("Check if remote column list has any
generated column") is misleading because the remote table might have
generated cols but we are not even interested in them if the
equivalent subscriber column is also generated. Please see nitpicks
diff, for my suggestion how to update this comment.

~~~

nitpick - add space after "if"

======
src/test/subscription/t/004_sync.pl

> > 5.
> > Here, you are confirming we get an ERROR when replicating from a
> > non-generated column to a generated column. But I think your patch
> > also added exactly that same test scenario in the 011_generated (as
> > the sub5 test). So, maybe this one here should be removed?
>
> For 0004_sync.pl, it is tested when 'include_generated_columns' is not
> specified. Whereas for the test in 011_generated
> 'include_generated_columns = true' is specified.
> I thought we should have a test for both cases to test if the error
> message format is the same for both cases. Thoughts?

3.
Sorry, I missed that there was a parameter flag difference. Anyway,
since the code-path to reach this error is the same regardless of the
'include_generated_columns' parameter value IMO having too many tests
might be overkill. YMMV.

Anyway, whether you decide to keep both test cases or not, I think all
testing related to generated column replication belongs in the new
001_generated.pl TAP file -- not here in 04_sync.pl
.
======
src/test/subscription/t/011_generated.pl

4. Untested scenarios for "missing col"?

I have seen (in 04_sync.pl) missing column test cases for:
- publisher not-generated col ==> subscriber missing column

Maybe I am mistaken, but I don't recall seeing any test cases for:
- publisher generated-col ==> subscriber missing col

Unless they are already done somewhere, I think this scenario should
be in 011_generated.pl. Furthermore, maybe it needs to be tested for
both include_generated_columns = true / false, because if the
parameter is false it should be OK, but if the parameter is true it
should give ERROR.

~~~

5.
-# publisher-side tab3 has generated col 'b' but subscriber-side tab3
has DIFFERENT COMPUTATION generated col 'b'.
+# tab3:
+# publisher-side tab3 has generated col 'b' but
+# subscriber-side tab3 has DIFFERENT COMPUTATION generated col 'b'.

I think this change is only improving a comment that was introduced by
patch 0001. This all belongs back in patch 0001, then patch 0002 has
nothing to do here.

======
99.
Please also refer to the attached diffs patch which implements any
nitpicks mentioned above.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shlok, here are my review comments for v16-0003.

======
src/backend/replication/logical/proto.c


On Mon, Jul 8, 2024 at 10:04 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> On Mon, 8 Jul 2024 at 13:20, Peter Smith <smithpb2250@gmail.com> wrote:
> >
> >
> > 2. logicalrep_write_tuple and logicalrep_write_attrs
> >
> > I thought all the code fragments like this:
> >
> > + if (att->attgenerated && att->attgenerated != ATTRIBUTE_GENERATED_STORED)
> > + continue;
> > +
> >
> > don't need to be in the code anymore, because of the BitMapSet (BMS)
> > processing done to make the "column list" for publication where
> > disallowed generated cols should already be excluded from the BMS,
> > right?
> >
> > So shouldn't all these be detected by the following statement:
> > if (!column_in_column_list(att->attnum, columns))
> >   continue;
> The current BMS logic do not handle the Virtual Generated Columns.
> There can be cases where we do not want a virtual generated column but
> it would be present in BMS.
> To address this I have added the above logic. I have added this logic
> similar to the checks of 'attr->attisdropped'.
>

Hmm. I thought the BMS idea of patch 0001 is to discover what columns
should be replicated up-front. If they should not be replicated (e.g.
virtual generated columns cannot be) then they should never be in the
BMS.

So what you said ("There can be cases where we do not want a virtual
generated column but it would be present in BMS") should not be
happening. If that is happening then it sounds more like a bug in the
new BMS logic of pgoutput_column_list_init() function. In other words,
if what you say is true, then it seems like the current extra
conditions you have in patch 0004 are just a band-aid to cover a
problem of the BMS logic of patch 0001. Am I mistaken?

> > ======
> > src/backend/replication/pgoutput/pgoutput.c
> >
> > 4. send_relation_and_attrs
> >
> > (this is a similar comment for #2 above)
> >
> > IIUC of the advantages of the BitMapSet (BMS) idea in patch 0001 to
> > process the generated columns up-front means there is no need to check
> > them again in code like this.
> >
> > They should be discovered anyway in the subsequent check:
> > /* Skip this attribute if it's not present in the column list */
> > if (columns != NULL && !bms_is_member(att->attnum, columns))
> >   continue;
> Same explanation as above.

As above.

======
src/test/subscription/t/011_generated.pl

I'm not sure if you needed to say "STORED" generated cols for the
subscriber-side columns but anyway, whatever is done needs to be done
consistently. FYI, below you did *not* say STORED for subscriber-side
generated cols, but in other comments for subscriber-side generated
columns, you did say STORED.

# tab3:
# publisher-side tab3 has STORED generated col 'b' but
# subscriber-side tab3 has DIFFERENT COMPUTATION generated col 'b'.

~

# tab4:
# publisher-side tab4 has STORED generated cols 'b' and 'c' but
# subscriber-side tab4 has non-generated col 'b', and generated-col 'c'
# where columns on publisher/subscriber are in a different order

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham/Shlok, I was thinking some more about the suggested new
BitMapSet (BMS) idea of patch 0001 that changes the 'columns' meaning
to include generated cols also where necessary.

I feel it is a bit risky to change lots of code without being 100%
confident it will still be in the final push. It's also going to make
the reviewing job harder if stuff gets added and then later removed.

IMO it might be better to revert all the patches (mostly 0001, but
also parts of subsequent patches) to their pre-BMS-change ~v14* state.
Then all the BMS "improvement" can be kept isolated in a new patch
0004.

Some more reasons to split this off into a separate patch are:

* The BMS change is essentially a redesign/cleanup of the code but is
nothing to do with the actual *functionality* of the new "generated
columns" feature.

* Apart from the BMS change I think the rest of the patches are nearly
stable now. So it might be good to get it all finished so the BMS
change can be tackled separately.

* By isolating the BMS change, then we will be able to see exactly
what is the code cost/benefit (e.g. removal of redundant code versus
adding new logic) which is part of the judgement to decide whether to
do it this way or not.

* By isolating the BMS change, then it makes it convenient for testing
before/after in case there are any performance concerns

* By isolating the BMS change, if some unexpected obstacle is
encountered that makes it unfeasible then we can just throw away patch
0004 and everything else (patches 0001,0002,0003) will still be good
to go.

Thoughts?

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Mon, Jul 8, 2024 at 10:53 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are review comments for v15-0001
>
> ======
> doc/src/sgml/ddl.sgml
>
> nitpick - there was a comma (,) which should be a period (.)
>
> ======
> .../libpqwalreceiver/libpqwalreceiver.c
>
> 1.
> + if (options->proto.logical.include_generated_columns &&
> + PQserverVersion(conn->streamConn) >= 170000)
> + appendStringInfoString(&cmd, ", include_generated_columns 'true'");
> +
>
> Should now say >= 180000
>
> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> nitpick - comment wording for RelationSyncEntry.collist.
>
> ~~
>
> 2.
> pgoutput_column_list_init:
>
> I found the current logic to be quite confusing. I assume the code is
> working OK, because AFAIK there are plenty of tests and they are all
> passing, but the logic seems somewhat repetitive and there are also no
> comments to explain it adding to my confusion.
>
> IIUC, PRIOR TO THIS PATCH:
>
> BMS field 'columns' represented the "columns of the column list" or it
> was NULL if there was no publication column list (and it was also NULL
> if the column list contained every column).
>
> IIUC NOW, WITH THIS PATCH:
>
> The BMS field 'columns' meaning is changed slightly to be something
> like "columns to be replicated" or NULL if all columns are to be
> replicated. This is almost the same thing except we are now handing
> the generated columns up-front, so generated columns will or won't
> appear in the BMS according to the "include_generated_columns"
> parameter. See how this is all a bit subtle which is why copious new
> comments are required to explain it...
>
> So, although the test result evidence suggests this is working OK, I
> have many questions/issues about it. Here are some to start with:
>
> 2a. It needs a lot more (summary and detailed) comments explaining the
> logic now that the meaning is slightly different.
>
> 2b. What is the story with the FOR ALL TABLES case now? Previously,
> there would always be NULL 'columns' for "FOR ALL TABLES" case -- the
> comment still says so. But now you've tacked on a 2nd pass of
> iterations to build the BMS outside of the "if (!pub->alltables)"
> check. Is that OK?
>
> 2c. The following logic seemed unexpected:
> - if (bms_num_members(cols) == nliveatts)
> + if (bms_num_members(cols) == nliveatts &&
> + data->include_generated_columns)
>   {
>   bms_free(cols);
>   cols = NULL;
> `
> I had thought the above code would look different -- more like:
> if (att->attgenerated && !data->include_generated_columns)
>   continue;
>
> nliveatts++;
> ...
>
> 2d. Was so much duplicated code necessary? It feels like the whole
> "Get the number of live attributes." and assignment of cols to NULL
> might be made common to both code paths.
>
> 2e. I'm beginning to question the pros/cons of the new BMS logic; I
> had suggested trying this way (processing the generated columns
> up-front in the BMS 'columns' list) to reduce patch code and simplify
> all the subsequent API delegation of "include_generated_cloumns"
> everywhere like it was in v14-0001. Indeed, that part was a success
> and the patch is now smaller. But I don't like much that we've traded
> reduced code overall for increased confusing code in that BMS
> function. If all this BMS code can be refactored and commented to be
> easier to understand then maybe all will be well, but if it can't then
> maybe this BMS change was a bridge too far. I haven't given up on it
> just yet, but I wonder what was your opinion about it, and do other
> people have thoughts about whether this was the good direction to
> take?

I have created a separate patch(v17-0004) for this idea. Will address
this comment in the next version of patches.

> ======
> src/bin/pg_dump/pg_dump.c
>
> 3.
> + if (fout->remoteVersion >= 170000)
> + appendPQExpBufferStr(query,
> + " s.subincludegencols\n");
> + else
> + appendPQExpBufferStr(query,
> + " false AS subincludegencols\n");
>
> Should now say >= 180000
>
> ======
> src/bin/psql/describe.c
>
> 4.
> + /* include_generated_columns is only supported in v18 and higher */
> + if (pset.sversion >= 170000)
> + appendPQExpBuffer(&buf,
> +   ", subincludegencols AS \"%s\"\n",
> +   gettext_noop("Include generated columns"));
> +
>
> Should now say >= 180000
>
> ======
> src/include/catalog/pg_subscription.h
>
> nitpick - let's make the comment the same as in WalRcvStreamOptions
>
> ======
> src/include/replication/logicalproto.h
>
> nitpick - extern for logicalrep_write_update should be unchanged by this patch
>
> ======
> src/test/regress/sql/subscription.sql
>
> nitpick = the comment "include_generated_columns and copy_data = true
> are mutually exclusive" is not necessary because this all falls under
> the existing comment "fail - invalid option combinations"
>
> nitpick - let's explicitly put "copy_data = true" in the CREATE
> SUBSCRIPTION to make it more obvious
>
> ======
> 99. Please also refer to the attached 'diffs' patch which implements
> all of my nitpicks issues mentioned above.

The attached Patches contain all the suggested changes. Here, v17-0001
is modified to fix the comments, v17-0002 and v17-0003 are modified
according to the changes in v17-0001 patch and v17-0004 patch contains
the changes related to Bitmapset(BMS) idea that changes the 'columns'
meaning to include generated cols also where necessary.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Wed, Jul 10, 2024 at 4:22 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shubham/Shlok, I was thinking some more about the suggested new
> BitMapSet (BMS) idea of patch 0001 that changes the 'columns' meaning
> to include generated cols also where necessary.
>
> I feel it is a bit risky to change lots of code without being 100%
> confident it will still be in the final push. It's also going to make
> the reviewing job harder if stuff gets added and then later removed.
>
> IMO it might be better to revert all the patches (mostly 0001, but
> also parts of subsequent patches) to their pre-BMS-change ~v14* state.
> Then all the BMS "improvement" can be kept isolated in a new patch
> 0004.
>
> Some more reasons to split this off into a separate patch are:
>
> * The BMS change is essentially a redesign/cleanup of the code but is
> nothing to do with the actual *functionality* of the new "generated
> columns" feature.
>
> * Apart from the BMS change I think the rest of the patches are nearly
> stable now. So it might be good to get it all finished so the BMS
> change can be tackled separately.
>
> * By isolating the BMS change, then we will be able to see exactly
> what is the code cost/benefit (e.g. removal of redundant code versus
> adding new logic) which is part of the judgement to decide whether to
> do it this way or not.
>
> * By isolating the BMS change, then it makes it convenient for testing
> before/after in case there are any performance concerns
>
> * By isolating the BMS change, if some unexpected obstacle is
> encountered that makes it unfeasible then we can just throw away patch
> 0004 and everything else (patches 0001,0002,0003) will still be good
> to go.

As suggested, I have created  a separate patch for the Bitmapset(BMS)
idea of patch 0001 that changes the 'columns' meaning to include
generated cols also where necessary.
Please refer to the updated v17 Patches here in [1]. See [1] for the
changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjJ0gAUd62PvBRXCPYy2oTNZWEY-Qe8cBNzQaJPVMZCeGA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham.

Thanks for separating the new BMS 'columns' modification.

Here are my review comments for the latest patch v17-0001.

======

1. src/backend/replication/pgoutput/pgoutput.c

  /*
  * Columns included in the publication, or NULL if all columns are
  * included implicitly.  Note that the attnums in this bitmap are not
+ * publication and include_generated_columns option: other reasons should
+ * be checked at user side.  Note that the attnums in this bitmap are not
  * shifted by FirstLowInvalidHeapAttributeNumber.
  */
  Bitmapset  *columns;
With this latest 0001 there is now no change to the original
interpretation of RelationSyncEntry BMS 'columns'. So, I think this
field comment should remain unchanged; i.e. it should be the same as
the current HEAD comment.

======
src/test/subscription/t/011_generated.pl

nitpick - comment changes for 'tab2' and 'tab3' to make them more consistent.

======
99.
Please refer to the attached diff patch which implements any nitpicks
described above.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, here are some review comments about patch v17-0003

======
1.
Missing a docs change?

Previously, (v16-0002) the patch included a change to
doc/src/sgml/protocol.sgml like below to say STORED generated instead
of just generated.

        <para>
-        Boolean option to enable generated columns. This option controls
-        whether generated columns should be included in the string
-        representation of tuples during logical decoding in PostgreSQL.
+        Boolean option to enable <literal>STORED</literal> generated columns.
+        This option controls whether <literal>STORED</literal>
generated columns
+        should be included in the string representation of tuples
during logical
+        decoding in PostgreSQL.
        </para>

Why is that v16 change no longer present in patch v17-0003?

======
src/backend/catalog/pg_publication.c

2.
Previously, (v16-0003) this patch included a change to clarify the
kind of generated cols that are allowed in a column list.

  * Translate a list of column names to an array of attribute numbers
  * and a Bitmapset with them; verify that each attribute is appropriate
- * to have in a publication column list (no system or generated attributes,
- * no duplicates).  Additional checks with replica identity are done later;
- * see pub_collist_contains_invalid_column.
+ * to have in a publication column list (no system or virtual generated
+ * attributes, no duplicates). Additional checks with replica identity
+ * are done later; see pub_collist_contains_invalid_column.

Why is that v16 change no longer present in patch v17-0003?

======
src/backend/replication/logical/tablesync.c

3. make_copy_attnamelist

- if (!attr->attgenerated)
+ if (attr->attgenerated != ATTRIBUTE_GENERATED_STORED)
  continue;

IIUC this logic is checking to make sure the subscriber-side table
column was not a generated column (because we don't replicate on top
of generated columns). So, does the distinction of STORED/VIRTUAL
really matter here?

~~~

fetch_remote_table_info:
nitpick - Should not change any spaces unrelated to the patch

======

send_relation_and_attrs:

- if (att->attgenerated && !include_generated_columns)
+ if (att->attgenerated && (att->attgenerated !=
ATTRIBUTE_GENERATED_STORED || !include_generated_columns))
  continue;

nitpick - It seems over-complicated. Conditions can be split so the
code fragment looks the same as in other places in this patch.

======
99.
Please see the attached diffs patch that implements any nitpicks
mentioned above.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, I had a quick look at the patch v17-0004 which is the split-off
new BMS logic.

IIUC this 0004 is currently undergoing some refactoring and
cleaning-up, so I won't comment much about it except to give the
following observation below.

======
src/backend/replication/logical/proto.c.

I did not expect to see any code fragments that are still checking
generated columns like below:

logicalrep_write_tuple:

  if (att->attgenerated)
  {
- if (!include_generated_columns)
- continue;

  if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
  continue;
~

  if (att->attgenerated)
  {
- if (!include_generated_columns)
- continue;

  if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
  continue;

~~~

logicalrep_write_attrs:

  if (att->attgenerated)
  {
- if (!include_generated_columns)
- continue;

  if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
  continue;

~
if (att->attgenerated)
  {
- if (!include_generated_columns)
- continue;

  if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
  continue;
~~~


AFAIK, now checking support of generated columns will be done when the
BMS 'columns' is assigned, so the continuation code will be handled
like this:

if (!column_in_column_list(att->attnum, columns))
  continue;

======

BTW there is a subtle but significant difference in this 0004 patch.
IOW, we are introducing a difference between the list of published
columns VERSUS a publication column list. So please make sure that all
code comments are adjusted appropriately so they are not misleading by
calling these "column lists" still.

BEFORE: BMS 'columns'  means "columns of the column list" or NULL if
there was no publication column list
AFTER: BMS 'columns' means "columns to be replicated" or NULL if all
columns are to be replicated

======
Kind Regards,
Peter Smith.



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Tue, 9 Jul 2024 at 07:14, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shlok, Here are my review comments for v16-0002
>
> ======
> src/test/subscription/t/004_sync.pl
>
> > > 5.
> > > Here, you are confirming we get an ERROR when replicating from a
> > > non-generated column to a generated column. But I think your patch
> > > also added exactly that same test scenario in the 011_generated (as
> > > the sub5 test). So, maybe this one here should be removed?
> >
> > For 0004_sync.pl, it is tested when 'include_generated_columns' is not
> > specified. Whereas for the test in 011_generated
> > 'include_generated_columns = true' is specified.
> > I thought we should have a test for both cases to test if the error
> > message format is the same for both cases. Thoughts?
>
> 3.
> Sorry, I missed that there was a parameter flag difference. Anyway,
> since the code-path to reach this error is the same regardless of the
> 'include_generated_columns' parameter value IMO having too many tests
> might be overkill. YMMV.
>
> Anyway, whether you decide to keep both test cases or not, I think all
> testing related to generated column replication belongs in the new
> 001_generated.pl TAP file -- not here in 04_sync.pl
I have removed the test

> ======
> src/test/subscription/t/011_generated.pl
>
> 4. Untested scenarios for "missing col"?
>
> I have seen (in 04_sync.pl) missing column test cases for:
> - publisher not-generated col ==> subscriber missing column
>
> Maybe I am mistaken, but I don't recall seeing any test cases for:
> - publisher generated-col ==> subscriber missing col
>
> Unless they are already done somewhere, I think this scenario should
> be in 011_generated.pl. Furthermore, maybe it needs to be tested for
> both include_generated_columns = true / false, because if the
> parameter is false it should be OK, but if the parameter is true it
> should give ERROR.
 Have added the tests in 011_generated.pl

I have also addressed the remaining comments. Please find the updated
v18 patches

v18-0001 - Rebased the patch on HEAD
v18-0002 - Addressed the comments
v18-0003 - Addressed the comments
v18-0004- Rebased the patch

Thanks and Regards,
Shlok Kyal

Attachment

Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Tue, 9 Jul 2024 at 09:53, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shlok, here are my review comments for v16-0003.
>
> ======
> src/backend/replication/logical/proto.c
>
>
> On Mon, Jul 8, 2024 at 10:04 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >
> > On Mon, 8 Jul 2024 at 13:20, Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > >
> > > 2. logicalrep_write_tuple and logicalrep_write_attrs
> > >
> > > I thought all the code fragments like this:
> > >
> > > + if (att->attgenerated && att->attgenerated != ATTRIBUTE_GENERATED_STORED)
> > > + continue;
> > > +
> > >
> > > don't need to be in the code anymore, because of the BitMapSet (BMS)
> > > processing done to make the "column list" for publication where
> > > disallowed generated cols should already be excluded from the BMS,
> > > right?
> > >
> > > So shouldn't all these be detected by the following statement:
> > > if (!column_in_column_list(att->attnum, columns))
> > >   continue;
> > The current BMS logic do not handle the Virtual Generated Columns.
> > There can be cases where we do not want a virtual generated column but
> > it would be present in BMS.
> > To address this I have added the above logic. I have added this logic
> > similar to the checks of 'attr->attisdropped'.
> >
>
> Hmm. I thought the BMS idea of patch 0001 is to discover what columns
> should be replicated up-front. If they should not be replicated (e.g.
> virtual generated columns cannot be) then they should never be in the
> BMS.
>
> So what you said ("There can be cases where we do not want a virtual
> generated column but it would be present in BMS") should not be
> happening. If that is happening then it sounds more like a bug in the
> new BMS logic of pgoutput_column_list_init() function. In other words,
> if what you say is true, then it seems like the current extra
> conditions you have in patch 0004 are just a band-aid to cover a
> problem of the BMS logic of patch 0001. Am I mistaken?
>
We have created a 0004 patch to use the BMS approach. It will be
addressed in the future 0004 patch.

> > > ======
> > > src/backend/replication/pgoutput/pgoutput.c
> > >
> > > 4. send_relation_and_attrs
> > >
> > > (this is a similar comment for #2 above)
> > >
> > > IIUC of the advantages of the BitMapSet (BMS) idea in patch 0001 to
> > > process the generated columns up-front means there is no need to check
> > > them again in code like this.
> > >
> > > They should be discovered anyway in the subsequent check:
> > > /* Skip this attribute if it's not present in the column list */
> > > if (columns != NULL && !bms_is_member(att->attnum, columns))
> > >   continue;
> > Same explanation as above.
>
> As above.
>
We have created a 0004 patch to use the BMS approach. It will be
addressed in the future 0004 patch.

> ======
> src/test/subscription/t/011_generated.pl
>
> I'm not sure if you needed to say "STORED" generated cols for the
> subscriber-side columns but anyway, whatever is done needs to be done
> consistently. FYI, below you did *not* say STORED for subscriber-side
> generated cols, but in other comments for subscriber-side generated
> columns, you did say STORED.
>
> # tab3:
> # publisher-side tab3 has STORED generated col 'b' but
> # subscriber-side tab3 has DIFFERENT COMPUTATION generated col 'b'.
>
> ~
>
> # tab4:
> # publisher-side tab4 has STORED generated cols 'b' and 'c' but
> # subscriber-side tab4 has non-generated col 'b', and generated-col 'c'
> # where columns on publisher/subscriber are in a different order
>
Fixed

Please find the updated patch v18-0003 patch at [1].

[1]: https://www.postgresql.org/message-id/CANhcyEW3LVJpRPScz6VBa%3DZipEMV7b-u76PDEALNcNDFURCYMA%40mail.gmail.com

Thanks and Regards,
Shok Kyal



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Mon, 15 Jul 2024 at 08:08, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, here are some review comments about patch v17-0003

I have addressed the comments in v18-0003 patch [1].

[1]: https://www.postgresql.org/message-id/CANhcyEW3LVJpRPScz6VBa%3DZipEMV7b-u76PDEALNcNDFURCYMA%40mail.gmail.com

Thanks and Regards,
Shlok Kyal



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Fri, Jul 12, 2024 at 12:13 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shubham.
>
> Thanks for separating the new BMS 'columns' modification.
>
> Here are my review comments for the latest patch v17-0001.
>
> ======
>
> 1. src/backend/replication/pgoutput/pgoutput.c
>
>   /*
>   * Columns included in the publication, or NULL if all columns are
>   * included implicitly.  Note that the attnums in this bitmap are not
> + * publication and include_generated_columns option: other reasons should
> + * be checked at user side.  Note that the attnums in this bitmap are not
>   * shifted by FirstLowInvalidHeapAttributeNumber.
>   */
>   Bitmapset  *columns;
> With this latest 0001 there is now no change to the original
> interpretation of RelationSyncEntry BMS 'columns'. So, I think this
> field comment should remain unchanged; i.e. it should be the same as
> the current HEAD comment.
>
> ======
> src/test/subscription/t/011_generated.pl
>
> nitpick - comment changes for 'tab2' and 'tab3' to make them more consistent.
>
> ======
> 99.
> Please refer to the attached diff patch which implements any nitpicks
> described above.

The attached Patches contain all the suggested changes.

v19-0001 - Addressed the comments.
v19-0002 - Rebased the Patch.
v19-0003 - Rebased the Patch.
v19-0004- Addressed all the comments related to Bitmapset(BMS).

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Mon, Jul 15, 2024 at 11:09 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, I had a quick look at the patch v17-0004 which is the split-off
> new BMS logic.
>
> IIUC this 0004 is currently undergoing some refactoring and
> cleaning-up, so I won't comment much about it except to give the
> following observation below.
>
> ======
> src/backend/replication/logical/proto.c.
>
> I did not expect to see any code fragments that are still checking
> generated columns like below:
>
> logicalrep_write_tuple:
>
>   if (att->attgenerated)
>   {
> - if (!include_generated_columns)
> - continue;
>
>   if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
>   continue;
> ~
>
>   if (att->attgenerated)
>   {
> - if (!include_generated_columns)
> - continue;
>
>   if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
>   continue;
>
> ~~~
>
> logicalrep_write_attrs:
>
>   if (att->attgenerated)
>   {
> - if (!include_generated_columns)
> - continue;
>
>   if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
>   continue;
>
> ~
> if (att->attgenerated)
>   {
> - if (!include_generated_columns)
> - continue;
>
>   if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
>   continue;
> ~~~
>
>
> AFAIK, now checking support of generated columns will be done when the
> BMS 'columns' is assigned, so the continuation code will be handled
> like this:
>
> if (!column_in_column_list(att->attnum, columns))
>   continue;
>
> ======
>
> BTW there is a subtle but significant difference in this 0004 patch.
> IOW, we are introducing a difference between the list of published
> columns VERSUS a publication column list. So please make sure that all
> code comments are adjusted appropriately so they are not misleading by
> calling these "column lists" still.
>
> BEFORE: BMS 'columns'  means "columns of the column list" or NULL if
> there was no publication column list
> AFTER: BMS 'columns' means "columns to be replicated" or NULL if all
> columns are to be replicated

I have addressed all the comments in v19-0004 patch.
Please refer to the updated v19-0004 Patch here in [1]. See [1] for
the changes added.

[1] https://www.postgresql.org/message-id/CAHv8Rj%2BR0cj%3Dz1bTMAgQKQWx1EKvkMEnV9QsHGvOqTdnLUQi1A%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham, here are my review comments for patch v19-0001.

======
src/backend/replication/pgoutput/pgoutput.c

1.
  /*
  * Columns included in the publication, or NULL if all columns are
  * included implicitly.  Note that the attnums in this bitmap are not
+ * publication and include_generated_columns option: other reasons should
+ * be checked at user side.  Note that the attnums in this bitmap are not
  * shifted by FirstLowInvalidHeapAttributeNumber.
  */
  Bitmapset  *columns;
You replied [1] "The attached Patches contain all the suggested
changes." but as I previously commented [2, #1], since there is no
change to the interpretation of the 'columns' BMS caused by this
patch, then I expected this comment would be unchanged (i.e. same as
HEAD code). But this fix was missed in v19-0001.

OTOH, if you do think there was a reason to change the comment then
the above is still not good because "are not publication and
include_generated_columns option" wording doesn't make sense.

======
src/test/subscription/t/011_generated.pl

Observation -- I added (in nitpicks diffs) some more comments for
'tab1' (to make all comments consistent with the new tests added). But
when I was doing that I observed that tab1 and tab3 test scenarios are
very similar. It seems only the subscription parameter is not
specified (so 'include_generated_cols' default wll be tested). IIRC
the default for that parameter is "false", so tab1 is not really
testing that properly -- e.g. I thought maybe to test the default
parameter it's better the subscriber-side 'b' should be not-generated?
But doing that would make 'tab1' the same as 'tab2'. Anyway, something
seems amiss -- it seems either something is not tested or is duplicate
tested. Please revisit what the tab1 test intention was and make sure
we are doing the right thing for it...

======
99.
The attached nitpicks diff patch has some tweaked comments.

======
[1] https://www.postgresql.org/message-id/CAHv8Rj%2BR0cj%3Dz1bTMAgQKQWx1EKvkMEnV9QsHGvOqTdnLUQi1A%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAHut%2BPtVfrbx0jb42LCmS%3D-LcMTtWxm%2BvhaoArkjg7Z0mvuXbg%40mail.gmail.com


Kind Regards,
Peter Smith.
Fujitsu Australia.

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, here are some review comments for v19-0002

======
src/backend/replication/logical/tablesync.c

make_copy_attnamelist:
nitpick - tweak function comment
nitpick - tweak other comments

~~~

fetch_remote_table_info:
nitpick - add space after "if"
nitpick - removed a comment because logic is self-evident from the variable name

======
src/test/subscription/t/004_sync.pl

1.
This new test is not related to generated columns. IIRC, this is just
some test that we discovered missing during review of this thread. As
such, I think this change can be posted/patched separately from this
thread.

======
src/test/subscription/t/011_generated.pl

nitpick - change some comment wording to be more consistent with patch 0001.

======
99.
Please see the nitpicks diff attachment which implements any nitpicks
mentioned above.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, here are some review comments for patch v19-0003

======
src/backend/catalog/pg_publication.c

1.
/*
 * Translate a list of column names to an array of attribute numbers
 * and a Bitmapset with them; verify that each attribute is appropriate
 * to have in a publication column list (no system or generated attributes,
 * no duplicates).  Additional checks with replica identity are done later;
 * see pub_collist_contains_invalid_column.
 *
 * Note that the attribute numbers are *not* offset by
 * FirstLowInvalidHeapAttributeNumber; system columns are forbidden so this
 * is okay.
 */
static void
publication_translate_columns(Relation targetrel, List *columns,
  int *natts, AttrNumber **attrs)

~

I though the above comment ought to change: /or generated
attributes/or virtual generated attributes/

IIRC this was already addressed back in v16, but somehow that fix has
been lost (???).

======
src/backend/replication/logical/tablesync.c

fetch_remote_table_info:
nitpick - missing end space in this comment /* TODO: use
ATTRIBUTE_GENERATED_VIRTUAL*/

======

2.
(in patch v19-0001)
+# tab3:
+# publisher-side tab3 has generated col 'b'.
+# subscriber-side tab3 has generated col 'b', using a different computation.

(here, in patch v19-0003)
 # tab3:
-# publisher-side tab3 has generated col 'b'.
-# subscriber-side tab3 has generated col 'b', using a different computation.
+# publisher-side tab3 has stored generated col 'b' but
+# subscriber-side tab3 has DIFFERENT COMPUTATION stored generated col 'b'.

It has become difficult to review these TAP tests, particularly when
different patches are modifying the same comment. e.g. I post
suggestions to modify comments for patch 0001. Those get addressed OK,
only to vanish in subsequent patches like has happened in the above
example.

Really this patch 0003 was only supposed to add the word "stored", not
revert the entire comment to something from an earlier version. Please
take care that all comment changes are carried forward correctly from
one patch to the next.

======
Kind Regards,
Peter Smith.
Fujitsu Australia.



Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Thu, 18 Jul 2024 at 13:55, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, here are some review comments for v19-0002
> ======
> src/test/subscription/t/004_sync.pl
>
> 1.
> This new test is not related to generated columns. IIRC, this is just
> some test that we discovered missing during review of this thread. As
> such, I think this change can be posted/patched separately from this
> thread.
>
I have removed the test for this thread.

I have also addressed the remaining comments for v19-0002 patch.
Please find the latest patches.

v20-0001 - not modified
v20-0002 - Addressed the comments
v20-0003 - Addressed the comments
v20-0004 - Not modified

Thanks and Regards,
Shlok Kyal

Attachment

Re: Pgoutput not capturing the generated columns

From
Shlok Kyal
Date:
On Fri, 19 Jul 2024 at 04:59, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, here are some review comments for patch v19-0003
>
> ======
> src/backend/catalog/pg_publication.c
>
> 1.
> /*
>  * Translate a list of column names to an array of attribute numbers
>  * and a Bitmapset with them; verify that each attribute is appropriate
>  * to have in a publication column list (no system or generated attributes,
>  * no duplicates).  Additional checks with replica identity are done later;
>  * see pub_collist_contains_invalid_column.
>  *
>  * Note that the attribute numbers are *not* offset by
>  * FirstLowInvalidHeapAttributeNumber; system columns are forbidden so this
>  * is okay.
>  */
> static void
> publication_translate_columns(Relation targetrel, List *columns,
>   int *natts, AttrNumber **attrs)
>
> ~
>
> I though the above comment ought to change: /or generated
> attributes/or virtual generated attributes/
>
> IIRC this was already addressed back in v16, but somehow that fix has
> been lost (???).
Modified the comment

> ======
> src/backend/replication/logical/tablesync.c
>
> fetch_remote_table_info:
> nitpick - missing end space in this comment /* TODO: use
> ATTRIBUTE_GENERATED_VIRTUAL*/
>
Fixed

> ======
>
> 2.
> (in patch v19-0001)
> +# tab3:
> +# publisher-side tab3 has generated col 'b'.
> +# subscriber-side tab3 has generated col 'b', using a different computation.
>
> (here, in patch v19-0003)
>  # tab3:
> -# publisher-side tab3 has generated col 'b'.
> -# subscriber-side tab3 has generated col 'b', using a different computation.
> +# publisher-side tab3 has stored generated col 'b' but
> +# subscriber-side tab3 has DIFFERENT COMPUTATION stored generated col 'b'.
>
> It has become difficult to review these TAP tests, particularly when
> different patches are modifying the same comment. e.g. I post
> suggestions to modify comments for patch 0001. Those get addressed OK,
> only to vanish in subsequent patches like has happened in the above
> example.
>
> Really this patch 0003 was only supposed to add the word "stored", not
> revert the entire comment to something from an earlier version. Please
> take care that all comment changes are carried forward correctly from
> one patch to the next.
Fixed

I have addressed the comment in v20-0003 patch. Please refer [1].

[1]: https://www.postgresql.org/message-id/CANhcyEUzUurrX38HGvG30gV92YDz6WmnnwNRYMVY4tiga-8KZg%40mail.gmail.com

Thanks and Regards,
Shlok Kyal



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Fri, Jul 19, 2024 at 4:01 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> On Thu, 18 Jul 2024 at 13:55, Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi, here are some review comments for v19-0002
> > ======
> > src/test/subscription/t/004_sync.pl
> >
> > 1.
> > This new test is not related to generated columns. IIRC, this is just
> > some test that we discovered missing during review of this thread. As
> > such, I think this change can be posted/patched separately from this
> > thread.
> >
> I have removed the test for this thread.
>
> I have also addressed the remaining comments for v19-0002 patch.

Hi, I have no more review comments for patch v20-0002 at this time.

I saw that the above test was removed from this thread as suggested,
but I could not find that any new thread was started to propose this
valuable missing test.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Thu, Jul 18, 2024 at 10:47 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shubham, here are my review comments for patch v19-0001.
>
> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 1.
>   /*
>   * Columns included in the publication, or NULL if all columns are
>   * included implicitly.  Note that the attnums in this bitmap are not
> + * publication and include_generated_columns option: other reasons should
> + * be checked at user side.  Note that the attnums in this bitmap are not
>   * shifted by FirstLowInvalidHeapAttributeNumber.
>   */
>   Bitmapset  *columns;
> You replied [1] "The attached Patches contain all the suggested
> changes." but as I previously commented [2, #1], since there is no
> change to the interpretation of the 'columns' BMS caused by this
> patch, then I expected this comment would be unchanged (i.e. same as
> HEAD code). But this fix was missed in v19-0001.
>
> OTOH, if you do think there was a reason to change the comment then
> the above is still not good because "are not publication and
> include_generated_columns option" wording doesn't make sense.
>
> ======
> src/test/subscription/t/011_generated.pl
>
> Observation -- I added (in nitpicks diffs) some more comments for
> 'tab1' (to make all comments consistent with the new tests added). But
> when I was doing that I observed that tab1 and tab3 test scenarios are
> very similar. It seems only the subscription parameter is not
> specified (so 'include_generated_cols' default wll be tested). IIRC
> the default for that parameter is "false", so tab1 is not really
> testing that properly -- e.g. I thought maybe to test the default
> parameter it's better the subscriber-side 'b' should be not-generated?
> But doing that would make 'tab1' the same as 'tab2'. Anyway, something
> seems amiss -- it seems either something is not tested or is duplicate
> tested. Please revisit what the tab1 test intention was and make sure
> we are doing the right thing for it...
>
> ======
> 99.
> The attached nitpicks diff patch has some tweaked comments.
>
> ======
> [1] https://www.postgresql.org/message-id/CAHv8Rj%2BR0cj%3Dz1bTMAgQKQWx1EKvkMEnV9QsHGvOqTdnLUQi1A%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/CAHut%2BPtVfrbx0jb42LCmS%3D-LcMTtWxm%2BvhaoArkjg7Z0mvuXbg%40mail.gmail.com

The attached Patches contain all the suggested changes.

v21-0001 - Addressed the comments.
v21-0002 - Added the TAP Tests for 011_generated.pl file and modified
the patch accordingly.
v21-0003 - Added the TAP Tests for 011_generated.pl file and modified
the patch accordingly.
v21-0004- Rebased the Patch.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Thanks for the patch updates.

Here are my review comments for v21-0001.

I think this patch is mostly OK now except there are still some
comments about the TAP test.

======
Commit Message

0.
Using Create Subscription:
CREATE SUBSCRIPTION sub2_gen_to_gen CONNECTION '$publisher_connstr' PUBLICATION
pub1 WITH (include_generated_columns = true, copy_data = false)"

If you are going to give an example, I think a gen-to-nogen example
would be a better choice. That's because the gen-to-gen (as you have
here) is not going to replicate anything due to the subscriber-side
column being generated.

======
src/test/subscription/t/011_generated.pl

1.
+$node_subscriber2->safe_psql('postgres',
+ "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
* 22) STORED, c int)"
+);

The subscriber2 node was intended only for all the tables where we
need include_generated_columns to be true. Mostly that is the
combination tests. (tab_gen_to_nogen, tab_nogen_to_gen, etc) OTOH,
table 'tab1' already existed. I don't think we need to bother
subscribing to tab1 from subscriber2 because every combination is
already covered by the combination tests. Let's leave this one alone.


~~~

2.
Huh? Where is the "tab_nogen_to_gen" combination test that I sent to
you off-list?

~~~

3.
+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE tab_order (c int GENERATED ALWAYS AS (a * 22) STORED,
a int, b int)"
+);

Maybe you can test 'tab_order' on both subscription nodes but I think
it is overkill. IMO it is enough to test it on subscription2.

~~~

4.
+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE tab_alter (a int, b int, c int GENERATED ALWAYS AS (a
* 22) STORED)"
+);

Ditto above. Maybe you can test 'tab_order' on both subscription nodes
but I think it is overkill. IMO it is enough to test it on
subscription2.

~~~

5.
Don't forget to add initial data for the missing nogen_to_gen table/test.

~~~

6.
 $node_publisher->safe_psql('postgres',
- "CREATE PUBLICATION pub1 FOR ALL TABLES");
+ "CREATE PUBLICATION pub1 FOR TABLE tab1, tab_gen_to_gen,
tab_gen_to_nogen, tab_gen_to_missing, tab_missing_to_gen, tab_order");
+
 $node_subscriber->safe_psql('postgres',
  "CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub1"
 );

It is not a bad idea to reduce the number of publications as you have
done, but IMO jamming all the tables into 1 publication is too much
because it makes it less understandable instead of simpler.

How about this:
- leave the 'pub1' just for 'tab1'.
- have a 'pub_combo' for publication all the gen_to_nogen,
nogen_to_gen etc combination tests.
- and a 'pub_misc' for any other misc tables like tab_order.

~~~

7.
+#####################
 # Wait for initial sync of all subscriptions
+#####################

I think you should write a note here that you have deliberately set
copy_data = false because COPY and include_generated_columns are not
allowed at the same time for patch 0001. And that is why all expected
results on subscriber2 will be empty. Also, say this limitation will
be changed in patch 0002.

~~~

(I didn't yet check 011_generated.pl file results beyond this point...
I'll wait for v22-0001 to review further)

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Mon, Jul 29, 2024 at 12:57 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Thanks for the patch updates.
>
> Here are my review comments for v21-0001.
>
> I think this patch is mostly OK now except there are still some
> comments about the TAP test.
>
> ======
> Commit Message
>
> 0.
> Using Create Subscription:
> CREATE SUBSCRIPTION sub2_gen_to_gen CONNECTION '$publisher_connstr' PUBLICATION
> pub1 WITH (include_generated_columns = true, copy_data = false)"
>
> If you are going to give an example, I think a gen-to-nogen example
> would be a better choice. That's because the gen-to-gen (as you have
> here) is not going to replicate anything due to the subscriber-side
> column being generated.
>
> ======
> src/test/subscription/t/011_generated.pl
>
> 1.
> +$node_subscriber2->safe_psql('postgres',
> + "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> * 22) STORED, c int)"
> +);
>
> The subscriber2 node was intended only for all the tables where we
> need include_generated_columns to be true. Mostly that is the
> combination tests. (tab_gen_to_nogen, tab_nogen_to_gen, etc) OTOH,
> table 'tab1' already existed. I don't think we need to bother
> subscribing to tab1 from subscriber2 because every combination is
> already covered by the combination tests. Let's leave this one alone.
>
>
> ~~~
>
> 2.
> Huh? Where is the "tab_nogen_to_gen" combination test that I sent to
> you off-list?
>
> ~~~
>
> 3.
> +$node_subscriber->safe_psql('postgres',
> + "CREATE TABLE tab_order (c int GENERATED ALWAYS AS (a * 22) STORED,
> a int, b int)"
> +);
>
> Maybe you can test 'tab_order' on both subscription nodes but I think
> it is overkill. IMO it is enough to test it on subscription2.
>
> ~~~
>
> 4.
> +$node_subscriber->safe_psql('postgres',
> + "CREATE TABLE tab_alter (a int, b int, c int GENERATED ALWAYS AS (a
> * 22) STORED)"
> +);
>
> Ditto above. Maybe you can test 'tab_order' on both subscription nodes
> but I think it is overkill. IMO it is enough to test it on
> subscription2.
>
> ~~~
>
> 5.
> Don't forget to add initial data for the missing nogen_to_gen table/test.
>
> ~~~
>
> 6.
>  $node_publisher->safe_psql('postgres',
> - "CREATE PUBLICATION pub1 FOR ALL TABLES");
> + "CREATE PUBLICATION pub1 FOR TABLE tab1, tab_gen_to_gen,
> tab_gen_to_nogen, tab_gen_to_missing, tab_missing_to_gen, tab_order");
> +
>  $node_subscriber->safe_psql('postgres',
>   "CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub1"
>  );
>
> It is not a bad idea to reduce the number of publications as you have
> done, but IMO jamming all the tables into 1 publication is too much
> because it makes it less understandable instead of simpler.
>
> How about this:
> - leave the 'pub1' just for 'tab1'.
> - have a 'pub_combo' for publication all the gen_to_nogen,
> nogen_to_gen etc combination tests.
> - and a 'pub_misc' for any other misc tables like tab_order.
>
> ~~~
>
> 7.
> +#####################
>  # Wait for initial sync of all subscriptions
> +#####################
>
> I think you should write a note here that you have deliberately set
> copy_data = false because COPY and include_generated_columns are not
> allowed at the same time for patch 0001. And that is why all expected
> results on subscriber2 will be empty. Also, say this limitation will
> be changed in patch 0002.
>
> ~~~
>
> (I didn't yet check 011_generated.pl file results beyond this point...
> I'll wait for v22-0001 to review further)

The attached Patches contain all the suggested changes.

v22-0001 - Addressed the comments.
v22-0002 - Rebased the Patch.
v22-0003 - Rebased the Patch.
v22-0004 - Rebased the Patch.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, Here are my review comments for patch v22-0001

All comments now are only for the TAP test.

======
src/test/subscription/t/011_generated.pl

1. I added all new code for the missing combination test case
"gen-to-missing". See nitpicks diff.
- create a separate publication for this "tab_gen_to_missing" table
because the test gives subscription errors.
- for the initial data
- for the replicated data

~~~

2. I added sub1 and sub2 subscriptions for every combo test
(previously some were absent). See nitpicks diff.

~~~

3. There was a missing test case for nogen-to-gen combination, and
after experimenting with this I am getting a bit suspicious,

Currently, it seems that if a COPY is attempted then the error would
be like this:
2024-08-01 17:16:45.110 AEST [18942] ERROR:  column "b" is a generated column
2024-08-01 17:16:45.110 AEST [18942] DETAIL:  Generated columns cannot
be used in COPY.

OTOH, if a COPY is not attempted (e.g. copy_data = false) then patch
0001 allows replication to happen. And the generated value of the
subscriber "b" takes precedence.

I have included these tests in the nitpicks diff of patch 0001.

Those results weren't exactly what I was expecting.  That is why it is
so important to include *every* test combination in these TAP tests --
because unless we know how it works today, we won't know if we are
accidentally breaking the current behaviour with the other (0002,
0003) patches.

Please experiment in patches 0001 and 0002 using tab_nogen_to_gen more
to make sure the (new?) patch errors make sense and don't overstep by
giving ERRORs when they should not.

~~~~

Also, many other smaller issues/changes were done:

~~~

Creating tables:

nitpick - rearranged to keep all combo test SQLs in a consistent order
throughout this file
1/ gen-to-gen
2/ gen-to-nogen
3/ gen-to-missing
4/ missing-to-gen
5/ nogen-to-gen

nitpick - fixed the wrong comment for CREATE TABLE tab_nogen_to_gen.

nitpick - tweaked some CREATE TABLE comments for consistency.

nitpick - in the v22 patch many of the generated col 'b' use different
computations for every test. It makes it unnecessarily difficult to
read/review the expected results. So, I've made them all the same. Now
computation is "a * 2" on the publisher side, and "a * 22" on the
subscriber side.

~~~

Creating Publications and Subscriptions:


nitpick - added comment for all the CREATE PUBLICATION

nitpick - added comment for all the CREATE SUBSCRIPTION

nitpick - I moved the note about copy_data = false to where all the
node_subscriber2 subscriptions are created. Also, don't explicitly
refer to "patch 000" in the comment, because that will not make any
sense after getting pushed.

nitpick - I changed many subscriber names to consistently use "sub1"
or "sub2" within the name (this is the visual cue of which
node_subscriber<n> they are on). e.g.
/regress_sub_combo2/regress_sub2_combo/

~~~

Initial Sync tests:

nitpick - not sure if it is possible to do the initial data tests for
"nogen_to_gen" in the normal place. For now, it is just replaced by a
comment.
NOTE - Maybe this should be refactored later to put all the initial
data checks in one place. I'll think about this point more in the next
review.

~~~

nitpick - Changed cleanup I drop subscriptions before publications.

nitpick - remove the unnecessary blank line at the end.

======

Please see the attached diffs patch (apply it atop patch 0001) which
includes all the nipick changes mentioned above.

~~

BTW, For a quicker turnaround and less churning please consider just
posting the v23-0001 by itself instead of waiting to rebase all the
subsequent patches. When 0001 settles down some more then rebase the
others.

~~

Also, please run the indentation tool over this code ASAP.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Thu, Aug 1, 2024 at 2:02 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, Here are my review comments for patch v22-0001
>
> All comments now are only for the TAP test.
>
> ======
> src/test/subscription/t/011_generated.pl
>
> 1. I added all new code for the missing combination test case
> "gen-to-missing". See nitpicks diff.
> - create a separate publication for this "tab_gen_to_missing" table
> because the test gives subscription errors.
> - for the initial data
> - for the replicated data
>
> ~~~
>
> 2. I added sub1 and sub2 subscriptions for every combo test
> (previously some were absent). See nitpicks diff.
>
> ~~~
>
> 3. There was a missing test case for nogen-to-gen combination, and
> after experimenting with this I am getting a bit suspicious,
>
> Currently, it seems that if a COPY is attempted then the error would
> be like this:
> 2024-08-01 17:16:45.110 AEST [18942] ERROR:  column "b" is a generated column
> 2024-08-01 17:16:45.110 AEST [18942] DETAIL:  Generated columns cannot
> be used in COPY.
>
> OTOH, if a COPY is not attempted (e.g. copy_data = false) then patch
> 0001 allows replication to happen. And the generated value of the
> subscriber "b" takes precedence.
>
> I have included these tests in the nitpicks diff of patch 0001.
>
> Those results weren't exactly what I was expecting.  That is why it is
> so important to include *every* test combination in these TAP tests --
> because unless we know how it works today, we won't know if we are
> accidentally breaking the current behaviour with the other (0002,
> 0003) patches.
>
> Please experiment in patches 0001 and 0002 using tab_nogen_to_gen more
> to make sure the (new?) patch errors make sense and don't overstep by
> giving ERRORs when they should not.
>
> ~~~~
>
> Also, many other smaller issues/changes were done:
>
> ~~~
>
> Creating tables:
>
> nitpick - rearranged to keep all combo test SQLs in a consistent order
> throughout this file
> 1/ gen-to-gen
> 2/ gen-to-nogen
> 3/ gen-to-missing
> 4/ missing-to-gen
> 5/ nogen-to-gen
>
> nitpick - fixed the wrong comment for CREATE TABLE tab_nogen_to_gen.
>
> nitpick - tweaked some CREATE TABLE comments for consistency.
>
> nitpick - in the v22 patch many of the generated col 'b' use different
> computations for every test. It makes it unnecessarily difficult to
> read/review the expected results. So, I've made them all the same. Now
> computation is "a * 2" on the publisher side, and "a * 22" on the
> subscriber side.
>
> ~~~
>
> Creating Publications and Subscriptions:
>
>
> nitpick - added comment for all the CREATE PUBLICATION
>
> nitpick - added comment for all the CREATE SUBSCRIPTION
>
> nitpick - I moved the note about copy_data = false to where all the
> node_subscriber2 subscriptions are created. Also, don't explicitly
> refer to "patch 000" in the comment, because that will not make any
> sense after getting pushed.
>
> nitpick - I changed many subscriber names to consistently use "sub1"
> or "sub2" within the name (this is the visual cue of which
> node_subscriber<n> they are on). e.g.
> /regress_sub_combo2/regress_sub2_combo/
>
> ~~~
>
> Initial Sync tests:
>
> nitpick - not sure if it is possible to do the initial data tests for
> "nogen_to_gen" in the normal place. For now, it is just replaced by a
> comment.
> NOTE - Maybe this should be refactored later to put all the initial
> data checks in one place. I'll think about this point more in the next
> review.
>
> ~~~
>
> nitpick - Changed cleanup I drop subscriptions before publications.
>
> nitpick - remove the unnecessary blank line at the end.
>
> ======
>
> Please see the attached diffs patch (apply it atop patch 0001) which
> includes all the nipick changes mentioned above.
>
> ~~
>
> BTW, For a quicker turnaround and less churning please consider just
> posting the v23-0001 by itself instead of waiting to rebase all the
> subsequent patches. When 0001 settles down some more then rebase the
> others.
>
> ~~
>
> Also, please run the indentation tool over this code ASAP.
>
I have fixed all the comments. The attached Patch(v23-0001) contains
all the changes.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubhab.

Here are some more review comments for the v23-0001.

======
011_generated.pl b/src/test/subscription/t/011_generated.pl

nitpick - renamed /regress_pub/regress_pub_tab1/ and
/regress_sub1/regress_sub1_tab1/
nitpick - typo /inital data /initial data/
nitpick - typo /snode_subscriber2/node_subscriber2/
nitpick - tweak the combo initial sync comments and messages
nitpick - /#Cleanup/# cleanup/
nitpick - tweak all the combo normal replication comments
nitpick - removed blank line at the end

~~~

1. Refactor tab_gen_to_missing initial sync tests.

I moved the tab_gen_to_missing initial sync for node_subscriber2 to be
back where all the other initial sync tests are done.
See the nitpicks patch file.

~~~

2. Refactor tab_nogen_to_gen initial sync tests

I moved all the tab_nogen_to_gen initial sync tests back to where the
other initial sync tests are done.
See the nitpicks patch file.

~~~

3. Added another test case:

Because the (current PG17) nogen-to-gen initial sync test case (with
copy_data=true) gives an ERROR, I have added another combination to
cover normal replication (e.g. using copy_data=false).
See the nitpicks patch file.

(This has exposed an inconsistency which IMO might be a PG17 bug. I
have included TAP test comments about this, and plan to post a
separate thread for it later).

~

4. GUC

Moving and adding more CREATE SUBSCRIPTION exceeded some default GUCs,
so extra configuration was needed.
See the nitpick patch file.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi,

Writing many new test case combinations has exposed a possible bug in
patch 0001.

In my previous post [1] there was questionable behaviour when
replicating from a normal (not generated) column on the publisher side
to a generated column on the subscriber side. Initially, I thought the
test might have exposed a possible PG17 bug, but now I think it has
really found a bug in patch 0001.

~~~

Previously (PG17) this would fail consistently both during COPY and
during normal replication.Now, patch 0001 has changed this behaviour
-- it is not always failing anymore.

The patch should not be impacting this existing behaviour. It only
introduces a new 'include_generated_columns', but since the publisher
side is not a generated column I do not expect there should be any
difference in behaviour for this test case. IMO the TAP test expected
results should be corrected for this scenario. And fix the bug.

Below is an example demonstrating PG17 behaviour.

======


Publisher:
----------

(notice column "b" is not generated)

test_pub=# CREATE TABLE tab_nogen_to_gen (a int, b int);
CREATE TABLE
test_pub=# INSERT INTO tab_nogen_to_gen VALUES (1,101),(2,102);
INSERT 0 2
test_pub=# CREATE PUBLICATION pub1 for TABLE tab_nogen_to_gen;
CREATE PUBLICATION
test_pub=#

Subscriber:
-----------

(notice corresponding column "b" is generated)

test_sub=# CREATE TABLE tab_nogen_to_gen (a int, b int GENERATED
ALWAYS AS (a * 22) STORED);
CREATE TABLE
test_sub=#

Try to create a subscription. Notice we get the error: ERROR:  logical
replication target relation "public.tab_nogen_to_gen" is missing
replicated column: "b"

test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=test_pub'
PUBLICATION pub1;
2024-08-05 13:16:40.043 AEST [20957] WARNING:  subscriptions created
by regression test cases should have names starting with "regress_"
WARNING:  subscriptions created by regression test cases should have
names starting with "regress_"
NOTICE:  created replication slot "sub1" on publisher
CREATE SUBSCRIPTION
test_sub=# 2024-08-05 13:16:40.105 AEST [29258] LOG:  logical
replication apply worker for subscription "sub1" has started
2024-08-05 13:16:40.117 AEST [29260] LOG:  logical replication table
synchronization worker for subscription "sub1", table
"tab_nogen_to_gen" has started
2024-08-05 13:16:40.172 AEST [29260] ERROR:  logical replication
target relation "public.tab_nogen_to_gen" is missing replicated
column: "b"
2024-08-05 13:16:40.173 AEST [20039] LOG:  background worker "logical
replication tablesync worker" (PID 29260) exited with exit code 1
2024-08-05 13:16:45.187 AEST [29400] LOG:  logical replication table
synchronization worker for subscription "sub1", table
"tab_nogen_to_gen" has started
2024-08-05 13:16:45.285 AEST [29400] ERROR:  logical replication
target relation "public.tab_nogen_to_gen" is missing replicated
column: "b"
2024-08-05 13:16:45.286 AEST [20039] LOG:  background worker "logical
replication tablesync worker" (PID 29400) exited with exit code 1
...

Create the subscription again, but this time with copy_data = false

test_sub=# CREATE SUBSCRIPTION sub1_nocopy CONNECTION
'dbname=test_pub' PUBLICATION pub1 WITH (copy_data = false);
2024-08-05 13:22:57.719 AEST [20957] WARNING:  subscriptions created
by regression test cases should have names starting with "regress_"
WARNING:  subscriptions created by regression test cases should have
names starting with "regress_"
NOTICE:  created replication slot "sub1_nocopy" on publisher
CREATE SUBSCRIPTION
test_sub=# 2024-08-05 13:22:57.765 AEST [7012] LOG:  logical
replication apply worker for subscription "sub1_nocopy" has started

test_sub=#

~~~

Then insert data from the publisher to see what happens for normal replication.

test_pub=#
test_pub=# INSERT INTO tab_nogen_to_gen VALUES (3,103),(4,104);
INSERT 0 2

~~~

Notice the subscriber gets the same error as before: ERROR:  logical
replication target relation "public.tab_nogen_to_gen" is missing
replicated column: "b"

2024-08-05 13:25:14.897 AEST [20039] LOG:  background worker "logical
replication apply worker" (PID 10957) exited with exit code 1
2024-08-05 13:25:19.933 AEST [11095] LOG:  logical replication apply
worker for subscription "sub1_nocopy" has started
2024-08-05 13:25:19.966 AEST [11095] ERROR:  logical replication
target relation "public.tab_nogen_to_gen" is missing replicated
column: "b"
2024-08-05 13:25:19.966 AEST [11095] CONTEXT:  processing remote data
for replication origin "pg_16390" during message type "INSERT" in
transaction 742, finished at 0/1967BB0
2024-08-05 13:25:19.968 AEST [20039] LOG:  background worker "logical
replication apply worker" (PID 11095) exited with exit code 1
2024-08-05 13:25:24.917 AEST [11225] LOG:  logical replication apply
worker for subscription "sub1_nocopy" has started
2024-08-05 13:25:24.926 AEST [11225] ERROR:  logical replication
target relation "public.tab_nogen_to_gen" is missing replicated
column: "b"
2024-08-05 13:25:24.926 AEST [11225] CONTEXT:  processing remote data
for replication origin "pg_16390" during message type "INSERT" in
transaction 742, finished at 0/1967BB0
2024-08-05 13:25:24.927 AEST [20039] LOG:  background worker "logical
replication apply worker" (PID 11225) exited with exit code 1
...

======
[1] https://www.postgresql.org/message-id/CAHut%2BPvtT8fKOfvVYr4vANx_fr92vedas%2BZRbQxvMC097rks6w%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Mon, Aug 5, 2024 at 8:10 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shubhab.
>
> Here are some more review comments for the v23-0001.
>
> ======
> 011_generated.pl b/src/test/subscription/t/011_generated.pl
>
> nitpick - renamed /regress_pub/regress_pub_tab1/ and
> /regress_sub1/regress_sub1_tab1/
> nitpick - typo /inital data /initial data/
> nitpick - typo /snode_subscriber2/node_subscriber2/
> nitpick - tweak the combo initial sync comments and messages
> nitpick - /#Cleanup/# cleanup/
> nitpick - tweak all the combo normal replication comments
> nitpick - removed blank line at the end
>
> ~~~
>
> 1. Refactor tab_gen_to_missing initial sync tests.
>
> I moved the tab_gen_to_missing initial sync for node_subscriber2 to be
> back where all the other initial sync tests are done.
> See the nitpicks patch file.
>
> ~~~
>
> 2. Refactor tab_nogen_to_gen initial sync tests
>
> I moved all the tab_nogen_to_gen initial sync tests back to where the
> other initial sync tests are done.
> See the nitpicks patch file.
>
> ~~~
>
> 3. Added another test case:
>
> Because the (current PG17) nogen-to-gen initial sync test case (with
> copy_data=true) gives an ERROR, I have added another combination to
> cover normal replication (e.g. using copy_data=false).
> See the nitpicks patch file.
>
> (This has exposed an inconsistency which IMO might be a PG17 bug. I
> have included TAP test comments about this, and plan to post a
> separate thread for it later).
>
> ~
>
> 4. GUC
>
> Moving and adding more CREATE SUBSCRIPTION exceeded some default GUCs,
> so extra configuration was needed.
> See the nitpick patch file.
>

I have fixed all the comments. The attached Patch(v24-0001) contains
all the changes.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Mon, Aug 5, 2024 at 9:15 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi,
>
> Writing many new test case combinations has exposed a possible bug in
> patch 0001.
>
> In my previous post [1] there was questionable behaviour when
> replicating from a normal (not generated) column on the publisher side
> to a generated column on the subscriber side. Initially, I thought the
> test might have exposed a possible PG17 bug, but now I think it has
> really found a bug in patch 0001.
>
> ~~~
>
> Previously (PG17) this would fail consistently both during COPY and
> during normal replication.Now, patch 0001 has changed this behaviour
> -- it is not always failing anymore.
>
> The patch should not be impacting this existing behaviour. It only
> introduces a new 'include_generated_columns', but since the publisher
> side is not a generated column I do not expect there should be any
> difference in behaviour for this test case. IMO the TAP test expected
> results should be corrected for this scenario. And fix the bug.
>
> Below is an example demonstrating PG17 behaviour.
>
> ======
>
>
> Publisher:
> ----------
>
> (notice column "b" is not generated)
>
> test_pub=# CREATE TABLE tab_nogen_to_gen (a int, b int);
> CREATE TABLE
> test_pub=# INSERT INTO tab_nogen_to_gen VALUES (1,101),(2,102);
> INSERT 0 2
> test_pub=# CREATE PUBLICATION pub1 for TABLE tab_nogen_to_gen;
> CREATE PUBLICATION
> test_pub=#
>
> Subscriber:
> -----------
>
> (notice corresponding column "b" is generated)
>
> test_sub=# CREATE TABLE tab_nogen_to_gen (a int, b int GENERATED
> ALWAYS AS (a * 22) STORED);
> CREATE TABLE
> test_sub=#
>
> Try to create a subscription. Notice we get the error: ERROR:  logical
> replication target relation "public.tab_nogen_to_gen" is missing
> replicated column: "b"
>
> test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=test_pub'
> PUBLICATION pub1;
> 2024-08-05 13:16:40.043 AEST [20957] WARNING:  subscriptions created
> by regression test cases should have names starting with "regress_"
> WARNING:  subscriptions created by regression test cases should have
> names starting with "regress_"
> NOTICE:  created replication slot "sub1" on publisher
> CREATE SUBSCRIPTION
> test_sub=# 2024-08-05 13:16:40.105 AEST [29258] LOG:  logical
> replication apply worker for subscription "sub1" has started
> 2024-08-05 13:16:40.117 AEST [29260] LOG:  logical replication table
> synchronization worker for subscription "sub1", table
> "tab_nogen_to_gen" has started
> 2024-08-05 13:16:40.172 AEST [29260] ERROR:  logical replication
> target relation "public.tab_nogen_to_gen" is missing replicated
> column: "b"
> 2024-08-05 13:16:40.173 AEST [20039] LOG:  background worker "logical
> replication tablesync worker" (PID 29260) exited with exit code 1
> 2024-08-05 13:16:45.187 AEST [29400] LOG:  logical replication table
> synchronization worker for subscription "sub1", table
> "tab_nogen_to_gen" has started
> 2024-08-05 13:16:45.285 AEST [29400] ERROR:  logical replication
> target relation "public.tab_nogen_to_gen" is missing replicated
> column: "b"
> 2024-08-05 13:16:45.286 AEST [20039] LOG:  background worker "logical
> replication tablesync worker" (PID 29400) exited with exit code 1
> ...
>
> Create the subscription again, but this time with copy_data = false
>
> test_sub=# CREATE SUBSCRIPTION sub1_nocopy CONNECTION
> 'dbname=test_pub' PUBLICATION pub1 WITH (copy_data = false);
> 2024-08-05 13:22:57.719 AEST [20957] WARNING:  subscriptions created
> by regression test cases should have names starting with "regress_"
> WARNING:  subscriptions created by regression test cases should have
> names starting with "regress_"
> NOTICE:  created replication slot "sub1_nocopy" on publisher
> CREATE SUBSCRIPTION
> test_sub=# 2024-08-05 13:22:57.765 AEST [7012] LOG:  logical
> replication apply worker for subscription "sub1_nocopy" has started
>
> test_sub=#
>
> ~~~
>
> Then insert data from the publisher to see what happens for normal replication.
>
> test_pub=#
> test_pub=# INSERT INTO tab_nogen_to_gen VALUES (3,103),(4,104);
> INSERT 0 2
>
> ~~~
>
> Notice the subscriber gets the same error as before: ERROR:  logical
> replication target relation "public.tab_nogen_to_gen" is missing
> replicated column: "b"
>
> 2024-08-05 13:25:14.897 AEST [20039] LOG:  background worker "logical
> replication apply worker" (PID 10957) exited with exit code 1
> 2024-08-05 13:25:19.933 AEST [11095] LOG:  logical replication apply
> worker for subscription "sub1_nocopy" has started
> 2024-08-05 13:25:19.966 AEST [11095] ERROR:  logical replication
> target relation "public.tab_nogen_to_gen" is missing replicated
> column: "b"
> 2024-08-05 13:25:19.966 AEST [11095] CONTEXT:  processing remote data
> for replication origin "pg_16390" during message type "INSERT" in
> transaction 742, finished at 0/1967BB0
> 2024-08-05 13:25:19.968 AEST [20039] LOG:  background worker "logical
> replication apply worker" (PID 11095) exited with exit code 1
> 2024-08-05 13:25:24.917 AEST [11225] LOG:  logical replication apply
> worker for subscription "sub1_nocopy" has started
> 2024-08-05 13:25:24.926 AEST [11225] ERROR:  logical replication
> target relation "public.tab_nogen_to_gen" is missing replicated
> column: "b"
> 2024-08-05 13:25:24.926 AEST [11225] CONTEXT:  processing remote data
> for replication origin "pg_16390" during message type "INSERT" in
> transaction 742, finished at 0/1967BB0
> 2024-08-05 13:25:24.927 AEST [20039] LOG:  background worker "logical
> replication apply worker" (PID 11225) exited with exit code 1
>
This is an expected behaviour. The error message here is improvised.
This error is consistent and it is being handled in the 0002 patch.
Below are the logs for the same:
2024-08-07 10:47:45.977 IST [29756] LOG:  logical replication table
synchronization worker for subscription "sub1", table
"tab_nogen_to_gen" has started
2024-08-07 10:47:46.116 IST [29756] ERROR:  logical replication target
relation "public.tab_nogen_to_gen" has a generated column "b" but
corresponding column on source relation is not a generated column
0002 Patch needs to be applied to get rid of this error.

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham,

Here are my review comments for patch v24-0001

I think the TAP tests have incorrect expected results for the nogen-to-gen case.

Whereas the HEAD code will cause "ERROR" for this test scenario, patch
0001 does not. IMO the behaviour should be unchanged for this scenario
which has no generated column on the publisher side. So it seems this
is a bug in patch 0001.

FYI, I have included "FIXME" comments in the attached top-up diff
patch to show which test cases I think are expecting wrong results.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Wed, Aug 7, 2024 at 1:31 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shubham,
>
> Here are my review comments for patch v24-0001
>
> I think the TAP tests have incorrect expected results for the nogen-to-gen case.
>
> Whereas the HEAD code will cause "ERROR" for this test scenario, patch
> 0001 does not. IMO the behaviour should be unchanged for this scenario
> which has no generated column on the publisher side. So it seems this
> is a bug in patch 0001.
>
> FYI, I have included "FIXME" comments in the attached top-up diff
> patch to show which test cases I think are expecting wrong results.
>

Fixed all the comments. The attached Patch(v25-0001) contains all the changes.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham,

I think the v25-0001 patch only half-fixes the problems reported in my
v24-0001 review.

~

Background (from the commit message):
This commit enables support for the 'include_generated_columns' option
in logical replication, allowing the transmission of generated column
information and data alongside regular table changes.

~

The broken TAP test scenario in question is replicating from a
"not-generated" column to a "generated" column. As the generated
column is not on the publishing side, IMO the
'include_generated_columns' option should have zero effect here.

In other words, I expect this TAP test for 'include_generated_columns
= true' case should also be failing, as I wrote already yesterday:

+# FIXME
+# Since there is no generated column on the publishing side this should give
+# the same result as the previous test. -- e.g. something like:
+# ERROR:  logical replication target relation
"public.tab_nogen_to_gen" is missing
+# replicated column: "b"

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Thu, 8 Aug 2024 at 10:53, Shubham Khanna <khannashubham1197@gmail.com> wrote:
>
> On Wed, Aug 7, 2024 at 1:31 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi Shubham,
> >
> > Here are my review comments for patch v24-0001
> >
> > I think the TAP tests have incorrect expected results for the nogen-to-gen case.
> >
> > Whereas the HEAD code will cause "ERROR" for this test scenario, patch
> > 0001 does not. IMO the behaviour should be unchanged for this scenario
> > which has no generated column on the publisher side. So it seems this
> > is a bug in patch 0001.
> >
> > FYI, I have included "FIXME" comments in the attached top-up diff
> > patch to show which test cases I think are expecting wrong results.
> >
>
> Fixed all the comments. The attached Patch(v25-0001) contains all the changes.

Few comments:
1) Can we add one test with replica identity full to show that
generated column is included in case of update operation with
test_decoding.

2) At the end of the file generated_columns.sql a newline is missing:
+-- when 'include-generated-columns' = '0' the generated column 'b'
values will not be replicated
+INSERT INTO gencoltable (a) VALUES (7), (8), (9);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '0');
+
+DROP TABLE gencoltable;
+
+SELECT 'stop' FROM pg_drop_replication_slot('regression_slot');
\ No newline at end of file

3)
3.a)This can be changed:
+-- when 'include-generated-columns' is not set the generated column
'b' values will be replicated
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);

to:
-- By default, 'include-generated-columns' is enabled, so the values
for the generated column 'b' will be replicated even if it is not
explicitly specified.

3.b) This can be changed:
-- when 'include-generated-columns' = '1' the generated column 'b'
values will be replicated
to:
-- when 'include-generated-columns' is enabled, the values of the
generated column 'b' will be replicated.

3.c) This can be changed:
-- when 'include-generated-columns' = '0' the generated column 'b'
values will not be replicated
to:
-- when 'include-generated-columns' is disabled, the values of the
generated column 'b' will not be replicated.

4) I did not see any test for dump, can we add one test for this.

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Tue, Jul 23, 2024 at 9:23 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Fri, Jul 19, 2024 at 4:01 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >
> > On Thu, 18 Jul 2024 at 13:55, Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > Hi, here are some review comments for v19-0002
> > > ======
> > > src/test/subscription/t/004_sync.pl
> > >
> > > 1.
> > > This new test is not related to generated columns. IIRC, this is just
> > > some test that we discovered missing during review of this thread. As
> > > such, I think this change can be posted/patched separately from this
> > > thread.
> > >
> > I have removed the test for this thread.
> >
> > I have also addressed the remaining comments for v19-0002 patch.
>
> Hi, I have no more review comments for patch v20-0002 at this time.
>
> I saw that the above test was removed from this thread as suggested,
> but I could not find that any new thread was started to propose this
> valuable missing test.
>

I still did not find any new thread for adding the missing test case,
so I started one myself [1].

======
[1] https://www.postgresql.org/message-id/CAHut+PtX8P0EGhsk9p=hQGUHrzxeCSzANXSMKOvYiLX-EjdyNw@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Fri, 16 Aug 2024 at 10:04, Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Thu, Aug 8, 2024 at 12:43 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi Shubham,
> >
> > I think the v25-0001 patch only half-fixes the problems reported in my
> > v24-0001 review.
> >
> > ~
> >
> > Background (from the commit message):
> > This commit enables support for the 'include_generated_columns' option
> > in logical replication, allowing the transmission of generated column
> > information and data alongside regular table changes.
> >
> > ~
> >
> > The broken TAP test scenario in question is replicating from a
> > "not-generated" column to a "generated" column. As the generated
> > column is not on the publishing side, IMO the
> > 'include_generated_columns' option should have zero effect here.
> >
> > In other words, I expect this TAP test for 'include_generated_columns
> > = true' case should also be failing, as I wrote already yesterday:
> >
> > +# FIXME
> > +# Since there is no generated column on the publishing side this should give
> > +# the same result as the previous test. -- e.g. something like:
> > +# ERROR:  logical replication target relation
> > "public.tab_nogen_to_gen" is missing
> > +# replicated column: "b"
>
> I have fixed the given comments. The attached v26-0001 Patch contains
> the required changes.

Few comments:
1) There's no need to pass include_generated_columns in this case; we
can retrieve it from ctx->data instead:
@@ -749,7 +764,7 @@ maybe_send_schema(LogicalDecodingContext *ctx,
 static void
 send_relation_and_attrs(Relation relation, TransactionId xid,
                                                LogicalDecodingContext *ctx,
-                                               Bitmapset *columns)
+                                               Bitmapset *columns,
bool include_generated_columns)
 {
        TupleDesc       desc = RelationGetDescr(relation);
        int                     i;
@@ -766,7 +781,10 @@ send_relation_and_attrs(Relation relation,
TransactionId xid,

2) Commit message:
If the subscriber-side column is also a generated column then this option
has no effect; the replicated data will be ignored and the subscriber
column will be filled as normal with the subscriber-side computed or
default data.

An error will occur in this case, so the message should be updated accordingly.

3) The current test is structured as follows: a) Create all required
tables b) Insert data into tables c) Create publications d) Create
subscriptions e) Perform inserts and verify
This approach can make reviewing and maintenance somewhat challenging.

Instead, could you modify it to: a) Create the required table for a
single test b) Insert data for this test c) Create the publication for
this test d) Create the subscriptions for this test e) Perform inserts
and verify f) Clean up

4) We can maintain the test as a separate 0002 patch, as it may need a
few rounds of review and final adjustments. Once it's fully completed,
we can merge it back in.

5) Once we create and drop publication/subscriptions for individual
tests, we won't need such extensive configuration; we should be able
to run them with default values:
+$node_publisher->append_conf(
+       'postgresql.conf',
+       "max_wal_senders = 20
+        max_replication_slots = 20");

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham, here are my review comments for the TAP tests patch v27-0002

======
Commit message

Tap tests for 'include-generated-columns'

~

But, it's more than that-- these are the TAP tests for all
combinations of replication related to generated columns. i.e. both
with and without 'include_generated_columns' option enabled.

======
src/test/subscription/t/011_generated.pl

I was mistaken, thinking that the v27-0002 had already been refactored
according to Vignesh's last review but it is not done yet, so I am not
going to post detailed review comments until the restructuring is
completed.

~

OTOH, there are some problems I felt have crept into v26-0001 (TAP
test is same as v27-0002), so maybe try to also take care of them (see
below) in v28-0002.

In no particular order:

* I felt it is almost useless now to have the "combo" (
"regress_pub_combo")  publication. It used to have many tables when
you first created it but with every version posted it is publishing
less and less so now there are only 2 tables in it. Better to have a
specific publication for each table now and forget about "combos"

* The "TEST tab_gen_to_gen initial sync" seems to be not even checking
the table data. Why not? e.g. Even if you expect no data, you should
test for it.

* The "TEST tab_gen_to_gen replication" seems to be not even checking
the table data. Why not?

* Multiple XXX comments like "... it needs more study to determine if
the above result was actually correct, or a PG17 bug..." should be
removed. AFAIK we should well understand the expected results for all
combinations by now.

* The "TEST tab_order replication" is now getting an error saying
<missing replicated column: "c">, Now, that may now be the correct
error for this situation, but in that case, then I think the test is
not longer testing what it was intended to test (i.e. that column
order does not matter....) Probably the table definition needs
adjusting to make sure we are testing whenwe want to test, and not
just making some random scenario "PASS".

* The test "# TEST tab_alter" expected empty result also seems
unhelpful. It might be related to the previous bullet.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Mon, Aug 19, 2024 at 11:01 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, Here are my review comments for v27-0001.
>
> ======
> contrib/test_decoding/expected/generated_columns.out
> contrib/test_decoding/sql/generated_columns.sql
>
> +-- By default, 'include-generated-columns' is enabled, so the values
> for the generated column 'b' will be replicated even if it is not
> explicitly specified.
>
> nit - The "default" is only like this for "test_decoding" (e.g., the
> CREATE SUBSCRIPTION option is the opposite), so let's make the comment
> clearer about that.
> nit - Use sentence case in the comments.

I have addressed all the comments in the v-28-0001 Patch. Please refer
to the updated v28-0001 Patch here in [1]. See [1] for the changes
added.

[1] https://www.postgresql.org/message-id/CAHv8RjL7rkxk6qSroRPg5ZARWMdK2Nd4-QyYNeoc2vhBm3cdDg%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Mon, Aug 19, 2024 at 12:40 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shubham, here are my review comments for the TAP tests patch v27-0002
>
> ======
> Commit message
>
> Tap tests for 'include-generated-columns'
>
> ~
>
> But, it's more than that-- these are the TAP tests for all
> combinations of replication related to generated columns. i.e. both
> with and without 'include_generated_columns' option enabled.
>
> ======
> src/test/subscription/t/011_generated.pl
>
> I was mistaken, thinking that the v27-0002 had already been refactored
> according to Vignesh's last review but it is not done yet, so I am not
> going to post detailed review comments until the restructuring is
> completed.
>
> ~
>
> OTOH, there are some problems I felt have crept into v26-0001 (TAP
> test is same as v27-0002), so maybe try to also take care of them (see
> below) in v28-0002.
>
> In no particular order:
>
> * I felt it is almost useless now to have the "combo" (
> "regress_pub_combo")  publication. It used to have many tables when
> you first created it but with every version posted it is publishing
> less and less so now there are only 2 tables in it. Better to have a
> specific publication for each table now and forget about "combos"
>
> * The "TEST tab_gen_to_gen initial sync" seems to be not even checking
> the table data. Why not? e.g. Even if you expect no data, you should
> test for it.
>
> * The "TEST tab_gen_to_gen replication" seems to be not even checking
> the table data. Why not?
>
> * Multiple XXX comments like "... it needs more study to determine if
> the above result was actually correct, or a PG17 bug..." should be
> removed. AFAIK we should well understand the expected results for all
> combinations by now.
>
> * The "TEST tab_order replication" is now getting an error saying
> <missing replicated column: "c">, Now, that may now be the correct
> error for this situation, but in that case, then I think the test is
> not longer testing what it was intended to test (i.e. that column
> order does not matter....) Probably the table definition needs
> adjusting to make sure we are testing whenwe want to test, and not
> just making some random scenario "PASS".
>
> * The test "# TEST tab_alter" expected empty result also seems
> unhelpful. It might be related to the previous bullet.

I have addressed all the comments in the v-28-0002 Patch. Please refer
to the updated v28-0002 Patch here in [1]. See [1] for the changes
added.

[1] https://www.postgresql.org/message-id/CAHv8RjL7rkxk6qSroRPg5ZARWMdK2Nd4-QyYNeoc2vhBm3cdDg%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Thu, 22 Aug 2024 at 10:22, Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Fri, Aug 16, 2024 at 2:47 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Fri, 16 Aug 2024 at 10:04, Shubham Khanna
> > <khannashubham1197@gmail.com> wrote:
> > >
> > > On Thu, Aug 8, 2024 at 12:43 PM Peter Smith <smithpb2250@gmail.com> wrote:
> > > >
> > > > Hi Shubham,
> > > >
> > > > I think the v25-0001 patch only half-fixes the problems reported in my
> > > > v24-0001 review.
> > > >
> > > > ~
> > > >
> > > > Background (from the commit message):
> > > > This commit enables support for the 'include_generated_columns' option
> > > > in logical replication, allowing the transmission of generated column
> > > > information and data alongside regular table changes.
> > > >
> > > > ~
> > > >
> > > > The broken TAP test scenario in question is replicating from a
> > > > "not-generated" column to a "generated" column. As the generated
> > > > column is not on the publishing side, IMO the
> > > > 'include_generated_columns' option should have zero effect here.
> > > >
> > > > In other words, I expect this TAP test for 'include_generated_columns
> > > > = true' case should also be failing, as I wrote already yesterday:
> > > >
> > > > +# FIXME
> > > > +# Since there is no generated column on the publishing side this should give
> > > > +# the same result as the previous test. -- e.g. something like:
> > > > +# ERROR:  logical replication target relation
> > > > "public.tab_nogen_to_gen" is missing
> > > > +# replicated column: "b"
> > >
> > > I have fixed the given comments. The attached v26-0001 Patch contains
> > > the required changes.
> >
> > Few comments:
> > 1) There's no need to pass include_generated_columns in this case; we
> > can retrieve it from ctx->data instead:
> > @@ -749,7 +764,7 @@ maybe_send_schema(LogicalDecodingContext *ctx,
> >  static void
> >  send_relation_and_attrs(Relation relation, TransactionId xid,
> >                                                 LogicalDecodingContext *ctx,
> > -                                               Bitmapset *columns)
> > +                                               Bitmapset *columns,
> > bool include_generated_columns)
> >  {
> >         TupleDesc       desc = RelationGetDescr(relation);
> >         int                     i;
> > @@ -766,7 +781,10 @@ send_relation_and_attrs(Relation relation,
> > TransactionId xid,
> >
> > 2) Commit message:
> > If the subscriber-side column is also a generated column then this option
> > has no effect; the replicated data will be ignored and the subscriber
> > column will be filled as normal with the subscriber-side computed or
> > default data.
> >
> > An error will occur in this case, so the message should be updated accordingly.
> >
> > 3) The current test is structured as follows: a) Create all required
> > tables b) Insert data into tables c) Create publications d) Create
> > subscriptions e) Perform inserts and verify
> > This approach can make reviewing and maintenance somewhat challenging.
> >
> > Instead, could you modify it to: a) Create the required table for a
> > single test b) Insert data for this test c) Create the publication for
> > this test d) Create the subscriptions for this test e) Perform inserts
> > and verify f) Clean up
> >
> > 4) We can maintain the test as a separate 0002 patch, as it may need a
> > few rounds of review and final adjustments. Once it's fully completed,
> > we can merge it back in.
> >
> > 5) Once we create and drop publication/subscriptions for individual
> > tests, we won't need such extensive configuration; we should be able
> > to run them with default values:
> > +$node_publisher->append_conf(
> > +       'postgresql.conf',
> > +       "max_wal_senders = 20
> > +        max_replication_slots = 20");
>
> Fixed all the given comments. The attached patches contain the
> suggested changes.

Few comments:
1) This is already been covered in the first existing test case, may
be this can be removed:
# =============================================================================
# Testcase start: Subscriber table with a generated column (b) on the
# subscriber, where column (b) is not present on the publisher.

This existing test:
$node_publisher->safe_psql(
'postgres', qq(
CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED);
INSERT INTO tab1 (a) VALUES (1), (2), (3);
CREATE PUBLICATION pub1 FOR ALL TABLES;
));

$node_subscriber->safe_psql(
'postgres', qq(
CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a *
22) STORED, c int);
CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub1;
));

2) Can we have this test verified with include_generated_columns =
true too like how others are done:
my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';

$node_publisher->safe_psql(
'postgres', qq(
CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED);
INSERT INTO tab1 (a) VALUES (1), (2), (3);
CREATE PUBLICATION pub1 FOR ALL TABLES;
));

$node_subscriber->safe_psql(
'postgres', qq(
CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a *
22) STORED, c int);
CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub1;
));

3) There is a typo in this comment:
3.a)  # Testcase start: Publisher table with a generated column (b)
and subscriber
# table a with regular column (b).

It should be:
# Testcase start: Publisher table with a generated column (b) and subscriber
# table with a regular column (b).

3.b) similarly here too:
# Testcase end: Publisher table with a generated column (b) and subscriber
# table a with regular column (b).

3.c) The comments are not consistent, sometimes mentioned as
column(b) and sometimes as column (b). We can keep it consistent.

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Mon, May 20, 2024 at 1:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, May 8, 2024 at 4:14 PM Shubham Khanna
> <khannashubham1197@gmail.com> wrote:
> >
> > On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal
> > <dangwalrajendra888@gmail.com> wrote:
> > >
> > > Hi PG Hackers.
> > >
> > > We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated columns.
> > > Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for
trackingsuch feature requests? Any insights or assistance you can provide on this matter would be greatly appreciated. 
> >
> > The attached patch has the changes to support capturing generated
> > column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
> > ‘include_generated_columns’ option is specified, the generated column
> > information and generated column data also will be sent.
>
> As Euler mentioned earlier, I think it's a decision not to replicate
> generated columns because we don't know the target table on the
> subscriber has the same expression and there could be locale issues
> even if it looks the same. I can see that a benefit of this proposal
> would be to save cost to compute generated column values if the user
> wants the target table on the subscriber to have exactly the same data
> as the publisher's one. Are there other benefits or use cases?
>

The cost is one but the other is the user may not want the data to be
different based on volatile functions like timeofday() or the table on
subscriber won't have the column marked as generated. Now, considering
such use cases, is providing a subscription-level option a good idea
as the patch is doing? I understand that this can serve the purpose
but it could also lead to having the same behavior for all the tables
in all the publications for a subscription which may or may not be
what the user expects. This could lead to some performance overhead
(due to always sending generated columns for all the tables) for cases
where the user needs it only for a subset of tables.

I think we should consider it as a table-level option while defining
publication in some way. A few ideas could be: (a) We ask users to
explicitly mention the generated column in the columns list while
defining publication. This has a drawback such that users need to
specify the column list even when all columns need to be replicated.
(b) We can have some new syntax to indicate the same like: CREATE
PUBLICATION pub1 FOR TABLE t1 INCLUDE GENERATED COLS, t2, t3, t4
INCLUDE ..., t5;. I haven't analyzed the feasibility of this, so there
could be some challenges but we can at least investigate it.

Yet another idea is to keep this as a publication option
(include_generated_columns or publish_generated_columns) similar to
"publish_via_partition_root". Normally, "publish_via_partition_root"
is used when tables on either side have different partition
hierarchies which is somewhat the case here.

Thoughts?

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Masahiko Sawada
Date:
On Wed, Aug 28, 2024 at 1:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, May 20, 2024 at 1:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, May 8, 2024 at 4:14 PM Shubham Khanna
> > <khannashubham1197@gmail.com> wrote:
> > >
> > > On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal
> > > <dangwalrajendra888@gmail.com> wrote:
> > > >
> > > > Hi PG Hackers.
> > > >
> > > > We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated
columns.
> > > > Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for
trackingsuch feature requests? Any insights or assistance you can provide on this matter would be greatly appreciated. 
> > >
> > > The attached patch has the changes to support capturing generated
> > > column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
> > > ‘include_generated_columns’ option is specified, the generated column
> > > information and generated column data also will be sent.
> >
> > As Euler mentioned earlier, I think it's a decision not to replicate
> > generated columns because we don't know the target table on the
> > subscriber has the same expression and there could be locale issues
> > even if it looks the same. I can see that a benefit of this proposal
> > would be to save cost to compute generated column values if the user
> > wants the target table on the subscriber to have exactly the same data
> > as the publisher's one. Are there other benefits or use cases?
> >
>
> The cost is one but the other is the user may not want the data to be
> different based on volatile functions like timeofday()

Shouldn't the generation expression be immutable?

> or the table on
> subscriber won't have the column marked as generated.

Yeah, it would be another use case.

>  Now, considering
> such use cases, is providing a subscription-level option a good idea
> as the patch is doing? I understand that this can serve the purpose
> but it could also lead to having the same behavior for all the tables
> in all the publications for a subscription which may or may not be
> what the user expects. This could lead to some performance overhead
> (due to always sending generated columns for all the tables) for cases
> where the user needs it only for a subset of tables.

Yeah, it's a downside and I think it's less flexible. For example, if
users want to send both tables with generated columns and tables
without generated columns, they would have to create at least two
subscriptions. Also, they would have to include a different set of
tables to two publications.

>
> I think we should consider it as a table-level option while defining
> publication in some way. A few ideas could be: (a) We ask users to
> explicitly mention the generated column in the columns list while
> defining publication. This has a drawback such that users need to
> specify the column list even when all columns need to be replicated.
> (b) We can have some new syntax to indicate the same like: CREATE
> PUBLICATION pub1 FOR TABLE t1 INCLUDE GENERATED COLS, t2, t3, t4
> INCLUDE ..., t5;. I haven't analyzed the feasibility of this, so there
> could be some challenges but we can at least investigate it.

I think we can create a publication for a single table, so what we can
do with this feature can be done also by the idea you described below.

> Yet another idea is to keep this as a publication option
> (include_generated_columns or publish_generated_columns) similar to
> "publish_via_partition_root". Normally, "publish_via_partition_root"
> is used when tables on either side have different partition
> hierarchies which is somewhat the case here.

It sounds more useful to me.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Thu, Aug 29, 2024 at 8:44 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Aug 28, 2024 at 1:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, May 20, 2024 at 1:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > As Euler mentioned earlier, I think it's a decision not to replicate
> > > generated columns because we don't know the target table on the
> > > subscriber has the same expression and there could be locale issues
> > > even if it looks the same. I can see that a benefit of this proposal
> > > would be to save cost to compute generated column values if the user
> > > wants the target table on the subscriber to have exactly the same data
> > > as the publisher's one. Are there other benefits or use cases?
> > >
> >
> > The cost is one but the other is the user may not want the data to be
> > different based on volatile functions like timeofday()
>
> Shouldn't the generation expression be immutable?
>

Yes, I missed that point.

> > or the table on
> > subscriber won't have the column marked as generated.
>
> Yeah, it would be another use case.
>

Right, apart from that I am not aware of other use cases. If they
have, I would request Euler or Rajendra to share any other use case.

> >  Now, considering
> > such use cases, is providing a subscription-level option a good idea
> > as the patch is doing? I understand that this can serve the purpose
> > but it could also lead to having the same behavior for all the tables
> > in all the publications for a subscription which may or may not be
> > what the user expects. This could lead to some performance overhead
> > (due to always sending generated columns for all the tables) for cases
> > where the user needs it only for a subset of tables.
>
> Yeah, it's a downside and I think it's less flexible. For example, if
> users want to send both tables with generated columns and tables
> without generated columns, they would have to create at least two
> subscriptions.
>

Agreed and that would consume more resources.

> Also, they would have to include a different set of
> tables to two publications.
>
> >
> > I think we should consider it as a table-level option while defining
> > publication in some way. A few ideas could be: (a) We ask users to
> > explicitly mention the generated column in the columns list while
> > defining publication. This has a drawback such that users need to
> > specify the column list even when all columns need to be replicated.
> > (b) We can have some new syntax to indicate the same like: CREATE
> > PUBLICATION pub1 FOR TABLE t1 INCLUDE GENERATED COLS, t2, t3, t4
> > INCLUDE ..., t5;. I haven't analyzed the feasibility of this, so there
> > could be some challenges but we can at least investigate it.
>
> I think we can create a publication for a single table, so what we can
> do with this feature can be done also by the idea you described below.
>
> > Yet another idea is to keep this as a publication option
> > (include_generated_columns or publish_generated_columns) similar to
> > "publish_via_partition_root". Normally, "publish_via_partition_root"
> > is used when tables on either side have different partition
> > hierarchies which is somewhat the case here.
>
> It sounds more useful to me.
>

Fair enough. Let's see if anyone else has any preference among the
proposed methods or can think of a better way.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Masahiko Sawada
Date:
On Mon, Sep 9, 2024 at 2:38 AM Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Thu, Aug 29, 2024 at 11:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Aug 29, 2024 at 8:44 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Wed, Aug 28, 2024 at 1:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Mon, May 20, 2024 at 1:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > > As Euler mentioned earlier, I think it's a decision not to replicate
> > > > > generated columns because we don't know the target table on the
> > > > > subscriber has the same expression and there could be locale issues
> > > > > even if it looks the same. I can see that a benefit of this proposal
> > > > > would be to save cost to compute generated column values if the user
> > > > > wants the target table on the subscriber to have exactly the same data
> > > > > as the publisher's one. Are there other benefits or use cases?
> > > > >
> > > >
> > > > The cost is one but the other is the user may not want the data to be
> > > > different based on volatile functions like timeofday()
> > >
> > > Shouldn't the generation expression be immutable?
> > >
> >
> > Yes, I missed that point.
> >
> > > > or the table on
> > > > subscriber won't have the column marked as generated.
> > >
> > > Yeah, it would be another use case.
> > >
> >
> > Right, apart from that I am not aware of other use cases. If they
> > have, I would request Euler or Rajendra to share any other use case.
> >
> > > >  Now, considering
> > > > such use cases, is providing a subscription-level option a good idea
> > > > as the patch is doing? I understand that this can serve the purpose
> > > > but it could also lead to having the same behavior for all the tables
> > > > in all the publications for a subscription which may or may not be
> > > > what the user expects. This could lead to some performance overhead
> > > > (due to always sending generated columns for all the tables) for cases
> > > > where the user needs it only for a subset of tables.
> > >
> > > Yeah, it's a downside and I think it's less flexible. For example, if
> > > users want to send both tables with generated columns and tables
> > > without generated columns, they would have to create at least two
> > > subscriptions.
> > >
> >
> > Agreed and that would consume more resources.
> >
> > > Also, they would have to include a different set of
> > > tables to two publications.
> > >
> > > >
> > > > I think we should consider it as a table-level option while defining
> > > > publication in some way. A few ideas could be: (a) We ask users to
> > > > explicitly mention the generated column in the columns list while
> > > > defining publication. This has a drawback such that users need to
> > > > specify the column list even when all columns need to be replicated.
> > > > (b) We can have some new syntax to indicate the same like: CREATE
> > > > PUBLICATION pub1 FOR TABLE t1 INCLUDE GENERATED COLS, t2, t3, t4
> > > > INCLUDE ..., t5;. I haven't analyzed the feasibility of this, so there
> > > > could be some challenges but we can at least investigate it.
> > >
> > > I think we can create a publication for a single table, so what we can
> > > do with this feature can be done also by the idea you described below.
> > >
> > > > Yet another idea is to keep this as a publication option
> > > > (include_generated_columns or publish_generated_columns) similar to
> > > > "publish_via_partition_root". Normally, "publish_via_partition_root"
> > > > is used when tables on either side have different partitions
> > > > hierarchies which is somewhat the case here.
> > >
> > > It sounds more useful to me.
> > >
> >
> > Fair enough. Let's see if anyone else has any preference among the
> > proposed methods or can think of a better way.
>
> I have fixed the current issue. I have added the option
> 'publish_generated_columns' to the publisher side and created the new
> test cases accordingly.
> The attached patches contain the desired changes.
>

Thank you for updating the patches. I have some comments:

Do we really need to add this option to test_decoding? I think it
would be good if this improves the test coverage. Otherwise, I'm not
sure we need this part. If we want to add it, I think it would be
better to have it in a separate patch.

---
+         <para>
+          If the publisher-side column is also a generated column
then this option
+          has no effect; the publisher column will be filled as normal with the
+          publisher-side computed or default data.
+         </para>

I don't understand this description. Why does this option have no
effect if the publisher-side column is a generated column?

---
+         <para>
+         This parameter can only be set <literal>true</literal> if
<literal>copy_data</literal> is
+         set to <literal>false</literal>.
+         </para>

If I understand this patch correctly, it doesn't disallow to set
copy_data to true when the publish_generated_columns option is
specified. But do we want to disallow it? I think it would be more
useful and understandable if we allow to use both
publish_generated_columns (publisher option) and copy_data (subscriber
option) at the same time.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Tue, Sep 10, 2024 at 2:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Sep 9, 2024 at 2:38 AM Shubham Khanna
> <khannashubham1197@gmail.com> wrote:
> >
>
> Thank you for updating the patches. I have some comments:
>
> Do we really need to add this option to test_decoding?
>

I don't see any reason to have such an option in test_decoding,
otherwise, we need a separate option for each publication option. I
guess this is leftover of the previous subscriber-side approach.

> I think it
> would be good if this improves the test coverage. Otherwise, I'm not
> sure we need this part. If we want to add it, I think it would be
> better to have it in a separate patch.
>

Right.

> ---
> +         <para>
> +          If the publisher-side column is also a generated column
> then this option
> +          has no effect; the publisher column will be filled as normal with the
> +          publisher-side computed or default data.
> +         </para>
>
> I don't understand this description. Why does this option have no
> effect if the publisher-side column is a generated column?
>

Shouldn't it be subscriber-side?

I have one additional comment:
/*
- * If the publication is FOR ALL TABLES then it is treated the same as
- * if there are no column lists (even if other publications have a
- * list).
+ * If the publication is FOR ALL TABLES and include generated columns
+ * then it is treated the same as if there are no column lists (even
+ * if other publications have a list).
  */
- if (!pub->alltables)
+ if (!pub->alltables || !pub->pubgencolumns)

Why do we treat pubgencolumns at the same level as the FOR ALL TABLES
case? I thought that if the user has provided a column list, we only
need to publish the specified columns even when the
publish_generated_columns option is set.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
IIUC, previously there was a subscriber side option
'include_generated_columns', but now since v30* there is a publisher
side option 'publish_generated_columns'.

Fair enough, but in the v30* patches I can still see remnants of the
old name 'include_generated_columns' all over the place:
- in the commit message
- in the code (struct field names, param names etc)
- in the comments
- in the docs

If the decision is to call the new PUBLICATION option
'publish_generated_columns', then can't we please use that one name
*everywhere* -- e.g. replace all cases where any old name is still
lurking?

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Here are a some more review comments for patch v30-0001.

======
src/sgml/ref/create_publication.sgml

1.
+         <para>
+          If the publisher-side column is also a generated column
then this option
+          has no effect; the publisher column will be filled as normal with the
+          publisher-side computed or default data.
+         </para>

It should say "subscriber-side"; not "publisher-side". The same was
already reported by Sawada-San [1].

~~~

2.
+         <para>
+         This parameter can only be set <literal>true</literal> if
<literal>copy_data</literal> is
+         set to <literal>false</literal>.
+         </para>

IMO this limitation should be addressed by patch 0001 like it was
already done in the previous patches (e.g. v22-0002). I think
Sawada-san suggested the same [1].

Anyway, 'copy_data' is not a PUBLICATION option, so the fact it is
mentioned like this without any reference to the SUBSCRIPTION seems
like a cut/paste error from the previous implementation.

======
src/backend/catalog/pg_publication.c

3. pub_collist_validate
- if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
- ereport(ERROR,
- errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
- errmsg("cannot use generated column \"%s\" in publication column list",
-    colname));
-

Instead of just removing this ERROR entirely here, I thought it would
be more user-friendly to give a WARNING if the PUBLICATION's explicit
column list includes generated cols when the option
"publish_generated_columns" is false. This combination doesn't seem
like something a user would do intentionally, so just silently
ignoring it (like the current patch does) is likely going to give
someone unexpected results/grief.

======
src/backend/replication/logical/proto.c

4. logicalrep_write_tuple, and logicalrep_write_attrs:

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

Why aren't you also checking the new PUBLICATION option here and
skipping all gencols if the "publish_generated_columns" option is
false? Or is the BMS of pgoutput_column_list_init handling this case?
Maybe there should be an Assert for this?

======
src/backend/replication/pgoutput/pgoutput.c

5. send_relation_and_attrs

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

Same question as #4.

~~~

6. prepare_all_columns_bms and pgoutput_column_list_init

+ if (att->attgenerated && !pub->pubgencolumns)
+ cols = bms_del_member(cols, i + 1);

IIUC, the algorithm seems overly tricky filling the BMS with all
columns, before straight away conditionally removing the generated
columns. Can't it be refactored to assign all the correct columns
up-front, to avoid calling bms_del_member()?

======
src/bin/pg_dump/pg_dump.c

7. getPublications

IIUC, there is lots of missing SQL code here (for all older versions)
that should be saying "false AS pubgencolumns".
e.g. compare the SQL with how "false AS pubviaroot" is used.

======
src/bin/pg_dump/t/002_pg_dump.pl

8. Missing tests?

I expected to see a pg_dump test for this new PUBLICATION option.

======
src/test/regress/sql/publication.sql

9. Missing tests?

How about adding another test case that checks this new option must be
"Boolean"?

~~~

10. Missing tests?

--- error: generated column "d" can't be in list
+-- ok: generated columns can be in the list too
 ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d);
+ALTER PUBLICATION testpub_fortable DROP TABLE testpub_tbl5;

(see my earlier comment #3)

IMO there should be another test case for a WARNING here if the user
attempts to include generated column 'd' in an explicit PUBLICATION
column list while the "publish_generated-columns" is false.

======
[1]  https://www.postgresql.org/message-id/CAD21AoA-tdTz0G-vri8KM2TXeFU8RCDsOpBXUBCgwkfokF7%3DjA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham,

Here are my general comments about the v30-0002 TAP test patch.

======

1.
As mentioned in a previous post [1] there are still several references
to the old 'include_generated_columns' option remaining in this patch.
They need replacing.

~~~

2.
+# Furthermore, all combinations are tested for publish_generated_columns=false
+# (see subscription sub1 of database 'postgres'), and
+# publish_generated_columns=true (see subscription sub2 of database
+# 'test_igc_true').

Those 'see subscription' notes and 'test_igc_true' are from the old
implementation. Those need fixing. BTW, 'test_pgc_true' is a better
name for the database now that the option name is changed.

In the previous implementation, the TAP test environment was:
- a common publication pub, on the 'postgres' database
- a subscription sub1 with option include_generated_columns=false, on
the 'postgres' database
- a subscription sub2 with option include_generated_columns=true, on
the 'test_igc_true' database

Now it is like:
- a publication pub1, on the 'postgres' database, with option
publish_generated_columns=false
- a publication pub2, on the 'postgres' database, with option
publish_generated_columns=true
- a subscription sub1, on the 'postgres' database for publication pub1
- a subscription sub2, on the 'test_pgc_true' database for publication pub2

It would be good to document that above convention because knowing how
the naming/numbering works makes it a lot easier to read the
subsequent test cases. Of course, it is really important to
name/number everything consistently otherwise these tests become hard
to follow.  AFAICT it is mostly OK, but the generated -> generated
publication should be called 'regress_pub2_gen_to_gen'

~~~

3.
+# Create table.
+$node_publisher->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE tab_gen_to_nogen (a int, b int GENERATED ALWAYS AS (a *
2) STORED);
+ INSERT INTO tab_gen_to_nogen (a) VALUES (1), (2), (3);
+));
+
+# Create publication with publish_generated_columns=false.
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION regress_pub1_gen_to_nogen FOR TABLE
tab_gen_to_nogen WITH (publish_generated_columns = false)"
+);
+
+# Create table and subscription with copy_data=true.
+$node_subscriber->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE tab_gen_to_nogen (a int, b int);
+ CREATE SUBSCRIPTION regress_sub1_gen_to_nogen CONNECTION
'$publisher_connstr' PUBLICATION regress_pub1_gen_to_nogen WITH
(copy_data = true);
+));
+
+# Create publication with publish_generated_columns=true.
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION regress_pub2_gen_to_nogen FOR TABLE
tab_gen_to_nogen WITH (publish_generated_columns = true)"
+);
+

The code can be restructured to be simpler. Both publications are
always created on the 'postgres' database at the publisher node, so
let's just create them at the same time as the creating the publisher
table. It also makes readability much better e.g.

# Create table, and publications
$node_publisher->safe_psql(
'postgres', qq(
CREATE TABLE tab_gen_to_nogen (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
INSERT INTO tab_gen_to_nogen (a) VALUES (1), (2), (3);
CREATE PUBLICATION regress_pub1_gen_to_nogen FOR TABLE
tab_gen_to_nogen WITH (publish_generated_columns = false);
CREATE PUBLICATION regress_pub2_gen_to_nogen FOR TABLE
tab_gen_to_nogen WITH (publish_generated_columns = true);
));

IFAICT this same simplification can be repeated multiple times in this TAP file.

~~

Similarly, it would be neater to combine DROP PUBLICATION's together too.

~~~

4.
Hopefully, the generated column 'copy_data' can be implemented again
soon for subscriptions, and then the initial sync tests here can be
properly implemented instead of the placeholders currently in patch
0002.

======
[1] https://www.postgresql.org/message-id/CAHut%2BPuDJToG%3DV-ogTi9_6fnhhn2S0%2BsVRGPynhcf9mEh0Q%3DLA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Because this feature is now being implemented as a PUBLICATION option,
there is another scenario that might need consideration; I am thinking
about where the same table is published by multiple PUBLICATIONS (with
different option settings) that are subscribed by a single
SUBSCRIPTION.

e.g.1
-----
CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
-----

e.g.2
-----
CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns = true);
CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
-----

Do you know if this case is supported? If yes, then which publication
option value wins?

The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having
several publications in which the same table has been published with
different column lists are not supported."

Perhaps the user is supposed to deduce that the example above would
work OK if table 't1' has no generated cols. OTOH, if it did have
generated cols then the PUBLICATION column lists must be different and
therefore it is "not supported" (??).

I have not tried this to see what happens, but even if it behaves as
expected, there should probably be some comments/docs/tests for this
scenario to clarify it for the user.

Notice that "publish_via_partition_root" has a similar conundrum, but
in that case, the behaviour is documented in the CREATE PUBLICATION
docs [2]. So, maybe  "publish_generated_columns" should be documented
a bit like that.

======
[1] https://www.postgresql.org/docs/devel/sql-createsubscription.html
[2] https://www.postgresql.org/docs/devel/sql-createpublication.html

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Tue, 10 Sept 2024 at 09:45, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Sep 10, 2024 at 2:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Sep 9, 2024 at 2:38 AM Shubham Khanna
> > <khannashubham1197@gmail.com> wrote:
> > >
> >
> > Thank you for updating the patches. I have some comments:
> >
> > Do we really need to add this option to test_decoding?
> >
>
> I don't see any reason to have such an option in test_decoding,
> otherwise, we need a separate option for each publication option. I
> guess this is leftover of the previous subscriber-side approach.
>
> > I think it
> > would be good if this improves the test coverage. Otherwise, I'm not
> > sure we need this part. If we want to add it, I think it would be
> > better to have it in a separate patch.
> >
>
> Right.
>
> > ---
> > +         <para>
> > +          If the publisher-side column is also a generated column
> > then this option
> > +          has no effect; the publisher column will be filled as normal with the
> > +          publisher-side computed or default data.
> > +         </para>
> >
> > I don't understand this description. Why does this option have no
> > effect if the publisher-side column is a generated column?
> >
>
> Shouldn't it be subscriber-side?
>
> I have one additional comment:
> /*
> - * If the publication is FOR ALL TABLES then it is treated the same as
> - * if there are no column lists (even if other publications have a
> - * list).
> + * If the publication is FOR ALL TABLES and include generated columns
> + * then it is treated the same as if there are no column lists (even
> + * if other publications have a list).
>   */
> - if (!pub->alltables)
> + if (!pub->alltables || !pub->pubgencolumns)
>
> Why do we treat pubgencolumns at the same level as the FOR ALL TABLES
> case? I thought that if the user has provided a column list, we only
> need to publish the specified columns even when the
> publish_generated_columns option is set.

To handle cases where the publish_generated_columns option isn't
specified for all tables in a publication, the pubgencolumns check
needs to be performed. In such cases, we must create a column list
that excludes generated columns. This process involves:
a) Retrieving all columns for the table and adding them to the column
list. b) Iterating through this column list and removing generated
columns. c) Checking if the remaining column count matches the total
number of columns. If they match, set the relation entry's column list
to NULL, so we don’t need to check columns during data replication. If
they do not match, update the column list to include only the relevant
columns, allowing pgoutput to replicate data for these specific
columns.

This step is necessary because some tables in the publication may
include generated columns.
For tables where publish_generated_columns is set, the column list
will be set to NULL, eliminating the need for a column list check
during data publication.
However, modifying the column list based on publish_generated_columns
is not required, this is addressed in the v31 patch posted by Shubham
at [1].

[1] - https://www.postgresql.org/message-id/CAHv8Rj%2BinrG6EU0rpDJxih8mmYLhCUP6ouTAmMN2RDnT9tE_Gg%40mail.gmail.com

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Fri, Sep 13, 2024 at 9:34 PM Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Tue, Sep 10, 2024 at 2:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Sep 9, 2024 at 2:38 AM Shubham Khanna
> > <khannashubham1197@gmail.com> wrote:
> > >
> > > On Thu, Aug 29, 2024 at 11:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Thu, Aug 29, 2024 at 8:44 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > > On Wed, Aug 28, 2024 at 1:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > On Mon, May 20, 2024 at 1:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > > >
> > > > > > > As Euler mentioned earlier, I think it's a decision not to replicate
> > > > > > > generated columns because we don't know the target table on the
> > > > > > > subscriber has the same expression and there could be locale issues
> > > > > > > even if it looks the same. I can see that a benefit of this proposal
> > > > > > > would be to save cost to compute generated column values if the user
> > > > > > > wants the target table on the subscriber to have exactly the same data
> > > > > > > as the publisher's one. Are there other benefits or use cases?
> > > > > > >
> > > > > >
> > > > > > The cost is one but the other is the user may not want the data to be
> > > > > > different based on volatile functions like timeofday()
> > > > >
> > > > > Shouldn't the generation expression be immutable?
> > > > >
> > > >
> > > > Yes, I missed that point.
> > > >
> > > > > > or the table on
> > > > > > subscriber won't have the column marked as generated.
> > > > >
> > > > > Yeah, it would be another use case.
> > > > >
> > > >
> > > > Right, apart from that I am not aware of other use cases. If they
> > > > have, I would request Euler or Rajendra to share any other use case.
> > > >
> > > > > >  Now, considering
> > > > > > such use cases, is providing a subscription-level option a good idea
> > > > > > as the patch is doing? I understand that this can serve the purpose
> > > > > > but it could also lead to having the same behavior for all the tables
> > > > > > in all the publications for a subscription which may or may not be
> > > > > > what the user expects. This could lead to some performance overhead
> > > > > > (due to always sending generated columns for all the tables) for cases
> > > > > > where the user needs it only for a subset of tables.
> > > > >
> > > > > Yeah, it's a downside and I think it's less flexible. For example, if
> > > > > users want to send both tables with generated columns and tables
> > > > > without generated columns, they would have to create at least two
> > > > > subscriptions.
> > > > >
> > > >
> > > > Agreed and that would consume more resources.
> > > >
> > > > > Also, they would have to include a different set of
> > > > > tables to two publications.
> > > > >
> > > > > >
> > > > > > I think we should consider it as a table-level option while defining
> > > > > > publication in some way. A few ideas could be: (a) We ask users to
> > > > > > explicitly mention the generated column in the columns list while
> > > > > > defining publication. This has a drawback such that users need to
> > > > > > specify the column list even when all columns need to be replicated.
> > > > > > (b) We can have some new syntax to indicate the same like: CREATE
> > > > > > PUBLICATION pub1 FOR TABLE t1 INCLUDE GENERATED COLS, t2, t3, t4
> > > > > > INCLUDE ..., t5;. I haven't analyzed the feasibility of this, so there
> > > > > > could be some challenges but we can at least investigate it.
> > > > >
> > > > > I think we can create a publication for a single table, so what we can
> > > > > do with this feature can be done also by the idea you described below.
> > > > >
> > > > > > Yet another idea is to keep this as a publication option
> > > > > > (include_generated_columns or publish_generated_columns) similar to
> > > > > > "publish_via_partition_root". Normally, "publish_via_partition_root"
> > > > > > is used when tables on either side have different partitions
> > > > > > hierarchies which is somewhat the case here.
> > > > >
> > > > > It sounds more useful to me.
> > > > >
> > > >
> > > > Fair enough. Let's see if anyone else has any preference among the
> > > > proposed methods or can think of a better way.
> > >
> > > I have fixed the current issue. I have added the option
> > > 'publish_generated_columns' to the publisher side and created the new
> > > test cases accordingly.
> > > The attached patches contain the desired changes.
> > >
> >
> > Thank you for updating the patches. I have some comments:
> >
> > Do we really need to add this option to test_decoding? I think it
> > would be good if this improves the test coverage. Otherwise, I'm not
> > sure we need this part. If we want to add it, I think it would be
> > better to have it in a separate patch.
> >
>
> I have removed the option from the test_decoding file.
>
> > ---
> > +         <para>
> > +          If the publisher-side column is also a generated column
> > then this option
> > +          has no effect; the publisher column will be filled as normal with the
> > +          publisher-side computed or default data.
> > +         </para>
> >
> > I don't understand this description. Why does this option have no
> > effect if the publisher-side column is a generated column?
> >
>
> The documentation was incorrect. Currently, replicating from a
> publisher table with a generated column to a subscriber table with a
> generated column will result in an error. This has now been updated.
>
> > ---
> > +         <para>
> > +         This parameter can only be set <literal>true</literal> if
> > <literal>copy_data</literal> is
> > +         set to <literal>false</literal>.
> > +         </para>
> >
> > If I understand this patch correctly, it doesn't disallow to set
> > copy_data to true when the publish_generated_columns option is
> > specified. But do we want to disallow it? I think it would be more
> > useful and understandable if we allow to use both
> > publish_generated_columns (publisher option) and copy_data (subscriber
> > option) at the same time.
> >
>
> Support for tablesync with generated columns was not included in the
> initial patch, and this was reflected in the documentation. The
> functionality for syncing generated column data has been introduced
> with the 0002 patch.
>

Since nothing was said otherwise, I assumed my v30-0001 comments were
addressed in v31, but the new code seems to have quite a few of my
suggested changes missing. If you haven't addressed my review comments
for patch 0001 yet, please say so. OTOH, please give reasons for any
rejected comments.

> The attached v31 patches contain the changes for the same. I won't be
> posting the test patch for now. I will share it once this patch has
> been stabilized.

How can the patch become "stabilized" without associated tests to
verify the behaviour is not broken? e.g. I can write a stable function
that says 2+2=5.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Masahiko Sawada
Date:
On Wed, Sep 11, 2024 at 10:30 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Because this feature is now being implemented as a PUBLICATION option,
> there is another scenario that might need consideration; I am thinking
> about where the same table is published by multiple PUBLICATIONS (with
> different option settings) that are subscribed by a single
> SUBSCRIPTION.
>
> e.g.1
> -----
> CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> -----
>
> e.g.2
> -----
> CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns = true);
> CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> -----
>
> Do you know if this case is supported? If yes, then which publication
> option value wins?

I would expect these option values are processed with OR. That is, we
publish changes of the generated columns if at least one publication
sets publish_generated_columns to true. It seems to me that we treat
multiple row filters in the same way.

>
> The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having
> several publications in which the same table has been published with
> different column lists are not supported."
>
> Perhaps the user is supposed to deduce that the example above would
> work OK if table 't1' has no generated cols. OTOH, if it did have
> generated cols then the PUBLICATION column lists must be different and
> therefore it is "not supported" (??).

With the patch, how should this feature work when users specify a
generated column to the column list and set publish_generated_column =
false, in the first place? raise an error (as we do today)? or always
send NULL?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Tue, Sep 17, 2024 at 7:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Sep 11, 2024 at 10:30 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Because this feature is now being implemented as a PUBLICATION option,
> > there is another scenario that might need consideration; I am thinking
> > about where the same table is published by multiple PUBLICATIONS (with
> > different option settings) that are subscribed by a single
> > SUBSCRIPTION.
> >
> > e.g.1
> > -----
> > CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > -----
> >
> > e.g.2
> > -----
> > CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns = true);
> > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > -----
> >
> > Do you know if this case is supported? If yes, then which publication
> > option value wins?
>
> I would expect these option values are processed with OR. That is, we
> publish changes of the generated columns if at least one publication
> sets publish_generated_columns to true. It seems to me that we treat
> multiple row filters in the same way.
>

I thought that the option "publish_generated_columns" is more related
to "column lists" than "row filters".

Let's say table 't1' has columns 'a', 'b', 'c', 'gen1', 'gen2'.

Then:
PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
is equivalent to
PUBLICATION pub1 FOR TABLE t1(a,b,c,gen1,gen2);

And
PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
is equivalent to
PUBLICATION pub2 FOR TABLE t1(a,b,c);

So, I would expect this to fail because the SUBSCRIPTION docs say
"Subscriptions having several publications in which the same table has
been published with different column lists are not supported."

~~

Here's another example:
PUBLICATION pub3 FOR TABLE t1(a,b);
PUBLICATION pub4 FOR TABLE t1(c);

Won't it be strange (e.g. difficult to explain) why pub1 and pub2
table column lists are allowed to be combined in one subscription, but
pub3 and pub4 in one subscription are not supported due to the
different column lists?

> >
> > The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having
> > several publications in which the same table has been published with
> > different column lists are not supported."
> >
> > Perhaps the user is supposed to deduce that the example above would
> > work OK if table 't1' has no generated cols. OTOH, if it did have
> > generated cols then the PUBLICATION column lists must be different and
> > therefore it is "not supported" (??).
>
> With the patch, how should this feature work when users specify a
> generated column to the column list and set publish_generated_column =
> false, in the first place? raise an error (as we do today)? or always
> send NULL?

For this scenario, I suggested (see [1] #3) that the code could give a
WARNING. As I wrote up-thread: This combination doesn't seem
like something a user would do intentionally, so just silently
ignoring it (which the current patch does) is likely going to give
someone unexpected results/grief.

======
[1] https://www.postgresql.org/message-id/CAHut%2BPuaitgE4tu3nfaR%3DPCQEKjB%3DmpDtZ1aWkbwb%3DJZE8YvqQ%40mail.gmail.com

Kind Regards,
Peter Smith
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Masahiko Sawada
Date:
On Mon, Sep 16, 2024 at 8:09 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> I thought that the option "publish_generated_columns" is more related
> to "column lists" than "row filters".
>
> Let's say table 't1' has columns 'a', 'b', 'c', 'gen1', 'gen2'.
>

> And
> PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> is equivalent to
> PUBLICATION pub2 FOR TABLE t1(a,b,c);

This makes sense to me as it preserves the current behavior.

> Then:
> PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> is equivalent to
> PUBLICATION pub1 FOR TABLE t1(a,b,c,gen1,gen2);

This also makes sense. It would also include future generated columns.

> So, I would expect this to fail because the SUBSCRIPTION docs say
> "Subscriptions having several publications in which the same table has
> been published with different column lists are not supported."

So I agree that it would raise an error if users subscribe to both
pub1 and pub2.

And looking back at your examples,

> > > e.g.1
> > > -----
> > > CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > > -----
> > >
> > > e.g.2
> > > -----
> > > CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns = true);
> > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > > -----

Both examples would not be supported.

> > >
> > > The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having
> > > several publications in which the same table has been published with
> > > different column lists are not supported."
> > >
> > > Perhaps the user is supposed to deduce that the example above would
> > > work OK if table 't1' has no generated cols. OTOH, if it did have
> > > generated cols then the PUBLICATION column lists must be different and
> > > therefore it is "not supported" (??).
> >
> > With the patch, how should this feature work when users specify a
> > generated column to the column list and set publish_generated_column =
> > false, in the first place? raise an error (as we do today)? or always
> > send NULL?
>
> For this scenario, I suggested (see [1] #3) that the code could give a
> WARNING. As I wrote up-thread: This combination doesn't seem
> like something a user would do intentionally, so just silently
> ignoring it (which the current patch does) is likely going to give
> someone unexpected results/grief.

It gives a WARNING, and then publishes the specified generated column
data (even if publish_generated_column = false)? If so, it would mean
that specifying the generated column to the column list means to
publish its data regardless of the publish_generated_column parameter
value.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Tue, Sep 17, 2024 at 4:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Sep 16, 2024 at 8:09 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > I thought that the option "publish_generated_columns" is more related
> > to "column lists" than "row filters".
> >
> > Let's say table 't1' has columns 'a', 'b', 'c', 'gen1', 'gen2'.
> >
>
> > And
> > PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > is equivalent to
> > PUBLICATION pub2 FOR TABLE t1(a,b,c);
>
> This makes sense to me as it preserves the current behavior.
>
> > Then:
> > PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> > is equivalent to
> > PUBLICATION pub1 FOR TABLE t1(a,b,c,gen1,gen2);
>
> This also makes sense. It would also include future generated columns.
>
> > So, I would expect this to fail because the SUBSCRIPTION docs say
> > "Subscriptions having several publications in which the same table has
> > been published with different column lists are not supported."
>
> So I agree that it would raise an error if users subscribe to both
> pub1 and pub2.
>
> And looking back at your examples,
>
> > > > e.g.1
> > > > -----
> > > > CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> > > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > > > -----
> > > >
> > > > e.g.2
> > > > -----
> > > > CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns = true);
> > > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > > > -----
>
> Both examples would not be supported.
>
> > > >
> > > > The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having
> > > > several publications in which the same table has been published with
> > > > different column lists are not supported."
> > > >
> > > > Perhaps the user is supposed to deduce that the example above would
> > > > work OK if table 't1' has no generated cols. OTOH, if it did have
> > > > generated cols then the PUBLICATION column lists must be different and
> > > > therefore it is "not supported" (??).
> > >
> > > With the patch, how should this feature work when users specify a
> > > generated column to the column list and set publish_generated_column =
> > > false, in the first place? raise an error (as we do today)? or always
> > > send NULL?
> >
> > For this scenario, I suggested (see [1] #3) that the code could give a
> > WARNING. As I wrote up-thread: This combination doesn't seem
> > like something a user would do intentionally, so just silently
> > ignoring it (which the current patch does) is likely going to give
> > someone unexpected results/grief.
>
> It gives a WARNING, and then publishes the specified generated column
> data (even if publish_generated_column = false)? If so, it would mean
> that specifying the generated column to the column list means to
> publish its data regardless of the publish_generated_column parameter
> value.
>

No. I meant only it can give the WARNING to tell the user user  "Hey,
there is a conflict here because you said publish_generated_column=
false, but you also specified gencols in the column list".

But always it is the option "publish_generated_column" determines the
final publishing behaviour. So if it says
publish_generated_column=false then it would NOT publish generated
columns even if they are gencols in the column list. I think this
makes sense because when there is no column list specified then that
implicitly means "all columns" and the table might have some gencols,
but still 'publish_generated_columns' is what determines the
behaviour.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Thu, Aug 29, 2024 at 8:44 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Aug 28, 2024 at 1:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > >
> > > As Euler mentioned earlier, I think it's a decision not to replicate
> > > generated columns because we don't know the target table on the
> > > subscriber has the same expression and there could be locale issues
> > > even if it looks the same. I can see that a benefit of this proposal
> > > would be to save cost to compute generated column values if the user
> > > wants the target table on the subscriber to have exactly the same data
> > > as the publisher's one. Are there other benefits or use cases?
> > >
> >
> > The cost is one but the other is the user may not want the data to be
> > different based on volatile functions like timeofday()
>
> Shouldn't the generation expression be immutable?
>
> > or the table on
> > subscriber won't have the column marked as generated.
>
> Yeah, it would be another use case.
>

While speaking with one of the decoding output plugin users, I learned
that this feature will be useful when replicating data to a
non-postgres database using the plugin output, especially when the
other database doesn't have a generated column concept.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Tue, Sep 17, 2024 at 12:04 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Tue, Sep 17, 2024 at 4:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Sep 16, 2024 at 8:09 PM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > I thought that the option "publish_generated_columns" is more related
> > > to "column lists" than "row filters".
> > >
> > > Let's say table 't1' has columns 'a', 'b', 'c', 'gen1', 'gen2'.
> > >
> >
> > > And
> > > PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > > is equivalent to
> > > PUBLICATION pub2 FOR TABLE t1(a,b,c);
> >
> > This makes sense to me as it preserves the current behavior.
> >
> > > Then:
> > > PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> > > is equivalent to
> > > PUBLICATION pub1 FOR TABLE t1(a,b,c,gen1,gen2);
> >
> > This also makes sense. It would also include future generated columns.
> >
> > > So, I would expect this to fail because the SUBSCRIPTION docs say
> > > "Subscriptions having several publications in which the same table has
> > > been published with different column lists are not supported."
> >
> > So I agree that it would raise an error if users subscribe to both
> > pub1 and pub2.
> >
> > And looking back at your examples,
> >
> > > > > e.g.1
> > > > > -----
> > > > > CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> > > > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > > > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > > > > -----
> > > > >
> > > > > e.g.2
> > > > > -----
> > > > > CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns = true);
> > > > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > > > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > > > > -----
> >
> > Both examples would not be supported.
> >
> > > > >
> > > > > The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having
> > > > > several publications in which the same table has been published with
> > > > > different column lists are not supported."
> > > > >
> > > > > Perhaps the user is supposed to deduce that the example above would
> > > > > work OK if table 't1' has no generated cols. OTOH, if it did have
> > > > > generated cols then the PUBLICATION column lists must be different and
> > > > > therefore it is "not supported" (??).
> > > >
> > > > With the patch, how should this feature work when users specify a
> > > > generated column to the column list and set publish_generated_column =
> > > > false, in the first place? raise an error (as we do today)? or always
> > > > send NULL?
> > >
> > > For this scenario, I suggested (see [1] #3) that the code could give a
> > > WARNING. As I wrote up-thread: This combination doesn't seem
> > > like something a user would do intentionally, so just silently
> > > ignoring it (which the current patch does) is likely going to give
> > > someone unexpected results/grief.
> >
> > It gives a WARNING, and then publishes the specified generated column
> > data (even if publish_generated_column = false)?


I think that the column list should take priority and we should
publish the generated column if it is mentioned in  irrespective of
the option.

> > If so, it would mean
> > that specifying the generated column to the column list means to
> > publish its data regardless of the publish_generated_column parameter
> > value.
> >
>
> No. I meant only it can give the WARNING to tell the user user  "Hey,
> there is a conflict here because you said publish_generated_column=
> false, but you also specified gencols in the column list".
>

Users can use a publication like "create publication pub1 for table
t1(c1, c2), t2;" where they want t1's generated column to be published
but not for t2. They can specify the generated column name in the
column list of t1 in that case even though the rest of the tables
won't publish generated columns.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Masahiko Sawada
Date:
On Thu, Sep 19, 2024 at 2:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Sep 17, 2024 at 12:04 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Tue, Sep 17, 2024 at 4:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Mon, Sep 16, 2024 at 8:09 PM Peter Smith <smithpb2250@gmail.com> wrote:
> > > >
> > > > I thought that the option "publish_generated_columns" is more related
> > > > to "column lists" than "row filters".
> > > >
> > > > Let's say table 't1' has columns 'a', 'b', 'c', 'gen1', 'gen2'.
> > > >
> > >
> > > > And
> > > > PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > > > is equivalent to
> > > > PUBLICATION pub2 FOR TABLE t1(a,b,c);
> > >
> > > This makes sense to me as it preserves the current behavior.
> > >
> > > > Then:
> > > > PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> > > > is equivalent to
> > > > PUBLICATION pub1 FOR TABLE t1(a,b,c,gen1,gen2);
> > >
> > > This also makes sense. It would also include future generated columns.
> > >
> > > > So, I would expect this to fail because the SUBSCRIPTION docs say
> > > > "Subscriptions having several publications in which the same table has
> > > > been published with different column lists are not supported."
> > >
> > > So I agree that it would raise an error if users subscribe to both
> > > pub1 and pub2.
> > >
> > > And looking back at your examples,
> > >
> > > > > > e.g.1
> > > > > > -----
> > > > > > CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> > > > > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > > > > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > > > > > -----
> > > > > >
> > > > > > e.g.2
> > > > > > -----
> > > > > > CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns = true);
> > > > > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > > > > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > > > > > -----
> > >
> > > Both examples would not be supported.
> > >
> > > > > >
> > > > > > The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having
> > > > > > several publications in which the same table has been published with
> > > > > > different column lists are not supported."
> > > > > >
> > > > > > Perhaps the user is supposed to deduce that the example above would
> > > > > > work OK if table 't1' has no generated cols. OTOH, if it did have
> > > > > > generated cols then the PUBLICATION column lists must be different and
> > > > > > therefore it is "not supported" (??).
> > > > >
> > > > > With the patch, how should this feature work when users specify a
> > > > > generated column to the column list and set publish_generated_column =
> > > > > false, in the first place? raise an error (as we do today)? or always
> > > > > send NULL?
> > > >
> > > > For this scenario, I suggested (see [1] #3) that the code could give a
> > > > WARNING. As I wrote up-thread: This combination doesn't seem
> > > > like something a user would do intentionally, so just silently
> > > > ignoring it (which the current patch does) is likely going to give
> > > > someone unexpected results/grief.
> > >
> > > It gives a WARNING, and then publishes the specified generated column
> > > data (even if publish_generated_column = false)?
>
>
> I think that the column list should take priority and we should
> publish the generated column if it is mentioned in  irrespective of
> the option.

Agreed.

>
> > > If so, it would mean
> > > that specifying the generated column to the column list means to
> > > publish its data regardless of the publish_generated_column parameter
> > > value.
> > >
> >
> > No. I meant only it can give the WARNING to tell the user user  "Hey,
> > there is a conflict here because you said publish_generated_column=
> > false, but you also specified gencols in the column list".
> >
>
> Users can use a publication like "create publication pub1 for table
> t1(c1, c2), t2;" where they want t1's generated column to be published
> but not for t2. They can specify the generated column name in the
> column list of t1 in that case even though the rest of the tables
> won't publish generated columns.

Agreed.

I think that users can use the publish_generated_column option when
they want to publish all generated columns, instead of specifying all
the columns in the column list. It's another advantage of this option
that it will also include the future generated columns.

Given that we publish the generated columns if they are mentioned in
the column list, can we separate the patch into two if it helps
reviews? One is to allow logical replication to publish generated
columns if they are explicitly mentioned in the column list. The
second patch is to introduce the publish_generated_columns option.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Fri, Sep 20, 2024 at 3:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Sep 19, 2024 at 2:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
...
> > I think that the column list should take priority and we should
> > publish the generated column if it is mentioned in  irrespective of
> > the option.
>
> Agreed.
>
> >
...
> >
> > Users can use a publication like "create publication pub1 for table
> > t1(c1, c2), t2;" where they want t1's generated column to be published
> > but not for t2. They can specify the generated column name in the
> > column list of t1 in that case even though the rest of the tables
> > won't publish generated columns.
>
> Agreed.
>
> I think that users can use the publish_generated_column option when
> they want to publish all generated columns, instead of specifying all
> the columns in the column list. It's another advantage of this option
> that it will also include the future generated columns.
>

OK. Let me give some examples below to help understand this idea.

Please correct me if these are incorrect.

======

Assuming these tables:

t1(a,b,gen1,gen2)
t2(c,d,gen1,gen2)

Examples, when publish_generated_columns=false:

CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
(publish_generated_columns=false)
t1 -> publishes a, b, gen2 (e.g. what column list says)
t2 -> publishes c, d

CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=false)
t1 -> publishes a, b
t2 -> publishes gen1 (e.g. what column list says)

CREATE PUBLICATION pub1 FOR t1, t2 WITH (publish_generated_columns=false)
t1 -> publishes a, b
t2 -> publishes c, d

CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns=false)
t1 -> publishes a, b
t2 -> publishes c, d

~~

Examples, when publish_generated_columns=true:

CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
(publish_generated_columns=true)
t1 -> publishes a, b, gen2 (e.g. what column list says)
t2 -> publishes c, d + ALSO gen1, gen2

CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=true)
t1 -> publishes a, b + ALSO gen1, gen2
t2 -> publishes gen1 (e.g. what column list says)

CREATE PUBLICATION pub1 FOR t1, t2 WITH (publish_generated_columns=true)
t1 -> publishes a, b + ALSO gen1, gen2
t2 -> publishes c, d + ALSO gen1, gen2

CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns=true)
t1 -> publishes a, b + ALSO gen1, gen2
t2 -> publishes c, d + ALSO gen1, gen2

======

The idea LGTM, although now the parameter name
('publish_generated_columns') seems a bit misleading since sometimes
generated columns get published "irrespective of the option".

So, I think the original parameter name 'include_generated_columns'
might be better here because IMO "include" seems more like "add them
if they are not already specified", which is exactly what this idea is
doing.

Thoughts?

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Fri, Sep 20, 2024 at 4:16 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Fri, Sep 20, 2024 at 3:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Sep 19, 2024 at 2:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > Users can use a publication like "create publication pub1 for table
> > > t1(c1, c2), t2;" where they want t1's generated column to be published
> > > but not for t2. They can specify the generated column name in the
> > > column list of t1 in that case even though the rest of the tables
> > > won't publish generated columns.
> >
> > Agreed.
> >
> > I think that users can use the publish_generated_column option when
> > they want to publish all generated columns, instead of specifying all
> > the columns in the column list. It's another advantage of this option
> > that it will also include the future generated columns.
> >
>
> OK. Let me give some examples below to help understand this idea.
>
> Please correct me if these are incorrect.
>
> Examples, when publish_generated_columns=true:
>
> CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
> (publish_generated_columns=true)
> t1 -> publishes a, b, gen2 (e.g. what column list says)
> t2 -> publishes c, d + ALSO gen1, gen2
>
> CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=true)
> t1 -> publishes a, b + ALSO gen1, gen2
> t2 -> publishes gen1 (e.g. what column list says)
>

These two could be controversial because one could expect that if
"publish_generated_columns=true" then publish generated columns
irrespective of whether they are mentioned in column_list. I am of the
opinion that column_list should take priority the results should be as
mentioned by you but let us see if anyone thinks otherwise.

>
> ======
>
> The idea LGTM, although now the parameter name
> ('publish_generated_columns') seems a bit misleading since sometimes
> generated columns get published "irrespective of the option".
>
> So, I think the original parameter name 'include_generated_columns'
> might be better here because IMO "include" seems more like "add them
> if they are not already specified", which is exactly what this idea is
> doing.
>

I still prefer 'publish_generated_columns' because it matches with
other publication option names. One can also deduce from
'include_generated_columns' that add all the generated columns even
when some of them are specified in column_list.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Thu, Sep 19, 2024 at 10:56 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
>
> Given that we publish the generated columns if they are mentioned in
> the column list, can we separate the patch into two if it helps
> reviews? One is to allow logical replication to publish generated
> columns if they are explicitly mentioned in the column list. The
> second patch is to introduce the publish_generated_columns option.
>

It sounds like a reasonable idea to me but I haven't looked at the
feasibility of the same. So, if it is possible without much effort, we
should split the patch as per your suggestion.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Tue, Sep 17, 2024 at 1:14 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are some review comments for v31-0001 (for the docs only)
>
> There may be some overlap here with some comments already made for
> v30-0001 which are not yet addressed in v31-0001.
>
> ======
> Commit message
>
> 1.
> When introducing the 'publish_generated_columns' parameter, you must
> also say this is a PUBLICATION parameter.
>
> ~~~
>
> 2.
> With this enhancement, users can now include the 'include_generated_columns'
> option when querying logical replication slots using either the pgoutput
> plugin or the test_decoding plugin. This option, when set to 'true' or '1',
> instructs the replication system to include generated column information
> and data in the replication stream.
>
> ~
>
> The above is stale information because it still refers to the old name
> 'include_generated_columns', and to test_decoding which was already
> removed in this patch.
>
> ======
> doc/src/sgml/ddl.sgml
>
> 3.
> +      Generated columns may be skipped during logical replication
> according to the
> +      <command>CREATE PUBLICATION</command> option
> +      <link linkend="sql-createpublication-params-with-include-generated-columns">
> +      <literal>publish_generated_columns</literal></link>.
>
> 3a.
> nit - The linkend is based on the old name instead of the new name.
>
> 3b.
> nit - Better to call this a parameter instead of an option because
> that is what the CREATE PUBLICATION docs call it.
>
> ======
> doc/src/sgml/protocol.sgml
>
> 4.
> +    <varlistentry>
> +     <term>publish_generated_columns</term>
> +      <listitem>
> +       <para>
> +        Boolean option to enable generated columns. This option controls
> +        whether generated columns should be included in the string
> +        representation of tuples during logical decoding in PostgreSQL.
> +       </para>
> +      </listitem>
> +    </varlistentry>
> +
>
> Is this even needed anymore? Now that the implementation is using a
> PUBLICATION parameter, isn't everything determined just by that
> parameter? I don't see the reason why a protocol change is needed
> anymore. And, if there is no protocol change needed, then this
> documentation change is also not needed.
>
> ~~~~
>
> 5.
>       <para>
> -      Next, the following message part appears for each column included in
> -      the publication (except generated columns):
> +      Next, the following message parts appear for each column included in
> +      the publication (generated columns are excluded unless the parameter
> +      <link linkend="protocol-logical-replication-params">
> +      <literal>publish_generated_columns</literal></link> specifies otherwise):
>       </para>
>
> Like the previous comment above, I think everything is now determined
> by the PUBLICATION parameter. So maybe this should just be referring
> to that instead.
>
> ======
> doc/src/sgml/ref/create_publication.sgml
>
> 6.
> +       <varlistentry
> id="sql-createpublication-params-with-include-generated-columns">
> +        <term><literal>publish_generated_columns</literal>
> (<type>boolean</type>)</term>
> +        <listitem>
>
> nit - the ID is based on the old parameter name.
>
> ~
>
> 7.
> +         <para>
> +          This option is only available for replicating generated
> column data from the publisher
> +          to a regular, non-generated column in the subscriber.
> +         </para>
>
> IMO remove this paragraph. I really don't think you should be
> mentioning the subscriber here at all. AFAIK this parameter is only
> for determining if the generated column will be published or not. What
> happens at the other end (e.g. logic whether it gets ignored or not by
> the subscriber) is more like a matrix of behaviours that could be
> documented in the "Logical Replication" section. But not here.
>
> (I removed this in my nitpicks attachment)
>
> ~~~
>
> 8.
> +         <para>
> +         This parameter can only be set <literal>true</literal> if
> <literal>copy_data</literal> is
> +         set to <literal>false</literal>.
> +         </para>
>
> IMO remove this paragraph too. The user can create a PUBLICATION
> before a SUBSCRIPTION even exists so to say it "can only be set..." is
> not correct. Sure, your patch 0001 does not support the COPY of
> generated columns but if you want to document that then it should be
> documented in the CREATE SUBSCRIBER docs. But not here.
>
> (I removed this in my nitpicks attachment)
>
> TBH, it would be better if patches 0001 and 0002 were merged then you
> can avoid all this. IIUC they were only separate in the first place
> because 2 different people wrote them. It is not making reviews easier
> with them split.
>
> ======
>
> Please see the attachment which implements some of the nits above.
>

I have addressed all the comments in the v32-0001 Patch. Please refer
to the updated v32-0001 Patch here in [1]. See [1] for the changes
added.

[1] https://www.postgresql.org/message-id/CAHv8RjKkoaS1oMsFvPRFB9nPSVC5p_D4Kgq5XB9Y2B2xU7smbA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Tue, Sep 17, 2024 at 3:12 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Review comments for v31-0001.
>
> (I tried to give only new comments, but there might be some overlap
> with comments I previously made for v30-0001)
>
> ======
> src/backend/catalog/pg_publication.c
>
> 1.
> +
> + if (publish_generated_columns_given)
> + {
> + values[Anum_pg_publication_pubgencolumns - 1] =
> BoolGetDatum(publish_generated_columns);
> + replaces[Anum_pg_publication_pubgencolumns - 1] = true;
> + }
>
> nit - unnecessary whitespace above here.
>
> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 2. prepare_all_columns_bms
>
> + /* Iterate the cols until generated columns are found. */
> + cols = bms_add_member(cols, i + 1);
>
> How does the comment relate to the statement that follows it?
>
> ~~~
>
> 3.
> + * Skip generated column if pubgencolumns option was not
> + * specified.
>
> nit - /pubgencolumns option/publish_generated_columns parameter/
>
> ======
> src/bin/pg_dump/pg_dump.c
>
> 4.
> getPublications:
>
> nit - /i_pub_gencolumns/i_pubgencols/ (it's the same information but simpler)
>
> ======
> src/bin/pg_dump/pg_dump.h
>
> 5.
> + bool pubgencolumns;
>  } PublicationInfo;
>
> nit - /pubgencolumns/pubgencols/ (it's the same information but simpler)
>
> ======
> vsrc/bin/psql/describe.c
>
> 6.
>   bool has_pubviaroot;
> + bool has_pubgencol;
>
> nit - /has_pubgencol/has_pubgencols/ (plural consistency)
>
> ======
> src/include/catalog/pg_publication.h
>
> 7.
> + /* true if generated columns data should be published */
> + bool pubgencolumns;
>  } FormData_pg_publication;
>
> nit - /pubgencolumns/pubgencols/ (it's the same information but simpler)
>
> ~~~
>
> 8.
> + bool pubgencolumns;
>   PublicationActions pubactions;
>  } Publication;
>
> nit - /pubgencolumns/pubgencols/ (it's the same information but simpler)
>
> ======
> src/test/regress/sql/publication.sql
>
> 9.
> +-- Test the publication with or without 'PUBLISH_GENERATED_COLUMNS' parameter
> +SET client_min_messages = 'ERROR';
> +CREATE PUBLICATION pub1 FOR ALL TABLES WITH (PUBLISH_GENERATED_COLUMNS=1);
> +\dRp+ pub1
> +
> +CREATE PUBLICATION pub2 FOR ALL TABLES WITH (PUBLISH_GENERATED_COLUMNS=0);
> +\dRp+ pub2
>
> 9a.
> nit - Use lowercase for the parameters.
>
> ~
>
> 9b.
> nit - Fix the comment to say what the test is actually doing:
> "Test the publication 'publish_generated_columns' parameter enabled or disabled"
>
> ======
> src/test/subscription/t/031_column_list.pl
>
> 10.
> Later I think you should add another test here to cover the scenario
> that I was discussing with Sawada-San -- e.g. when there are 2
> publications for the same table subscribed by just 1 subscription but
> having different values of the 'publish_generated_columns' for the
> publications.
>

I have addressed all the comments in the v32-0001 Patch. Please refer
to the updated v32-0001 Patch here in [1]. See [1] for the changes
added.

[1] https://www.postgresql.org/message-id/CAHv8RjKkoaS1oMsFvPRFB9nPSVC5p_D4Kgq5XB9Y2B2xU7smbA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Wed, Sep 18, 2024 at 8:58 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, here are my review comments for patch v31-0002.
>
> ======
>
> 1. General.
>
> IMO patches 0001 and 0002 should be merged when next posted. IIUC the
> reason for the split was only because there were 2 different authors
> but that seems to be not relevant anymore.
>
> ======
> Commit message
>
> 2.
> When 'copy_data' is true, during the initial sync, the data is replicated from
> the publisher to the subscriber using the COPY command. The normal COPY
> command does not copy generated columns, so when 'publish_generated_columns'
> is true, we need to copy using the syntax:
> 'COPY (SELECT column_name FROM table_name) TO STDOUT'.
>
> ~
>
> 2a.
> Should clarify that 'copy_data' is a SUBSCRIPTION parameter.
>
> 2b.
> Should clarify that 'publish_generated_columns' is a PUBLICATION parameter.
>
> ======
> src/backend/replication/logical/tablesync.c
>
> make_copy_attnamelist:
>
> 3.
> - for (i = 0; i < rel->remoterel.natts; i++)
> + desc = RelationGetDescr(rel->localrel);
> + localgenlist = palloc0(rel->remoterel.natts * sizeof(bool));
>
> Each time I review this code I am tricked into thinking it is wrong to
> use rel->remoterel.natts here for the localgenlist. AFAICT the code is
> actually fine because you do not store *all* the subscriber gencols in
> 'localgenlist' -- you only store those with matching names on the
> publisher table. It might be good if you could add an explanatory
> comment about that to prevent any future doubts.
>
> ~~~
>
> 4.
> + if (!remotegenlist[remote_attnum])
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("logical replication target relation \"%s.%s\" has a
> generated column \"%s\" "
> + "but corresponding column on source relation is not a generated column",
> + rel->remoterel.nspname, rel->remoterel.relname, NameStr(attr->attname))));
>
> This error message has lots of good information. OTOH, I think when
> copy_data=false the error would report the subscriber column just as
> "missing", which is maybe less helpful. Perhaps that other
> copy_data=false "missing" case can be improved to share the same error
> message that you have here.
>

This comment is still open. Will fix this in the next set of patches.

> ~~~
>
> fetch_remote_table_info:
>
> 5.
> IIUC, this logic needs to be more sophisticated to handle the case
> that was being discussed earlier with Sawada-san [1]. e.g. when the
> same table has gencols but there are multiple subscribed publications
> where the 'publish_generated_columns' parameter differs.
>
> Also, you'll need test cases for this scenario, because it is too
> difficult to judge correctness just by visual inspection of the code.
>
> ~~~~
>
> 6.
> nit - Change 'hasgencolpub' to 'has_pub_with_pubgencols' for
> readability, and initialize it to 'false' to make it easy to use
> later.
>
> ~~~
>
> 7.
> - * Get column lists for each relation.
> + * Get column lists for each relation and check if any of the publication
> + * has generated column option.
>
> and
>
> + /* Check if any of the publication has generated column option */
> + if (server_version >= 180000)
>
> nit - tweak the comments to name the publication parameter properly.
>
> ~~~
>
> 8.
> foreach(lc, MySubscription->publications)
> {
> if (foreach_current_index(lc) > 0)
> appendStringInfoString(&pub_names, ", ");
> appendStringInfoString(&pub_names, quote_literal_cstr(strVal(lfirst(lc))));
> }
>
> I know this is existing code, but shouldn't all this be done by using
> the purpose-built function 'get_publications_str'
>
> ~~~
>
> 9.
> + ereport(ERROR,
> + errcode(ERRCODE_CONNECTION_FAILURE),
> + errmsg("could not fetch gencolumns information from publication list: %s",
> +    pub_names.data));
>
> and
>
> + errcode(ERRCODE_UNDEFINED_OBJECT),
> + errmsg("failed to fetch tuple for gencols from publication list: %s",
> +    pub_names.data));
>
> nit - /gencolumns information/generated column publication
> information/ to make the errmsg more human-readable
>
> ~~~
>
> 10.
> + bool gencols_allowed = server_version >= 180000 && hasgencolpub;
> +
> + if (!gencols_allowed)
> + appendStringInfo(&cmd, " AND a.attgenerated = ''");
>
> Can the 'gencols_allowed' var be removed, and the condition just be
> replaced with if (!has_pub_with_pubgencols)? It seems equivalent
> unless I am mistaken.
>
> ======
>
> Please refer to the attachment which implements some of the nits
> mentioned above.
>
> ======
> [1] https://www.postgresql.org/message-id/CAD21AoBun9crSWaxteMqyu8A_zme2ppa2uJvLJSJC2E3DJxQVA%40mail.gmail.com
>

I have addressed the comments in the v32-0002 Patch. Please refer to
the updated v32-0002 Patch here in [1]. See [1] for the changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjKkoaS1oMsFvPRFB9nPSVC5p_D4Kgq5XB9Y2B2xU7smbA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Masahiko Sawada
Date:
On Thu, Sep 19, 2024 at 9:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Sep 20, 2024 at 4:16 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Fri, Sep 20, 2024 at 3:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Thu, Sep 19, 2024 at 2:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > >
> > > > Users can use a publication like "create publication pub1 for table
> > > > t1(c1, c2), t2;" where they want t1's generated column to be published
> > > > but not for t2. They can specify the generated column name in the
> > > > column list of t1 in that case even though the rest of the tables
> > > > won't publish generated columns.
> > >
> > > Agreed.
> > >
> > > I think that users can use the publish_generated_column option when
> > > they want to publish all generated columns, instead of specifying all
> > > the columns in the column list. It's another advantage of this option
> > > that it will also include the future generated columns.
> > >
> >
> > OK. Let me give some examples below to help understand this idea.
> >
> > Please correct me if these are incorrect.
> >
> > Examples, when publish_generated_columns=true:
> >
> > CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
> > (publish_generated_columns=true)
> > t1 -> publishes a, b, gen2 (e.g. what column list says)
> > t2 -> publishes c, d + ALSO gen1, gen2
> >
> > CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=true)
> > t1 -> publishes a, b + ALSO gen1, gen2
> > t2 -> publishes gen1 (e.g. what column list says)
> >
>
> These two could be controversial because one could expect that if
> "publish_generated_columns=true" then publish generated columns
> irrespective of whether they are mentioned in column_list. I am of the
> opinion that column_list should take priority the results should be as
> mentioned by you but let us see if anyone thinks otherwise.

I agree with Amit. We also publish t2's future generated column in the
first example and t1's future generated columns in the second example.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Fri, Sep 20, 2024 at 2:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Sep 20, 2024 at 4:16 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Fri, Sep 20, 2024 at 3:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Thu, Sep 19, 2024 at 2:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > >
> > > > Users can use a publication like "create publication pub1 for table
> > > > t1(c1, c2), t2;" where they want t1's generated column to be published
> > > > but not for t2. They can specify the generated column name in the
> > > > column list of t1 in that case even though the rest of the tables
> > > > won't publish generated columns.
> > >
> > > Agreed.
> > >
> > > I think that users can use the publish_generated_column option when
> > > they want to publish all generated columns, instead of specifying all
> > > the columns in the column list. It's another advantage of this option
> > > that it will also include the future generated columns.
> > >
> >
> > OK. Let me give some examples below to help understand this idea.
> >
> > Please correct me if these are incorrect.
> >
> > Examples, when publish_generated_columns=true:
> >
> > CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
> > (publish_generated_columns=true)
> > t1 -> publishes a, b, gen2 (e.g. what column list says)
> > t2 -> publishes c, d + ALSO gen1, gen2
> >
> > CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=true)
> > t1 -> publishes a, b + ALSO gen1, gen2
> > t2 -> publishes gen1 (e.g. what column list says)
> >
>
> These two could be controversial because one could expect that if
> "publish_generated_columns=true" then publish generated columns
> irrespective of whether they are mentioned in column_list. I am of the
> opinion that column_list should take priority the results should be as
> mentioned by you but let us see if anyone thinks otherwise.
>
> >
> > ======
> >
> > The idea LGTM, although now the parameter name
> > ('publish_generated_columns') seems a bit misleading since sometimes
> > generated columns get published "irrespective of the option".
> >
> > So, I think the original parameter name 'include_generated_columns'
> > might be better here because IMO "include" seems more like "add them
> > if they are not already specified", which is exactly what this idea is
> > doing.
> >
>
> I still prefer 'publish_generated_columns' because it matches with
> other publication option names. One can also deduce from
> 'include_generated_columns' that add all the generated columns even
> when some of them are specified in column_list.
>

Fair point. Anyway, to avoid surprises it will be important for the
precedence rules to be documented clearly (probably with some
examples),

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Sat, Sep 21, 2024 at 3:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Sep 19, 2024 at 9:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > >
> > > OK. Let me give some examples below to help understand this idea.
> > >
> > > Please correct me if these are incorrect.
> > >
> > > Examples, when publish_generated_columns=true:
> > >
> > > CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
> > > (publish_generated_columns=true)
> > > t1 -> publishes a, b, gen2 (e.g. what column list says)
> > > t2 -> publishes c, d + ALSO gen1, gen2
> > >
> > > CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=true)
> > > t1 -> publishes a, b + ALSO gen1, gen2
> > > t2 -> publishes gen1 (e.g. what column list says)
> > >
> >
> > These two could be controversial because one could expect that if
> > "publish_generated_columns=true" then publish generated columns
> > irrespective of whether they are mentioned in column_list. I am of the
> > opinion that column_list should take priority the results should be as
> > mentioned by you but let us see if anyone thinks otherwise.
>
> I agree with Amit. We also publish t2's future generated column in the
> first example and t1's future generated columns in the second example.
>

Right, it would be good to have at least one test that shows future
generated columns also get published wherever applicable (like where
column_list is not given and publish_generated_columns is true).

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Mon, Sep 23, 2024 at 4:10 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Fri, Sep 20, 2024 at 2:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Sep 20, 2024 at 4:16 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > On Fri, Sep 20, 2024 at 3:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Thu, Sep 19, 2024 at 2:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > >
> > > > > Users can use a publication like "create publication pub1 for table
> > > > > t1(c1, c2), t2;" where they want t1's generated column to be published
> > > > > but not for t2. They can specify the generated column name in the
> > > > > column list of t1 in that case even though the rest of the tables
> > > > > won't publish generated columns.
> > > >
> > > > Agreed.
> > > >
> > > > I think that users can use the publish_generated_column option when
> > > > they want to publish all generated columns, instead of specifying all
> > > > the columns in the column list. It's another advantage of this option
> > > > that it will also include the future generated columns.
> > > >
> > >
> > > OK. Let me give some examples below to help understand this idea.
> > >
> > > Please correct me if these are incorrect.
> > >
> > > Examples, when publish_generated_columns=true:
> > >
> > > CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
> > > (publish_generated_columns=true)
> > > t1 -> publishes a, b, gen2 (e.g. what column list says)
> > > t2 -> publishes c, d + ALSO gen1, gen2
> > >
> > > CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=true)
> > > t1 -> publishes a, b + ALSO gen1, gen2
> > > t2 -> publishes gen1 (e.g. what column list says)
> > >
> >
> > These two could be controversial because one could expect that if
> > "publish_generated_columns=true" then publish generated columns
> > irrespective of whether they are mentioned in column_list. I am of the
> > opinion that column_list should take priority the results should be as
> > mentioned by you but let us see if anyone thinks otherwise.
> >
> > >
> > > ======
> > >
> > > The idea LGTM, although now the parameter name
> > > ('publish_generated_columns') seems a bit misleading since sometimes
> > > generated columns get published "irrespective of the option".
> > >
> > > So, I think the original parameter name 'include_generated_columns'
> > > might be better here because IMO "include" seems more like "add them
> > > if they are not already specified", which is exactly what this idea is
> > > doing.
> > >
> >
> > I still prefer 'publish_generated_columns' because it matches with
> > other publication option names. One can also deduce from
> > 'include_generated_columns' that add all the generated columns even
> > when some of them are specified in column_list.
> >
>
> Fair point. Anyway, to avoid surprises it will be important for the
> precedence rules to be documented clearly (probably with some
> examples),
>

Yeah, one or two examples would be good, but we can have a separate
doc patch that has clearly mentioned all the rules.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Thu, 12 Sept 2024 at 11:01, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Because this feature is now being implemented as a PUBLICATION option,
> there is another scenario that might need consideration; I am thinking
> about where the same table is published by multiple PUBLICATIONS (with
> different option settings) that are subscribed by a single
> SUBSCRIPTION.
>
> e.g.1
> -----
> CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> -----
>
> e.g.2
> -----
> CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns = true);
> CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> -----
>
> Do you know if this case is supported? If yes, then which publication
> option value wins?

I have verified the various scenarios discussed here and the patch
works as expected:
Test presetup:
-- publisher
CREATE TABLE t1 (a int PRIMARY KEY, b int, c int, gen1 int GENERATED
ALWAYS AS (a * 2) STORED, gen2 int GENERATED ALWAYS AS (a * 2)
STORED);
-- Subscriber
CREATE TABLE t1 (a int PRIMARY KEY, b int, c int, d int, e int);

Test1: Subscriber will have only non-generated columns a,b,c
replicated from publisher:
create publication pub1 for all tables with (
publish_generated_columns = false);
INSERT INTO t1 (a,b,c) VALUES (1,1,1);

--Subscriber will have only non-generated columns a,b,c replicated
from publisher:
subscriber=# select * from t1;
 a | b | c | d | e
---+---+---+---+---
 1 | 1 | 1 |   |
(1 row)

Test2: Subscriber will include generated columns a,b,c replicated from
publisher:
create publication pub1 for all tables with ( publish_generated_columns = true);
INSERT INTO t1 (a,b,c) VALUES (1,1,1);

-- Subscriber will include generated columns a,b,c replicated from publisher:
subscriber=# select * from t1;
 a | b | c | d | e
---+---+---+---+---
 1 | 1 | 1 | 2 | 2
(1 row)

Test3: Cannot have subscription subscribing to publication with
publish_generated_columns as true and false
-- publisher
create publication pub1 for all tables with (publish_generated_columns = false);
create publication pub2 for all tables with (publish_generated_columns = true);

-- subscriber
subscriber=# create subscription sub1 connection 'dbname=postgres
host=localhost port=5432' publication pub1,pub2;
ERROR:  cannot use different column lists for table "public.t1" in
different publications

Test4a: Warning thrown when a generated column is specified in column
list along with publish_generated_columns as false
-- publisher
postgres=# create publication pub1 for table t1(a,b,gen1) with (
publish_generated_columns = false);
WARNING:  specified generated column "gen1" in publication column list
for publication with publish_generated_columns as false
CREATE PUBLICATION

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Fri, 20 Sept 2024 at 04:16, Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Fri, Sep 20, 2024 at 3:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Sep 19, 2024 at 2:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> ...
> > > I think that the column list should take priority and we should
> > > publish the generated column if it is mentioned in  irrespective of
> > > the option.
> >
> > Agreed.
> >
> > >
> ...
> > >
> > > Users can use a publication like "create publication pub1 for table
> > > t1(c1, c2), t2;" where they want t1's generated column to be published
> > > but not for t2. They can specify the generated column name in the
> > > column list of t1 in that case even though the rest of the tables
> > > won't publish generated columns.
> >
> > Agreed.
> >
> > I think that users can use the publish_generated_column option when
> > they want to publish all generated columns, instead of specifying all
> > the columns in the column list. It's another advantage of this option
> > that it will also include the future generated columns.
> >
>
> OK. Let me give some examples below to help understand this idea.
>
> Please correct me if these are incorrect.
>
> ======
>
> Assuming these tables:
>
> t1(a,b,gen1,gen2)
> t2(c,d,gen1,gen2)
>
> Examples, when publish_generated_columns=false:
>
> CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
> (publish_generated_columns=false)
> t1 -> publishes a, b, gen2 (e.g. what column list says)
> t2 -> publishes c, d
>
> CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=false)
> t1 -> publishes a, b
> t2 -> publishes gen1 (e.g. what column list says)
>
> CREATE PUBLICATION pub1 FOR t1, t2 WITH (publish_generated_columns=false)
> t1 -> publishes a, b
> t2 -> publishes c, d
>
> CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns=false)
> t1 -> publishes a, b
> t2 -> publishes c, d
>
> ~~
>
> Examples, when publish_generated_columns=true:
>
> CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
> (publish_generated_columns=true)
> t1 -> publishes a, b, gen2 (e.g. what column list says)
> t2 -> publishes c, d + ALSO gen1, gen2
>
> CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=true)
> t1 -> publishes a, b + ALSO gen1, gen2
> t2 -> publishes gen1 (e.g. what column list says)
>
> CREATE PUBLICATION pub1 FOR t1, t2 WITH (publish_generated_columns=true)
> t1 -> publishes a, b + ALSO gen1, gen2
> t2 -> publishes c, d + ALSO gen1, gen2
>
> CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns=true)
> t1 -> publishes a, b + ALSO gen1, gen2
> t2 -> publishes c, d + ALSO gen1, gen2
>
> ======
>
> The idea LGTM, although now the parameter name
> ('publish_generated_columns') seems a bit misleading since sometimes
> generated columns get published "irrespective of the option".
>
> So, I think the original parameter name 'include_generated_columns'
> might be better here because IMO "include" seems more like "add them
> if they are not already specified", which is exactly what this idea is
> doing.
>
> Thoughts?

I have verified the various scenarios discussed here and the patch
works as expected with v32 version patch shared at [1]:

Test presetup:
-- publisher
CREATE TABLE t1 (a int PRIMARY KEY, b int, gen1 int GENERATED ALWAYS
AS (a * 2) STORED, gen2 int GENERATED ALWAYS AS (a * 2) STORED);
CREATE TABLE t2 (c int PRIMARY KEY, d int, gen1 int GENERATED ALWAYS
AS (c * 2) STORED, gen2 int GENERATED ALWAYS AS (d * 2) STORED);

-- subscriber
CREATE TABLE t1 (a int PRIMARY KEY, b int, gen1 int, gen2 int);
CREATE TABLE t2 (c int PRIMARY KEY, d int, gen1 int, gen1 int);

Test1: Publisher replicates the column list data including generated
columns even though publish_generated_columns option is false:
Publisher:
CREATE PUBLICATION pub1 FOR table t1, t2(gen1) WITH
(publish_generated_columns=false)
insert into t1 values(1,1);
insert into t2 values(1,1);

Subscriber:
--t1 -> publishes a, b
subscriber=# select * from t1;
 a | b | gen1 | gen2
---+---+------+------
 1 | 1 |      |
(1 row)

--t2 -> publishes gen1 (e.g. what column list says)
subscriber=# select * from t2;
 c | d | gen1 | gen2
---+---+------+------
   |   |    2 |
(1 row)

Test2: Publisher does not replication gen column if
publish_generated_columns option is false
Publisher:
CREATE PUBLICATION pub1 FOR table t1, t2 WITH (publish_generated_columns=false)
insert into t1 values(1,1);
insert into t2 values(1,1);

Subscriber:
--t1 -> publishes a, b
subscriber=# select * from t1;
 a | b | gen1 | gen2
---+---+------+------
 1 | 1 |      |
(1 row)

-- t2 -> publishes c, d
subscriber=# select * from t2;
 c | d | gen1 | gen2
---+---+------+------
 1 | 1 |      |
(1 row)

Test3: Publisher does not replication gen column if
publish_generated_columns option is false
Publisher:
CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns=false)
insert into t1 values(1,1);
insert into t2 values(1,1);

Subscriber:
--t1 -> publishes a, b
subscriber=# select * from t1;
 a | b | gen1 | gen2
---+---+------+------
 1 | 1 |      |
(1 row)

-- t2 -> publishes c, d
subscriber=# select * from t2;
 c | d | gen1 | gen2
---+---+------+------
 1 | 1 |      |
(1 row)

Test4: Publisher publishes only the data of the columns specified in
column list skipping other generated/non-generated columns:
Publisher:
CREATE PUBLICATION pub1 FOR table t1(a,b,gen2), t2 WITH
(publish_generated_columns=true)
insert into t1 values(1,1);
insert into t2 values(1,1);

Subscriber:
-- t1 -> publishes a, b, gen2 (e.g. what column list says)
subscriber=#  select * from t1;
 a | b | gen1 | gen2
---+---+------+------
 1 | 1 |      |    2
(1 row)

-- t2 -> publishes c, d + ALSO gen1, gen2
subscriber=# select * from t2;
 c | d | gen1 | gen2
---+---+------+------
 1 | 1 |    2 |    2
(1 row)


Test5: Publisher publishes only the data of the columns specified in
column list skipping other generated/non-generated columns:
Publisher:
CREATE PUBLICATION pub1 FOR table t1, t2(gen1) WITH
(publish_generated_columns=true)
insert into t1 values(1,1);
insert into t2 values(1,1);

Subscriber:
-- t1 -> publishes a, b + ALSO gen1, gen2
subscriber=#  select * from t1;
 a | b | gen1 | gen2
---+---+------+------
 1 | 1 |    2 |    2
(1 row)

-- t2 -> publishes gen1 (e.g. what column list says)
subscriber=# select * from t2;
 c | d | gen1 | gen2
---+---+------+------
   |   |    2 |
(1 row)

Test6: Publisher replicates all columns if publish_generated_columns
is enabled without column list
Publisher:
CREATE PUBLICATION pub1 FOR  table t1, t2 WITH (publish_generated_columns=true)
insert into t1 values(1,1);
insert into t2 values(1,1);

Subscriber:
-- t1 -> publishes a, b + ALSO gen1, gen2
subscriber=#  select * from t1;
 a | b | gen1 | gen2
---+---+------+------
 1 | 1 |    2 |    2
(1 row)

-- t2 -> publishes c, d + ALSO gen1, gen2
subscriber=# select * from t2;
 c | d | gen1 | gen2
---+---+------+------
 1 | 1 |    2 |    2
(1 row)

Test7: Publisher replicates all columns if publish_generated_columns
is enabled without column list
Publisher:
CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns=true)
insert into t1 values(1,1);
insert into t2 values(1,1);

Subscriber:
-- t1 -> publishes a, b + ALSO gen1, gen2
subscriber=#  select * from t1;
 a | b | gen1 | gen2
---+---+------+------
 1 | 1 |    2 |    2
(1 row)

-- t2 -> publishes c, d + ALSO gen1, gen2
subscriber=# select * from t2;
 c | d | gen1 | gen2
---+---+------+------
 1 | 1 |    2 |    2
(1 row)

[1] - https://www.postgresql.org/message-id/CAHv8RjKkoaS1oMsFvPRFB9nPSVC5p_D4Kgq5XB9Y2B2xU7smbA%40mail.gmail.com

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Fri, 20 Sept 2024 at 17:15, Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Wed, Sep 11, 2024 at 8:55 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Here are a some more review comments for patch v30-0001.
> >
> > ======
> > src/sgml/ref/create_publication.sgml
> >
> > 1.
> > +         <para>
> > +          If the publisher-side column is also a generated column
> > then this option
> > +          has no effect; the publisher column will be filled as normal with the
> > +          publisher-side computed or default data.
> > +         </para>
> >
> > It should say "subscriber-side"; not "publisher-side". The same was
> > already reported by Sawada-San [1].
> >
> > ~~~
> >
> > 2.
> > +         <para>
> > +         This parameter can only be set <literal>true</literal> if
> > <literal>copy_data</literal> is
> > +         set to <literal>false</literal>.
> > +         </para>
> >
> > IMO this limitation should be addressed by patch 0001 like it was
> > already done in the previous patches (e.g. v22-0002). I think
> > Sawada-san suggested the same [1].
> >
> > Anyway, 'copy_data' is not a PUBLICATION option, so the fact it is
> > mentioned like this without any reference to the SUBSCRIPTION seems
> > like a cut/paste error from the previous implementation.
> >
> > ======
> > src/backend/catalog/pg_publication.c
> >
> > 3. pub_collist_validate
> > - if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
> > - ereport(ERROR,
> > - errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> > - errmsg("cannot use generated column \"%s\" in publication column list",
> > -    colname));
> > -
> >
> > Instead of just removing this ERROR entirely here, I thought it would
> > be more user-friendly to give a WARNING if the PUBLICATION's explicit
> > column list includes generated cols when the option
> > "publish_generated_columns" is false. This combination doesn't seem
> > like something a user would do intentionally, so just silently
> > ignoring it (like the current patch does) is likely going to give
> > someone unexpected results/grief.
> >
> > ======
> > src/backend/replication/logical/proto.c
> >
> > 4. logicalrep_write_tuple, and logicalrep_write_attrs:
> >
> > - if (att->attisdropped || att->attgenerated)
> > + if (att->attisdropped)
> >   continue;
> >
> > Why aren't you also checking the new PUBLICATION option here and
> > skipping all gencols if the "publish_generated_columns" option is
> > false? Or is the BMS of pgoutput_column_list_init handling this case?
> > Maybe there should be an Assert for this?
> >
> > ======
> > src/backend/replication/pgoutput/pgoutput.c
> >
> > 5. send_relation_and_attrs
> >
> > - if (att->attisdropped || att->attgenerated)
> > + if (att->attisdropped)
> >   continue;
> >
> > Same question as #4.
> >
> > ~~~
> >
> > 6. prepare_all_columns_bms and pgoutput_column_list_init
> >
> > + if (att->attgenerated && !pub->pubgencolumns)
> > + cols = bms_del_member(cols, i + 1);
> >
> > IIUC, the algorithm seems overly tricky filling the BMS with all
> > columns, before straight away conditionally removing the generated
> > columns. Can't it be refactored to assign all the correct columns
> > up-front, to avoid calling bms_del_member()?
> >
> > ======
> > src/bin/pg_dump/pg_dump.c
> >
> > 7. getPublications
> >
> > IIUC, there is lots of missing SQL code here (for all older versions)
> > that should be saying "false AS pubgencolumns".
> > e.g. compare the SQL with how "false AS pubviaroot" is used.
> >
> > ======
> > src/bin/pg_dump/t/002_pg_dump.pl
> >
> > 8. Missing tests?
> >
> > I expected to see a pg_dump test for this new PUBLICATION option.
> >
> > ======
> > src/test/regress/sql/publication.sql
> >
> > 9. Missing tests?
> >
> > How about adding another test case that checks this new option must be
> > "Boolean"?
> >
> > ~~~
> >
> > 10. Missing tests?
> >
> > --- error: generated column "d" can't be in list
> > +-- ok: generated columns can be in the list too
> >  ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d);
> > +ALTER PUBLICATION testpub_fortable DROP TABLE testpub_tbl5;
> >
> > (see my earlier comment #3)
> >
> > IMO there should be another test case for a WARNING here if the user
> > attempts to include generated column 'd' in an explicit PUBLICATION
> > column list while the "publish_generated-columns" is false.
> >
> > ======
> > [1]  https://www.postgresql.org/message-id/CAD21AoA-tdTz0G-vri8KM2TXeFU8RCDsOpBXUBCgwkfokF7%3DjA%40mail.gmail.com
> >
>
> I have fixed all the comments. The attached patches contain the desired changes.
> Also the merging of 0001 and 0002 can be done once there are no
> comments on the patch to help in reviewing.

The warning message appears to be incorrect. Even though
publish_generated_columns is set to true, the warning indicates that
it is false.
CREATE TABLE t1 (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
postgres=# CREATE PUBLICATION pub1 FOR table t1(gen1) WITH
(publish_generated_columns=true);
WARNING:  specified generated column "gen1" in publication column list
for publication with publish_generated_columns as false

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Fri, 20 Sept 2024 at 17:15, Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Wed, Sep 11, 2024 at 8:55 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> I have fixed all the comments. The attached patches contain the desired changes.
> Also the merging of 0001 and 0002 can be done once there are no
> comments on the patch to help in reviewing.

Few comments:
1) This commit  message seems wrong, currently irrespective of
publish_generated_columns, the column specified in column list take
preceedene:
When 'publish_generated_columns' is false, generated columns are not
replicated, even when present in a PUBLICATION col-list.

2) Since we have added pubgencols to pg_pubication.h we can specify
"Bump catversion" in the commit message.

3) In create publication column list/publish_generated_columns
documentation we should mention that if generated column is mentioned
in column list, generated columns mentioned in column list will be
replication irrespective of publish_generated_columns option.

4) This warning should be mentioned only if publish_generated_columns is false:
                if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
-                       ereport(ERROR,
+                       ereport(WARNING,

errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
-                                       errmsg("cannot use generated
column \"%s\" in publication column list",
+                                       errmsg("specified generated
column \"%s\" in publication column list for publication with
publish_generated_columns as false",
                                                   colname));

5) These tests are not required for this feature:
+       'ALTER PUBLICATION pub5 ADD TABLE test_table WHERE (col1 > 0);' => {
+               create_order => 51,
+               create_sql =>
+                 'ALTER PUBLICATION pub5 ADD TABLE
dump_test.test_table WHERE (col1 > 0);',
+               regexp => qr/^
+                       \QALTER PUBLICATION pub5 ADD TABLE ONLY
dump_test.test_table WHERE ((col1 > 0));\E
+                       /xm,
+               like => { %full_runs, section_post_data => 1, },
+               unlike => {
+                       exclude_dump_test_schema => 1,
+                       exclude_test_table => 1,
+               },
+       },
+
+       'ALTER PUBLICATION pub5 ADD TABLE test_second_table WHERE
(col2 = \'test\');'
+         => {
+               create_order => 52,
+               create_sql =>
+                 'ALTER PUBLICATION pub5 ADD TABLE
dump_test.test_second_table WHERE (col2 = \'test\');',
+               regexp => qr/^
+                       \QALTER PUBLICATION pub5 ADD TABLE ONLY
dump_test.test_second_table WHERE ((col2 = 'test'::text));\E
+                       /xm,
+               like => { %full_runs, section_post_data => 1, },
+               unlike => { exclude_dump_test_schema => 1, },
+         },

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Masahiko Sawada
Date:
On Wed, Sep 25, 2024 at 11:15 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, I have written a new patch to document this feature.
>
> The patch adds a new section to the "Logical Replication" chapter. It
> applies atop the existing patches.
>
> v33-0001 (same as v32-0001)
> v33-0002 (same as v32-0002)
> v33-0003 (new DOCS)
>
> Review comments are welcome.

Thank you for updating the patch!

I think that the patch doesn't have regression tests to check if
generated column data is replicated to the subscriber as expected. I
think we should include some tests for this feature (especially with
other features such as column list).

Also, when testing this feature, I got the following warning message
even if the publication has publish_generated_columns = true:

=# create publication pub for table test (a, c) with
(publish_generated_columns = true);
WARNING:  specified generated column "c" in publication column list
for publication with publish_generated_columns as false

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Mon, Sep 23, 2024 at 6:19 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Fri, 20 Sept 2024 at 17:15, Shubham Khanna
> <khannashubham1197@gmail.com> wrote:
> >
> > On Wed, Sep 11, 2024 at 8:55 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > I have fixed all the comments. The attached patches contain the desired changes.
> > Also the merging of 0001 and 0002 can be done once there are no
> > comments on the patch to help in reviewing.
>
> Few comments:
> 1) This commit  message seems wrong, currently irrespective of
> publish_generated_columns, the column specified in column list take
> preceedene:
> When 'publish_generated_columns' is false, generated columns are not
> replicated, even when present in a PUBLICATION col-list.
>
> 2) Since we have added pubgencols to pg_pubication.h we can specify
> "Bump catversion" in the commit message.
>
> 3) In create publication column list/publish_generated_columns
> documentation we should mention that if generated column is mentioned
> in column list, generated columns mentioned in column list will be
> replication irrespective of publish_generated_columns option.
>
> 4) This warning should be mentioned only if publish_generated_columns is false:
>                 if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
> -                       ereport(ERROR,
> +                       ereport(WARNING,
>
> errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> -                                       errmsg("cannot use generated
> column \"%s\" in publication column list",
> +                                       errmsg("specified generated
> column \"%s\" in publication column list for publication with
> publish_generated_columns as false",
>                                                    colname));
>
> 5) These tests are not required for this feature:
> +       'ALTER PUBLICATION pub5 ADD TABLE test_table WHERE (col1 > 0);' => {
> +               create_order => 51,
> +               create_sql =>
> +                 'ALTER PUBLICATION pub5 ADD TABLE
> dump_test.test_table WHERE (col1 > 0);',
> +               regexp => qr/^
> +                       \QALTER PUBLICATION pub5 ADD TABLE ONLY
> dump_test.test_table WHERE ((col1 > 0));\E
> +                       /xm,
> +               like => { %full_runs, section_post_data => 1, },
> +               unlike => {
> +                       exclude_dump_test_schema => 1,
> +                       exclude_test_table => 1,
> +               },
> +       },
> +
> +       'ALTER PUBLICATION pub5 ADD TABLE test_second_table WHERE
> (col2 = \'test\');'
> +         => {
> +               create_order => 52,
> +               create_sql =>
> +                 'ALTER PUBLICATION pub5 ADD TABLE
> dump_test.test_second_table WHERE (col2 = \'test\');',
> +               regexp => qr/^
> +                       \QALTER PUBLICATION pub5 ADD TABLE ONLY
> dump_test.test_second_table WHERE ((col2 = 'test'::text));\E
> +                       /xm,
> +               like => { %full_runs, section_post_data => 1, },
> +               unlike => { exclude_dump_test_schema => 1, },
> +         },
>

I have addressed all the comments in the v34-0001 Patch. Please refer
to the updated v34-0001 Patch here in [1]. See [1] for the changes
added.

[1] https://www.postgresql.org/message-id/CAHv8RjJkUdYCdK_bL3yvEV%3DzKrA2dsnZYa1VMT2H5v0%2BqbaGbA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Tue, Sep 24, 2024 at 5:16 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi. Here are my review comments for v32-0001
>
> You wrote: "I have addressed all the comments in the v32-0001 Patch.",
> however, I found multiple old review comments not addressed. Please
> give a reason if a comment is deliberately left out, otherwise, I will
> assume they are omitted by accident and so keep repeating them.
>
> There were also still some unanswered questions from previous reviews,
> so I have reminded you about those again here.
>
> ======
> Commit message
>
> 1.
> This commit enables support for the 'publish_generated_columns' option
> in logical replication, allowing the transmission of generated column
> information and data alongside regular table changes. The option
> 'publish_generated_columns' is a PUBLICATION parameter.
>
> ~
>
> That PUBLICATION info in the 2nd sentence would be easier to say in
> the 1st sentence.
> SUGGESTION:
> This commit supports the transmission of generated column information
> and data alongside regular table changes. This behaviour is controlled
> by a new PUBLICATION parameter ('publish_generated_columns').
>
> ~~~
>
> 2.
> When 'publish_generated_columns' is false, generated columns are not
> replicated, even when present in a PUBLICATION col-list.
>
> Hm. This contradicts the behaviour that Amit wanted, (e.g.
> "column-list takes precedence"). So I am not sure if this patch is
> already catering for the behaviour suggested by Amit or if that is yet
> to come in v33. For now, I am assuming that 32* has not caught up with
> the latest behaviour requirements, but that might be a wrong
> assumption; perhaps it is only this commit message that is bogus.
>
> ~~~
>
> 3. General.
>
> On the same subject, there is lots of code, like:
>
> if (att->attgenerated && !pub->pubgencols)
> continue;
>
> I suspect that might not be quite what you want for the "column-list
> takes precedence" behaviour, but I am not going to identify all those
> during this review. It needs lots of combinations of column list tests
> to verify it.
>
> ======
> doc/src/sgml/ddl.sgml
>
> 4ab.
> nit - Huh?? Not changed the linkend as told in a previous review [1-#3a]
> nit - Huh?? Not changed to call this a "parameter" instead of an
> "option" as told in a previous review [1-#3b]
>
> ======
> doc/src/sgml/protocol.sgml
>
> 5.
> -     <para>
> -      Next, the following message part appears for each column included in
> -      the publication (except generated columns):
> -     </para>
> -
>
> nit -- Huh?? I don't think you can just remove this whole paragraph.
> But, probably you can just remove the "except generated columns" part.
> I posted this same comment [4 #11] 20 patch versions back.
>
> ======
> doc/src/sgml/ref/create_publication.sgml
>
> 6abc.
> nit - Huh?? Not changed the parameter ID as told in a previous review [1-#6]
> nit - Huh?? Not removed paragraph "This option is only available..."
> as told in a previous review. See [1-#7]
> nit - Huh?? Not removed paragraph "This parameter can only be set" as
> told in a previous review. See [1-#8]
>
> ======
> src/backend/catalog/pg_publication.c
>
> 7.
>   if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
> - ereport(ERROR,
> + ereport(WARNING,
>   errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> - errmsg("cannot use generated column \"%s\" in publication column list",
> + errmsg("specified generated column \"%s\" in publication column list
> for publication with publish_generated_columns as false",
>      colname));
>
> I did not understand how this WARNING can know
> "publish_generated_columns as false"? Should the code be checking the
> function parameter 'pubgencols'?
>
> The errmsg also seemed a bit verbose. How about:
> "specified generated column \"%s\" in publication column list when
> publish_generated_columns = false"
>
> ======
> src/backend/replication/logical/proto.c
>
> 8.
> logicalrep_write_tuple:
> logicalrep_write_attrs:
>
> Reminder. I think I have multiple questions about this code from
> previous reviews that may be still unanswered. See [2 #4]. Maybe when
> you implement Amit's "column list takes precedence" behaviour then
> this code is fine as-is (because the replication message might include
> gencols or not-gecols regardless of the 'publish_generated_columns'
> value). But I don't think that is the current implementation, so
> something did not quite seem right. I am not sure. If you say it is
> fine then I will believe it, but the question [2 #4] remains
> unanswered.
>
> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 9.
> send_relation_and_attrs:
>
> Reminder: Here is another question that was answered from [2 #5]. I
> did not really trust it for the current implementation, but for the
> "column list takes precedence" behaviour probably it will be ok.
>
> ~~~
>
> 10.
> +/*
> + * Prepare new column list bitmap. This includes all the columns of the table.
> + */
> +static Bitmapset *
> +prepare_all_columns_bms(PGOutputData *data, RelationSyncEntry *entry,
> + TupleDesc desc)
> +{
>
> This function needs a better comment with more explanation about what
> this is REALLY doing. e.g. it says "includes all columns of the
> table", but tthe implementation is skipping generated cols, so clearly
> it is not "all columns of the table".
>
> ~~~
>
> 11. pgoutput_column_list_init
>
> TBH, I struggle to read the logic of this function. Rewriting some
> parts, inverting some variables, and adding more commentary might help
> a lot.
>
> 11a.
> There are too many "negatives" (with ! operator and with the word "no"
> in the variable).
>
> e.g. code is written in a backward way like:
> if (!pub_no_list)
> cols = pub_collist_to_bitmapset(cols, cfdatum, entry->entry_cxt);
> else
> cols = prepare_all_columns_bms(data, entry, desc);
>
> instead of what could have been said:
> if (pub_rel_has_collist)
> cols = pub_collist_to_bitmapset(cols, cfdatum, entry->entry_cxt);
> else
> cols = prepare_all_columns_bms(data, entry, desc);
>
> ~
>
> 11b.
> - * If the publication is FOR ALL TABLES then it is treated the same as
> - * if there are no column lists (even if other publications have a
> - * list).
> + * If the publication is FOR ALL TABLES and include generated columns
> + * then it is treated the same as if there are no column lists (even
> + * if other publications have a list).
>   */
> - if (!pub->alltables)
> + if (!pub->alltables || !pub->pubgencols)
>
> The code does not appear to match the comment ("If the publication is
> FOR ALL TABLES and include generated columns"). If it did it should
> look like "if (pub->alltables && pub->pubgencols)".
>
> Also, should "and include generated column" be properly referring to
> the new PUBLICATION parameter name?
>
> Also, the comment is somewhat confusing. I saw in the thread Vignesh
> wrote an explanation like "To handle cases where the
> publish_generated_columns option isn't specified for all tables in a
> publication, the pubgencolumns check needs to be performed. In such
> cases, we must create a column list that excludes generated columns"
> [3]. IMO that was clearer information so something similar should be
> written in this code comment.
> ~
>
> 11c.
> + /* Build the column list bitmap in the per-entry context. */
> + if (!pub_no_list || !pub->pubgencols) /* when not null */
>
> I don't know what "when not null" means here. Aren't those both
> booleans? How can it be "null"?
>
> ======
> src/bin/pg_dump/pg_dump.c
>
> 12. getPublications:
>
> Huh?? The code has not changed to address an old review comment I had
> posted to say there seem multiple code fragments missing that should
> say "false AS pubgencols". Refer to [2 #7].
>
> ======
> src/bin/pg_dump/t/002_pg_dump.pl
>
> 13.
> 'ALTER PUBLICATION pub5 ADD TABLE test_table WHERE (col1 > 0);' => {
> + create_order => 51,
> + create_sql =>
> +   'ALTER PUBLICATION pub5 ADD TABLE dump_test.test_table WHERE (col1 > 0);',
> + regexp => qr/^
> + \QALTER PUBLICATION pub5 ADD TABLE ONLY dump_test.test_table WHERE
> ((col1 > 0));\E
> + /xm,
> + like => { %full_runs, section_post_data => 1, },
> + unlike => {
> + exclude_dump_test_schema => 1,
> + exclude_test_table => 1,
> + },
> + },
> +
> + 'ALTER PUBLICATION pub5 ADD TABLE test_second_table WHERE (col2 = \'test\');'
> +   => {
> + create_order => 52,
> + create_sql =>
> +   'ALTER PUBLICATION pub5 ADD TABLE dump_test.test_second_table
> WHERE (col2 = \'test\');',
> + regexp => qr/^
> + \QALTER PUBLICATION pub5 ADD TABLE ONLY dump_test.test_second_table
> WHERE ((col2 = 'test'::text));\E
> + /xm,
> + like => { %full_runs, section_post_data => 1, },
> + unlike => { exclude_dump_test_schema => 1, },
> +   },
> +
>
> It wasn't clear to me how these tests are related to the patch.
> Shouldn't there instead be some ALTER tests for trying to modify the
> 'publish_generate_columns' parameter?
>
> ======
> src/test/regress/expected/publication.out
> src/test/regress/sql/publication.sql
>
> 14.
> --- error: generated column "d" can't be in list
> +-- ok: generated columns can be in the list too
>  ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d);
> -ERROR:  cannot use generated column "d" in publication column list
> +WARNING:  specified generated column "d" in publication column list
> for publication with publish_generated_columns as false
>
> I think these tests for the WARNING scenario need to be a bit more
> deliberate. This seems to have happened as a side-effect. For example,
> I was expecting more testing like:
>
> Comments about various combinations to say what you are doing and what
> you are expecting:
> - gencols in column list with publish_generated_columns=false, expecting WARNING
> - gencols in column list with publish_generated_columns=true, NOT
> expecting WARNING
> - gencols in column list with publish_generated_columns=true, then
> ALTER PUBLICATION setting publication_generate_columns=false,
> expecting WARNING
> - NO gencols in column list with publish_generated_columns=false, then
> ALTER PUBLICATION to add gencols to column list, expecting WARNING
>
> ======
> src/test/subscription/t/031_column_list.pl
>
> 15.
> -# TEST: Generated and dropped columns are not considered for the column list.
> +# TEST: Dropped columns are not considered for the column list.
>  # So, the publication having a column list except for those columns and a
> -# publication without any column (aka all columns as part of the columns
> +# publication without any column list (aka all columns as part of the column
>  # list) are considered to have the same column list.
>  $node_publisher->safe_psql(
>   'postgres', qq(
>   CREATE TABLE test_mix_4 (a int PRIMARY KEY, b int, c int, d int
> GENERATED ALWAYS AS (a + 1) STORED);
>   ALTER TABLE test_mix_4 DROP COLUMN c;
>
> - CREATE PUBLICATION pub_mix_7 FOR TABLE test_mix_4 (a, b);
> - CREATE PUBLICATION pub_mix_8 FOR TABLE test_mix_4;
> + CREATE PUBLICATION pub_mix_7 FOR TABLE test_mix_4 WITH
> (publish_generated_columns = true);
> + CREATE PUBLICATION pub_mix_8 FOR TABLE test_mix_4 WITH
> (publish_generated_columns = false);
>
> I felt the comment for this test ought to be saying something more
> about what you are doing with the 'publish_generated_columns'
> parameters and what behaviour it was expecting.
>
> ======
> Please refer to the attachment which addresses some of the nit
> comments mentioned above.
>
> ======
> [1] my review of v31-0001:
> https://www.postgresql.org/message-id/CAHut%2BPsv-neEP_ftvBUBahh%2BKCWw%2BqQMF9N3sGU3YHWPEzFH-Q%40mail.gmail.com
> [2] my review of v30-0001:
> https://www.postgresql.org/message-id/CAHut%2BPuaitgE4tu3nfaR%3DPCQEKjB%3DmpDtZ1aWkbwb%3DJZE8YvqQ%40mail.gmail.com
> [3] https://www.postgresql.org/message-id/CALDaNm1c7xPBodHw6LKp9e8hvGVJHcKH%3DDHK0iXmZuXKPnxZ3Q%40mail.gmail.com
> [4] https://www.postgresql.org/message-id/CAHut%2BPv45gB4cV%2BSSs6730Kb8urQyqjdZ9PBVgmpwqCycr1Ybg%40mail.gmail.com
>

I have addressed all the comments in the v34-0001 Patch. Please refer
to the updated v34-0001 Patch here in [1]. See [1] for the changes
added.

[1] https://www.postgresql.org/message-id/CAHv8RjJkUdYCdK_bL3yvEV%3DzKrA2dsnZYa1VMT2H5v0%2BqbaGbA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Tue, Sep 24, 2024 at 7:08 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi. Here are my v32-0002 review comments:
>
> ======
> src/backend/replication/logical/tablesync.c
>
> 1. fetch_remote_table_info
>
>   /*
> - * Get column lists for each relation.
> + * Get column lists for each relation, and check if any of the
> + * publications have the 'publish_generated_columns' parameter enabled.
>
> I am not 100% sure about this logic anymore. Maybe it is OK, but it
> requires careful testing because with Amit's "column lists take
> precedence" it is now possible for the publication to say
> 'publish_generated_columns=false', but the publication can still
> publish gencols *anyway* if they were specified in a column list.
>
> ~~~
>

This comment is still open. Will fix this in the next version of patches.

> 2.
>   /*
>   * Fetch info about column lists for the relation (from all the
>   * publications).
>   */
> + StringInfo pub_names = makeStringInfo();
> +
> + get_publications_str(MySubscription->publications, pub_names, true);
>   resetStringInfo(&cmd);
>   appendStringInfo(&cmd,
> ~
>
> nit - The comment here seems misplaced.
>
> ~~~
>
> 3.
> + if (server_version >= 120000)
> + {
> + has_pub_with_pubgencols = server_version >= 180000 && has_pub_with_pubgencols;
> +
> + if (!has_pub_with_pubgencols)
> + appendStringInfo(&cmd, " AND a.attgenerated = ''");
> + }
>
> My previous review comment about this [1 #10] was:
> Can the 'gencols_allowed' var be removed, and the condition just be
> replaced with if (!has_pub_with_pubgencols)? It seems equivalent
> unless I am mistaken.
>
> nit - So the current v32 code is not what I was expecting. What I
> meant was 'has_pub_with_pubgencols' can only be true if server_version
> >= 180000, so I thought there was no reason to check it again. For
> reference, I've changed it to like I meant in the nitpicks attachment.
> Please see if that works the same.
>
> ======
> [1] my review of v31-0002.
> https://www.postgresql.org/message-id/CAHut%2BPusbhvPrL1uN1TKY%3DFd4zu3h63eDebZvsF%3Duy%2BLBKTwgA%40mail.gmail.com
>

I have addressed all the comments in the v34-0002 Patch. Please refer
to the updated v34-0002 Patch here in [1]. See [1] for the changes
added.

[1] https://www.postgresql.org/message-id/CAHv8RjJkUdYCdK_bL3yvEV%3DzKrA2dsnZYa1VMT2H5v0%2BqbaGbA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham. Here are my review comment for patch v34-0002.

======
doc/src/sgml/ref/create_publication.sgml

1.
+         <para>
+         This parameter can only be set <literal>true</literal> if
<literal>copy_data</literal> is
+         set to <literal>false</literal>.
+         </para>

Huh? AFAIK the patch implements COPY for generated columns, so why are
you saying this limitation?

======
src/backend/replication/logical/tablesync.c

2. reminder

Previously (18/9) [1 #4] I wrote maybe that other copy_data=false
"missing" case error can be improved to share the same error message
that you have in make_copy_attnamelist. And you replied [2] it would
be addressed in the next patchset, but that was at least 2 versions
back and I don't see any change yet.

======
[1] 18/9 review
https://www.postgresql.org/message-id/CAHut%2BPusbhvPrL1uN1TKY%3DFd4zu3h63eDebZvsF%3Duy%2BLBKTwgA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAHv8RjJ5_dmyCH58xQ0StXMdPt9gstemMMWytR79%2BLfOMAHdLw%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Vignesh,

On Mon, Sep 23, 2024 at 10:49 PM vignesh C <vignesh21@gmail.com> wrote:
>

> 3) In create publication column list/publish_generated_columns
> documentation we should mention that if generated column is mentioned
> in column list, generated columns mentioned in column list will be
> replication irrespective of publish_generated_columns option.
>

v34-0003 introduced a new Chapter 29 (Logical Replication) section for
"Generated Column Replication"
- This version also added a link from CREATE PUBLICATION
'publish_generated_column' parameter to this new section

To address your column list point, in v35-0003 I added more
information about Generate Columns in the Chapter 29 section "Column
List". The CREATE PUBLICATION column lists docs already linked to
that. See [1]

======
[1] v35-0003 -
https://www.postgresql.org/message-id/CAHut%2BPvoQS9HjcGFZrTHrUQZ8vzyfAcSgeTgQEoO_-f8CrhW4A%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham,

The different meanings of the terms "parameter" versus "option" were
discussed in a recent thread [1], and that has made me reconsider this
generated columns feature.

Despite being in the PUBLICATION section "WITH ( publication_parameter
[= value] [, ... ] )", I think that 'publish_generated_columns' is an
"option" (not a parameter).

We should update all those places that are currently calling it a parameter:
- commit messages
- docs
- comments
- etc.

======
[1] https://www.postgresql.org/message-id/CAHut%2BPuiRydyrYfMzR1OxOnVJf-_G8OBCLdyqu8jJ8si51d%2BEQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Thu, Oct 3, 2024 at 10:09 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shubham,
>
> The different meanings of the terms "parameter" versus "option" were
> discussed in a recent thread [1], and that has made me reconsider this
> generated columns feature.
>
> Despite being in the PUBLICATION section "WITH ( publication_parameter
> [= value] [, ... ] )", I think that 'publish_generated_columns' is an
> "option" (not a parameter).
>
> We should update all those places that are currently calling it a parameter:
> - commit messages
> - docs
> - comments
> - etc.
>
> ======
> [1] https://www.postgresql.org/message-id/CAHut%2BPuiRydyrYfMzR1OxOnVJf-_G8OBCLdyqu8jJ8si51d%2BEQ%40mail.gmail.com
>

It seems there are differing opinions on that other thread about what
term to use. Probably, it is best to just leave the above suggestion
alone for now.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Mon, Sep 30, 2024 at 12:56 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shubham. Here are my review comment for patch v34-0002.
>
> ======
> doc/src/sgml/ref/create_publication.sgml
>
> 1.
> +         <para>
> +         This parameter can only be set <literal>true</literal> if
> <literal>copy_data</literal> is
> +         set to <literal>false</literal>.
> +         </para>
>
> Huh? AFAIK the patch implements COPY for generated columns, so why are
> you saying this limitation?
>
> ======

I have fixed this in the v36-0002 patch.

> src/backend/replication/logical/tablesync.c
>
> 2. reminder
>
> Previously (18/9) [1 #4] I wrote maybe that other copy_data=false
> "missing" case error can be improved to share the same error message
> that you have in make_copy_attnamelist. And you replied [2] it would
> be addressed in the next patchset, but that was at least 2 versions
> back and I don't see any change yet.
>
This comment is still open. Will fix this and post in the next version
of patches.

Please refer to the updated v36-0002 Patch here in [1]. See [1] for
the changes added.

[1] https://www.postgresql.org/message-id/CAHv8Rj%2B1RDd7AnJNzOJXk--zcbTtU3nys%3DZgU3ktB4e3DWbJgg%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Shubham, I don't have any new comments for the patch v36-0002.

But, according to my records, there are multiple old comments not yet
addressed for this patch. I am giving reminders for those below so
they don't get accidentally overlooked. Please re-confirm and at the
next posted version please respond individually to each of these to
say if they are addressed or not.

======

1. General
From review v31 [1] comment #1. Patches 0001 and 0002 should be merged.

======
src/backend/replication/logical/tablesync.c

make_copy_attnamelist:

2.
From review v31 [1] comment #4. Make the detailed useful error message
common if possible.

~~~

fetch_remote_table_info:

3.
From review v31 [1] comment #5. I was not sure if this logic is
sophisticated enough to handle the case when the same table has
gencols but there are multiple subscribed publications and the
'publish_generated_columns' parameter differs. Is this scenario
tested?

~

4.
+ * Get column lists for each relation, and check if any of the
+ * publications have the 'publish_generated_columns' parameter enabled.

From review v32 [2] comment #1. This needs some careful testing. I was
not sure if sufficient to just check the 'publish_generated_columns'
flag. Now that "column lists take precedence" it is quite possible for
all publications to say 'publish_generated_columns=false', but the
publication can still publish gencols *anyway* if they are specified
in a column list.

======
[1] review v31 18/9 -

https://www.postgresql.org/message-id/flat/CAHv8Rj%2BKOoh58Uf5k2MN-%3DA3VdV60kCVKCh5ftqYxgkdxFSkqg%40mail.gmail.com#f2f3b48080f96ea45e1410f5b1cd9735
[2] review v32 24/9 -
https://www.postgresql.org/message-id/CAHut%2BPu7EcK_JTgWS7GzeStHk6Asb1dmEzCJU2TJf%2BW1Zy30LQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Fri, Oct 4, 2024 at 9:43 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shubham, I don't have any new comments for the patch v36-0002.
>
> But, according to my records, there are multiple old comments not yet
> addressed for this patch. I am giving reminders for those below so
> they don't get accidentally overlooked. Please re-confirm and at the
> next posted version please respond individually to each of these to
> say if they are addressed or not.
>
> ======
>
> 1. General
> From review v31 [1] comment #1. Patches 0001 and 0002 should be merged.
>
> ======
> src/backend/replication/logical/tablesync.c
>
> make_copy_attnamelist:
>
> 2.
> From review v31 [1] comment #4. Make the detailed useful error message
> common if possible.
>
> ~~~

This comment is still open. Will fix this in next versions of patches.

>
> fetch_remote_table_info:
>
> 3.
> From review v31 [1] comment #5. I was not sure if this logic is
> sophisticated enough to handle the case when the same table has
> gencols but there are multiple subscribed publications and the
> 'publish_generated_columns' parameter differs. Is this scenario
> tested?
>
> ~
>
> 4.
> + * Get column lists for each relation, and check if any of the
> + * publications have the 'publish_generated_columns' parameter enabled.
>
> From review v32 [2] comment #1. This needs some careful testing. I was
> not sure if sufficient to just check the 'publish_generated_columns'
> flag. Now that "column lists take precedence" it is quite possible for
> all publications to say 'publish_generated_columns=false', but the
> publication can still publish gencols *anyway* if they are specified
> in a column list.
>
> ======
> [1] review v31 18/9 -
>
https://www.postgresql.org/message-id/flat/CAHv8Rj%2BKOoh58Uf5k2MN-%3DA3VdV60kCVKCh5ftqYxgkdxFSkqg%40mail.gmail.com#f2f3b48080f96ea45e1410f5b1cd9735
> [2] review v32 24/9 -
> https://www.postgresql.org/message-id/CAHut%2BPu7EcK_JTgWS7GzeStHk6Asb1dmEzCJU2TJf%2BW1Zy30LQ%40mail.gmail.com
>

I have fixed the comments and posted the v37 patches for them. Please
refer to the updated v37 Patches here in [1]. See [1] for
the changes added.

[1]
https://www.postgresql.org/message-id/CAHv8Rj%2BRnw%2B_SfSyyrvWL49AfJzx4O8YVvdU9gB%2BSQdt3%3DqF%2BA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Tue, 8 Oct 2024 at 11:37, Shubham Khanna <khannashubham1197@gmail.com> wrote:
>
> On Fri, Oct 4, 2024 at 9:36 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi Shubham, here are my review comments for v36-0001.
> >
> > ======
> > 1. General  - merge patches
> >
> > It is long past due when patches 0001 and 0002 should've been merged.
> > AFAIK the split was only because historically these parts had
> > different authors. But, keeping them separated is not helpful anymore.
> >
> > ======
> > src/backend/catalog/pg_publication.c
> >
> > 2.
> >  Bitmapset *
> > -pub_collist_validate(Relation targetrel, List *columns)
> > +pub_collist_validate(Relation targetrel, List *columns, bool pubgencols)
> >
> > Since you removed the WARNING, this parameter 'pubgencols' is unused
> > so it should also be removed.
> >
> > ======
> > src/backend/replication/pgoutput/pgoutput.c
> >
> > 3.
> >   /*
> > - * If the publication is FOR ALL TABLES then it is treated the same as
> > - * if there are no column lists (even if other publications have a
> > - * list).
> > + * To handle cases where the publish_generated_columns option isn't
> > + * specified for all tables in a publication, we must create a column
> > + * list that excludes generated columns. So, the publisher will not
> > + * replicate the generated columns.
> >   */
> > - if (!pub->alltables)
> > + if (!(pub->alltables && pub->pubgencols))
> >
> > I still found that comment hard to understand. Does this mean to say
> > something like:
> >
> > ------
> > Process potential column lists for the following cases:
> >
> > a. Any publication that is not FOR ALL TABLES.
> >
> > b. When the publication is FOR ALL TABLES and
> > 'publish_generated_columns' is false.
> > A FOR ALL TABLES publication doesn't have user-defined column lists,
> > so all columns will be replicated by default. However, if
> > 'publish_generated_columns' is set to false, column lists must still
> > be created to exclude any generated columns from being published
> > ------
> >
> > ======
> > src/test/regress/sql/publication.sql
> >
> > 4.
> > +SET client_min_messages = 'WARNING';
> > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
> >
> > AFAIK you don't need to keep changing 'client_min_messages',
> > particularly now that you've removed the WARNING message that was
> > previously emitted.
> >
> > ~
> >
> > 5.
> > nit - minor comment changes.
> >
> > ======
> > Please refer to the attachment which implements any nits from above.
> >
>
> I have fixed all the given comments. Also, I have created a new 0003
> patch for the TAP-Tests related to the '011_generated.pl' file. I am
> planning to merge 0001 and 0003 patches once they will get fixed.
> The attached patches contain the required changes.

Few comments:
1) Since we are no longer throwing an error for generated columns, the
function header comments also need to be updated accordingly " Checks
for and raises an ERROR for any; unknown columns, system columns,
duplicate columns or generated columns."
-               if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
-                       ereport(ERROR,
-
errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
-                                       errmsg("cannot use generated
column \"%s\" in publication column list",
-                                                  colname));
-

2) Tab completion missing for "PUBLISH_GENERATED_COLUMNS" option in
ALTER PUBLICATION ... SET (
postgres=# alter publication pub2 set (PUBLISH
PUBLISH                     PUBLISH_VIA_PARTITION_ROOT

3) I was able to compile without this include, may be this is not required:
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -118,6 +118,7 @@
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/rel.h"

4) You can include "\dRp+ pubname" after each of the create/alter
publication to verify the columns that will be published:
+-- Test the 'publish_generated_columns' parameter enabled or disabled for
+-- different scenarios with/without generated columns in column lists.
+CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
+
+-- Generated columns in column list, when 'publish_generated_columns'=false
+CREATE PUBLICATION pub1 FOR table gencols(a, gen1) WITH
(publish_generated_columns=false);

+-- Generated columns in column list, when 'publish_generated_columns'=true
+CREATE PUBLICATION pub2 FOR table gencols(a, gen1) WITH
(publish_generated_columns=true);
+
+-- Generated columns in column list, then set
'publication_generate_columns'=false
+ALTER PUBLICATION pub2 SET (publish_generated_columns = false);
+
+-- Remove generate columns from column list, when
'publish_generated_columns'=false
+ALTER PUBLICATION pub2 SET TABLE gencols(a);
+
+-- Add generated columns in column list, when 'publish_generated_columns'=false
+ALTER PUBLICATION pub2 SET TABLE gencols(a, gen1);

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Tue, 8 Oct 2024 at 11:37, Shubham Khanna <khannashubham1197@gmail.com> wrote:
>
> On Fri, Oct 4, 2024 at 9:36 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi Shubham, here are my review comments for v36-0001.
> >
> > ======
> > 1. General  - merge patches
> >
> > It is long past due when patches 0001 and 0002 should've been merged.
> > AFAIK the split was only because historically these parts had
> > different authors. But, keeping them separated is not helpful anymore.
> >
> > ======
> > src/backend/catalog/pg_publication.c
> >
> > 2.
> >  Bitmapset *
> > -pub_collist_validate(Relation targetrel, List *columns)
> > +pub_collist_validate(Relation targetrel, List *columns, bool pubgencols)
> >
> > Since you removed the WARNING, this parameter 'pubgencols' is unused
> > so it should also be removed.
> >
> > ======
> > src/backend/replication/pgoutput/pgoutput.c
> >
> > 3.
> >   /*
> > - * If the publication is FOR ALL TABLES then it is treated the same as
> > - * if there are no column lists (even if other publications have a
> > - * list).
> > + * To handle cases where the publish_generated_columns option isn't
> > + * specified for all tables in a publication, we must create a column
> > + * list that excludes generated columns. So, the publisher will not
> > + * replicate the generated columns.
> >   */
> > - if (!pub->alltables)
> > + if (!(pub->alltables && pub->pubgencols))
> >
> > I still found that comment hard to understand. Does this mean to say
> > something like:
> >
> > ------
> > Process potential column lists for the following cases:
> >
> > a. Any publication that is not FOR ALL TABLES.
> >
> > b. When the publication is FOR ALL TABLES and
> > 'publish_generated_columns' is false.
> > A FOR ALL TABLES publication doesn't have user-defined column lists,
> > so all columns will be replicated by default. However, if
> > 'publish_generated_columns' is set to false, column lists must still
> > be created to exclude any generated columns from being published
> > ------
> >
> > ======
> > src/test/regress/sql/publication.sql
> >
> > 4.
> > +SET client_min_messages = 'WARNING';
> > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
> >
> > AFAIK you don't need to keep changing 'client_min_messages',
> > particularly now that you've removed the WARNING message that was
> > previously emitted.
> >
> > ~
> >
> > 5.
> > nit - minor comment changes.
> >
> > ======
> > Please refer to the attachment which implements any nits from above.
> >
>
> I have fixed all the given comments. Also, I have created a new 0003
> patch for the TAP-Tests related to the '011_generated.pl' file. I am
> planning to merge 0001 and 0003 patches once they will get fixed.
> The attached patches contain the required changes.

There is inconsistency in replication when a generated column is
specified in the column list. The generated column data is not
replicated during initial sync whereas it is getting replicated during
incremental sync:
-- publisher
CREATE TABLE t1(c1 int, c2 int GENERATED ALWAYS AS (c1 * 2) STORED)
INSERT INTO t1 VALUES (1);
CREATE PUBLICATION pub1 for table t1(c1, c2);

--subscriber
CREATE TABLE t1(c1 int, c2 int)
CREATE SUBSCRIPTION sub1 connection 'dbname=postgres host=localhost
port=5432' PUBLICATION pub1;

-- Generate column data is not synced during initial sync
postgres=# select * from t1;
 c1 | c2
----+----
  1 |
(1 row)

-- publisher
INSERT INTO t1 VALUES (2);

-- Whereas generated column data is synced during incremental sync
postgres=# select * from t1;
 c1 | c2
----+----
  1 |
  2 |  4
(2 rows)

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Tue, 8 Oct 2024 at 11:37, Shubham Khanna <khannashubham1197@gmail.com> wrote:
>
> On Fri, Oct 4, 2024 at 9:36 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi Shubham, here are my review comments for v36-0001.
> >
> > ======
> > 1. General  - merge patches
> >
> > It is long past due when patches 0001 and 0002 should've been merged.
> > AFAIK the split was only because historically these parts had
> > different authors. But, keeping them separated is not helpful anymore.
> >
> > ======
> > src/backend/catalog/pg_publication.c
> >
> > 2.
> >  Bitmapset *
> > -pub_collist_validate(Relation targetrel, List *columns)
> > +pub_collist_validate(Relation targetrel, List *columns, bool pubgencols)
> >
> > Since you removed the WARNING, this parameter 'pubgencols' is unused
> > so it should also be removed.
> >
> > ======
> > src/backend/replication/pgoutput/pgoutput.c
> >
> > 3.
> >   /*
> > - * If the publication is FOR ALL TABLES then it is treated the same as
> > - * if there are no column lists (even if other publications have a
> > - * list).
> > + * To handle cases where the publish_generated_columns option isn't
> > + * specified for all tables in a publication, we must create a column
> > + * list that excludes generated columns. So, the publisher will not
> > + * replicate the generated columns.
> >   */
> > - if (!pub->alltables)
> > + if (!(pub->alltables && pub->pubgencols))
> >
> > I still found that comment hard to understand. Does this mean to say
> > something like:
> >
> > ------
> > Process potential column lists for the following cases:
> >
> > a. Any publication that is not FOR ALL TABLES.
> >
> > b. When the publication is FOR ALL TABLES and
> > 'publish_generated_columns' is false.
> > A FOR ALL TABLES publication doesn't have user-defined column lists,
> > so all columns will be replicated by default. However, if
> > 'publish_generated_columns' is set to false, column lists must still
> > be created to exclude any generated columns from being published
> > ------
> >
> > ======
> > src/test/regress/sql/publication.sql
> >
> > 4.
> > +SET client_min_messages = 'WARNING';
> > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
> >
> > AFAIK you don't need to keep changing 'client_min_messages',
> > particularly now that you've removed the WARNING message that was
> > previously emitted.
> >
> > ~
> >
> > 5.
> > nit - minor comment changes.
> >
> > ======
> > Please refer to the attachment which implements any nits from above.
> >
>
> I have fixed all the given comments. Also, I have created a new 0003
> patch for the TAP-Tests related to the '011_generated.pl' file. I am
> planning to merge 0001 and 0003 patches once they will get fixed.
> The attached patches contain the required changes.

Few comments:
1) I felt this change need not be part of this patch, if required it
can be proposed as a separate patch:
+       if (server_version >= 150000)
        {
                WalRcvExecResult *pubres;
                TupleTableSlot *tslot;
                Oid                     attrsRow[] = {INT2VECTOROID};
-               StringInfoData pub_names;
-
-               initStringInfo(&pub_names);
-               foreach(lc, MySubscription->publications)
-               {
-                       if (foreach_current_index(lc) > 0)
-                               appendStringInfoString(&pub_names, ", ");
-                       appendStringInfoString(&pub_names,
quote_literal_cstr(strVal(lfirst(lc))));
-               }
+               StringInfo      pub_names = makeStringInfo();

2) These two statements can be combined in to single appendStringInfo:
+       appendStringInfo(&cmd,
                                         "  FROM pg_catalog.pg_attribute a"
                                         "  LEFT JOIN pg_catalog.pg_index i"
                                         "       ON (i.indexrelid =
pg_get_replica_identity_index(%u))"
                                         " WHERE a.attnum > 0::pg_catalog.int2"
-                                        "   AND NOT a.attisdropped %s"
+                                        "   AND NOT a.attisdropped",
lrel->remoteid);
+
+       appendStringInfo(&cmd,
                                         "   AND a.attrelid = %u"
                                         " ORDER BY a.attnum",
-                                        lrel->remoteid,
-
(walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 ?
-                                         "AND a.attgenerated = ''" : ""),
                                         lrel->remoteid);

3) In which scenario this will be hit:
+       /*
+        * Construct column list for COPY, excluding columns that are
subscription
+        * table generated columns.
+        */
+       for (int i = 0; i < rel->remoterel.natts; i++)
+       {
+               if (!localgenlist[i])
+                       attnamelist = lappend(attnamelist,
+
makeString(rel->remoterel.attnames[i]));
+       }

As in case of publisher having non generated columns:
CREATE TABLE t1(c1 int, c2 int)
and subscriber having generated columns:
CREATE TABLE t1(c1 int, c2 int GENERATED ALWAYS AS (c1 * 2) STORED)

We throw an error much earlier at
logicalrep_rel_open->logicalrep_report_missing_attrs saying:
ERROR: logical replication target relation "public.t1" is missing
replicated column: "c2"

4) To simplify the code and reduce complexity, we can refactor the
error checks to be included within the fetch_remote_table_info
function. This way, the remotegenlist will not need to be prepared and
passed to make_copy_attnamelist:
+       /*
+        * This loop checks for generated columns of the subscription table.
+        */
+       for (int i = 0; i < desc->natts; i++)
        {
-               attnamelist = lappend(attnamelist,
-
makeString(rel->remoterel.attnames[i]));
+               int                     remote_attnum;
+               Form_pg_attribute attr = TupleDescAttr(desc, i);
+
+               if (!attr->attgenerated)
+                       continue;
+
+               remote_attnum = logicalrep_rel_att_by_name(&rel->remoterel,
+
                            NameStr(attr->attname));
+
+               if (remote_attnum >= 0)
+               {
+                       /*
+                        * Check if the subscription table generated
column has same name
+                        * as a non-generated column in the
corresponding publication
+                        * table.
+                        */
+                       if (!remotegenlist[remote_attnum])
+                               ereport(ERROR,
+
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                                errmsg("logical
replication target relation \"%s.%s\" has a generated column \"%s\" "
+                                                               "but
corresponding column on source relation is not a generated column",
+
rel->remoterel.nspname, rel->remoterel.relname,
NameStr(attr->attname))));
+
+                       /*
+                        * 'localgenlist' records that this is a
generated column in the
+                        * subscription table. Later, we use this
information to skip
+                        * adding this column to the column list for COPY.
+                        */
+                       localgenlist[remote_attnum] = true;
+               }
        }

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Tue, 8 Oct 2024 at 11:37, Shubham Khanna <khannashubham1197@gmail.com> wrote:
>
> On Fri, Oct 4, 2024 at 9:36 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi Shubham, here are my review comments for v36-0001.
> >
> > ======
> > 1. General  - merge patches
> >
> > It is long past due when patches 0001 and 0002 should've been merged.
> > AFAIK the split was only because historically these parts had
> > different authors. But, keeping them separated is not helpful anymore.
> >
> > ======
> > src/backend/catalog/pg_publication.c
> >
> > 2.
> >  Bitmapset *
> > -pub_collist_validate(Relation targetrel, List *columns)
> > +pub_collist_validate(Relation targetrel, List *columns, bool pubgencols)
> >
> > Since you removed the WARNING, this parameter 'pubgencols' is unused
> > so it should also be removed.
> >
> > ======
> > src/backend/replication/pgoutput/pgoutput.c
> >
> > 3.
> >   /*
> > - * If the publication is FOR ALL TABLES then it is treated the same as
> > - * if there are no column lists (even if other publications have a
> > - * list).
> > + * To handle cases where the publish_generated_columns option isn't
> > + * specified for all tables in a publication, we must create a column
> > + * list that excludes generated columns. So, the publisher will not
> > + * replicate the generated columns.
> >   */
> > - if (!pub->alltables)
> > + if (!(pub->alltables && pub->pubgencols))
> >
> > I still found that comment hard to understand. Does this mean to say
> > something like:
> >
> > ------
> > Process potential column lists for the following cases:
> >
> > a. Any publication that is not FOR ALL TABLES.
> >
> > b. When the publication is FOR ALL TABLES and
> > 'publish_generated_columns' is false.
> > A FOR ALL TABLES publication doesn't have user-defined column lists,
> > so all columns will be replicated by default. However, if
> > 'publish_generated_columns' is set to false, column lists must still
> > be created to exclude any generated columns from being published
> > ------
> >
> > ======
> > src/test/regress/sql/publication.sql
> >
> > 4.
> > +SET client_min_messages = 'WARNING';
> > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
> >
> > AFAIK you don't need to keep changing 'client_min_messages',
> > particularly now that you've removed the WARNING message that was
> > previously emitted.
> >
> > ~
> >
> > 5.
> > nit - minor comment changes.
> >
> > ======
> > Please refer to the attachment which implements any nits from above.
> >
>
> I have fixed all the given comments. Also, I have created a new 0003
> patch for the TAP-Tests related to the '011_generated.pl' file. I am
> planning to merge 0001 and 0003 patches once they will get fixed.
> The attached patches contain the required changes.

Few comments for v37-0002 patch:
1.a) We could include the output of each command execution like
"CREATE TABLE", "INSERT 0 3" and "CREATE PUBLICATION" as we have done
in other places like in [1]:
+test_pub=# CREATE TABLE tab_gen_to_gen (a int, b int GENERATED ALWAYS
AS (a + 1) STORED);
+test_pub=# INSERT INTO tab_gen_to_gen VALUES (1),(2),(3);
+test_pub=# CREATE PUBLICATION pub1 FOR TABLE tab_gen_to_gen;

1.b) Similarly here too:
+test_sub=# CREATE TABLE tab_gen_to_gen (a int, b int GENERATED ALWAYS
AS (a * 100) STORED);
+test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=test_pub'
PUBLICATION pub1;
+test_sub=# SELECT * from tab_gen_to_gen;

1.c) Similarly here too:
+<programlisting>
+test_pub=# CREATE TABLE t1 (a int PRIMARY KEY, b int,
+test_pub-#                  c int GENERATED ALWAYS AS (a + 1) STORED,
+test_pub-#                  d int GENERATED ALWAYS AS (b + 1) STORED);
+
+test_pub=# CREATE TABLE t2 (a int PRIMARY KEY, b int,
+test_pub-#                  c int GENERATED ALWAYS AS (a + 1) STORED,
+test_pub-#                  d int GENERATED ALWAYS AS (b + 1) STORED);
+</programlisting>
+<programlisting>
+test_sub=# CREATE TABLE t1 (a int PRIMARY KEY, b int,
+test_sub-#                  c int,
+test_sub-#                  d int GENERATED ALWAYS AS (b * 100) STORED);
+
+test_sub=# CREATE TABLE t2 (a int PRIMARY KEY, b int,
+test_sub-#                  c int,
+test_sub-#                  d int);

1.d) Similarly here too:
+<programlisting>
+test_pub=# CREATE PUBLICATION pub1 FOR TABLE t1, t2(a,c)
+test_pub-#     WITH (publish_generated_columns=false);
+</programlisting>
+<programlisting>
+test_sub=# CREATE SUBSCRIPTION sub1
+test_sub-#     CONNECTION 'dbname=test_pub'
+test_sub-#     PUBLICATION pub1;
+</programlisting>

1.e) Similarly here too:
+   Insert some data to the publisher tables:
+<programlisting>
+test_pub=# INSERT INTO t1 VALUES (1,2);
+test_pub=# INSERT INTO t2 VALUES (1,2);

2) All of the document changes of ddl.sgml, protocol.sgml,
create_publication.sgml can also be moved from 0001 patch to 0002
patch:
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 8ab0ddb112..7b9c349343 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -514,8 +514,10 @@ CREATE TABLE people (
     </listitem>
     <listitem>
      <para>
-      Generated columns are skipped for logical replication and cannot be
-      specified in a <command>CREATE PUBLICATION</command> column list.
+      Generated columns may be skipped during logical replication
according to the
+      <command>CREATE PUBLICATION</command> parameter
+      <link linkend="sql-createpublication-params-with-publish-generated-columns">
+      <literal>publish_generated_columns</literal></link>.

3) I felt "(except generated columns)" should be removed from here too:
  <variablelist>
   <varlistentry id="protocol-logicalrep-message-formats-TupleData">
    <term>TupleData</term>
    <listitem>
     <variablelist>
      <varlistentry>
       <term>Int16</term>
       <listitem>
        <para>
         Number of columns.
        </para>
       </listitem>
      </varlistentry>
     </variablelist>

     <para>
      Next, one of the following submessages appears for each column
(except generated columns):

[1] -
https://www.postgresql.org/docs/devel/logical-replication-subscription.html#LOGICAL-REPLICATION-SUBSCRIPTION-EXAMPLES

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Masahiko Sawada
Date:
On Mon, Oct 7, 2024 at 11:07 PM Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Fri, Oct 4, 2024 at 9:36 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi Shubham, here are my review comments for v36-0001.
> >
> > ======
> > 1. General  - merge patches
> >
> > It is long past due when patches 0001 and 0002 should've been merged.
> > AFAIK the split was only because historically these parts had
> > different authors. But, keeping them separated is not helpful anymore.
> >
> > ======
> > src/backend/catalog/pg_publication.c
> >
> > 2.
> >  Bitmapset *
> > -pub_collist_validate(Relation targetrel, List *columns)
> > +pub_collist_validate(Relation targetrel, List *columns, bool pubgencols)
> >
> > Since you removed the WARNING, this parameter 'pubgencols' is unused
> > so it should also be removed.
> >
> > ======
> > src/backend/replication/pgoutput/pgoutput.c
> >
> > 3.
> >   /*
> > - * If the publication is FOR ALL TABLES then it is treated the same as
> > - * if there are no column lists (even if other publications have a
> > - * list).
> > + * To handle cases where the publish_generated_columns option isn't
> > + * specified for all tables in a publication, we must create a column
> > + * list that excludes generated columns. So, the publisher will not
> > + * replicate the generated columns.
> >   */
> > - if (!pub->alltables)
> > + if (!(pub->alltables && pub->pubgencols))
> >
> > I still found that comment hard to understand. Does this mean to say
> > something like:
> >
> > ------
> > Process potential column lists for the following cases:
> >
> > a. Any publication that is not FOR ALL TABLES.
> >
> > b. When the publication is FOR ALL TABLES and
> > 'publish_generated_columns' is false.
> > A FOR ALL TABLES publication doesn't have user-defined column lists,
> > so all columns will be replicated by default. However, if
> > 'publish_generated_columns' is set to false, column lists must still
> > be created to exclude any generated columns from being published
> > ------
> >
> > ======
> > src/test/regress/sql/publication.sql
> >
> > 4.
> > +SET client_min_messages = 'WARNING';
> > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
> >
> > AFAIK you don't need to keep changing 'client_min_messages',
> > particularly now that you've removed the WARNING message that was
> > previously emitted.
> >
> > ~
> >
> > 5.
> > nit - minor comment changes.
> >
> > ======
> > Please refer to the attachment which implements any nits from above.
> >
>
> I have fixed all the given comments. Also, I have created a new 0003
> patch for the TAP-Tests related to the '011_generated.pl' file. I am
> planning to merge 0001 and 0003 patches once they will get fixed.
> The attached patches contain the required changes.
>

Regarding the 0001 patch, it seems to me that UPDATE and DELETE are
allowed on the table even if its replica identity is set to generated
columns that are not published. For example, consider the following
scenario:

create table t (a int not null, b int generated always as (a + 1)
stored not null);
create unique index t_idx on t (b);
alter table t replica identity using index t_idx;
create publication pub for table t with (publish_generated_columns = false);
insert into t values (1);
update t set a = 100 where a = 1;

The publication pub doesn't include the generated column 'b' which is
the replica identity of the table 't'. Therefore, the update message
generated by the last UPDATE would have NULL for the column 'b'. I
think we should not allow UPDATE and DELETE on such a table.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Here are some comments for TAP test patch v37-0003.

I’m not in favour of the removal of such a large number of
'combination' and other 'misc' tests. In the commit message, please
delete me as a "co-author" of this patch.

======

1.
Any description or comment that still mentions "all combinations" is
no longer valid:

(e.g. in the comment message)
Add tests for all combinations of generated column replication.

(e.g. in the test file)
# The following test cases exercise logical replication for all combinations
# where there is a generated column on one or both sides of pub/sub:

and

# Furthermore, all combinations are tested using:

======
2.
+# --------------------------------------------------
+# Testcase: generated -> normal
+# Publisher table has generated column 'b'.
+# Subscriber table has normal column 'b'.
+# --------------------------------------------------
+

Now that COPY for generated columns is already implemented in patch
0001, shouldn't this test be using 'copy_data' enabled, so it can test
replication both for initial tablesync as well as normal replication?

That was the whole point of having the "# XXX copy_data=false for now.
This will be changed later." reminder comment in this file.

======

3.
Previously there were some misc tests to ensure that a generated
column which was then altered using DROP EXPRESSION would work as
expected. The test scenario was commented like:

+# =============================================================================
+# Misc test.
+#
+# A "normal -> generated" replication fails, reporting an error that the
+# subscriber side column is missing.
+#
+# In this test case we use DROP EXPRESSION to change the subscriber generated
+# column into a normal column, then verify replication works ok.
+# =============================================================================

Now in patch v37 this test no longer exists. Why?

======
4.
+# =============================================================================
+# The following test cases demonstrate behavior of generated column replication
+# when publish_generated_colums=false/true:
+#
+# Test: column list includes gencols, when publish_generated_columns=false
+# Test: column list does not include gencols, when
publish_generated_columns=false
+#
+# Test: column list includes gencols, when publish_generated_columns=true
+# Test: column list does not include gencols, when
publish_generated_columns=true
+# Test: no column list, when publish_generated_columns=true
+# =============================================================================

These tests are currently only testing the initial tablesync
replication. Since the COPY logic is different from the normal
replication logic, I think it would be better to test some normal
replication records as well, to make sure both parts work
consistently. This comment applies to all of the following test cases.

~~~

5.
+# Create table and publications.
+$node_publisher->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE nogen_to_gen3 (a int, b int, gen1 int GENERATED ALWAYS
AS (a * 2) STORED, gen2 int GENERATED ALWAYS AS (a * 2) STORED);
+ CREATE TABLE nogen_to_gen4 (c int, d int, gen1 int GENERATED ALWAYS
AS (c * 2) STORED, gen2 int GENERATED ALWAYS AS (c * 2) STORED);
+ INSERT INTO nogen_to_gen3 VALUES (1, 1);
+ INSERT INTO nogen_to_gen4 VALUES (1, 1);
+ CREATE PUBLICATION pub1 FOR table nogen_to_gen3, nogen_to_gen4(gen1)
WITH (publish_generated_columns=true);
+));
+

5a.
The code should do only what the comments say it does. So, the INSERTS
should be done separately after the CREATE PUBLICATION, but before the
CREATE SUBSCRIPTION. A similar change should be made for all of these
test cases.

# Insert some initial data
INSERT INTO nogen_to_gen3 VALUES (1, 1);
INSERT INTO nogen_to_gen4 VALUES (1, 1);

~

5b.
The tables are badly named. Why are they 'nogen_to_gen', when the
publisher side has generated cols and the subscriber side does not?
This problem seems repeated in multiple subsequent test cases.

~

6.
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT * FROM gen_to_nogen ORDER BY a");
+is($result, qq(1|1||2),
+ 'gen_to_nogen initial sync, when publish_generated_columns=false');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT * FROM gen_to_nogen2 ORDER BY c");
+is($result, qq(1|1||),
+ 'gen_to_nogen2 initial sync, when publish_generated_columns=false');

IMO all the "result" queries like these ones ought to have to have a
comment which explains the reason for the expected results. This
review comment applies to multiple places. Please add comments to all
of them.

~~~

7.
+# --------------------------------------------------
+# Testcase: Publisher replicates the column list data excluding generated
+# columns even though publish_generated_columns option is false.
+# --------------------------------------------------
+

7a.
This is the 2nd test case, but AFAICT it would be far easier to test
this scenario just by making another table (with an appropriate column
list) for the 1st test case.

~

7b.
BTW, I don't understand this test at all. I thought according to the
comment that it intended to use a publication column list with only
normal columns in it. But that is not what the publication looks like
here:
+ CREATE PUBLICATION pub1 FOR table nogen_to_gen, nogen_to_gen2(gen1)
WITH (publish_generated_columns=false);

Indeed, the way it is currently written I didn't see what this test is
doing that is any different from the prior test (???)

~~~

8.
+# --------------------------------------------------
+# Testcase: Although publish_generated_columns is true, publisher publishes
+# only the data of the columns specified in column list, skipping other
+# generated/non-generated columns.
+# --------------------------------------------------

versus

+# --------------------------------------------------
+# Testcase: Publisher publishes only the data of the columns specified in
+# column list skipping other generated/non-generated columns.
+# --------------------------------------------------

Again, I did not understand how these test cases differ from each
other. Surely, those can be combined easily enough just by adding
another table with a different kind of column list.

~~~

9.
+# --------------------------------------------------
+# Testcase: Publisher replicates all columns if publish_generated_columns is
+# enabled and there is no column list
+# --------------------------------------------------
+

Here is yet another test case that AFAICT can just be combined with
other test cases that were using publish_generated_columns=true. It
seems all you need is one extra table with no column list. You don't
need all the extra create/drop pub/sub overheads to test this.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Wed, Oct 9, 2024 at 11:00 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Tue, 8 Oct 2024 at 11:37, Shubham Khanna <khannashubham1197@gmail.com> wrote:
> >
> > On Fri, Oct 4, 2024 at 9:36 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > Hi Shubham, here are my review comments for v36-0001.
> > >
> > > ======
> > > 1. General  - merge patches
> > >
> > > It is long past due when patches 0001 and 0002 should've been merged.
> > > AFAIK the split was only because historically these parts had
> > > different authors. But, keeping them separated is not helpful anymore.
> > >
> > > ======
> > > src/backend/catalog/pg_publication.c
> > >
> > > 2.
> > >  Bitmapset *
> > > -pub_collist_validate(Relation targetrel, List *columns)
> > > +pub_collist_validate(Relation targetrel, List *columns, bool pubgencols)
> > >
> > > Since you removed the WARNING, this parameter 'pubgencols' is unused
> > > so it should also be removed.
> > >
> > > ======
> > > src/backend/replication/pgoutput/pgoutput.c
> > >
> > > 3.
> > >   /*
> > > - * If the publication is FOR ALL TABLES then it is treated the same as
> > > - * if there are no column lists (even if other publications have a
> > > - * list).
> > > + * To handle cases where the publish_generated_columns option isn't
> > > + * specified for all tables in a publication, we must create a column
> > > + * list that excludes generated columns. So, the publisher will not
> > > + * replicate the generated columns.
> > >   */
> > > - if (!pub->alltables)
> > > + if (!(pub->alltables && pub->pubgencols))
> > >
> > > I still found that comment hard to understand. Does this mean to say
> > > something like:
> > >
> > > ------
> > > Process potential column lists for the following cases:
> > >
> > > a. Any publication that is not FOR ALL TABLES.
> > >
> > > b. When the publication is FOR ALL TABLES and
> > > 'publish_generated_columns' is false.
> > > A FOR ALL TABLES publication doesn't have user-defined column lists,
> > > so all columns will be replicated by default. However, if
> > > 'publish_generated_columns' is set to false, column lists must still
> > > be created to exclude any generated columns from being published
> > > ------
> > >
> > > ======
> > > src/test/regress/sql/publication.sql
> > >
> > > 4.
> > > +SET client_min_messages = 'WARNING';
> > > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
> > >
> > > AFAIK you don't need to keep changing 'client_min_messages',
> > > particularly now that you've removed the WARNING message that was
> > > previously emitted.
> > >
> > > ~
> > >
> > > 5.
> > > nit - minor comment changes.
> > >
> > > ======
> > > Please refer to the attachment which implements any nits from above.
> > >
> >
> > I have fixed all the given comments. Also, I have created a new 0003
> > patch for the TAP-Tests related to the '011_generated.pl' file. I am
> > planning to merge 0001 and 0003 patches once they will get fixed.
> > The attached patches contain the required changes.
>
> There is inconsistency in replication when a generated column is
> specified in the column list. The generated column data is not
> replicated during initial sync whereas it is getting replicated during
> incremental sync:
> -- publisher
> CREATE TABLE t1(c1 int, c2 int GENERATED ALWAYS AS (c1 * 2) STORED)
> INSERT INTO t1 VALUES (1);
> CREATE PUBLICATION pub1 for table t1(c1, c2);
>
> --subscriber
> CREATE TABLE t1(c1 int, c2 int)
> CREATE SUBSCRIPTION sub1 connection 'dbname=postgres host=localhost
> port=5432' PUBLICATION pub1;
>
> -- Generate column data is not synced during initial sync
> postgres=# select * from t1;
>  c1 | c2
> ----+----
>   1 |
> (1 row)
>
> -- publisher
> INSERT INTO t1 VALUES (2);
>
> -- Whereas generated column data is synced during incremental sync
> postgres=# select * from t1;
>  c1 | c2
> ----+----
>   1 |
>   2 |  4
> (2 rows)
>

There was an issue for this scenario:
CREATE TABLE t1(c1 int, c2 int GENERATED ALWAYS AS (c1 * 2) STORED)
create publication pub1 for table t1(c1, c2)

In this case included_cols was getting set to NULL.
Changed it to get included_cols as it is instead of replacing with
NULL and changed the condition to:
    if (server_version >= 180000)
    {
      remotegenlist[natt] = DatumGetBool(slot_getattr(slot, 5, &isnull));
      /*
       * If the column is generated and neither the generated column
       * option is specified nor it appears in the column list, we will
       * skip it.
       */
      if (remotegenlist[natt] && !has_pub_with_pubgencols && !included_cols)
      {
        ExecClearTuple(slot);
        continue;
      }
    }

I will further think if there is a better solution for this.
Please refer to the updated v39 Patches here in [1]. See [1] for the
changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjLjb%2B98i5ZQUphivxdOZ3hSGLfq2SiWQetUvk8zGyAQwQ%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Wed, Oct 9, 2024 at 11:13 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, here are my review comments for patch v37-0001.
>
> ======
> Commit message
>
> 1.
> Example usage of subscription option:
> CREATE PUBLICATION FOR TABLE tab_gencol WITH (publish_generated_columns
> = true);
>
> ~
>
> This is wrong -- it's not a "subscription option". Better to just say
> "Example usage:"
>
> ~~~
>
> 2.
> When 'copy_data' is true, during the initial sync, the data is replicated from
> the publisher to the subscriber using the COPY command. The normal COPY
> command does not copy generated columns, so when 'publish_generated_columns'
> is true...
>
> ~
>
> By only mentioning the "when ... is true" case this description does
> not cover the scenario when 'publish_generated_columns' is false when
> the publication column list has a generated column.
>
> ~~~
>
> 3.
> typo - /replication of generated column/replication of generated columns/
> typo - /filed/filled/
> typo - 'pg_publicataion' catalog
>
> ======
> src/backend/replication/logical/tablesync.c
>
> make_copy_attnamelist:
> 4.
> nit - missing word in a comment
>
> ~~~
>
> fetch_remote_table_info:
> 5.
> + appendStringInfo(&cmd,
>   "  FROM pg_catalog.pg_attribute a"
>   "  LEFT JOIN pg_catalog.pg_index i"
>   "       ON (i.indexrelid = pg_get_replica_identity_index(%u))"
>   " WHERE a.attnum > 0::pg_catalog.int2"
> - "   AND NOT a.attisdropped %s"
> + "   AND NOT a.attisdropped", lrel->remoteid);
> +
> + appendStringInfo(&cmd,
>   "   AND a.attrelid = %u"
>   " ORDER BY a.attnum",
> - lrel->remoteid,
> - (walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 ?
> -   "AND a.attgenerated = ''" : ""),
>   lrel->remoteid);
>
> Version v37-0001 has removed a condition previously between these two
> appendStringInfo's. But, that now means there is no reason to keep
> these statements separated. These should be combined now to use one
> appendStringInfo.
>
> ~
>
> 6.
> + if (server_version >= 120000)
> + remotegenlist[natt] = DatumGetBool(slot_getattr(slot, 5, &isnull));
> +
>
> Are you sure the version check for 120000 is correct? IIUC, this 5
> matches the 'attgenerated' column, but the SQL for that was
> constructed using a different condition:
> if (server_version >= 180000)
>   appendStringInfo(&cmd, ", a.attgenerated != ''");
>
> It is this 120000 versus 180000 difference that makes me suspicious of
> a potential mistake.
>
> ~~~
>
> 7.
> + /*
> + * If the column is generated and neither the generated column option
> + * is specified nor it appears in the column list, we will skip it.
> + */
> + if (remotegenlist[natt] && !has_pub_with_pubgencols &&
> + !bms_is_member(attnum, included_cols))
> + {
> + ExecClearTuple(slot);
> + continue;
> + }
>
> 7b.
> I am also suspicious about how this condition interacts with the other
> condition (shown below) that came earlier:
> /* If the column is not in the column list, skip it. */
> if (included_cols != NULL && !bms_is_member(attnum, included_cols))
>
> Something doesn't seem right. e.g. If we can only get here by passing
> the earlier condition, then it means we already know the generated
> condition was *not* a member of a column list.... in which case that
> should affect this new condition and the new comment too.
>
> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> pgoutput_column_list_init:
>
> 8.
>   /*
> - * If the publication is FOR ALL TABLES then it is treated the same as
> - * if there are no column lists (even if other publications have a
> - * list).
> + * Process potential column lists for the following cases: a. Any
> + * publication that is not FOR ALL TABLES. b. When the publication is
> + * FOR ALL TABLES and 'publish_generated_columns' is false. FOR ALL
> + * TABLES publication doesn't have user-defined column lists, so all
> + * columns will be replicated by default. However, if
> + * 'publish_generated_columns' is set to false, column lists must
> + * still be created to exclude any generated columns from being
> + * published.
>   */
>
> nit - please reformat this comment so the bullets are readable
>

I have fixed all the comments and posted the v39 patches for them.
Please refer to the updated v39 Patches here in [1]. See [1] for the
changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjLjb%2B98i5ZQUphivxdOZ3hSGLfq2SiWQetUvk8zGyAQwQ%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Wed, Oct 9, 2024 at 11:52 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Tue, 8 Oct 2024 at 11:37, Shubham Khanna <khannashubham1197@gmail.com> wrote:
> >
> > On Fri, Oct 4, 2024 at 9:36 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > Hi Shubham, here are my review comments for v36-0001.
> > >
> > > ======
> > > 1. General  - merge patches
> > >
> > > It is long past due when patches 0001 and 0002 should've been merged.
> > > AFAIK the split was only because historically these parts had
> > > different authors. But, keeping them separated is not helpful anymore.
> > >
> > > ======
> > > src/backend/catalog/pg_publication.c
> > >
> > > 2.
> > >  Bitmapset *
> > > -pub_collist_validate(Relation targetrel, List *columns)
> > > +pub_collist_validate(Relation targetrel, List *columns, bool pubgencols)
> > >
> > > Since you removed the WARNING, this parameter 'pubgencols' is unused
> > > so it should also be removed.
> > >
> > > ======
> > > src/backend/replication/pgoutput/pgoutput.c
> > >
> > > 3.
> > >   /*
> > > - * If the publication is FOR ALL TABLES then it is treated the same as
> > > - * if there are no column lists (even if other publications have a
> > > - * list).
> > > + * To handle cases where the publish_generated_columns option isn't
> > > + * specified for all tables in a publication, we must create a column
> > > + * list that excludes generated columns. So, the publisher will not
> > > + * replicate the generated columns.
> > >   */
> > > - if (!pub->alltables)
> > > + if (!(pub->alltables && pub->pubgencols))
> > >
> > > I still found that comment hard to understand. Does this mean to say
> > > something like:
> > >
> > > ------
> > > Process potential column lists for the following cases:
> > >
> > > a. Any publication that is not FOR ALL TABLES.
> > >
> > > b. When the publication is FOR ALL TABLES and
> > > 'publish_generated_columns' is false.
> > > A FOR ALL TABLES publication doesn't have user-defined column lists,
> > > so all columns will be replicated by default. However, if
> > > 'publish_generated_columns' is set to false, column lists must still
> > > be created to exclude any generated columns from being published
> > > ------
> > >
> > > ======
> > > src/test/regress/sql/publication.sql
> > >
> > > 4.
> > > +SET client_min_messages = 'WARNING';
> > > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
> > >
> > > AFAIK you don't need to keep changing 'client_min_messages',
> > > particularly now that you've removed the WARNING message that was
> > > previously emitted.
> > >
> > > ~
> > >
> > > 5.
> > > nit - minor comment changes.
> > >
> > > ======
> > > Please refer to the attachment which implements any nits from above.
> > >
> >
> > I have fixed all the given comments. Also, I have created a new 0003
> > patch for the TAP-Tests related to the '011_generated.pl' file. I am
> > planning to merge 0001 and 0003 patches once they will get fixed.
> > The attached patches contain the required changes.
>
> Few comments:
> 1) I felt this change need not be part of this patch, if required it
> can be proposed as a separate patch:
> +       if (server_version >= 150000)
>         {
>                 WalRcvExecResult *pubres;
>                 TupleTableSlot *tslot;
>                 Oid                     attrsRow[] = {INT2VECTOROID};
> -               StringInfoData pub_names;
> -
> -               initStringInfo(&pub_names);
> -               foreach(lc, MySubscription->publications)
> -               {
> -                       if (foreach_current_index(lc) > 0)
> -                               appendStringInfoString(&pub_names, ", ");
> -                       appendStringInfoString(&pub_names,
> quote_literal_cstr(strVal(lfirst(lc))));
> -               }
> +               StringInfo      pub_names = makeStringInfo();
>
> 2) These two statements can be combined in to single appendStringInfo:
> +       appendStringInfo(&cmd,
>                                          "  FROM pg_catalog.pg_attribute a"
>                                          "  LEFT JOIN pg_catalog.pg_index i"
>                                          "       ON (i.indexrelid =
> pg_get_replica_identity_index(%u))"
>                                          " WHERE a.attnum > 0::pg_catalog.int2"
> -                                        "   AND NOT a.attisdropped %s"
> +                                        "   AND NOT a.attisdropped",
> lrel->remoteid);
> +
> +       appendStringInfo(&cmd,
>                                          "   AND a.attrelid = %u"
>                                          " ORDER BY a.attnum",
> -                                        lrel->remoteid,
> -
> (walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 ?
> -                                         "AND a.attgenerated = ''" : ""),
>                                          lrel->remoteid);
>
> 3) In which scenario this will be hit:
> +       /*
> +        * Construct column list for COPY, excluding columns that are
> subscription
> +        * table generated columns.
> +        */
> +       for (int i = 0; i < rel->remoterel.natts; i++)
> +       {
> +               if (!localgenlist[i])
> +                       attnamelist = lappend(attnamelist,
> +
> makeString(rel->remoterel.attnames[i]));
> +       }
>
> As in case of publisher having non generated columns:
> CREATE TABLE t1(c1 int, c2 int)
> and subscriber having generated columns:
> CREATE TABLE t1(c1 int, c2 int GENERATED ALWAYS AS (c1 * 2) STORED)
>
> We throw an error much earlier at
> logicalrep_rel_open->logicalrep_report_missing_attrs saying:
> ERROR: logical replication target relation "public.t1" is missing
> replicated column: "c2"
>
> 4) To simplify the code and reduce complexity, we can refactor the
> error checks to be included within the fetch_remote_table_info
> function. This way, the remotegenlist will not need to be prepared and
> passed to make_copy_attnamelist:
> +       /*
> +        * This loop checks for generated columns of the subscription table.
> +        */
> +       for (int i = 0; i < desc->natts; i++)
>         {
> -               attnamelist = lappend(attnamelist,
> -
> makeString(rel->remoterel.attnames[i]));
> +               int                     remote_attnum;
> +               Form_pg_attribute attr = TupleDescAttr(desc, i);
> +
> +               if (!attr->attgenerated)
> +                       continue;
> +
> +               remote_attnum = logicalrep_rel_att_by_name(&rel->remoterel,
> +
>                             NameStr(attr->attname));
> +
> +               if (remote_attnum >= 0)
> +               {
> +                       /*
> +                        * Check if the subscription table generated
> column has same name
> +                        * as a non-generated column in the
> corresponding publication
> +                        * table.
> +                        */
> +                       if (!remotegenlist[remote_attnum])
> +                               ereport(ERROR,
> +
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +                                                errmsg("logical
> replication target relation \"%s.%s\" has a generated column \"%s\" "
> +                                                               "but
> corresponding column on source relation is not a generated column",
> +
> rel->remoterel.nspname, rel->remoterel.relname,
> NameStr(attr->attname))));
> +
> +                       /*
> +                        * 'localgenlist' records that this is a
> generated column in the
> +                        * subscription table. Later, we use this
> information to skip
> +                        * adding this column to the column list for COPY.
> +                        */
> +                       localgenlist[remote_attnum] = true;
> +               }
>         }
>

I have fixed all the comments and posted the v39 patches for them.
Please refer to the updated v39 Patches here in [1]. See [1] for the
changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjLjb%2B98i5ZQUphivxdOZ3hSGLfq2SiWQetUvk8zGyAQwQ%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Thu, Oct 10, 2024 at 10:53 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are some comments for TAP test patch v37-0003.
>
> I’m not in favour of the removal of such a large number of
> 'combination' and other 'misc' tests. In the commit message, please
> delete me as a "co-author" of this patch.
>
> ======
>
> 1.
> Any description or comment that still mentions "all combinations" is
> no longer valid:
>
> (e.g. in the comment message)
> Add tests for all combinations of generated column replication.
>
> (e.g. in the test file)
> # The following test cases exercise logical replication for all combinations
> # where there is a generated column on one or both sides of pub/sub:
>
> and
>
> # Furthermore, all combinations are tested using:
>
> ======
> 2.
> +# --------------------------------------------------
> +# Testcase: generated -> normal
> +# Publisher table has generated column 'b'.
> +# Subscriber table has normal column 'b'.
> +# --------------------------------------------------
> +
>
> Now that COPY for generated columns is already implemented in patch
> 0001, shouldn't this test be using 'copy_data' enabled, so it can test
> replication both for initial tablesync as well as normal replication?
>
> That was the whole point of having the "# XXX copy_data=false for now.
> This will be changed later." reminder comment in this file.
>
> ======
>
> 3.
> Previously there were some misc tests to ensure that a generated
> column which was then altered using DROP EXPRESSION would work as
> expected. The test scenario was commented like:
>
> +# =============================================================================
> +# Misc test.
> +#
> +# A "normal -> generated" replication fails, reporting an error that the
> +# subscriber side column is missing.
> +#
> +# In this test case we use DROP EXPRESSION to change the subscriber generated
> +# column into a normal column, then verify replication works ok.
> +# =============================================================================
>
> Now in patch v37 this test no longer exists. Why?
>
> ======
> 4.
> +# =============================================================================
> +# The following test cases demonstrate behavior of generated column replication
> +# when publish_generated_colums=false/true:
> +#
> +# Test: column list includes gencols, when publish_generated_columns=false
> +# Test: column list does not include gencols, when
> publish_generated_columns=false
> +#
> +# Test: column list includes gencols, when publish_generated_columns=true
> +# Test: column list does not include gencols, when
> publish_generated_columns=true
> +# Test: no column list, when publish_generated_columns=true
> +# =============================================================================
>
> These tests are currently only testing the initial tablesync
> replication. Since the COPY logic is different from the normal
> replication logic, I think it would be better to test some normal
> replication records as well, to make sure both parts work
> consistently. This comment applies to all of the following test cases.
>
> ~~~
>
> 5.
> +# Create table and publications.
> +$node_publisher->safe_psql(
> + 'postgres', qq(
> + CREATE TABLE nogen_to_gen3 (a int, b int, gen1 int GENERATED ALWAYS
> AS (a * 2) STORED, gen2 int GENERATED ALWAYS AS (a * 2) STORED);
> + CREATE TABLE nogen_to_gen4 (c int, d int, gen1 int GENERATED ALWAYS
> AS (c * 2) STORED, gen2 int GENERATED ALWAYS AS (c * 2) STORED);
> + INSERT INTO nogen_to_gen3 VALUES (1, 1);
> + INSERT INTO nogen_to_gen4 VALUES (1, 1);
> + CREATE PUBLICATION pub1 FOR table nogen_to_gen3, nogen_to_gen4(gen1)
> WITH (publish_generated_columns=true);
> +));
> +
>
> 5a.
> The code should do only what the comments say it does. So, the INSERTS
> should be done separately after the CREATE PUBLICATION, but before the
> CREATE SUBSCRIPTION. A similar change should be made for all of these
> test cases.
>
> # Insert some initial data
> INSERT INTO nogen_to_gen3 VALUES (1, 1);
> INSERT INTO nogen_to_gen4 VALUES (1, 1);
>
> ~
>
> 5b.
> The tables are badly named. Why are they 'nogen_to_gen', when the
> publisher side has generated cols and the subscriber side does not?
> This problem seems repeated in multiple subsequent test cases.
>
> ~
>
> 6.
> +$result = $node_subscriber->safe_psql('postgres',
> + "SELECT * FROM gen_to_nogen ORDER BY a");
> +is($result, qq(1|1||2),
> + 'gen_to_nogen initial sync, when publish_generated_columns=false');
> +
> +$result = $node_subscriber->safe_psql('postgres',
> + "SELECT * FROM gen_to_nogen2 ORDER BY c");
> +is($result, qq(1|1||),
> + 'gen_to_nogen2 initial sync, when publish_generated_columns=false');
>
> IMO all the "result" queries like these ones ought to have to have a
> comment which explains the reason for the expected results. This
> review comment applies to multiple places. Please add comments to all
> of them.
>
> ~~~
>
> 7.
> +# --------------------------------------------------
> +# Testcase: Publisher replicates the column list data excluding generated
> +# columns even though publish_generated_columns option is false.
> +# --------------------------------------------------
> +
>
> 7a.
> This is the 2nd test case, but AFAICT it would be far easier to test
> this scenario just by making another table (with an appropriate column
> list) for the 1st test case.
>
> ~
>
> 7b.
> BTW, I don't understand this test at all. I thought according to the
> comment that it intended to use a publication column list with only
> normal columns in it. But that is not what the publication looks like
> here:
> + CREATE PUBLICATION pub1 FOR table nogen_to_gen, nogen_to_gen2(gen1)
> WITH (publish_generated_columns=false);
>
> Indeed, the way it is currently written I didn't see what this test is
> doing that is any different from the prior test (???)
>
> ~~~
>
> 8.
> +# --------------------------------------------------
> +# Testcase: Although publish_generated_columns is true, publisher publishes
> +# only the data of the columns specified in column list, skipping other
> +# generated/non-generated columns.
> +# --------------------------------------------------
>
> versus
>
> +# --------------------------------------------------
> +# Testcase: Publisher publishes only the data of the columns specified in
> +# column list skipping other generated/non-generated columns.
> +# --------------------------------------------------
>
> Again, I did not understand how these test cases differ from each
> other. Surely, those can be combined easily enough just by adding
> another table with a different kind of column list.
>
> ~~~
>
> 9.
> +# --------------------------------------------------
> +# Testcase: Publisher replicates all columns if publish_generated_columns is
> +# enabled and there is no column list
> +# --------------------------------------------------
> +
>
> Here is yet another test case that AFAICT can just be combined with
> other test cases that were using publish_generated_columns=true. It
> seems all you need is one extra table with no column list. You don't
> need all the extra create/drop pub/sub overheads to test this.
>
> ======

I have fixed all the comments and posted the v39 patches for them.
Please refer to the updated v39 Patches here in [1]. See [1] for the
changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjLjb%2B98i5ZQUphivxdOZ3hSGLfq2SiWQetUvk8zGyAQwQ%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Wed, 16 Oct 2024 at 23:25, Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Wed, Oct 9, 2024 at 9:08 AM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Tue, 8 Oct 2024 at 11:37, Shubham Khanna <khannashubham1197@gmail.com> wrote:
> > >
> > > On Fri, Oct 4, 2024 at 9:36 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > > >
> > > > Hi Shubham, here are my review comments for v36-0001.
> > > >
> > > > ======
> > > > 1. General  - merge patches
> > > >
> > > > It is long past due when patches 0001 and 0002 should've been merged.
> > > > AFAIK the split was only because historically these parts had
> > > > different authors. But, keeping them separated is not helpful anymore.
> > > >
> > > > ======
> > > > src/backend/catalog/pg_publication.c
> > > >
> > > > 2.
> > > >  Bitmapset *
> > > > -pub_collist_validate(Relation targetrel, List *columns)
> > > > +pub_collist_validate(Relation targetrel, List *columns, bool pubgencols)
> > > >
> > > > Since you removed the WARNING, this parameter 'pubgencols' is unused
> > > > so it should also be removed.
> > > >
> > > > ======
> > > > src/backend/replication/pgoutput/pgoutput.c
> > > >
> > > > 3.
> > > >   /*
> > > > - * If the publication is FOR ALL TABLES then it is treated the same as
> > > > - * if there are no column lists (even if other publications have a
> > > > - * list).
> > > > + * To handle cases where the publish_generated_columns option isn't
> > > > + * specified for all tables in a publication, we must create a column
> > > > + * list that excludes generated columns. So, the publisher will not
> > > > + * replicate the generated columns.
> > > >   */
> > > > - if (!pub->alltables)
> > > > + if (!(pub->alltables && pub->pubgencols))
> > > >
> > > > I still found that comment hard to understand. Does this mean to say
> > > > something like:
> > > >
> > > > ------
> > > > Process potential column lists for the following cases:
> > > >
> > > > a. Any publication that is not FOR ALL TABLES.
> > > >
> > > > b. When the publication is FOR ALL TABLES and
> > > > 'publish_generated_columns' is false.
> > > > A FOR ALL TABLES publication doesn't have user-defined column lists,
> > > > so all columns will be replicated by default. However, if
> > > > 'publish_generated_columns' is set to false, column lists must still
> > > > be created to exclude any generated columns from being published
> > > > ------
> > > >
> > > > ======
> > > > src/test/regress/sql/publication.sql
> > > >
> > > > 4.
> > > > +SET client_min_messages = 'WARNING';
> > > > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
> > > >
> > > > AFAIK you don't need to keep changing 'client_min_messages',
> > > > particularly now that you've removed the WARNING message that was
> > > > previously emitted.
> > > >
> > > > ~
> > > >
> > > > 5.
> > > > nit - minor comment changes.
> > > >
> > > > ======
> > > > Please refer to the attachment which implements any nits from above.
> > > >
> > >
> > > I have fixed all the given comments. Also, I have created a new 0003
> > > patch for the TAP-Tests related to the '011_generated.pl' file. I am
> > > planning to merge 0001 and 0003 patches once they will get fixed.
> > > The attached patches contain the required changes.
> >
> > Few comments:
> > 1) Since we are no longer throwing an error for generated columns, the
> > function header comments also need to be updated accordingly " Checks
> > for and raises an ERROR for any; unknown columns, system columns,
> > duplicate columns or generated columns."
> > -               if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
> > -                       ereport(ERROR,
> > -
> > errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> > -                                       errmsg("cannot use generated
> > column \"%s\" in publication column list",
> > -                                                  colname));
> > -
> >
> > 2) Tab completion missing for "PUBLISH_GENERATED_COLUMNS" option in
> > ALTER PUBLICATION ... SET (
> > postgres=# alter publication pub2 set (PUBLISH
> > PUBLISH                     PUBLISH_VIA_PARTITION_ROOT
> >
> > 3) I was able to compile without this include, may be this is not required:
> > --- a/src/backend/replication/logical/tablesync.c
> > +++ b/src/backend/replication/logical/tablesync.c
> > @@ -118,6 +118,7 @@
> >  #include "utils/builtins.h"
> >  #include "utils/lsyscache.h"
> >  #include "utils/memutils.h"
> > +#include "utils/rel.h"
> >
> > 4) You can include "\dRp+ pubname" after each of the create/alter
> > publication to verify the columns that will be published:
> > +-- Test the 'publish_generated_columns' parameter enabled or disabled for
> > +-- different scenarios with/without generated columns in column lists.
> > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
> > +
> > +-- Generated columns in column list, when 'publish_generated_columns'=false
> > +CREATE PUBLICATION pub1 FOR table gencols(a, gen1) WITH
> > (publish_generated_columns=false);
> >
> > +-- Generated columns in column list, when 'publish_generated_columns'=true
> > +CREATE PUBLICATION pub2 FOR table gencols(a, gen1) WITH
> > (publish_generated_columns=true);
> > +
> > +-- Generated columns in column list, then set
> > 'publication_generate_columns'=false
> > +ALTER PUBLICATION pub2 SET (publish_generated_columns = false);
> > +
> > +-- Remove generate columns from column list, when
> > 'publish_generated_columns'=false
> > +ALTER PUBLICATION pub2 SET TABLE gencols(a);
> > +
> > +-- Add generated columns in column list, when 'publish_generated_columns'=false
> > +ALTER PUBLICATION pub2 SET TABLE gencols(a, gen1);
> >
>
> I have fixed all the given comments. The attached patches contain the
> required changes.

Few comments:
1) This change is not required:
diff --git a/src/backend/catalog/pg_subscription.c
b/src/backend/catalog/pg_subscription.c
index 9efc9159f2..fcfbf86c0b 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -551,3 +551,34 @@ GetSubscriptionRelations(Oid subid, bool not_ready)

        return res;
 }
+
+/*
+ * Add publication names from the list to a string.
+ */
+void
+get_publications_str(List *publications, StringInfo dest, bool quote_literal)
+{
+       ListCell   *lc;
+       bool            first = true;
+
+       Assert(publications != NIL);
+
+       foreach(lc, publications)
+       {
+               char       *pubname = strVal(lfirst(lc));
+
+               if (first)
+                       first = false;
+               else
+                       appendStringInfoString(dest, ", ");
+
+               if (quote_literal)
+                       appendStringInfoString(dest,
quote_literal_cstr(pubname));
+               else
+               {
+                       appendStringInfoChar(dest, '"');
+                       appendStringInfoString(dest, pubname);
+                       appendStringInfoChar(dest, '"');
+               }
+       }
+}

It can be moved to subscriptioncmds.c file as earlier.

2) This line change is not required:
  *             Process and validate the 'columns' list and ensure the
columns are all
- *             valid to use for a publication.  Checks for and raises
an ERROR for
- *             any; unknown columns, system columns, duplicate
columns or generated
- *             columns.
+ *             valid to use for a publication. Checks for and raises
an ERROR for

3) Can we store this information in LogicalRepRelation instead of
having a local variable as column information is being stored, that
way remotegenlist and remotegenlist_res can be removed and code will
be more simpler:
+               if (server_version >= 180000)
+               {
+                       remotegenlist[natt] =
DatumGetBool(slot_getattr(slot, 5, &isnull));
+
+                       /*
+                        * If the column is generated and neither the
generated column
+                        * option is specified nor it appears in the
column list, we will
+                        * skip it.
+                        */
+                       if (remotegenlist[natt] &&
!has_pub_with_pubgencols && !included_cols)
+                       {
+                               ExecClearTuple(slot);
+                               continue;
+                       }
+               }
+
                rel_colname = TextDatumGetCString(slot_getattr(slot,
2, &isnull));
                Assert(!isnull);

@@ -1015,7 +1112,7 @@ fetch_remote_table_info(char *nspname, char *relname,
        ExecDropSingleTupleTableSlot(slot);

        lrel->natts = natt;
-
+       *remotegenlist_res = remotegenlist;

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Wed, 16 Oct 2024 at 23:25, Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Wed, Oct 9, 2024 at 9:08 AM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Tue, 8 Oct 2024 at 11:37, Shubham Khanna <khannashubham1197@gmail.com> wrote:
> > >
> > > On Fri, Oct 4, 2024 at 9:36 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > > >
> > > > Hi Shubham, here are my review comments for v36-0001.
> > > >
> > > > ======
> > > > 1. General  - merge patches
> > > >
> > > > It is long past due when patches 0001 and 0002 should've been merged.
> > > > AFAIK the split was only because historically these parts had
> > > > different authors. But, keeping them separated is not helpful anymore.
> > > >
> > > > ======
> > > > src/backend/catalog/pg_publication.c
> > > >
> > > > 2.
> > > >  Bitmapset *
> > > > -pub_collist_validate(Relation targetrel, List *columns)
> > > > +pub_collist_validate(Relation targetrel, List *columns, bool pubgencols)
> > > >
> > > > Since you removed the WARNING, this parameter 'pubgencols' is unused
> > > > so it should also be removed.
> > > >
> > > > ======
> > > > src/backend/replication/pgoutput/pgoutput.c
> > > >
> > > > 3.
> > > >   /*
> > > > - * If the publication is FOR ALL TABLES then it is treated the same as
> > > > - * if there are no column lists (even if other publications have a
> > > > - * list).
> > > > + * To handle cases where the publish_generated_columns option isn't
> > > > + * specified for all tables in a publication, we must create a column
> > > > + * list that excludes generated columns. So, the publisher will not
> > > > + * replicate the generated columns.
> > > >   */
> > > > - if (!pub->alltables)
> > > > + if (!(pub->alltables && pub->pubgencols))
> > > >
> > > > I still found that comment hard to understand. Does this mean to say
> > > > something like:
> > > >
> > > > ------
> > > > Process potential column lists for the following cases:
> > > >
> > > > a. Any publication that is not FOR ALL TABLES.
> > > >
> > > > b. When the publication is FOR ALL TABLES and
> > > > 'publish_generated_columns' is false.
> > > > A FOR ALL TABLES publication doesn't have user-defined column lists,
> > > > so all columns will be replicated by default. However, if
> > > > 'publish_generated_columns' is set to false, column lists must still
> > > > be created to exclude any generated columns from being published
> > > > ------
> > > >
> > > > ======
> > > > src/test/regress/sql/publication.sql
> > > >
> > > > 4.
> > > > +SET client_min_messages = 'WARNING';
> > > > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
> > > >
> > > > AFAIK you don't need to keep changing 'client_min_messages',
> > > > particularly now that you've removed the WARNING message that was
> > > > previously emitted.
> > > >
> > > > ~
> > > >
> > > > 5.
> > > > nit - minor comment changes.
> > > >
> > > > ======
> > > > Please refer to the attachment which implements any nits from above.
> > > >
> > >
> > > I have fixed all the given comments. Also, I have created a new 0003
> > > patch for the TAP-Tests related to the '011_generated.pl' file. I am
> > > planning to merge 0001 and 0003 patches once they will get fixed.
> > > The attached patches contain the required changes.
> >
> > Few comments:
> > 1) Since we are no longer throwing an error for generated columns, the
> > function header comments also need to be updated accordingly " Checks
> > for and raises an ERROR for any; unknown columns, system columns,
> > duplicate columns or generated columns."
> > -               if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
> > -                       ereport(ERROR,
> > -
> > errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> > -                                       errmsg("cannot use generated
> > column \"%s\" in publication column list",
> > -                                                  colname));
> > -
> >
> > 2) Tab completion missing for "PUBLISH_GENERATED_COLUMNS" option in
> > ALTER PUBLICATION ... SET (
> > postgres=# alter publication pub2 set (PUBLISH
> > PUBLISH                     PUBLISH_VIA_PARTITION_ROOT
> >
> > 3) I was able to compile without this include, may be this is not required:
> > --- a/src/backend/replication/logical/tablesync.c
> > +++ b/src/backend/replication/logical/tablesync.c
> > @@ -118,6 +118,7 @@
> >  #include "utils/builtins.h"
> >  #include "utils/lsyscache.h"
> >  #include "utils/memutils.h"
> > +#include "utils/rel.h"
> >
> > 4) You can include "\dRp+ pubname" after each of the create/alter
> > publication to verify the columns that will be published:
> > +-- Test the 'publish_generated_columns' parameter enabled or disabled for
> > +-- different scenarios with/without generated columns in column lists.
> > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
> > +
> > +-- Generated columns in column list, when 'publish_generated_columns'=false
> > +CREATE PUBLICATION pub1 FOR table gencols(a, gen1) WITH
> > (publish_generated_columns=false);
> >
> > +-- Generated columns in column list, when 'publish_generated_columns'=true
> > +CREATE PUBLICATION pub2 FOR table gencols(a, gen1) WITH
> > (publish_generated_columns=true);
> > +
> > +-- Generated columns in column list, then set
> > 'publication_generate_columns'=false
> > +ALTER PUBLICATION pub2 SET (publish_generated_columns = false);
> > +
> > +-- Remove generate columns from column list, when
> > 'publish_generated_columns'=false
> > +ALTER PUBLICATION pub2 SET TABLE gencols(a);
> > +
> > +-- Add generated columns in column list, when 'publish_generated_columns'=false
> > +ALTER PUBLICATION pub2 SET TABLE gencols(a, gen1);
> >
>
> I have fixed all the given comments. The attached patches contain the
> required changes.

Few comments:
1) File mode change is not required:
 src/test/subscription/t/011_generated.pl | 354 +++++++++++++++++++++++
 1 file changed, 354 insertions(+)
 mode change 100644 => 100755 src/test/subscription/t/011_generated.pl

diff --git a/src/test/subscription/t/011_generated.pl
b/src/test/subscription/t/011_generated.pl
old mode 100644
new mode 100755
index 8b2e5f4708..d1f2718078
--- a/src/test/subscription/t/011_generated.pl
+++ b/src/test/subscription/t/011_generated.pl

2) Here copy_data=true looks obvious no need to mention again and
again in comments:
+# Create table and subscription with copy_data=true.
+$node_subscriber->safe_psql(
+       'postgres', qq(
+       CREATE TABLE tab_gen_to_nogen (a int, b int);
+       CREATE SUBSCRIPTION regress_sub1_gen_to_nogen CONNECTION
'$publisher_connstr' PUBLICATION regress_pub1_gen_to_nogen WITH
(copy_data = true);
+));
+
+# Create table and subscription with copy_data=true.
+$node_subscriber->safe_psql(
+       'test_pgc_true', qq(
+       CREATE TABLE tab_gen_to_nogen (a int, b int);
+       CREATE SUBSCRIPTION regress_sub2_gen_to_nogen CONNECTION
'$publisher_connstr' PUBLICATION regress_pub2_gen_to_nogen WITH
(copy_data = true);
+));
+
+# Wait for initial sync.
+$node_subscriber->wait_for_subscription_sync($node_publisher,
+       'regress_sub1_gen_to_nogen', 'postgres');
+$node_subscriber->wait_for_subscription_sync($node_publisher,
+       'regress_sub2_gen_to_nogen', 'test_pgc_true');
+
+# Initial sync test when publish_generated_columns=false and copy_data=true.
+# Verify that column 'b' is not replicated.
+$result = $node_subscriber->safe_psql('postgres',
+       "SELECT a, b FROM tab_gen_to_nogen");
+is( $result, qq(1|
+2|
+3|), 'tab_gen_to_nogen initial sync, when publish_generated_columns=false');
+
+# Initial sync test when publish_generated_columns=true and copy_data=true.
+$result = $node_subscriber->safe_psql('test_pgc_true',
+       "SELECT a, b FROM tab_gen_to_nogen");
+is( $result, qq(1|2
+2|4
+3|6),
+       'tab_gen_to_nogen initial sync, when publish_generated_columns=true');

3) The database test_pgc_true and also can be cleaned as it is not
required after this:
+# cleanup
+$node_subscriber->safe_psql('postgres',
+       "DROP SUBSCRIPTION regress_sub1_gen_to_nogen");
+$node_subscriber->safe_psql('test_pgc_true',
+       "DROP SUBSCRIPTION regress_sub2_gen_to_nogen");
+$node_publisher->safe_psql(
+       'postgres', qq(
+       DROP PUBLICATION regress_pub1_gen_to_nogen;
+       DROP PUBLICATION regress_pub2_gen_to_nogen;
+));

4) There is no error message verification in this test, let's add the
error verification:
+# =============================================================================
+# Misc test.
+#
+# A "normal -> generated" replication fails, reporting an error that the
+# subscriber side column is missing.
+#
+# In this test case we use DROP EXPRESSION to change the subscriber generated
+# column into a normal column, then verify replication works ok.
+# =============================================================================

5)
5.a) If possible have one regular column and one generated column in the tables
+# --------------------------------------------------
+# Testcase: Publisher replicates the column list data including generated
+# columns even though publish_generated_columns option is false.
+# --------------------------------------------------
+
+# Create table and publications.
+$node_publisher->safe_psql(
+       'postgres', qq(
+       CREATE TABLE gen_to_nogen (a int, b int, gen1 int GENERATED
ALWAYS AS (a * 2) STORED, gen2 int GENERATED ALWAYS AS (a * 2)
STORED);
+       CREATE TABLE gen_to_nogen2 (c int, d int, gen1 int GENERATED
ALWAYS AS (c * 2) STORED, gen2 int GENERATED ALWAYS AS (c * 2)
STORED);
+       CREATE TABLE nogen_to_gen2 (c int, d int, gen1 int GENERATED
ALWAYS AS (c * 2) STORED, gen2 int GENERATED ALWAYS AS (c * 2)
STORED);
+       CREATE PUBLICATION pub1 FOR table gen_to_nogen(a, b, gen2),
gen_to_nogen2, nogen_to_gen2(gen1) WITH
(publish_generated_columns=false);
+));

5.b) Try to have same columns in all the tables

6) These are inserting two records:
+# Insert data to verify incremental replication
+$node_publisher->safe_psql(
+       'postgres', qq(
+       INSERT INTO gen_to_nogen VALUES (2), (3);
+       INSERT INTO gen_to_nogen2 VALUES (2), (3);
+       INSERT INTO nogen_to_gen2 VALUES (2), (3);
+));

I felt you wanted this to be:
+# Insert data to verify incremental replication
+$node_publisher->safe_psql(
+       'postgres', qq(
+       INSERT INTO gen_to_nogen VALUES (2, 3);
+       INSERT INTO gen_to_nogen2 VALUES (2, 3);
+       INSERT INTO nogen_to_gen2 VALUES (2, 3);
+));

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Thu, Oct 17, 2024 at 3:59 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, 16 Oct 2024 at 23:25, Shubham Khanna
> <khannashubham1197@gmail.com> wrote:
> >
> > On Wed, Oct 9, 2024 at 9:08 AM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > On Tue, 8 Oct 2024 at 11:37, Shubham Khanna <khannashubham1197@gmail.com> wrote:
> > > >
> > > > On Fri, Oct 4, 2024 at 9:36 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > > > >
> > > > > Hi Shubham, here are my review comments for v36-0001.
> > > > >
> > > > > ======
> > > > > 1. General  - merge patches
> > > > >
> > > > > It is long past due when patches 0001 and 0002 should've been merged.
> > > > > AFAIK the split was only because historically these parts had
> > > > > different authors. But, keeping them separated is not helpful anymore.
> > > > >
> > > > > ======
> > > > > src/backend/catalog/pg_publication.c
> > > > >
> > > > > 2.
> > > > >  Bitmapset *
> > > > > -pub_collist_validate(Relation targetrel, List *columns)
> > > > > +pub_collist_validate(Relation targetrel, List *columns, bool pubgencols)
> > > > >
> > > > > Since you removed the WARNING, this parameter 'pubgencols' is unused
> > > > > so it should also be removed.
> > > > >
> > > > > ======
> > > > > src/backend/replication/pgoutput/pgoutput.c
> > > > >
> > > > > 3.
> > > > >   /*
> > > > > - * If the publication is FOR ALL TABLES then it is treated the same as
> > > > > - * if there are no column lists (even if other publications have a
> > > > > - * list).
> > > > > + * To handle cases where the publish_generated_columns option isn't
> > > > > + * specified for all tables in a publication, we must create a column
> > > > > + * list that excludes generated columns. So, the publisher will not
> > > > > + * replicate the generated columns.
> > > > >   */
> > > > > - if (!pub->alltables)
> > > > > + if (!(pub->alltables && pub->pubgencols))
> > > > >
> > > > > I still found that comment hard to understand. Does this mean to say
> > > > > something like:
> > > > >
> > > > > ------
> > > > > Process potential column lists for the following cases:
> > > > >
> > > > > a. Any publication that is not FOR ALL TABLES.
> > > > >
> > > > > b. When the publication is FOR ALL TABLES and
> > > > > 'publish_generated_columns' is false.
> > > > > A FOR ALL TABLES publication doesn't have user-defined column lists,
> > > > > so all columns will be replicated by default. However, if
> > > > > 'publish_generated_columns' is set to false, column lists must still
> > > > > be created to exclude any generated columns from being published
> > > > > ------
> > > > >
> > > > > ======
> > > > > src/test/regress/sql/publication.sql
> > > > >
> > > > > 4.
> > > > > +SET client_min_messages = 'WARNING';
> > > > > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
> > > > >
> > > > > AFAIK you don't need to keep changing 'client_min_messages',
> > > > > particularly now that you've removed the WARNING message that was
> > > > > previously emitted.
> > > > >
> > > > > ~
> > > > >
> > > > > 5.
> > > > > nit - minor comment changes.
> > > > >
> > > > > ======
> > > > > Please refer to the attachment which implements any nits from above.
> > > > >
> > > >
> > > > I have fixed all the given comments. Also, I have created a new 0003
> > > > patch for the TAP-Tests related to the '011_generated.pl' file. I am
> > > > planning to merge 0001 and 0003 patches once they will get fixed.
> > > > The attached patches contain the required changes.
> > >
> > > Few comments:
> > > 1) Since we are no longer throwing an error for generated columns, the
> > > function header comments also need to be updated accordingly " Checks
> > > for and raises an ERROR for any; unknown columns, system columns,
> > > duplicate columns or generated columns."
> > > -               if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
> > > -                       ereport(ERROR,
> > > -
> > > errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> > > -                                       errmsg("cannot use generated
> > > column \"%s\" in publication column list",
> > > -                                                  colname));
> > > -
> > >
> > > 2) Tab completion missing for "PUBLISH_GENERATED_COLUMNS" option in
> > > ALTER PUBLICATION ... SET (
> > > postgres=# alter publication pub2 set (PUBLISH
> > > PUBLISH                     PUBLISH_VIA_PARTITION_ROOT
> > >
> > > 3) I was able to compile without this include, may be this is not required:
> > > --- a/src/backend/replication/logical/tablesync.c
> > > +++ b/src/backend/replication/logical/tablesync.c
> > > @@ -118,6 +118,7 @@
> > >  #include "utils/builtins.h"
> > >  #include "utils/lsyscache.h"
> > >  #include "utils/memutils.h"
> > > +#include "utils/rel.h"
> > >
> > > 4) You can include "\dRp+ pubname" after each of the create/alter
> > > publication to verify the columns that will be published:
> > > +-- Test the 'publish_generated_columns' parameter enabled or disabled for
> > > +-- different scenarios with/without generated columns in column lists.
> > > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
> > > +
> > > +-- Generated columns in column list, when 'publish_generated_columns'=false
> > > +CREATE PUBLICATION pub1 FOR table gencols(a, gen1) WITH
> > > (publish_generated_columns=false);
> > >
> > > +-- Generated columns in column list, when 'publish_generated_columns'=true
> > > +CREATE PUBLICATION pub2 FOR table gencols(a, gen1) WITH
> > > (publish_generated_columns=true);
> > > +
> > > +-- Generated columns in column list, then set
> > > 'publication_generate_columns'=false
> > > +ALTER PUBLICATION pub2 SET (publish_generated_columns = false);
> > > +
> > > +-- Remove generate columns from column list, when
> > > 'publish_generated_columns'=false
> > > +ALTER PUBLICATION pub2 SET TABLE gencols(a);
> > > +
> > > +-- Add generated columns in column list, when 'publish_generated_columns'=false
> > > +ALTER PUBLICATION pub2 SET TABLE gencols(a, gen1);
> > >
> >
> > I have fixed all the given comments. The attached patches contain the
> > required changes.
>
> Few comments:
> 1) File mode change is not required:
>  src/test/subscription/t/011_generated.pl | 354 +++++++++++++++++++++++
>  1 file changed, 354 insertions(+)
>  mode change 100644 => 100755 src/test/subscription/t/011_generated.pl
>
> diff --git a/src/test/subscription/t/011_generated.pl
> b/src/test/subscription/t/011_generated.pl
> old mode 100644
> new mode 100755
> index 8b2e5f4708..d1f2718078
> --- a/src/test/subscription/t/011_generated.pl
> +++ b/src/test/subscription/t/011_generated.pl
>
> 2) Here copy_data=true looks obvious no need to mention again and
> again in comments:
> +# Create table and subscription with copy_data=true.
> +$node_subscriber->safe_psql(
> +       'postgres', qq(
> +       CREATE TABLE tab_gen_to_nogen (a int, b int);
> +       CREATE SUBSCRIPTION regress_sub1_gen_to_nogen CONNECTION
> '$publisher_connstr' PUBLICATION regress_pub1_gen_to_nogen WITH
> (copy_data = true);
> +));
> +
> +# Create table and subscription with copy_data=true.
> +$node_subscriber->safe_psql(
> +       'test_pgc_true', qq(
> +       CREATE TABLE tab_gen_to_nogen (a int, b int);
> +       CREATE SUBSCRIPTION regress_sub2_gen_to_nogen CONNECTION
> '$publisher_connstr' PUBLICATION regress_pub2_gen_to_nogen WITH
> (copy_data = true);
> +));
> +
> +# Wait for initial sync.
> +$node_subscriber->wait_for_subscription_sync($node_publisher,
> +       'regress_sub1_gen_to_nogen', 'postgres');
> +$node_subscriber->wait_for_subscription_sync($node_publisher,
> +       'regress_sub2_gen_to_nogen', 'test_pgc_true');
> +
> +# Initial sync test when publish_generated_columns=false and copy_data=true.
> +# Verify that column 'b' is not replicated.
> +$result = $node_subscriber->safe_psql('postgres',
> +       "SELECT a, b FROM tab_gen_to_nogen");
> +is( $result, qq(1|
> +2|
> +3|), 'tab_gen_to_nogen initial sync, when publish_generated_columns=false');
> +
> +# Initial sync test when publish_generated_columns=true and copy_data=true.
> +$result = $node_subscriber->safe_psql('test_pgc_true',
> +       "SELECT a, b FROM tab_gen_to_nogen");
> +is( $result, qq(1|2
> +2|4
> +3|6),
> +       'tab_gen_to_nogen initial sync, when publish_generated_columns=true');
>
> 3) The database test_pgc_true and also can be cleaned as it is not
> required after this:
> +# cleanup
> +$node_subscriber->safe_psql('postgres',
> +       "DROP SUBSCRIPTION regress_sub1_gen_to_nogen");
> +$node_subscriber->safe_psql('test_pgc_true',
> +       "DROP SUBSCRIPTION regress_sub2_gen_to_nogen");
> +$node_publisher->safe_psql(
> +       'postgres', qq(
> +       DROP PUBLICATION regress_pub1_gen_to_nogen;
> +       DROP PUBLICATION regress_pub2_gen_to_nogen;
> +));
>
> 4) There is no error message verification in this test, let's add the
> error verification:
> +# =============================================================================
> +# Misc test.
> +#
> +# A "normal -> generated" replication fails, reporting an error that the
> +# subscriber side column is missing.
> +#
> +# In this test case we use DROP EXPRESSION to change the subscriber generated
> +# column into a normal column, then verify replication works ok.
> +# =============================================================================
>
> 5)
> 5.a) If possible have one regular column and one generated column in the tables
> +# --------------------------------------------------
> +# Testcase: Publisher replicates the column list data including generated
> +# columns even though publish_generated_columns option is false.
> +# --------------------------------------------------
> +
> +# Create table and publications.
> +$node_publisher->safe_psql(
> +       'postgres', qq(
> +       CREATE TABLE gen_to_nogen (a int, b int, gen1 int GENERATED
> ALWAYS AS (a * 2) STORED, gen2 int GENERATED ALWAYS AS (a * 2)
> STORED);
> +       CREATE TABLE gen_to_nogen2 (c int, d int, gen1 int GENERATED
> ALWAYS AS (c * 2) STORED, gen2 int GENERATED ALWAYS AS (c * 2)
> STORED);
> +       CREATE TABLE nogen_to_gen2 (c int, d int, gen1 int GENERATED
> ALWAYS AS (c * 2) STORED, gen2 int GENERATED ALWAYS AS (c * 2)
> STORED);
> +       CREATE PUBLICATION pub1 FOR table gen_to_nogen(a, b, gen2),
> gen_to_nogen2, nogen_to_gen2(gen1) WITH
> (publish_generated_columns=false);
> +));
>
> 5.b) Try to have same columns in all the tables
>
> 6) These are inserting two records:
> +# Insert data to verify incremental replication
> +$node_publisher->safe_psql(
> +       'postgres', qq(
> +       INSERT INTO gen_to_nogen VALUES (2), (3);
> +       INSERT INTO gen_to_nogen2 VALUES (2), (3);
> +       INSERT INTO nogen_to_gen2 VALUES (2), (3);
> +));
>
> I felt you wanted this to be:
> +# Insert data to verify incremental replication
> +$node_publisher->safe_psql(
> +       'postgres', qq(
> +       INSERT INTO gen_to_nogen VALUES (2, 3);
> +       INSERT INTO gen_to_nogen2 VALUES (2, 3);
> +       INSERT INTO nogen_to_gen2 VALUES (2, 3);
> +));

I have fixed all the comments and posted the v40 patches for them.
Please refer to the updated v40 Patches here in [1]. See [1] for the
changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjLviXAWtB3Kcn1A1jPpqORpkNay1y2U%2B55K64sqwCdrGw%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Fri, 18 Oct 2024 at 17:42, Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
>
> I have fixed all the given comments. The attached v40-0001 patch
> contains the required changes.

1) The recent patch removed the function header comment where
generated column is specified, that change is required:
@@ -511,7 +511,6 @@ pub_collist_validate(Relation targetrel, List *columns)
 {
        Bitmapset  *set = NULL;
        ListCell   *lc;
-       TupleDesc       tupdesc = RelationGetDescr(targetrel);

        foreach(lc, columns)
        {
@@ -530,12 +529,6 @@ pub_collist_validate(Relation targetrel, List *columns)
                                        errmsg("cannot use system
column \"%s\" in publication column list",
                                                   colname));

-               if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
-                       ereport(ERROR,
-
errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
-                                       errmsg("cannot use generated
column \"%s\" in publication column list",
-                                                  colname));
-

2) This change is no more required as get_publications_str changes are
removed now:
diff --git a/src/include/catalog/pg_subscription.h
b/src/include/catalog/pg_subscription.h
index 0aa14ec4a2..6657186317 100644
--- a/src/include/catalog/pg_subscription.h
+++ b/src/include/catalog/pg_subscription.h
@@ -20,6 +20,7 @@
 #include "access/xlogdefs.h"
 #include "catalog/genbki.h"
 #include "catalog/pg_subscription_d.h"
+#include "lib/stringinfo.h"

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Fri, 18 Oct 2024 at 17:42, Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Thu, Oct 17, 2024 at 12:58 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Wed, 16 Oct 2024 at 23:25, Shubham Khanna
> > <khannashubham1197@gmail.com> wrote:
> > >
> > > On Wed, Oct 9, 2024 at 9:08 AM vignesh C <vignesh21@gmail.com> wrote:
> > > >
> > > > On Tue, 8 Oct 2024 at 11:37, Shubham Khanna <khannashubham1197@gmail.com> wrote:
> > > > >
> > > > > On Fri, Oct 4, 2024 at 9:36 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > > > > >
> > > > > > Hi Shubham, here are my review comments for v36-0001.
> > > > > >
> > > > > > ======
> > > > > > 1. General  - merge patches
> > > > > >
> > > > > > It is long past due when patches 0001 and 0002 should've been merged.
> > > > > > AFAIK the split was only because historically these parts had
> > > > > > different authors. But, keeping them separated is not helpful anymore.
> > > > > >
> > > > > > ======
> > > > > > src/backend/catalog/pg_publication.c
> > > > > >
> > > > > > 2.
> > > > > >  Bitmapset *
> > > > > > -pub_collist_validate(Relation targetrel, List *columns)
> > > > > > +pub_collist_validate(Relation targetrel, List *columns, bool pubgencols)
> > > > > >
> > > > > > Since you removed the WARNING, this parameter 'pubgencols' is unused
> > > > > > so it should also be removed.
> > > > > >
> > > > > > ======
> > > > > > src/backend/replication/pgoutput/pgoutput.c
> > > > > >
> > > > > > 3.
> > > > > >   /*
> > > > > > - * If the publication is FOR ALL TABLES then it is treated the same as
> > > > > > - * if there are no column lists (even if other publications have a
> > > > > > - * list).
> > > > > > + * To handle cases where the publish_generated_columns option isn't
> > > > > > + * specified for all tables in a publication, we must create a column
> > > > > > + * list that excludes generated columns. So, the publisher will not
> > > > > > + * replicate the generated columns.
> > > > > >   */
> > > > > > - if (!pub->alltables)
> > > > > > + if (!(pub->alltables && pub->pubgencols))
> > > > > >
> > > > > > I still found that comment hard to understand. Does this mean to say
> > > > > > something like:
> > > > > >
> > > > > > ------
> > > > > > Process potential column lists for the following cases:
> > > > > >
> > > > > > a. Any publication that is not FOR ALL TABLES.
> > > > > >
> > > > > > b. When the publication is FOR ALL TABLES and
> > > > > > 'publish_generated_columns' is false.
> > > > > > A FOR ALL TABLES publication doesn't have user-defined column lists,
> > > > > > so all columns will be replicated by default. However, if
> > > > > > 'publish_generated_columns' is set to false, column lists must still
> > > > > > be created to exclude any generated columns from being published
> > > > > > ------
> > > > > >
> > > > > > ======
> > > > > > src/test/regress/sql/publication.sql
> > > > > >
> > > > > > 4.
> > > > > > +SET client_min_messages = 'WARNING';
> > > > > > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
> > > > > >
> > > > > > AFAIK you don't need to keep changing 'client_min_messages',
> > > > > > particularly now that you've removed the WARNING message that was
> > > > > > previously emitted.
> > > > > >
> > > > > > ~
> > > > > >
> > > > > > 5.
> > > > > > nit - minor comment changes.
> > > > > >
> > > > > > ======
> > > > > > Please refer to the attachment which implements any nits from above.
> > > > > >
> > > > >
> > > > > I have fixed all the given comments. Also, I have created a new 0003
> > > > > patch for the TAP-Tests related to the '011_generated.pl' file. I am
> > > > > planning to merge 0001 and 0003 patches once they will get fixed.
> > > > > The attached patches contain the required changes.
> > > >
> > > > Few comments:
> > > > 1) Since we are no longer throwing an error for generated columns, the
> > > > function header comments also need to be updated accordingly " Checks
> > > > for and raises an ERROR for any; unknown columns, system columns,
> > > > duplicate columns or generated columns."
> > > > -               if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
> > > > -                       ereport(ERROR,
> > > > -
> > > > errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> > > > -                                       errmsg("cannot use generated
> > > > column \"%s\" in publication column list",
> > > > -                                                  colname));
> > > > -
> > > >
> > > > 2) Tab completion missing for "PUBLISH_GENERATED_COLUMNS" option in
> > > > ALTER PUBLICATION ... SET (
> > > > postgres=# alter publication pub2 set (PUBLISH
> > > > PUBLISH                     PUBLISH_VIA_PARTITION_ROOT
> > > >
> > > > 3) I was able to compile without this include, may be this is not required:
> > > > --- a/src/backend/replication/logical/tablesync.c
> > > > +++ b/src/backend/replication/logical/tablesync.c
> > > > @@ -118,6 +118,7 @@
> > > >  #include "utils/builtins.h"
> > > >  #include "utils/lsyscache.h"
> > > >  #include "utils/memutils.h"
> > > > +#include "utils/rel.h"
> > > >
> > > > 4) You can include "\dRp+ pubname" after each of the create/alter
> > > > publication to verify the columns that will be published:
> > > > +-- Test the 'publish_generated_columns' parameter enabled or disabled for
> > > > +-- different scenarios with/without generated columns in column lists.
> > > > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
> > > > +
> > > > +-- Generated columns in column list, when 'publish_generated_columns'=false
> > > > +CREATE PUBLICATION pub1 FOR table gencols(a, gen1) WITH
> > > > (publish_generated_columns=false);
> > > >
> > > > +-- Generated columns in column list, when 'publish_generated_columns'=true
> > > > +CREATE PUBLICATION pub2 FOR table gencols(a, gen1) WITH
> > > > (publish_generated_columns=true);
> > > > +
> > > > +-- Generated columns in column list, then set
> > > > 'publication_generate_columns'=false
> > > > +ALTER PUBLICATION pub2 SET (publish_generated_columns = false);
> > > > +
> > > > +-- Remove generate columns from column list, when
> > > > 'publish_generated_columns'=false
> > > > +ALTER PUBLICATION pub2 SET TABLE gencols(a);
> > > > +
> > > > +-- Add generated columns in column list, when 'publish_generated_columns'=false
> > > > +ALTER PUBLICATION pub2 SET TABLE gencols(a, gen1);
> > > >
> > >
> > > I have fixed all the given comments. The attached patches contain the
> > > required changes.
> >
> > Few comments:
> > 1) This change is not required:
> > diff --git a/src/backend/catalog/pg_subscription.c
> > b/src/backend/catalog/pg_subscription.c
> > index 9efc9159f2..fcfbf86c0b 100644
> > --- a/src/backend/catalog/pg_subscription.c
> > +++ b/src/backend/catalog/pg_subscription.c
> > @@ -551,3 +551,34 @@ GetSubscriptionRelations(Oid subid, bool not_ready)
> >
> >         return res;
> >  }
> > +
> > +/*
> > + * Add publication names from the list to a string.
> > + */
> > +void
> > +get_publications_str(List *publications, StringInfo dest, bool quote_literal)
> > +{
> > +       ListCell   *lc;
> > +       bool            first = true;
> > +
> > +       Assert(publications != NIL);
> > +
> > +       foreach(lc, publications)
> > +       {
> > +               char       *pubname = strVal(lfirst(lc));
> > +
> > +               if (first)
> > +                       first = false;
> > +               else
> > +                       appendStringInfoString(dest, ", ");
> > +
> > +               if (quote_literal)
> > +                       appendStringInfoString(dest,
> > quote_literal_cstr(pubname));
> > +               else
> > +               {
> > +                       appendStringInfoChar(dest, '"');
> > +                       appendStringInfoString(dest, pubname);
> > +                       appendStringInfoChar(dest, '"');
> > +               }
> > +       }
> > +}
> >
> > It can be moved to subscriptioncmds.c file as earlier.
> >
> > 2) This line change is not required:
> >   *             Process and validate the 'columns' list and ensure the
> > columns are all
> > - *             valid to use for a publication.  Checks for and raises
> > an ERROR for
> > - *             any; unknown columns, system columns, duplicate
> > columns or generated
> > - *             columns.
> > + *             valid to use for a publication. Checks for and raises
> > an ERROR for
> >
> > 3) Can we store this information in LogicalRepRelation instead of
> > having a local variable as column information is being stored, that
> > way remotegenlist and remotegenlist_res can be removed and code will
> > be more simpler:
> > +               if (server_version >= 180000)
> > +               {
> > +                       remotegenlist[natt] =
> > DatumGetBool(slot_getattr(slot, 5, &isnull));
> > +
> > +                       /*
> > +                        * If the column is generated and neither the
> > generated column
> > +                        * option is specified nor it appears in the
> > column list, we will
> > +                        * skip it.
> > +                        */
> > +                       if (remotegenlist[natt] &&
> > !has_pub_with_pubgencols && !included_cols)
> > +                       {
> > +                               ExecClearTuple(slot);
> > +                               continue;
> > +                       }
> > +               }
> > +
> >                 rel_colname = TextDatumGetCString(slot_getattr(slot,
> > 2, &isnull));
> >                 Assert(!isnull);
> >
> > @@ -1015,7 +1112,7 @@ fetch_remote_table_info(char *nspname, char *relname,
> >         ExecDropSingleTupleTableSlot(slot);
> >
> >         lrel->natts = natt;
> > -
> > +       *remotegenlist_res = remotegenlist;
>
> I have fixed all the given comments. The attached v40-0001 patch
> contains the required changes.

Few comments:
1) Add a test case to ensure that an error is properly raised for the
issue reported by Swada-san in [1]:
create table t (a int not null, b int generated always as (a + 1)
stored not null);
create unique index t_idx on t (b);
alter table t replica identity using index t_idx;
insert into t values (1);
update t set a = 100 where a = 1;

2) The existing comments only reference the column list, so we should
revise them to include an important point about generated columns. If
generated columns are set to false, it may result in some columns not
being replicated:
+               if (!isnull)
+               {
+                       /* With REPLICA IDENTITY FULL, no column list
is allowed. */
+                       if (relation->rd_rel->relreplident ==
REPLICA_IDENTITY_FULL)
+                               result = true;
+
+                       /* Transform the column list datum to a bitmapset. */
+                       columns = pub_collist_to_bitmapset(NULL, datum, NULL);
+               }
+               else
+               {
+                       TupleDesc       desc = RelationGetDescr(relation);
+                       int                     nliveatts = 0;
+
+                       for (int i = 0; i < desc->natts; i++)
+                       {
+                               Form_pg_attribute att = TupleDescAttr(desc, i);
+
+                               /* Skip if the attribute is dropped or
generated */
+                               if (att->attisdropped)
+                                       continue;
+
+                               nliveatts++;
+
+                               if (att->attgenerated)
+                                       continue;
+
+                               columns = bms_add_member(columns, i + 1);
+                       }

3) Now that we are sending generated columns from the publisher, the
comment in the tuples_equal function is no longer accurate. We need to
update the comment to reflect the new behavior of the function.
Specifically, it should clarify how generated columns are considered
in the equality check:
/*
 * Compare the tuples in the slots by checking if they have equal values.
 */
static bool
tuples_equal(TupleTableSlot *slot1, TupleTableSlot *slot2,
TypeCacheEntry **eq)
{
....

/*
* Ignore dropped and generated columns as the publisher doesn't send
* those
*/
if (att->attisdropped || att->attgenerated)
continue;

4) This change is not required:
@@ -1015,7 +1110,6 @@ fetch_remote_table_info(char *nspname, char *relname,
        ExecDropSingleTupleTableSlot(slot);

        lrel->natts = natt;
-
        walrcv_clear_result(res);

[1] - https://www.postgresql.org/message-id/CAD21AoB%3DDBVDNCGBja%2BsDa2-w9tsM7_E%3DZgyw2qYMR1R0FwDsg%40mail.gmail.com

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Fri, Oct 18, 2024 at 5:42 PM Shubham Khanna
<khannashubham1197@gmail.com> wrote:
> > >
> > > I have fixed all the given comments. The attached patches contain the
> > > required changes.

Review comments:
===============
1.
>
B. when generated columns are not published

* Publisher not-generated column => subscriber not-generated column:
  This is just normal logical replication (not changed by this patch).

* Publisher not-generated column => subscriber generated column:
  This will give ERROR.
>

Is the second behavior introduced by the patch? If so, why?

2.
@@ -1213,7 +1207,10 @@ pg_get_publication_tables(PG_FUNCTION_ARGS)
{
...
- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
+ continue;
+
+ if (att->attgenerated && !pub->pubgencols)
  continue;

It is better to combine the above conditions and write a comment on it.

3.
@@ -368,18 +379,50 @@ pub_collist_contains_invalid_column(Oid pubid,
Relation relation, List *ancestor
  Anum_pg_publication_rel_prattrs,
  &isnull);

- if (!isnull)
+ if (!isnull || !pubgencols)
  {
  int x;
  Bitmapset  *idattrs;
  Bitmapset  *columns = NULL;

- /* With REPLICA IDENTITY FULL, no column list is allowed. */
- if (relation->rd_rel->relreplident == REPLICA_IDENTITY_FULL)
- result = true;
+ if (!isnull)
+ {
+ /* With REPLICA IDENTITY FULL, no column list is allowed. */
+ if (relation->rd_rel->relreplident == REPLICA_IDENTITY_FULL)
+ result = true;
+
+ /* Transform the column list datum to a bitmapset. */
+ columns = pub_collist_to_bitmapset(NULL, datum, NULL);
+ }
+ else
+ {
+ TupleDesc desc = RelationGetDescr(relation);
+ int nliveatts = 0;
+
+ for (int i = 0; i < desc->natts; i++)
+ {
+ Form_pg_attribute att = TupleDescAttr(desc, i);
+
+ /* Skip if the attribute is dropped or generated */
+ if (att->attisdropped)
+ continue;
+
+ nliveatts++;
+
+ if (att->attgenerated)
+ continue;
+
+ columns = bms_add_member(columns, i + 1);
+ }

- /* Transform the column list datum to a bitmapset. */
- columns = pub_collist_to_bitmapset(NULL, datum, NULL);
+ /* Return if all columns of the table will be replicated */
+ if (bms_num_members(columns) == nliveatts)
+ {
+ bms_free(columns);
+ ReleaseSysCache(tuple);
+ return false;
+ }

Won't this lead to traversing the entire column list for default cases
where publish_generated_columns would be false which could hurt the
update/delete's performance? Irrespective of that, it is better to
write some comments to explain this logic.

4. Some minimum parts of 0002 like the changes in
/doc/src/sgml/ref/create_publication.sgml should be part of 0001
patch. We can always add examples or more details in the docs as a
later patch.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Wed, Oct 9, 2024 at 10:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Regarding the 0001 patch, it seems to me that UPDATE and DELETE are
> allowed on the table even if its replica identity is set to generated
> columns that are not published. For example, consider the following
> scenario:
>
> create table t (a int not null, b int generated always as (a + 1)
> stored not null);
> create unique index t_idx on t (b);
> alter table t replica identity using index t_idx;
> create publication pub for table t with (publish_generated_columns = false);
> insert into t values (1);
> update t set a = 100 where a = 1;
>
> The publication pub doesn't include the generated column 'b' which is
> the replica identity of the table 't'. Therefore, the update message
> generated by the last UPDATE would have NULL for the column 'b'. I
> think we should not allow UPDATE and DELETE on such a table.
>

I see the same behavior even without a patch on the HEAD. See the
following example executed on HEAD:

postgres=# create table t (a int not null, b int generated always as (a + 1)
postgres(# stored not null);
CREATE TABLE
postgres=# create unique index t_idx on t (b);
CREATE INDEX
postgres=# alter table t replica identity using index t_idx;
ALTER TABLE
postgres=# create publication pub for table t;
CREATE PUBLICATION
postgres=# insert into t values (1);
INSERT 0 1
postgres=# update t set a = 100 where a = 1;
UPDATE 1

So, the update is allowed even when we don't publish generated
columns, if so, why do we need to handle it in this patch when the
user gave publish_generated_columns=false?

Also, on the subscriber side, I see the ERROR: "publisher did not send
replica identity column expected by the logical replication target
relation "public.t"".

Considering this, I feel if find this behavior buggy then we should
fix this separately rather than part of this patch. What do you think?

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Masahiko Sawada
Date:
On Tue, Oct 22, 2024 at 3:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Oct 9, 2024 at 10:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Regarding the 0001 patch, it seems to me that UPDATE and DELETE are
> > allowed on the table even if its replica identity is set to generated
> > columns that are not published. For example, consider the following
> > scenario:
> >
> > create table t (a int not null, b int generated always as (a + 1)
> > stored not null);
> > create unique index t_idx on t (b);
> > alter table t replica identity using index t_idx;
> > create publication pub for table t with (publish_generated_columns = false);
> > insert into t values (1);
> > update t set a = 100 where a = 1;
> >
> > The publication pub doesn't include the generated column 'b' which is
> > the replica identity of the table 't'. Therefore, the update message
> > generated by the last UPDATE would have NULL for the column 'b'. I
> > think we should not allow UPDATE and DELETE on such a table.
> >
>
> I see the same behavior even without a patch on the HEAD. See the
> following example executed on HEAD:
>
> postgres=# create table t (a int not null, b int generated always as (a + 1)
> postgres(# stored not null);
> CREATE TABLE
> postgres=# create unique index t_idx on t (b);
> CREATE INDEX
> postgres=# alter table t replica identity using index t_idx;
> ALTER TABLE
> postgres=# create publication pub for table t;
> CREATE PUBLICATION
> postgres=# insert into t values (1);
> INSERT 0 1
> postgres=# update t set a = 100 where a = 1;
> UPDATE 1
>
> So, the update is allowed even when we don't publish generated
> columns, if so, why do we need to handle it in this patch when the
> user gave publish_generated_columns=false?
>
> Also, on the subscriber side, I see the ERROR: "publisher did not send
> replica identity column expected by the logical replication target
> relation "public.t"".

Good point.

> Considering this, I feel if find this behavior buggy then we should
> fix this separately rather than part of this patch. What do you think?

Agreed. It's better to fix it separately.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Recently (~ version v39/v40) some changes to 'get_publications_str'
calls got removed from this patchset because it was decided it was
really a separate problem, unrelated to this generated columns
feature.

FYI - I've started a new thread "Refactor to use common function
'get_publications_str'" [1] to address it separately.

======
[1] https://www.postgresql.org/message-id/CAHut+PtJMk4bKXqtpvqVy9ckknCgK9P6=FeG8zHF=6+Em_Snpw@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Tue, Oct 22, 2024 at 9:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Oct 22, 2024 at 3:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> > Considering this, I feel if find this behavior buggy then we should
> > fix this separately rather than part of this patch. What do you think?
>
> Agreed. It's better to fix it separately.
>

Thanks. One more thing that I didn't like about the patch is that it
used column_list to address the "publish_generated_columns = false"
case such that we build column_list without generated columns for the
same. The first problem is that it will add overhead to always probe
column_list during proto.c calls (for example during
logicalrep_write_attrs()), then it makes the column_list code complex
especially the handling in pgoutput_column_list_init(), and finally
this appears to be a misuse of column_list.

So, I suggest remembering this information in RelationSyncEntry and
then using it at the required places. We discussed above that
contradictory values of "publish_generated_columns" across
publications for the same relations are not accepted, so we can detect
that during get_rel_sync_entry() and give an ERROR for the same.

Additional comment on the 0003 patch
+# =============================================================================
+# Misc test.
+#
+# A "normal -> generated" replication.
+#
+# In this test case we use DROP EXPRESSION to change the subscriber generated
+# column into a normal column, then verify replication works ok.
+# =============================================================================

In patch 0003, why do we have the above test? This doesn't seem to be
directly related to this patch.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Wed, Oct 23, 2024 at 5:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

> Additional comment on the 0003 patch
> +# =============================================================================
> +# Misc test.
> +#
> +# A "normal -> generated" replication.
> +#
> +# In this test case we use DROP EXPRESSION to change the subscriber generated
> +# column into a normal column, then verify replication works ok.
> +# =============================================================================
>
> In patch 0003, why do we have the above test? This doesn't seem to be
> directly related to this patch.
>
> --

Perhaps the test should be turned around, to test this feature more directly...

e.g. Replication of table tab(a int, b int) ==>  tab(a int, b int, c int)

test_pub=# create table tab(a int, b int);

then, dynamically add a generated column "c" to the publisher table
test_pub=# alter table tab add column c int GENERATED ALWAYS AS (a + b) STORED;
test_pub=# insert into tab values (1,2);

then, verify that replication works for the newly added generated
column "c" to the existing normal column "c" at the subscriber.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Wed, Oct 23, 2024 at 12:26 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Wed, Oct 23, 2024 at 5:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > Additional comment on the 0003 patch
> > +# =============================================================================
> > +# Misc test.
> > +#
> > +# A "normal -> generated" replication.
> > +#
> > +# In this test case we use DROP EXPRESSION to change the subscriber generated
> > +# column into a normal column, then verify replication works ok.
> > +# =============================================================================
> >
> > In patch 0003, why do we have the above test? This doesn't seem to be
> > directly related to this patch.
> >
> > --
>
> Perhaps the test should be turned around, to test this feature more directly...
>
> e.g. Replication of table tab(a int, b int) ==>  tab(a int, b int, c int)
>
> test_pub=# create table tab(a int, b int);
>
> then, dynamically add a generated column "c" to the publisher table
> test_pub=# alter table tab add column c int GENERATED ALWAYS AS (a + b) STORED;
> test_pub=# insert into tab values (1,2);
>
> then, verify that replication works for the newly added generated
> column "c" to the existing normal column "c" at the subscriber.
>

This is testing whether the invalidation mechanism works for this case
which I see no reason to not work as this patch hasn't changed
anything in this regard. We should verify this but not sure it will
add much value in keeping this in regression tests.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Wed, Oct 23, 2024 at 11:51 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Thanks. One more thing that I didn't like about the patch is that it
> used column_list to address the "publish_generated_columns = false"
> case such that we build column_list without generated columns for the
> same. The first problem is that it will add overhead to always probe
> column_list during proto.c calls (for example during
> logicalrep_write_attrs()), then it makes the column_list code complex
> especially the handling in pgoutput_column_list_init(), and finally
> this appears to be a misuse of column_list.
>
> So, I suggest remembering this information in RelationSyncEntry and
> then using it at the required places. We discussed above that
> contradictory values of "publish_generated_columns" across
> publications for the same relations are not accepted, so we can detect
> that during get_rel_sync_entry() and give an ERROR for the same.
>

The changes in tablesync look complicated and I am not sure whether it
handles the conflicting publish_generated_columns option correctly. I
have few thoughts for the same.
* The handling of contradictory options in multiple publications needs
to be the same as for column lists. I think it is handled (a) during
subscription creation, (b) during copy in fetch_remote_table_info(),
and (c) during replication. See Peter's email
(https://www.postgresql.org/message-id/CAHut%2BPs985rc95cB2x5yMF56p6Lf192AmCJOpAtK_%2BC5YGUF2A%40mail.gmail.com)
to understand why this new option has to be handled in the same way as
the column list.

* While fetching column list via pg_get_publication_tables(), we
should detect contradictory publish_generated_columns options similar
to column lists, and then after we get publish_generated_columns as
return value, we can even use that while fetching attribute
information.

A few additional comments:
1.
- /* Regular table with no row filter */
- if (lrel.relkind == RELKIND_RELATION && qual == NIL)
+ /*
+ * Check if the remote table has any generated columns that should be
+ * copied.
+ */
+ for (int i = 0; i < relmapentry->remoterel.natts; i++)
+ {
+ if (lrel.attremotegen[i])
+ {
+ gencol_copy_needed = true;
+ break;
+ }
+ }

Can't we get this information from fetch_remote_table_info() instead
of traversing the entire column list again?

2.
@@ -1015,7 +1110,6 @@ fetch_remote_table_info(char *nspname, char *relname,
  ExecDropSingleTupleTableSlot(slot);

  lrel->natts = natt;
-
  walrcv_clear_result(res);

Spurious line removal.

3. Why do we have to specifically exclude generated columns of a
subscriber-side table in make_copy_attnamelist()? Can't we rely on
fetch_remote_table_info() and logicalrep_rel_open() that the final
remote attrlist will contain the generated column only if the
subscriber doesn't have a generated column otherwise it would have
given an error in logicalrep_rel_open()?

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Thu, Oct 24, 2024 at 12:15 PM vignesh C <vignesh21@gmail.com> wrote:
>
> The attached v41 version patch has the changes for the same.
>

Please find comments for the new version as follows:
1.
+      Generated columns may be skipped during logical replication
according to the
+      <command>CREATE PUBLICATION</command> option
+      <link linkend="sql-createpublication-params-with-publish-generated-columns">
+      <literal>include_generated_columns</literal></link>.

The above statement doesn't sound to be clear. Can we change it to:
"Generated columns are allowed to be replicated during logical
replication according to the <command>CREATE PUBLICATION</command>
option .."?

2.
 static void publication_invalidation_cb(Datum arg, int cacheid,
  uint32 hashvalue);
-static void send_relation_and_attrs(Relation relation, TransactionId xid,
- LogicalDecodingContext *ctx,
- Bitmapset *columns);
 static void send_repl_origin(LogicalDecodingContext *ctx,
...
...
 static RelationSyncEntry *get_rel_sync_entry(PGOutputData *data,
  Relation relation);
+static void send_relation_and_attrs(Relation relation, TransactionId xid,
+ LogicalDecodingContext *ctx,
+ RelationSyncEntry *relentry);

Why the declaration of this function is changed?

3.
+ /*
+ * Skip publishing generated columns if the option is not specified or
+ * if they are not included in the column list.
+ */
+ if (att->attgenerated && !relentry->pubgencols && !columns)

In the comment above, shouldn't "specified or" be "specified and"?

4.
+pgoutput_pubgencol_init(PGOutputData *data, List *publications,
+ RelationSyncEntry *entry)

{
...
+ foreach(lc, publications)
+ {
+ Publication *pub = lfirst(lc);
+
+ /* No need to check column list publications */
+ if (is_column_list_publication(pub, entry->publish_as_relid))

Are we ignoring column_list publications because for such publications
the value of column_list prevails and we ignore
'publish_generated_columns' value? If so, it is not clear from the
comments.

5.
  /* Initialize the column list */
  pgoutput_column_list_init(data, rel_publications, entry);
+
+ /* Initialize publish generated columns value */
+ pgoutput_pubgencol_init(data, rel_publications, entry);
+
+ /*
+ * Check if there is conflict with the columns selected for the
+ * publication.
+ */
+ check_conflicting_columns(rel_publications, entry);
  }

It looks odd to check conflicting column lists among publications
twice once in pgoutput_column_list_init() and then in
check_conflicting_columns(). Can we merge those?

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Thu, 24 Oct 2024 at 16:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Oct 24, 2024 at 12:15 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > The attached v41 version patch has the changes for the same.
> >
>
> Please find comments for the new version as follows:
> 1.
> +      Generated columns may be skipped during logical replication
> according to the
> +      <command>CREATE PUBLICATION</command> option
> +      <link linkend="sql-createpublication-params-with-publish-generated-columns">
> +      <literal>include_generated_columns</literal></link>.
>
> The above statement doesn't sound to be clear. Can we change it to:
> "Generated columns are allowed to be replicated during logical
> replication according to the <command>CREATE PUBLICATION</command>
> option .."?

Modified

> 2.
>  static void publication_invalidation_cb(Datum arg, int cacheid,
>   uint32 hashvalue);
> -static void send_relation_and_attrs(Relation relation, TransactionId xid,
> - LogicalDecodingContext *ctx,
> - Bitmapset *columns);
>  static void send_repl_origin(LogicalDecodingContext *ctx,
> ...
> ...
>  static RelationSyncEntry *get_rel_sync_entry(PGOutputData *data,
>   Relation relation);
> +static void send_relation_and_attrs(Relation relation, TransactionId xid,
> + LogicalDecodingContext *ctx,
> + RelationSyncEntry *relentry);
>
> Why the declaration of this function is changed?

Two changes were made: a) The function declaration need to be moved
down as the RelationSyncEntry structure is defined below. b) Bitmapset
was replaced with RelationSyncEntry to give send_relation_and_attrs
access to RelationSyncEntry.pubgencols and RelationSyncEntry.columns.
Instead of adding a new parameter to the function, RelationSyncEntry
was utilized, as it contains both pubgencols and columns members.

> 3.
> + /*
> + * Skip publishing generated columns if the option is not specified or
> + * if they are not included in the column list.
> + */
> + if (att->attgenerated && !relentry->pubgencols && !columns)
>
> In the comment above, shouldn't "specified or" be "specified and"?

Modified

> 4.
> +pgoutput_pubgencol_init(PGOutputData *data, List *publications,
> + RelationSyncEntry *entry)
>
> {
> ...
> + foreach(lc, publications)
> + {
> + Publication *pub = lfirst(lc);
> +
> + /* No need to check column list publications */
> + if (is_column_list_publication(pub, entry->publish_as_relid))
>
> Are we ignoring column_list publications because for such publications
> the value of column_list prevails and we ignore
> 'publish_generated_columns' value? If so, it is not clear from the
> comments.

Yes column takes precedence over publish_generated_columns value, so
column list publications are skipped. Modified the comments
accordingly.

> 5.
>   /* Initialize the column list */
>   pgoutput_column_list_init(data, rel_publications, entry);
> +
> + /* Initialize publish generated columns value */
> + pgoutput_pubgencol_init(data, rel_publications, entry);
> +
> + /*
> + * Check if there is conflict with the columns selected for the
> + * publication.
> + */
> + check_conflicting_columns(rel_publications, entry);
>   }
>
> It looks odd to check conflicting column lists among publications
> twice once in pgoutput_column_list_init() and then in
> check_conflicting_columns(). Can we merge those?

Modified it to check from pgoutput_column_list_init

The v42 version patch attached at [1] has the changes for the same.
[1] - https://www.postgresql.org/message-id/CALDaNm2wFZRzSJLcNi_uMZcSUWuZ8%2Bkktc0n3Nfw9Fdti9WbVA%40mail.gmail.com

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Thu, Oct 24, 2024 at 8:50 PM vignesh C <vignesh21@gmail.com> wrote:
>
> The v42 version patch attached at [1] has the changes for the same.
>

Some more comments:
1.
@@ -1017,7 +1089,31 @@ pgoutput_column_list_init(PGOutputData *data,
List *publications,
 {
  ListCell   *lc;
  bool first = true;
+ Bitmapset  *relcols = NULL;
  Relation relation = RelationIdGetRelation(entry->publish_as_relid);
+ TupleDesc desc = RelationGetDescr(relation);
+ MemoryContext oldcxt = NULL;
+ bool collistpubexist = false;
+
+ pgoutput_ensure_entry_cxt(data, entry);
+
+ oldcxt = MemoryContextSwitchTo(entry->entry_cxt);
+
+ /*
+ * Prepare the columns that will be published for FOR ALL TABLES and
+ * FOR TABLES IN SCHEMA publication.
+ */
+ for (int i = 0; i < desc->natts; i++)
+ {
+ Form_pg_attribute att = TupleDescAttr(desc, i);
+
+ if (att->attisdropped || (att->attgenerated && !entry->pubgencols))
+ continue;
+
+ relcols = bms_add_member(relcols, att->attnum);
+ }
+
+ MemoryContextSwitchTo(oldcxt);

This code is unnecessary for cases when the table's publication has a
column list. So, I suggest to form this list only when required. Also,
have an assertion that pubgencols value for entry and publication
matches.

2.
@@ -1115,10 +1186,17 @@ pgoutput_column_list_init(PGOutputData *data,
List *publications,
  ereport(ERROR,
  errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("cannot use different column lists for table \"%s.%s\" in
different publications",
-    get_namespace_name(RelationGetNamespace(relation)),
-    RelationGetRelationName(relation)));
+ get_namespace_name(RelationGetNamespace(relation)),
+ RelationGetRelationName(relation)));

Is there a reason to make the above change? It appears to be a spurious change.

3.
+ /* Check if there is any generated column present */
+ for (int i = 0; i < desc->natts; i++)
+ {
+ Form_pg_attribute att = TupleDescAttr(desc, i);
+ if (att->attgenerated)

Add one empty line between the above two lines.

4.
+ else if (entry->pubgencols != pub->pubgencols)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use different values of publish_generated_columns for
table \"%s.%s\" in different publications",
+ get_namespace_name(RelationGetNamespace(relation)),
+ RelationGetRelationName(relation)));

The last two lines are not aligned.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Fri, Oct 25, 2024 at 12:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Oct 24, 2024 at 8:50 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > The v42 version patch attached at [1] has the changes for the same.
> >
>
> Some more comments:
>

1.
+pgoutput_pubgencol_init(PGOutputData *data, List *publications,
+ RelationSyncEntry *entry)

Can we name it as check_and_init_gencol? I don't know if it is a good
idea to append a prefix pgoutput for local functions. It is primarily
used for exposed functions from pgoutput.c. I see that in a few cases
we do that for local functions as well but that is not a norm.

A related point:
+ /* Initialize publish generated columns value */
+ pgoutput_pubgencol_init(data, rel_publications, entry);

Accordingly change this comment to something like: "Check whether to
publish to generated columns.".

2.
+/*
+ * Returns true if the relation has column list associated with the
+ * publication, false if the relation has no column list associated with the
+ * publication.
+ */
+bool
+is_column_list_publication(Publication *pub, Oid relid)
...
...

How about naming the above function as has_column_list_defined()?
Also, you can write the above comment as: "Returns true if the
relation has column list associated with the publication, false
otherwise."

3.
+ /*
+ * The column list takes precedence over pubgencols, so skip checking
+ * column list publications.
+ */
+ if (is_column_list_publication(pub, entry->publish_as_relid))

Let's change this comment to: "The column list takes precedence over
publish_generated_columns option. Those will be checked later, see
pgoutput_column_list_init."

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Fri, Oct 25, 2024 at 3:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Oct 25, 2024 at 12:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Oct 24, 2024 at 8:50 PM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > The v42 version patch attached at [1] has the changes for the same.
> > >
> >
> > Some more comments:
> >
>
> 1.
> +pgoutput_pubgencol_init(PGOutputData *data, List *publications,
> + RelationSyncEntry *entry)
>
> Can we name it as check_and_init_gencol? I don't know if it is a good
> idea to append a prefix pgoutput for local functions. It is primarily
> used for exposed functions from pgoutput.c. I see that in a few cases
> we do that for local functions as well but that is not a norm.
>
> A related point:
> + /* Initialize publish generated columns value */
> + pgoutput_pubgencol_init(data, rel_publications, entry);
>
> Accordingly change this comment to something like: "Check whether to
> publish to generated columns.".
>

Fixed.

> 2.
> +/*
> + * Returns true if the relation has column list associated with the
> + * publication, false if the relation has no column list associated with the
> + * publication.
> + */
> +bool
> +is_column_list_publication(Publication *pub, Oid relid)
> ...
> ...
>
> How about naming the above function as has_column_list_defined()?
> Also, you can write the above comment as: "Returns true if the
> relation has column list associated with the publication, false
> otherwise."
>

Fixed.

> 3.
> + /*
> + * The column list takes precedence over pubgencols, so skip checking
> + * column list publications.
> + */
> + if (is_column_list_publication(pub, entry->publish_as_relid))
>
> Let's change this comment to: "The column list takes precedence over
> publish_generated_columns option. Those will be checked later, see
> pgoutput_column_list_init."
>

Fixed.

The v43 version patch attached at [1] has the changes for the same.
[1] - https://www.postgresql.org/message-id/CAHv8RjJJJRzy83tG0nB90ivYcp7sFKTU%3D_BcQ-nUZ7VbHFwceA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi, here are my review comments for patch v43-0001.

======

1. Missing docs update?

The CREATE PUBLICATION docs currently says:
When a column list is specified, only the named columns are
replicated. If no column list is specified, all columns of the table
are replicated through this publication, including any columns added
later.

~

For this patch, should that be updated to say "... all columns (except
generated columns) of the table are replicated..."

======
src/backend/replication/logical/proto.c

2.
+static bool
+should_publish_column(Form_pg_attribute att, Bitmapset *columns)
+{
+ if (att->attisdropped)
+ return false;
+
+ /*
+ * Skip publishing generated columns if they are not included in the
+ * column list.
+ */
+ if (att->attgenerated && !columns)
+ return false;
+
+ if (!column_in_column_list(att->attnum, columns))
+ return false;
+
+ return true;
+}

Here, I wanted to suggest that the whole "Skip publishing generated
columns" if-part is unnecessary because the next check
(!column_in_column_list) is going to return false for the same
scenario anyhow.

But, unfortunately, the "column_in_column_list" function has some
special NULL handling logic in it; this means none of this code is
quite what it seems to be (e.g. the function name
column_in_column_list is somewhat misleading)

IMO it would be better to change the column_in_column_list signature
-- add another boolean param to say if a NULL column list is allowed
or not. That will remove any subtle behaviour and then you can remove
the "if (att->attgenerated && !columns)" part.

======
src/backend/replication/pgoutput/pgoutput.c

3. send_relation_and_attrs

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

  if (att->atttypid < FirstGenbkiObjectId)
  continue;

+ /*
+ * Skip publishing generated columns if they are not included in the
+ * column list.
+ */
+ if (att->attgenerated && !columns)
+ continue;
+
  /* Skip this attribute if it's not present in the column list */
  if (columns != NULL && !bms_is_member(att->attnum, columns))
  continue;
~

Most of that code above looks to be doing the very same thing as the
new 'should_publish_column' in proto.c. Won't it be better to expose
the other function and share the common logic?

~~~

4. pgoutput_column_list_init

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

+ if (att->attgenerated)
+ {
+ if (bms_is_member(att->attnum, cols))
+ gencolpresent = true;
+
+ continue;
+ }
+
  nliveatts++;
  }

  /*
- * If column list includes all the columns of the table,
- * set it to NULL.
+ * If column list includes all the columns of the table
+ * and there are no generated columns, set it to NULL.
  */
- if (bms_num_members(cols) == nliveatts)
+ if (bms_num_members(cols) == nliveatts && !gencolpresent)
  {

Something seems not quite right (or maybe redundant) with this logic.
For example, because you unconditionally 'continue' for generated
columns, then AFAICT it is just not possible for bms_num_members(cols)
== nliveatts and at the same time 'gencolpresent' to be true. So you
could've just Asserted (!gencolpresent) instead of checking it in the
condition and mentioning it in the comment.

======
src/test/regress/expected/publication.out

5.
--- error: generated column "d" can't be in list
+-- ok: generated column "d" can be in the list too
 ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d);
-ERROR:  cannot use generated column "d" in publication column list

By allowing the above to work without giving ERROR, I think you've
broken many subsequent test expected results. e.g. I don't trust these
"expected" results anymore because I didn't think these next test
errors should have been affected, right?

 -- error: system attributes "ctid" not allowed in column list
 ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, ctid);
-ERROR:  cannot use system column "ctid" in publication column list
+ERROR:  relation "testpub_tbl5" is already member of publication
"testpub_fortable"

Hmm - looks like a wrong expected result to me.

~

 -- error: duplicates not allowed in column list
 ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, a);
-ERROR:  duplicate column "a" in publication column list
+ERROR:  relation "testpub_tbl5" is already member of publication
"testpub_fortable"

Hmm - looks like a wrong expected result to me.

probably more like this...

======
src/test/subscription/t/031_column_list.pl

6.
+$node_subscriber->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE test_gen (a int PRIMARY KEY, b int);
+));
+
+$node_subscriber->safe_psql(
+ 'postgres', qq(
+ CREATE SUBSCRIPTION sub_gen CONNECTION '$publisher_connstr'
PUBLICATION pub_gen;
+));

Should combine these.

~~~

7.
+$node_publisher->wait_for_catchup('sub_gen');
+
+is( $node_subscriber->safe_psql(
+ 'postgres', "SELECT * FROM test_gen ORDER BY a"),
+ qq(1|2),
+ 'replication with generated columns in column list');
+

But, this is only testing normal replication. You should also include
some initial table data so you can test that the initial table
synchronization works too. Otherwise, I think current this patch has
no proof that the initial 'copy_data' even works at all.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Mon, Oct 28, 2024 at 7:43 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> 7.
> +$node_publisher->wait_for_catchup('sub_gen');
> +
> +is( $node_subscriber->safe_psql(
> + 'postgres', "SELECT * FROM test_gen ORDER BY a"),
> + qq(1|2),
> + 'replication with generated columns in column list');
> +
>
> But, this is only testing normal replication. You should also include
> some initial table data so you can test that the initial table
> synchronization works too. Otherwise, I think current this patch has
> no proof that the initial 'copy_data' even works at all.
>

Per my tests, the initial copy doesn't work with 0001 alone. It needs
changes in table sync.c from the 0002 patch. Now, we can commit 0001
after fixing comments and mentioning in the commit message that this
patch supports only the replication of generated columns when
specified in the column list. The initial sync and replication of
generated columns when not specified in the column list will be
supported in future commits. OTOH, if the change to make table sync
work is simple, we can even combine that change.

--
With Regards,
Amit Kapila.



RE: Pgoutput not capturing the generated columns

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shubham,

More comments for v43-0001.

01. publication.out and publication.sql

I think your fix is not sufficient, even if it pass tests. 

```
-- error: system attributes "ctid" not allowed in column list
 ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, ctid);
-ERROR:  cannot use system column "ctid" in publication column list
+ERROR:  relation "testpub_tbl5" is already member of publication "testpub_fortable"
 ALTER PUBLICATION testpub_fortable SET TABLE testpub_tbl1 (id, ctid);
 ERROR:  cannot use system column "ctid" in publication column list
 -- error: duplicates not allowed in column list
 ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, a);
-ERROR:  duplicate column "a" in publication column list
+ERROR:  relation "testpub_tbl5" is already member of publication "testpub_fortable
```

The error message is not match with the comment. The comment said that the table
has already been added in the publication. I think the first line [1] succeeded by your change
and testpub_tbl5 became a member at that time then upcoming ALTER statements failed
by the duplicate registration.

```
-- ok
 ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, c);
+ERROR:  relation "testpub_tbl5" is already member of publication "testpub_fortable"
```

You said OK but same error happened.

```
ALTER TABLE testpub_tbl5 DROP COLUMN c;     -- no dice
-ERROR:  cannot drop column c of table testpub_tbl5 because other objects depend on it
-DETAIL:  publication of table testpub_tbl5 in publication testpub_fortable depends on column c of table testpub_tbl5
-HINT:  Use DROP ... CASCADE to drop the dependent objects too.
```

This statement should be failed because c was included in the column.
However, it succeeded because previous ALTER PUBLICATION was failed.
Upcoming SQLs wrongly thawed ERRORs because of this.

Please look at all of differences before doing copy-and-paste.

02. 031_column_list.pl

```
-# TEST: Generated and dropped columns are not considered for the column list.
+# TEST: Dropped columns are not considered for the column list.
 # So, the publication having a column list except for those columns and a
 # publication without any column (aka all columns as part of the columns
 # list) are considered to have the same column list.
```

Based on the comment, this case does not test the behavior of generated columns
anymore. So, I felt column 'd' could be removed from the case.

03. 031_column_list.pl

Can we test that generated columns won't be replaced if it does not included in
the column list?

[1]:
```
ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d);
```

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Mon, Oct 28, 2024 at 4:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Oct 28, 2024 at 7:43 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi, here are my review comments for patch v43-0001.
> >
> > ======
> > src/backend/replication/logical/proto.c
> >
> > 2.
> > +static bool
> > +should_publish_column(Form_pg_attribute att, Bitmapset *columns)
> > +{
> > + if (att->attisdropped)
> > + return false;
> > +
> > + /*
> > + * Skip publishing generated columns if they are not included in the
> > + * column list.
> > + */
> > + if (att->attgenerated && !columns)
> > + return false;
> > +
> > + if (!column_in_column_list(att->attnum, columns))
> > + return false;
> > +
> > + return true;
> > +}
> >
> > Here, I wanted to suggest that the whole "Skip publishing generated
> > columns" if-part is unnecessary because the next check
> > (!column_in_column_list) is going to return false for the same
> > scenario anyhow.
> >
> > But, unfortunately, the "column_in_column_list" function has some
> > special NULL handling logic in it; this means none of this code is
> > quite what it seems to be (e.g. the function name
> > column_in_column_list is somewhat misleading)
> >
> > IMO it would be better to change the column_in_column_list signature
> > -- add another boolean param to say if a NULL column list is allowed
> > or not. That will remove any subtle behaviour and then you can remove
> > the "if (att->attgenerated && !columns)" part.
> >
>
> The NULL column list still means all columns, so changing the behavior
> as you are proposing doesn't make sense and would make the code
> difficult to understand.
>

My point was that the function 'column_in_column_list' would return
true even when there is no publication column list at all, so that
function name is misleading.

And, because in patch 0001 the generated columns only work when
specified via a column list it means now there is a difference
between:
- NULL (all columns specified in the column list) and
- NULL (no column list at all).

which seems strange and likely to cause confusion.

On closer inspection, this function 'column_in_column_list; is only
called from one place -- the new 'should_publish_column()'. I think
the function column_in_column_list should be thrown away and just
absorbed into the calling function 'should_publish_column'. Then the
misleading function name is eliminated, and the special NULL handling
can be commented on properly.

======
Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Mon, Oct 28, 2024 at 5:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Oct 28, 2024 at 7:43 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > 7.
> > +$node_publisher->wait_for_catchup('sub_gen');
> > +
> > +is( $node_subscriber->safe_psql(
> > + 'postgres', "SELECT * FROM test_gen ORDER BY a"),
> > + qq(1|2),
> > + 'replication with generated columns in column list');
> > +
> >
> > But, this is only testing normal replication. You should also include
> > some initial table data so you can test that the initial table
> > synchronization works too. Otherwise, I think current this patch has
> > no proof that the initial 'copy_data' even works at all.
> >
>
> Per my tests, the initial copy doesn't work with 0001 alone. It needs
> changes in table sync.c from the 0002 patch. Now, we can commit 0001
> after fixing comments and mentioning in the commit message that this
> patch supports only the replication of generated columns when
> specified in the column list. The initial sync and replication of
> generated columns when not specified in the column list will be
> supported in future commits. OTOH, if the change to make table sync
> work is simple, we can even combine that change.
>

If this comes to a vote, then my vote is to refactor the necessary
tablesync COPY code back into patch 0001 so that patch 0001 can
replicate initial data properly stand alone.

Otherwise, (if we accept patch 0001 only partly works, like now) users
would have to jump through hoops to get any benefit from this patch by
itself. This is particularly true because the CREATE SUBSCRIPTION
'copy_data' parameter default is true, so patch 0001 is going to be
broken by default if there is any pre-existing table data when
publishing generated columns to default subscriptions.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Mon, Oct 28, 2024 at 12:27 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Mon, Oct 28, 2024 at 4:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Oct 28, 2024 at 7:43 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > Hi, here are my review comments for patch v43-0001.
> > >
> > > ======
> > > src/backend/replication/logical/proto.c
> > >
> > > 2.
> > > +static bool
> > > +should_publish_column(Form_pg_attribute att, Bitmapset *columns)
> > > +{
> > > + if (att->attisdropped)
> > > + return false;
> > > +
> > > + /*
> > > + * Skip publishing generated columns if they are not included in the
> > > + * column list.
> > > + */
> > > + if (att->attgenerated && !columns)
> > > + return false;
> > > +
> > > + if (!column_in_column_list(att->attnum, columns))
> > > + return false;
> > > +
> > > + return true;
> > > +}
> > >
> > > Here, I wanted to suggest that the whole "Skip publishing generated
> > > columns" if-part is unnecessary because the next check
> > > (!column_in_column_list) is going to return false for the same
> > > scenario anyhow.
> > >
> > > But, unfortunately, the "column_in_column_list" function has some
> > > special NULL handling logic in it; this means none of this code is
> > > quite what it seems to be (e.g. the function name
> > > column_in_column_list is somewhat misleading)
> > >
> > > IMO it would be better to change the column_in_column_list signature
> > > -- add another boolean param to say if a NULL column list is allowed
> > > or not. That will remove any subtle behaviour and then you can remove
> > > the "if (att->attgenerated && !columns)" part.
> > >
> >
> > The NULL column list still means all columns, so changing the behavior
> > as you are proposing doesn't make sense and would make the code
> > difficult to understand.
> >
>
> My point was that the function 'column_in_column_list' would return
> true even when there is no publication column list at all, so that
> function name is misleading.
>
> And, because in patch 0001 the generated columns only work when
> specified via a column list it means now there is a difference
> between:
> - NULL (all columns specified in the column list) and
> - NULL (no column list at all).
>
> which seems strange and likely to cause confusion.
>

This is no more strange than it was before the 0001 patch. Also, the
comment atop the function clarifies the special condition of the
function. OTOH, I am fine with pulling the check outside function as
you are proposing especially because now it is called from just one
place.

--
With Regards,
Amit Kapila.



RE: Pgoutput not capturing the generated columns

From
"Zhijie Hou (Fujitsu)"
Date:
On Monday, October 28, 2024 1:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Mon, Oct 28, 2024 at 7:43 AM Peter Smith <smithpb2250@gmail.com>
> wrote:
> 
> >
> > 4. pgoutput_column_list_init
> >
> > - if (att->attisdropped || att->attgenerated)
> > + if (att->attisdropped)
> >   continue;
> >
> > + if (att->attgenerated)
> > + {
> > + if (bms_is_member(att->attnum, cols)) gencolpresent = true;
> > +
> > + continue;
> > + }
> > +
> >   nliveatts++;
> >   }
> >
> >   /*
> > - * If column list includes all the columns of the table,
> > - * set it to NULL.
> > + * If column list includes all the columns of the table
> > + * and there are no generated columns, set it to NULL.
> >   */
> > - if (bms_num_members(cols) == nliveatts)
> > + if (bms_num_members(cols) == nliveatts && !gencolpresent)
> >   {
> >
> > Something seems not quite right (or maybe redundant) with this logic.
> > For example, because you unconditionally 'continue' for generated
> > columns, then AFAICT it is just not possible for bms_num_members(cols)
> > == nliveatts and at the same time 'gencolpresent' to be true. So you
> > could've just Asserted (!gencolpresent) instead of checking it in the
> > condition and mentioning it in the comment.

I think it's possible for the condition you mentioned to happen.

For example:
 
CREATE TABLE test_mix_4 (a int primary key, b int, d int GENERATED ALWAYS AS (a + 1) STORED);
CREATE PUBLICATION pub FOR TABLE test_mix_4(a, d);

> >
> 
> It seems part of the logic is redundant. I propose to change something along the
> lines of the attached. I haven't tested the attached change as it shows how we
> can improve this part of code.

Thanks for the changes. I tried and faced an unexpected behavior
that the walsender will report Error "cannot use different column lists fo.."
in the following case:

Pub:
    CREATE TABLE test_mix_4 (a int PRIMARY KEY, b int, c int, d int GENERATED ALWAYS AS (a + 1) STORED);
    ALTER TABLE test_mix_4 DROP COLUMN c;
    CREATE PUBLICATION pub_mix_7 FOR TABLE test_mix_4 (a, b);
    CREATE PUBLICATION pub_mix_8 FOR TABLE test_mix_4;
Sub:
    CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub_mix_7, pub_mix_8;

The pub_mix_7 publishes column a,b which should be converted to NULL
in pgoutput, but was not due to the check of att_gen_present.

Based on above, I feel we can keep the original code as it is.

Best Regards,
Hou zj

RE: Pgoutput not capturing the generated columns

From
"Zhijie Hou (Fujitsu)"
Date:
On Monday, October 28, 2024 2:54 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote:
> 
> 
> 02. 031_column_list.pl
> 
> ```
> -# TEST: Generated and dropped columns are not considered for the column
> list.
> +# TEST: Dropped columns are not considered for the column list.
>  # So, the publication having a column list except for those columns and a  #
> publication without any column (aka all columns as part of the columns  #
> list) are considered to have the same column list.
> ```
> 
> Based on the comment, this case does not test the behavior of generated
> columns anymore. So, I felt column 'd' could be removed from the case.

I think keeping the generated column can test the cases you mentioned
in comment #03, so we can modify the comments here to make that clear.

> 
> 03. 031_column_list.pl
> 
> Can we test that generated columns won't be replaced if it does not included in
> the column list?

As stated above, it can be covered in existing tests.

Best Regards,
Hou zj

Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Mon, Oct 28, 2024 at 8:47 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Shubham,
>
> Thanks for updating the patch! I resumed reviewing the patch set.
> Here are only cosmetic comments as my rehabilitation.
>
> 01. getPublications()
>
> I feel we could follow the notation like getSubscriptions(), because number of
> parameters became larger. How do you feel like attached?
>

I will handle this comment in a later set of patches.

> 02. fetch_remote_table_info()
>
> ```
>                           "SELECT DISTINCT"
> -                         "  (CASE WHEN (array_length(gpt.attrs, 1) = c.relnatts)"
> -                         "   THEN NULL ELSE gpt.attrs END)"
> +                         "  (gpt.attrs)"
> ```
>
> I think no need to separate lines and add bracket. How about like below?
>
> ```
>                                                  "SELECT DISTINCT gpt.attrs"
> ```
>
Fixed this.

The v44 version patches attached at [1] have the changes for the same.
[1] - https://www.postgresql.org/message-id/CAHv8RjLvr8ZxX-1TcaxrZns1nwgrVUTO_2jhDdOPys0WgrDyKQ%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Here are my review comments for v44-0001.

======
doc/src/sgml/ref/create_publication.sgml

1.
-      When a column list is specified, only the named columns are replicated.
+      When a column list is specified, all columns (except generated columns)
+      of the table are replicated.
       If no column list is specified, all columns of the table are replicated
       through this publication, including any columns added later. It has no

Huh? This seems very wrong.

I think it should have been like:
When a column list is specified, only the named columns are
replicated. If no column list is specified, all table columns (except
generated columns) are replicated...

======
src/backend/replication/logical/proto.c

2.
+bool
+logicalrep_should_publish_column(Form_pg_attribute att, Bitmapset *columns)
+{
+ if (att->attisdropped)
+ return false;
+
+ /*
+ * Skip publishing generated columns if they are not included in the
+ * column list.
+ */
+ if (!columns && att->attgenerated)
+ return false;
+
+ /*
+ * Check if a column is covered by a column list.
+ */
+ if (columns && !bms_is_member(att->attnum, columns))
+ return false;
+
+ return true;
+}

I thought this could be more simply written as:

{
if (att->attisdropped)
  return false;

/* If a column list was specified only publish the specified columns. */
if (columns)
  return bms_is_member(att->attnum, columns);

/* If a column list was not specified publish everything except
generated columns. */
return !att->attgenerated;
}

======
src/backend/replication/pgoutput/pgoutput.c

3.
- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
+ continue;
+
+ if (att->attgenerated)
+ {
+ if (bms_is_member(att->attnum, cols))
+ gencolpresent = true;
+
  continue;
+ }
+

  nliveatts++;
  }

  /*
- * If column list includes all the columns of the table,
- * set it to NULL.
+ * If column list includes all the columns of the table
+ * and there are no generated columns, set it to NULL.
  */
- if (bms_num_members(cols) == nliveatts)
+ if (bms_num_members(cols) == nliveatts && !gencolpresent)
  {
  bms_free(cols);
  cols = NULL;
~

That code still looks strange to me. I think that unconditional
'continue' for attgenerated is breaking the meaning of 'nliveattrs'
(which I take as meaning 'count-of-the-attrs-to-be-published').

AFAICT the code should be more like this:

if (att->attgenerated)
{
  /* Generated cols are skipped unless they are present in a column list. */
  if (!bms_is_member(att->attnum, cols))
    continue;

  gencolpresent = true;
}

======
src/test/regress/sql/publication.sql

4.
 ALTER PUBLICATION testpub_fortable DROP TABLE testpub_tbl5;

+-- ok: generated column "d" can be in the list too
+ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (d);
+ALTER PUBLICATION testpub_fortable DROP TABLE testpub_tbl5;

Maybe you can change this test to do "SET TABLE testpub_tbl5 (a,d);"
instead of ADD TABLE, so then you can remove the earlier DROP and DROP
the table only once.

======
src/test/subscription/t/031_column_list.pl

5.
+# TEST: Dropped columns are not considered for the column list, and generated
+# columns are not replicated if they are not explicitly included in the column
+# list. So, the publication having a column list except for those columns and a
+# publication without any column (aka all columns as part of the columns list)
+# are considered to have the same column list.

Hmm. I don't think this wording is quite right "without any column".
AFAIK the original intent of this test was to prove only that
dropped/generated columns were ignored for the NULL column list logic.

That last sentence maybe should say more like:

So a publication with a column list specifying all table columns
(excluding only dropped and generated columns) is considered to be the
same as a publication that has no column list at all for that table.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Tue, Oct 29, 2024 at 7:44 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> ======
> src/backend/replication/logical/proto.c
>
> 2.
> +bool
> +logicalrep_should_publish_column(Form_pg_attribute att, Bitmapset *columns)
> +{
> + if (att->attisdropped)
> + return false;
> +
> + /*
> + * Skip publishing generated columns if they are not included in the
> + * column list.
> + */
> + if (!columns && att->attgenerated)
> + return false;
> +
> + /*
> + * Check if a column is covered by a column list.
> + */
> + if (columns && !bms_is_member(att->attnum, columns))
> + return false;
> +
> + return true;
> +}
>
> I thought this could be more simply written as:
>
> {
> if (att->attisdropped)
>   return false;
>
> /* If a column list was specified only publish the specified columns. */
> if (columns)
>   return bms_is_member(att->attnum, columns);
>
> /* If a column list was not specified publish everything except
> generated columns. */
> return !att->attgenerated;
> }
>

Your version is difficult to follow compared to what is proposed in
the current patch. It is a matter of personal choice, so I leave it to
the author (or others) which one they prefer. However, I suggest that
we add extra comments in the current patch where we return true at the
end of the function and also at the top of the function.

>
> ======
> src/test/regress/sql/publication.sql
>
> 4.
>  ALTER PUBLICATION testpub_fortable DROP TABLE testpub_tbl5;
>
> +-- ok: generated column "d" can be in the list too
> +ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (d);
> +ALTER PUBLICATION testpub_fortable DROP TABLE testpub_tbl5;
>
> Maybe you can change this test to do "SET TABLE testpub_tbl5 (a,d);"
> instead of ADD TABLE, so then you can remove the earlier DROP and DROP
> the table only once.
>

Yeah, we can do that if we want, but let's not add the dependency of
the previous test. Separate tests make it easier to extend the tests
in the future. Now, if it would have saved a noticeable amount of
time, then we could have considered it. Having said that, we can keep
both columns a and d in the column list.

> ======
> src/test/subscription/t/031_column_list.pl
>
> 5.
> +# TEST: Dropped columns are not considered for the column list, and generated
> +# columns are not replicated if they are not explicitly included in the column
> +# list. So, the publication having a column list except for those columns and a
> +# publication without any column (aka all columns as part of the columns list)
> +# are considered to have the same column list.
>
> Hmm. I don't think this wording is quite right "without any column".
> AFAIK the original intent of this test was to prove only that
> dropped/generated columns were ignored for the NULL column list logic.
>
> That last sentence maybe should say more like:
>
> So a publication with a column list specifying all table columns
> (excluding only dropped and generated columns) is considered to be the
> same as a publication that has no column list at all for that table.
>

I think you are saying the same thing in slightly different words.
Both of those sound correct to me. So not sure if we get any advantage
by changing it.

--
With Regards,
Amit Kapila.



RE: Pgoutput not capturing the generated columns

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shubham,

Thanks for updating the patch! Here are my comments for v44.

01. fetch_remote_table_info()

`bool *remotegencolpresent` is accessed unconditionally, but it can cause crash
if NULL is passed to the function. Should we add an Assert to verify it?

02. fetch_remote_table_info()

```
+        if (server_version >= 180000)
+            *remotegencolpresent |= DatumGetBool(slot_getattr(slot, 5, &isnull));
+
```

Can we add Assert(!isnull) like other parts?

03. fetch_remote_table_info()

Also, we do not have to reach here once *remotegencolpresent becomes true.
Based on 02 and 03, how about below?

```
        if (server_version >= 180000 && !(*remotegencolpresent))
        {
            *remotegencolpresent |= DatumGetBool(slot_getattr(slot, 5, &isnull));
            Assert(!isnull);
        }
```

04. pgoutput_column_list_init()

+                        if (att->attgenerated)
+                        {
+                            if (bms_is_member(att->attnum, cols))
+                                gencolpresent = true;
+
                             continue;
+                        }

I'm not sure it is correct. Why do you skip the generated column even when it is in
the column list? Also, can you add comments what you want to do?

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Here are my review comments for patch v44-0002.

======
Commit message.

1.
The commit message is missing.

======
src/backend/replication/logical/tablesync.c

fetch_remote_table_info:

2.
+fetch_remote_table_info(char *nspname, char *relname, LogicalRepRelation *lrel,
+ List **qual, bool *remotegencolpresent)

The name 'remotegencolpresent' sounds like it means a generated col is
present in the remote table, but don't we only care when it is being
published? So, would a better parameter name be more like
'remote_gencol_published'?

~~~

3.
Would it be better to introduce a new human-readable variable like:
bool check_for_published_gencols = (server_version >= 180000);

because then you could use that instead of having the 180000 check in
multiple places.

~~~

4.
-   lengthof(attrRow), attrRow);
+   server_version >= 180000 ? lengthof(attrRow) : lengthof(attrRow) -
1, attrRow);

If you wish, that length calculation could be written more concisely like:
lengthof(attrow) - (server_version >= 180000 ? 0 : 1)

~~~

5.
+ if (server_version >= 180000)
+ *remotegencolpresent |= DatumGetBool(slot_getattr(slot, 5, &isnull));
+

Should this also say Assert(!isnull)?

======
src/test/subscription/t/031_column_list.pl

6.
+ qq(0|1),
+ 'replication with generated columns in column list');

Perhaps this message should be worded slightly differently, to
distinguish it from the "normal" replication message.

/replication with generated columns in column list/initial replication
with generated columns in column list/

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Tue, Oct 29, 2024 at 11:19 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> 01. fetch_remote_table_info()
>
> `bool *remotegencolpresent` is accessed unconditionally, but it can cause crash
> if NULL is passed to the function. Should we add an Assert to verify it?
>

This is a static function being called from just one place, so don't
think this is required.

> 02. fetch_remote_table_info()
>
> ```
> +        if (server_version >= 180000)
> +            *remotegencolpresent |= DatumGetBool(slot_getattr(slot, 5, &isnull));
> +
> ```
>
> Can we add Assert(!isnull) like other parts?
>
> 03. fetch_remote_table_info()
>
> Also, we do not have to reach here once *remotegencolpresent becomes true.
> Based on 02 and 03, how about below?
>
> ```
>                 if (server_version >= 180000 && !(*remotegencolpresent))
>                 {
>                         *remotegencolpresent |= DatumGetBool(slot_getattr(slot, 5, &isnull));
>                         Assert(!isnull);
>                 }
> ```
>

Yeah, we can follow this suggestion but better to add a comment for the same.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Tue, Oct 29, 2024 at 11:30 AM Peter Smith <smithpb2250@gmail.com> wrote:
> ======
> src/backend/replication/logical/tablesync.c
>
> fetch_remote_table_info:
>
> 2.
> +fetch_remote_table_info(char *nspname, char *relname, LogicalRepRelation *lrel,
> + List **qual, bool *remotegencolpresent)
>
> The name 'remotegencolpresent' sounds like it means a generated col is
> present in the remote table, but don't we only care when it is being
> published? So, would a better parameter name be more like
> 'remote_gencol_published'?
>

I feel no need to add a 'remote' to this variable name as the function
name itself clarifies the same. Both in the function definition and at
the caller site, we can name it 'gencol_published'.

> ~~~
>
> 3.
> Would it be better to introduce a new human-readable variable like:
> bool check_for_published_gencols = (server_version >= 180000);
>
> because then you could use that instead of having the 180000 check in
> multiple places.
>

It is better to add a comment because it makes this part of the code
difficult to enhance in the same version (18) if required.

> ~~~
>
> 4.
> -   lengthof(attrRow), attrRow);
> +   server_version >= 180000 ? lengthof(attrRow) : lengthof(attrRow) -
> 1, attrRow);
>
> If you wish, that length calculation could be written more concisely like:
> lengthof(attrow) - (server_version >= 180000 ? 0 : 1)
>

The current way of the patch seems easier to follow.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Tue, 29 Oct 2024 at 07:44, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are my review comments for v44-0001.
>
> ======
> doc/src/sgml/ref/create_publication.sgml
>
> 1.
> -      When a column list is specified, only the named columns are replicated.
> +      When a column list is specified, all columns (except generated columns)
> +      of the table are replicated.
>        If no column list is specified, all columns of the table are replicated
>        through this publication, including any columns added later. It has no
>
> Huh? This seems very wrong.
>
> I think it should have been like:
> When a column list is specified, only the named columns are
> replicated. If no column list is specified, all table columns (except
> generated columns) are replicated...

Modified

> ======
> src/backend/replication/logical/proto.c
>
> 2.
> +bool
> +logicalrep_should_publish_column(Form_pg_attribute att, Bitmapset *columns)
> +{
> + if (att->attisdropped)
> + return false;
> +
> + /*
> + * Skip publishing generated columns if they are not included in the
> + * column list.
> + */
> + if (!columns && att->attgenerated)
> + return false;
> +
> + /*
> + * Check if a column is covered by a column list.
> + */
> + if (columns && !bms_is_member(att->attnum, columns))
> + return false;
> +
> + return true;
> +}
>
> I thought this could be more simply written as:
>
> {
> if (att->attisdropped)
>   return false;
>
> /* If a column list was specified only publish the specified columns. */
> if (columns)
>   return bms_is_member(att->attnum, columns);
>
> /* If a column list was not specified publish everything except
> generated columns. */
> return !att->attgenerated;
> }

I preferred the earlier code as it is more simple, added a few
comments for the same to avoid confusion.

> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 3.
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
> + continue;
> +
> + if (att->attgenerated)
> + {
> + if (bms_is_member(att->attnum, cols))
> + gencolpresent = true;
> +
>   continue;
> + }
> +
>
>   nliveatts++;
>   }
>
>   /*
> - * If column list includes all the columns of the table,
> - * set it to NULL.
> + * If column list includes all the columns of the table
> + * and there are no generated columns, set it to NULL.
>   */
> - if (bms_num_members(cols) == nliveatts)
> + if (bms_num_members(cols) == nliveatts && !gencolpresent)
>   {
>   bms_free(cols);
>   cols = NULL;
> ~
>
> That code still looks strange to me. I think that unconditional
> 'continue' for attgenerated is breaking the meaning of 'nliveattrs'
> (which I take as meaning 'count-of-the-attrs-to-be-published').
>
> AFAICT the code should be more like this:
>
> if (att->attgenerated)
> {
>   /* Generated cols are skipped unless they are present in a column list. */
>   if (!bms_is_member(att->attnum, cols))
>     continue;
>
>   gencolpresent = true;
> }

Modified

> ======
> src/test/regress/sql/publication.sql
>
> 4.
>  ALTER PUBLICATION testpub_fortable DROP TABLE testpub_tbl5;
>
> +-- ok: generated column "d" can be in the list too
> +ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (d);
> +ALTER PUBLICATION testpub_fortable DROP TABLE testpub_tbl5;
>
> Maybe you can change this test to do "SET TABLE testpub_tbl5 (a,d);"
> instead of ADD TABLE, so then you can remove the earlier DROP and DROP
> the table only once.

I did not make this change as Amit also felt that way, added column a
also mentioned in [1].

> ======
> src/test/subscription/t/031_column_list.pl
>
> 5.
> +# TEST: Dropped columns are not considered for the column list, and generated
> +# columns are not replicated if they are not explicitly included in the column
> +# list. So, the publication having a column list except for those columns and a
> +# publication without any column (aka all columns as part of the columns list)
> +# are considered to have the same column list.
>
> Hmm. I don't think this wording is quite right "without any column".
> AFAIK the original intent of this test was to prove only that
> dropped/generated columns were ignored for the NULL column list logic.
>
> That last sentence maybe should say more like:
>
> So a publication with a column list specifying all table columns
> (excluding only dropped and generated columns) is considered to be the
> same as a publication that has no column list at all for that table.

I have just changed "publication without any column" to "publication
without any column list" as the rest looks ok to me.

The attached v45 version patch has the changes for the same.  I have
also merged the 0002 patch as the patch looks fairly stable now.

 [1] - https://www.postgresql.org/message-id/CAA4eK1K31%3D1draCJE0ng3Drt8C9D65qPppwK%3D-V64YMiDyRziA%40mail.gmail.com

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Tue, 29 Oct 2024 at 11:30, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are my review comments for patch v44-0002.
>
> ======
> Commit message.
>
> 1.
> The commit message is missing.

This patch is now merged, so no change required.

> ======
> src/backend/replication/logical/tablesync.c
>
> fetch_remote_table_info:
>
> 2.
> +fetch_remote_table_info(char *nspname, char *relname, LogicalRepRelation *lrel,
> + List **qual, bool *remotegencolpresent)
>
> The name 'remotegencolpresent' sounds like it means a generated col is
> present in the remote table, but don't we only care when it is being
> published? So, would a better parameter name be more like
> 'remote_gencol_published'?

I have changed it to gencol_published based on Amit's suggestion at [1].

> ~~~
>
> 3.
> Would it be better to introduce a new human-readable variable like:
> bool check_for_published_gencols = (server_version >= 180000);
>
> because then you could use that instead of having the 180000 check in
> multiple places.

I felt this is not required, so not making any change for this.

> ~~~
>
> 4.
> -   lengthof(attrRow), attrRow);
> +   server_version >= 180000 ? lengthof(attrRow) : lengthof(attrRow) -
> 1, attrRow);
>
> If you wish, that length calculation could be written more concisely like:
> lengthof(attrow) - (server_version >= 180000 ? 0 : 1)

I felt the current one is better, also Amit feels the same way as in
[1]. Not making any change for this.

> ~~~
>
> 5.
> + if (server_version >= 180000)
> + *remotegencolpresent |= DatumGetBool(slot_getattr(slot, 5, &isnull));
> +
>
> Should this also say Assert(!isnull)?

Added an assert

> ======
> src/test/subscription/t/031_column_list.pl
>
> 6.
> + qq(0|1),
> + 'replication with generated columns in column list');
>
> Perhaps this message should be worded slightly differently, to
> distinguish it from the "normal" replication message.
>
> /replication with generated columns in column list/initial replication
> with generated columns in column list/

Modified

The v45 version patch attached at [2] has the changes for the same.

[1] - https://www.postgresql.org/message-id/CAA4eK1Lpzy3eqd2AOM%2BTXp80SFL1cCfX3cf9thjL-hJxn%2BAYGA%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CALDaNm1oc-%2Buav380Z1k6gCZY5GJn5ZYKRexwM%2BqqGiRinUS-Q%40mail.gmail.com

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Shubham Khanna
Date:
On Tue, Oct 29, 2024 at 3:18 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Tue, 29 Oct 2024 at 11:30, Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Here are my review comments for patch v44-0002.
> >
> > ======
> > Commit message.
> >
> > 1.
> > The commit message is missing.
>
> This patch is now merged, so no change required.
>
> > ======
> > src/backend/replication/logical/tablesync.c
> >
> > fetch_remote_table_info:
> >
> > 2.
> > +fetch_remote_table_info(char *nspname, char *relname, LogicalRepRelation *lrel,
> > + List **qual, bool *remotegencolpresent)
> >
> > The name 'remotegencolpresent' sounds like it means a generated col is
> > present in the remote table, but don't we only care when it is being
> > published? So, would a better parameter name be more like
> > 'remote_gencol_published'?
>
> I have changed it to gencol_published based on Amit's suggestion at [1].
>
> > ~~~
> >
> > 3.
> > Would it be better to introduce a new human-readable variable like:
> > bool check_for_published_gencols = (server_version >= 180000);
> >
> > because then you could use that instead of having the 180000 check in
> > multiple places.
>
> I felt this is not required, so not making any change for this.
>
> > ~~~
> >
> > 4.
> > -   lengthof(attrRow), attrRow);
> > +   server_version >= 180000 ? lengthof(attrRow) : lengthof(attrRow) -
> > 1, attrRow);
> >
> > If you wish, that length calculation could be written more concisely like:
> > lengthof(attrow) - (server_version >= 180000 ? 0 : 1)
>
> I felt the current one is better, also Amit feels the same way as in
> [1]. Not making any change for this.
>
> > ~~~
> >
> > 5.
> > + if (server_version >= 180000)
> > + *remotegencolpresent |= DatumGetBool(slot_getattr(slot, 5, &isnull));
> > +
> >
> > Should this also say Assert(!isnull)?
>
> Added an assert
>
> > ======
> > src/test/subscription/t/031_column_list.pl
> >
> > 6.
> > + qq(0|1),
> > + 'replication with generated columns in column list');
> >
> > Perhaps this message should be worded slightly differently, to
> > distinguish it from the "normal" replication message.
> >
> > /replication with generated columns in column list/initial replication
> > with generated columns in column list/
>
> Modified
>
> The v45 version patch attached at [2] has the changes for the same.
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1Lpzy3eqd2AOM%2BTXp80SFL1cCfX3cf9thjL-hJxn%2BAYGA%40mail.gmail.com
> [2] - https://www.postgresql.org/message-id/CALDaNm1oc-%2Buav380Z1k6gCZY5GJn5ZYKRexwM%2BqqGiRinUS-Q%40mail.gmail.com
>

While performing the Backward Compatibility Test, I found that
'tablesync' is not working for the older versions i.e., from
version-12 till version-15.
I created 2 nodes ; PUBLISHER on old versions and SUBSCRIBER on HEAD +
v45 Patch for testing.
Following was done on the PUBLISHER node:
CREATE TABLE t1 (c1 int, c2 int GENERATED ALWAYS AS (c1 * 2) STORED);
INSERT INTO t1 (c1) VALUES (1), (2);
CREATE PUBLICATION pub1 for table t1;

Following  was done on the SUBSCRIBER node:
CREATE TABLE t1 (c1 int, c2 int);
CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres' PUBLICATION pub1;

Following Error occurs repeatedly in the Subscriber log files:
ERROR:  could not start initial contents copy for table "public.t1":
ERROR:  column "c2" is a generated column
DETAIL:  Generated columns cannot be used in COPY.

Thanks and Regards,
Shubham Khanna.



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Tue, Oct 29, 2024 at 8:50 PM vignesh C <vignesh21@gmail.com> wrote:
>
> Thank you for reporting this issue. The attached v46 patch addresses
> the problem and includes some adjustments to the comments. Thanks to
> Amit for sharing the comment changes offline.
>

Pushed. Kindly rebase and send the remaining patches.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Thu, Oct 31, 2024 at 3:16 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, 30 Oct 2024 at 15:06, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Oct 29, 2024 at 8:50 PM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > Thank you for reporting this issue. The attached v46 patch addresses
> > > the problem and includes some adjustments to the comments. Thanks to
> > > Amit for sharing the comment changes offline.
> > >
> >
> > Pushed. Kindly rebase and send the remaining patches.
>
> Thanks for committing this patch, here is a rebased version of the
> remaining patches.
>

Hi,

I found that the docs of src/sgml/ddl.sgml [1] are still saying:

     <para>
      Generated columns are skipped for logical replication and cannot be
      specified in a <command>CREATE PUBLICATION</command> column list.
     </para>

But that is contrary to the new behaviour after the "Replicate
generated columns when specified in the column list." commit yesterday
[2].

It looks like an oversight. I think updating that paragraph should
have been included with yesterday's commit.

======
[1] https://github.com/postgres/postgres/blob/master/doc/src/sgml/ddl.sgml
[2] https://github.com/postgres/postgres/commit/745217a051a9341e8c577ea59a87665d331d4af0

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Thu, Oct 31, 2024 at 1:14 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Thu, 31 Oct 2024 at 04:42, Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Thu, Oct 31, 2024 at 3:16 AM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > On Wed, 30 Oct 2024 at 15:06, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Tue, Oct 29, 2024 at 8:50 PM vignesh C <vignesh21@gmail.com> wrote:
> > > > >
> > > > > Thank you for reporting this issue. The attached v46 patch addresses
> > > > > the problem and includes some adjustments to the comments. Thanks to
> > > > > Amit for sharing the comment changes offline.
> > > > >
> > > >
> > > > Pushed. Kindly rebase and send the remaining patches.
> > >
> > > Thanks for committing this patch, here is a rebased version of the
> > > remaining patches.
> > >
> >
> > Hi,
> >
> > I found that the docs of src/sgml/ddl.sgml [1] are still saying:
> >
> >      <para>
> >       Generated columns are skipped for logical replication and cannot be
> >       specified in a <command>CREATE PUBLICATION</command> column list.
> >      </para>
> >
> > But that is contrary to the new behaviour after the "Replicate
> > generated columns when specified in the column list." commit yesterday
> > [2].
> >
> > It looks like an oversight. I think updating that paragraph should
> > have been included with yesterday's commit.
>
> Thanks for the findings, the attached patch has the changes for the same.
>

LGTM.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Ajin Cherian
Date:
I ran some tests and verified that the patch works with previous versions of PG12 and PG17
1. Verified with publications with generated columns and without generated columns on patched code and subscriptions on PG12 and PG17
Observations:
    a. If publication is created with publish_generated_columns=true or with generated columns mentioned explicitly, then tablesync will not copy generated columns but post tablesync the generated columns are replicated
    b. Column list override (publish_generated_columns=false) behaviour

These seem expected.

2. Publication on PG12 and PG17 with subscription on patched code:
Observation:
Behaves as if without patch.

3. Pg_dump - confirmed that the new version correctly dumps the new syntax
4. Pg_upgrade - confirmed that when updating from previous version to the latest the "Generated columns" field default to false.
5. Verified that publications with different column list are disallowed to be subscribed by one subscription
   a. PUB_A(column list = (a, b)) PUB_B(no column list, with publish_generated_column) - OK
   b. PUB_A(column list = (a, b)) PUB_B(no column list, without publish_generated_column) - FAIL
   c.  PUB_A(no column list, without publish_generated_column) PUB_B(no column list, with publish_generated_column) - FAIL

Tests did not show any unexpected behaviour.

regards,
Ajin Cherian
Fujitsu Australia

Re: Pgoutput not capturing the generated columns

From
Ajin Cherian
Date:


On Thu, Oct 31, 2024 at 9:55 PM Ajin Cherian <itsajin@gmail.com> wrote:
I ran some tests and verified that the patch works with previous versions of PG12 and PG17
1. Verified with publications with generated columns and without generated columns on patched code and subscriptions on PG12 and PG17
Observations:
    a. If publication is created with publish_generated_columns=true or with generated columns mentioned explicitly, then tablesync will not copy generated columns but post tablesync the generated columns are replicated
    b. Column list override (publish_generated_columns=false) behaviour

These seem expected.


Currently the documentation does not talk about this behaviour, I suggest this be added similar to how such a behaviour was documented when the original row-filter version was committed.
Suggestion:
"If a subscriber is a pre-18 version, the initial table synchronization won't publish generated columns even if they are defined in the publisher."

regards,
Ajin Cherian
Fujitsu Australia

Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Thu, Oct 31, 2024 at 3:16 AM vignesh C <vignesh21@gmail.com> wrote:

> Thanks for committing this patch, here is a rebased version of the
> remaining patches.
>

Hi Vignesh.

Here are my review comments for the docs patch v1-0002.

======
Commit message

1.
This patch updates docs to describe the new feature allowing
replication of generated
columns. This includes addition of a new section "Generated Column
Replication" to the
"Logical Replication" documentation chapter.

~

That first sentence was correct previously when this patch contained
*all* the gencols documentation, but now some of the feature docs are
already handled by previous patches, so the first sentence can be
removed.

Now patch 0002 is only for adding the new chapter, plus the references to it.

~

/This includes addition of a new section/This patch adds a new section/

======
doc/src/sgml/protocol.sgml

2.
      <para>
-      Next, one of the following submessages appears for each column
(except generated columns):
+      Next, one of the following submessages appears for each column:

AFAIK this simply cancels out a change from the v1-0001 patch which
IMO should have not been there in the first place. Please refer to my
v1-0001 review for the same.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



RE: Pgoutput not capturing the generated columns

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Vignesh,

Thanks for rebasing the patch! Before reviewing deeply, I want to confirm the specification.
I understood like below based on the documentation.

- If publish_generated_columns is false, the publication won't replicate generated columns
- If publish_generated_columns is true, the behavior on the subscriber depends on the table column:
  - If it is a generated column even on the subscriber, it causes an ERROR.
  - If it is a regular column, the generated value is replicated.
  - If the column is missing, it causes an ERROR.

However, below test implies that generated columns can be replicated even if
publish_generated_columns is false. Not sure...

```
# Verify that incremental replication of generated columns occurs
# when they are included in the column list, regardless of the
# publish_generated_columns option.
$result =
  $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3 ORDER BY a");
is( $result, qq(|2
|4
|6
|8),
    'tab3 incremental replication, when publish_generated_columns=false');
```

Also, I've tested the case both pub and sub had the generated column, but the ERROR was strange for me.

```
test_pub=# CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED);
test_pub=# CREATE PUBLICATION pub FOR TABLE gencoltable(a, b) WITH (publish_generated_columns = true);
test_pub=# INSERT INTO gencoltable (a) VALUES (generate_series(1, 10));

test_sub=# CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED);
test_sub=# CREATE SUBSCRIPTION sub CONNECTION ... PUBLICATION pub;

-> ERROR: logical replication target relation "public.gencoltable" is missing replicated column: "b"
```

The attribute existed on the sub but it was reported as missing column. I think
we should somehow report like "generated column on publisher is replicated the
generated column on the subscriber".

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Fri, 1 Nov 2024 at 13:27, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Vignesh,
>
> Thanks for rebasing the patch! Before reviewing deeply, I want to confirm the specification.
> I understood like below based on the documentation.
>
> - If publish_generated_columns is false, the publication won't replicate generated columns
> - If publish_generated_columns is true, the behavior on the subscriber depends on the table column:
>   - If it is a generated column even on the subscriber, it causes an ERROR.
>   - If it is a regular column, the generated value is replicated.
>   - If the column is missing, it causes an ERROR.

This is correct.

> However, below test implies that generated columns can be replicated even if
> publish_generated_columns is false. Not sure...
>
> ```
> # Verify that incremental replication of generated columns occurs
> # when they are included in the column list, regardless of the
> # publish_generated_columns option.
> $result =
>   $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3 ORDER BY a");
> is( $result, qq(|2
> |4
> |6
> |8),
>         'tab3 incremental replication, when publish_generated_columns=false');
> ```

Yes, this is a special case where the column list will take priority
over the publish_generated_columns option. The same was discussed at
[1].

> Also, I've tested the case both pub and sub had the generated column, but the ERROR was strange for me.
>
> ```
> test_pub=# CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED);
> test_pub=# CREATE PUBLICATION pub FOR TABLE gencoltable(a, b) WITH (publish_generated_columns = true);
> test_pub=# INSERT INTO gencoltable (a) VALUES (generate_series(1, 10));
>
> test_sub=# CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED);
> test_sub=# CREATE SUBSCRIPTION sub CONNECTION ... PUBLICATION pub;
>
> -> ERROR: logical replication target relation "public.gencoltable" is missing replicated column: "b"
> ```
>
> The attribute existed on the sub but it was reported as missing column. I think
> we should somehow report like "generated column on publisher is replicated the
> generated column on the subscriber".

Agree on this, we will include a fix for this in one of the upcoming versions.

[1] - https://www.postgresql.org/message-id/CAA4eK1JgdyLYGo%2BG%3Db90VCqpbtwGMV8Su5Cuafo_hByWNTbkBg%40mail.gmail.com

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Thu, Oct 31, 2024 at 3:16 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, 30 Oct 2024 at 15:06, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Oct 29, 2024 at 8:50 PM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > Thank you for reporting this issue. The attached v46 patch addresses
> > > the problem and includes some adjustments to the comments. Thanks to
> > > Amit for sharing the comment changes offline.
> > >
> >
> > Pushed. Kindly rebase and send the remaining patches.
>
> Thanks for committing this patch, here is a rebased version of the
> remaining patches.
>

Here are some review comments for the patch v1-0003 (tap tests)

======
src/test/subscription/t/011_generated.pl

1.
+# The following combinations are tested:
+# - Publication pub1 on the 'postgres' database with the option
+#   publish_generated_columns set to false.
+# - Publication pub2 on the 'postgres' database with the option
+#   publish_generated_columns set to true.
+# - Subscription sub1 on the 'postgres' database for publication pub1.
+# - Subscription sub2 on the 'test_pgc_true' database for publication pub2.

Those aren't really "combinations" anymore. That's just describing how
these pub/sub tests are configured.

/The following combinations are tested:/The test environment is set up
as follows:/

~~~

2.
+# Wait for the initial synchronization of the 'regress_sub1_gen_to_nogen'
+# subscription in the 'postgres' database.
+$node_subscriber->wait_for_subscription_sync($node_publisher,
+ 'regress_sub1_gen_to_nogen', 'postgres');
+
+# Wait for the initial synchronization of the 'regress_sub2_gen_to_nogen'
+# subscription in the 'test_pgc_true' database.
+$node_subscriber->wait_for_subscription_sync($node_publisher,
+ 'regress_sub2_gen_to_nogen', 'test_pgc_true');
+

These detailed descriptions are not adding much value here. Just
combining these and saying "Wait for the initial synchronization of
both subscriptions" would have been enough, I think.

~~~

3.
+# =============================================================================
+# The following test cases demonstrate the behavior of generated column
+# replication with publish_generated_columns set to false and true:
+# Test: Publication column list includes generated columns when
+# publish_generated_columns is set to false.
+# Test: Publication column list excludes generated columns when
+# publish_generated_columns is set to false.
+# Test: Publication column list includes generated columns when
+# publish_generated_columns is set to true.
+# Test: Publication column list excludes generated columns when
+# publish_generated_columns is set to true.
+# =============================================================================

Some extra spacing and minor rewording would make this unreadable
comment readable. e.g.

# =============================================================================
# The following test cases demonstrate the behavior of generated column
# replication with publish_generated_columns set to false and true:
#
# When publish_generated_columns is set to false...
# Test: Publication column list includes generated columns
# Test: Publication column list excludes generated columns
#
# When publish_generated_columns is set to true...
# Test: Publication column list includes generated columns
# Test: Publication column list excludes generated columns
# =============================================================================


====

1st test:

4.
+# Create table and publications.
+$node_publisher->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE tab2 (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
+ CREATE TABLE tab3 (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
+ CREATE PUBLICATION pub1 FOR table tab2, tab3(gen1) WITH
(publish_generated_columns=false);
+));
+

4a.
/Create table/Create tables/

~

4b.
TBH, I am not sure why you are including the table 'tab2' like this,
because the test case for replication without any column list at all
was already tested in your earlier tests. AFAICT 'tab2' should also
have a column list, but one that *exlcudes* the gencols. After all,
that's what the main comment said you were going to test.

~~~

5.
+# Create table and subscription.
+$node_subscriber->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE tab2 (a int, gen1 int);
+ CREATE TABLE tab3 (a int, gen1 int);
+ CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION
pub1 WITH (copy_data = true);
+));

/Create table/Create tables/

~~~

6.
+# Verify that the initial synchronization of generated columns is not
replicated
+# when they are not included in the column list, regardless of the
+# publish_generated_columns option.
+$result =
+  $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2 ORDER BY a");
+is( $result, qq(1|
+2|),
+ 'tab2 initial sync, when publish_generated_columns=false');
+

This comment doesn't make much sense to me. E.g.
a) IIUC tab2 should have used a column list that "excludes generated
columns". After all, that's what the main comment said you were going
to test.
b) the option is already false, so saying "... regardless of the
publish_generated_columns option" doesn't really mean anything

~~~

7.
+# Verify that incremental replication of generated columns does not occur
+# when they are not included in the column list, regardless of the
+# publish_generated_columns option.
+$node_publisher->wait_for_catchup('sub1');
+$result =
+  $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2 ORDER BY a");
+is( $result, qq(1|
+2|
+3|
+4|),
+ 'tab2 incremental replication, when publish_generated_columns=false');
+

This comment has the same issues described in the above review comment #6.

====

2nd test:

8.
+# --------------------------------------------------
+# Test Case: Even when publish_generated_columns is set to true, the publisher
+# only publishes the data of columns specified in the column list,
+# skipping other generated and non-generated columns.
+# --------------------------------------------------
+

This 2nd test has lots of the same problems as the first test.

For example:
- /# Create table and publications./# Create tables and publications./
- IMO 'tab4' should also have a publications column list, but one that
does not include gencols.
- /# Create table and subscription./# Create tables and subscription./

~~~

9.
+# Initial sync test when publish_generated_columns=true.

Why did you use a simple comment here, but a much more complicated
comment for the same scenario in the 1st test?

~~~

10.
+# Incremental replication test when publish_generated_columns=true.
+# Verify that column 'gen1' is replicated.
+$node_publisher->wait_for_catchup('sub1');
+$result =
+  $node_subscriber->safe_psql('postgres', "SELECT * FROM tab4 ORDER BY a");
+is( $result, qq(1|2
+2|4
+3|6
+4|8),
+ 'tab4 incremental replication, when publish_generated_columns=true');
+$result =
+  $node_subscriber->safe_psql('postgres', "SELECT * FROM tab5 ORDER BY a");
+is( $result, qq(|2
+|4
+|6
+|8),
+ 'tab5 incremental replication, when publish_generated_columns=true');
+

AFAICT the table 'tab4' also should have a column list, but one that
excludes the gencol. After all, that's what the main comment said you
were going to test. So this test comment and the tab4 test part is
currently broken.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Thu, Oct 31, 2024 at 4:26 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
> 5. Verified that publications with different column list are disallowed to be subscribed by one subscription
>    a. PUB_A(column list = (a, b)) PUB_B(no column list, with publish_generated_column) - OK
>    b. PUB_A(column list = (a, b)) PUB_B(no column list, without publish_generated_column) - FAIL
>    c.  PUB_A(no column list, without publish_generated_column) PUB_B(no column list, with publish_generated_column) -
FAIL
>
> Tests did not show any unexpected behaviour.
>

Thanks for the tests, but the results of step 5 do not clearly show
whether they are correct because you haven't shared the table schema.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Ajin Cherian
Date:
On Mon, Nov 4, 2024 at 2:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Oct 31, 2024 at 4:26 PM Ajin Cherian <itsajin@gmail.com> wrote:
> >
> > 5. Verified that publications with different column list are disallowed to be subscribed by one subscription
> >    a. PUB_A(column list = (a, b)) PUB_B(no column list, with publish_generated_column) - OK
> >    b. PUB_A(column list = (a, b)) PUB_B(no column list, without publish_generated_column) - FAIL
> >    c.  PUB_A(no column list, without publish_generated_column) PUB_B(no column list, with publish_generated_column)
-FAIL 
> >
> > Tests did not show any unexpected behaviour.
> >
>
> Thanks for the tests, but the results of step 5 do not clearly show
> whether they are correct because you haven't shared the table schema.
>

Here are the tests:
5. Verified that publications with different column list are
disallowed to be subscribed by one subscription
   a. PUB_A(column list = (a, b)) PUB_B(no column list, with
publish_generated_column) - OK
PUB:
CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
CREATE PUBLICATION pub1 FOR table gencols with (publish_generated_columns=true);
CREATE PUBLICATION pub2 FOR table gencols(a,gen1);

SUB:
postgres=# CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres
host=localhost port=6972' PUBLICATION pub1, pub2;
NOTICE:  created replication slot "sub1" on publisher
CREATE SUBSCRIPTION


   b. PUB_A(column list = (a, b)) PUB_B(no column list, without
publish_generated_column) - FAIL
PUB:
CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
CREATE PUBLICATION pub1 FOR table gencols with
(publish_generated_columns=false);
CREATE PUBLICATION pub2 FOR table gencols(a,gen1);

SUB:
postgres=# CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres
host=localhost port=6972' PUBLICATION pub1, pub2;
ERROR:  cannot use different column lists for table "public.gencols"
in different publications

   c.  PUB_A(no column list, without publish_generated_column)
PUB_B(no column list, with publish_generated_column) - FAIL
PUB:
CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
CREATE PUBLICATION pub1 FOR table gencols with
(publish_generated_columns=false);
CREATE PUBLICATION pub2 FOR table gencols with (publish_generated_columns=true);

SUB:
postgres=# CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres
host=localhost port=6972' PUBLICATION pub1, pub2;
ERROR:  cannot use different column lists for table "public.gencols"
in different publications

regards,
Ajin Cherian
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Mon, Nov 4, 2024 at 12:28 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Thu, 31 Oct 2024 at 16:44, Ajin Cherian <itsajin@gmail.com> wrote:
> >
> >
> >
> > On Thu, Oct 31, 2024 at 9:55 PM Ajin Cherian <itsajin@gmail.com> wrote:
> >>
> >> I ran some tests and verified that the patch works with previous versions of PG12 and PG17
> >> 1. Verified with publications with generated columns and without generated columns on patched code and
subscriptionson PG12 and PG17 
> >> Observations:
> >>     a. If publication is created with publish_generated_columns=true or with generated columns mentioned
explicitly,then tablesync will not copy generated columns but post tablesync the generated columns are replicated 
> >>     b. Column list override (publish_generated_columns=false) behaviour
> >>
> >> These seem expected.
> >>
> >
> > Currently the documentation does not talk about this behaviour, I suggest this be added similar to how such a
behaviourwas documented when the original row-filter version was committed. 
> > Suggestion:
> > "If a subscriber is a pre-18 version, the initial table synchronization won't publish generated columns even if
theyare defined in the publisher." 
>
> The updated patch has the changes for the same.

Hi Vignesh,

Thanks for the latest doc v2 "fix" patch. Here are my review comments about it.

======
src/sgml/logical-replication.sgml

1.
    During initial data synchronization, only the published columns are
    copied.  However, if the subscriber is from a release prior to 15, then
    all the columns in the table are copied during initial data synchronization,
-   ignoring any column lists.
+   ignoring any column lists. If the subscriber is from a release prior to 18,
+   then initial table synchronization won't copy generated columns data even if
+   they are defined in the publisher.

There are some inconsistencies with the markup etc.

a) For publication row filters the text about Initial Synchronization
version differences is using SGML <Note> markup. But, for "Column
Lists" the similar text about Initial Synchronization version
differences is just plain paragraph text. So, shouldn't this also be
using a <Note> markup for better documentation consistency?

b) I also thought "even if they are defined in the publisher" wording
seems like it is referring about the table definition, but IMO it
needs to convey something more like "even when they are published"

SUGGESTION
If the subscriber is from a release prior to 18, copy pre-existing
data does not copy generated columns even when they are published.
This is because old releases ignore generated table data during the
copy.

~~

Furthermore, we will have to write something more about this in the
main patch still being developed, because the same initial sync caveat
is true even for publication of generated columns published *without*
column lists.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Mon, Nov 4, 2024 at 10:30 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Mon, Nov 4, 2024 at 12:28 AM vignesh C <vignesh21@gmail.com> wrote:
>
> Thanks for the latest doc v2 "fix" patch. Here are my review comments about it.
>
> ======
> src/sgml/logical-replication.sgml
>
> 1.
>     During initial data synchronization, only the published columns are
>     copied.  However, if the subscriber is from a release prior to 15, then
>     all the columns in the table are copied during initial data synchronization,
> -   ignoring any column lists.
> +   ignoring any column lists. If the subscriber is from a release prior to 18,
> +   then initial table synchronization won't copy generated columns data even if
> +   they are defined in the publisher.
>
> There are some inconsistencies with the markup etc.
>
> a) For publication row filters the text about Initial Synchronization
> version differences is using SGML <Note> markup. But, for "Column
> Lists" the similar text about Initial Synchronization version
> differences is just plain paragraph text. So, shouldn't this also be
> using a <Note> markup for better documentation consistency?
>

I don't think both are comparable as the row filters section has a
separate sub-section for Initial Data Synchronization. In general, I
find the way things are described in the Column Lists sub-section more
like other parts of the documentation. Moreover, this patch has just
extended the existing docs.

> b) I also thought "even if they are defined in the publisher" wording
> seems like it is referring about the table definition, but IMO it
> needs to convey something more like "even when they are published"
>
> SUGGESTION
> If the subscriber is from a release prior to 18, copy pre-existing
> data does not copy generated columns even when they are published.
> This is because old releases ignore generated table data during the
> copy.
>

The second line says something obvious and doesn't seem to be
required. The change "even when they are published" is debatable as I
didn't read the way you read Vignesh's proposed wording, to me it was
clear what the doc is saying. I have already pushed Vignesh's version
with a minor modification.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Fri, Nov 1, 2024 at 7:10 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
>
> ======
> doc/src/sgml/protocol.sgml
>
> 3.
>       <para>
> -      Next, one of the following submessages appears for each column:
> +      Next, one of the following submessages appears for each column
> (except generated columns):
>
> Hmm. Now that generated column replication is supported is this change
> still required?
>

How about changing it to: "Next, one of the following submessages
appears for each published column:"? This is because the column may
not be sent because either it is not in the column list or a generated
one (with publish_generated_columns as false for respective
publication).

> ======
> doc/src/sgml/ref/create_publication.sgml
>
> 4.
> +
> +       <varlistentry
> id="sql-createpublication-params-with-publish-generated-columns">
> +        <term><literal>publish_generated_columns</literal>
> (<type>boolean</type>)</term>
> +        <listitem>
> +         <para>
> +          Specifies whether the generated columns present in the tables
> +          associated with the publication should be replicated.
> +          The default is <literal>false</literal>.
> +         </para>
> +        </listitem>
> +       </varlistentry>
> +
>
> I know that the subsequent DOCS patch V1-0002 will explain more about
> this, but as a stand-alone patch 0001 maybe you need to clarify that a
> publication column list will override this 'publish_generated_columns'
> parameter.
>

It is better to leave it to 0002 patch. But note in that patch, we
should add some reference link for the column_list behavior in the
create publication page as well.


> ======
> src/backend/catalog/pg_publication.c
>
>
> pub_getallcol_bitmapset:
>
> 6.
> +/*
> + * Return a column list bitmap for the specified table.
> + *
> + * Generated columns are included if pubgencols is true.
> + *
> + * If mcxt isn't NULL, build the bitmapset in that context.
> + */
> +Bitmapset *
> +pub_getallcol_bitmapset(Relation relation, bool pubgencols,
> + MemoryContext mcxt)
>
> IIUC this is a BMS of the table columns to be published. The function
> comment seems confusing to me when it says "column list bitmap"
> because IIUC this function is not really anything to do with a
> publication "column list", which is an entirely different thing.
>

We can probably name it pub_form_cols_map() and change the comments accordingly.

> ======
> src/backend/replication/logical/proto.c
>
> 7.
>  static void logicalrep_write_attrs(StringInfo out, Relation rel,
> -    Bitmapset *columns);
> +    Bitmapset *columns, bool pubgencols);
>  static void logicalrep_write_tuple(StringInfo out, Relation rel,
>      TupleTableSlot *slot,
> -    bool binary, Bitmapset *columns);
> +    bool binary, Bitmapset *columns,
> +    bool pubgencols);
>
> The meaning of all these new 'pubgencols' are ambiguous. e.g. Are they
> (a) The value of the CREATE PUBLICATION 'publish_generate_columns'
> parameter, or does it mean (b) Just some generated column is being
> published (maybe via column list or maybe not).
>
> I think it means (a) but, if true, that could be made much more clear
> by changing all of these names to 'pubgencols_option' or something
> similar.  Actually, now I have doubts about that also -- I think this
> might be magically assigned to false if no generated columns exist in
> the table. Anyway, please do whatever you can to disambiguate this.
>

To make it clear we can name this parameter as include_gencols.
Similarly, change the name of RelationSyncEntry's new member.
> ~~~
>
> 9.
> bool
> logicalrep_should_publish_column(Form_pg_attribute att, Bitmapset *columns,
> bool pubgencols)
> {
> if (att->attisdropped)
> return false;
>
> /*
> * Skip publishing generated columns if they are not included in the
> * column list or if the option is not specified.
> */
> if (!columns && !pubgencols && att->attgenerated)
> return false;
>
> /*
> * Check if a column is covered by a column list.
> */
> if (columns && !bms_is_member(att->attnum, columns))
> return false;
>
> return true;
> }
>
> Same as mentioned before in my previous v46-0001 review comments, I
> feel that the conditionals of this function are over-complicated and
> that there are more 'return' points than necessary. The alternative
> code below looks simpler to me.
>
> SUGGESTION
> bool
> logicalrep_should_publish_column(Form_pg_attribute att, Bitmapset *columns,
> bool pubgencols_option)
> {
>   if (att->attisdropped)
>     return false;
>
>   if (columns)
>   {
>     /*
> * Has a column list:
> * Publish only cols named in that list.
> */
>     return bms_is_member(att->attnum, columns);
>   }
>   else
>   {
>     /*
>      * Has no column list:
> * Publish generated cols only if 'publish_generated_cols' is true.
>      * Publish all non-generated cols.
>     */
>     return att->attgenerated ? pubgencols_option : true;
>   }
> }
>

Fair enough but do we need else in the above code?

> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 10.
> + /* Include publishing generated columns */
> + bool pubgencols;
> +
>
> There is similar ambiguity here with this field-name as was mentioned
> about for other 'pbgencols' function params. I had initially thought
> that this this just caries around same value as the publication option
> 'publish_generated_columns' but now (after looking at function
> check_and_init_gencol) I think that might not be the case because I
> saw it can be assigned false (overriding the publication option?).
>
> Anyway, the comment needs to be made much clearer about what is the
> true meaning of this field. Or, rename it if there is a better name.
>

As suggested above, we can name it as include_gencols.

>
> send_relation_and_attrs:
>
> 12.
> - if (!logicalrep_should_publish_column(att, columns))
> + if (!logicalrep_should_publish_column(att, columns, relentry->pubgencols))
>   continue;
> It seemed a bit strange/inconsistent that 'columns' was assigned to a
> local var, but 'pubgencols' was not, given they are both fields of the
> same struct. Maybe removing this 'columns' var would be consistent
> with other code in this patch.
>

I think the other way would be better. I mean take another local
variable for this function. We don't need to always do the same in
such cases.

> ~~~
>
> 13.
> check_and_init_gencol:
>
> nit - missing periods for comments.
>
> ~~~
>
> 14.
> + /* There is no generated columns to be published *
>
> /There is no generated columns/There are no generated columns/
>
> ~~~
>
> 15.
> + foreach(lc, publications)
> + {
> + Publication *pub = lfirst(lc);
>
> AFAIK this can be re-written using a different macro to avoid needing
> the 'lc' var.
>
> ~~~
>
> pgoutput_column_list_init:
>
> 16.
> + bool collistpubexist = false;
>
> This seemed like not a very good name, How about 'found_pub_with_collist';
>
> ~~~
>
> 17.
> bool pub_no_list = true;
>
> nit - Not caused by this patch, but it's closely related; In passing
> we should declare this variable at a lower scope, and rename it to
> 'isnull' which is more in keeping with the comments around it.
>

Moving to local scope is okay but doing more than that in this patch
is not advisable even if your suggestion is a good idea which I am not
sure.


> ~
>
> 20b.
> There is a GENERAL problem that applies for lots of comments of this
> patch (including this comment) because the new publication option is
> referred to inconsistently in many different ways:
>
> e.g.
> - the generated columns option.
> - if the option is not specified
> - publish_generated_columns option.
> - the pubgencols option
> - 'publish_generated_columns' option
>
> All these references should be made the same. My personal preference
> is the last one ('publish_generated_columns' option).
>

I have responded with a better name for other places. Here, the
proposed name seems okay to me.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Wed, Oct 30, 2024 at 9:46 PM vignesh C <vignesh21@gmail.com> wrote:
>
...
+ /*
+ * For non-column list publications—such as TABLE (without a column
+ * list), ALL TABLES, or ALL TABLES IN SCHEMA publications consider
+ * all columns of the table, including generated columns, based on the
+ * pubgencols option.
+ */
+ if (!cols)
+ {
+ Assert(pub->pubgencols == entry->pubgencols);
+
+ /*
+ * Retrieve the columns if they haven't been prepared yet, or if
+ * there are multiple publications.
+ */
+ if (!relcols && (list_length(publications) > 1))
+ {
+ pgoutput_ensure_entry_cxt(data, entry);
+ relcols = pub_getallcol_bitmapset(relation, entry->pubgencols,
+   entry->entry_cxt);
+ }
+
+ cols = relcols;

Don't we need this only when generated column(s) are present, if so,
we can get that as an input to pgoutput_column_list_init()? We have
already computed that in the function check_and_init_gencol() which is
invoked just before pgoutput_column_list_init().

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Mon, 4 Nov 2024 at 16:25, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Oct 30, 2024 at 9:46 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> ...
> + /*
> + * For non-column list publications—such as TABLE (without a column
> + * list), ALL TABLES, or ALL TABLES IN SCHEMA publications consider
> + * all columns of the table, including generated columns, based on the
> + * pubgencols option.
> + */
> + if (!cols)
> + {
> + Assert(pub->pubgencols == entry->pubgencols);
> +
> + /*
> + * Retrieve the columns if they haven't been prepared yet, or if
> + * there are multiple publications.
> + */
> + if (!relcols && (list_length(publications) > 1))
> + {
> + pgoutput_ensure_entry_cxt(data, entry);
> + relcols = pub_getallcol_bitmapset(relation, entry->pubgencols,
> +   entry->entry_cxt);
> + }
> +
> + cols = relcols;
>
> Don't we need this only when generated column(s) are present, if so,
> we can get that as an input to pgoutput_column_list_init()?

We will use this in all cases i.e. irrespective of generated columns present:
ex:
CREATE TABLE t1(c1 int, c2 int);
create publication pub1 for table t1(c1);
create publication pub2 for table t1;

Create subscription ... publication pub1,pub2;

Even in this case we will have to identify that column list is not
matching and throw:
2024-11-04 20:35:58.199 IST [492190] 492190 sub1 ERROR:  cannot use
different column lists for table "public.t1" in different publications

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Fri, 1 Nov 2024 at 09:23, Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Thu, Oct 31, 2024 at 3:16 AM vignesh C <vignesh21@gmail.com> wrote:
>
> > Thanks for committing this patch, here is a rebased version of the
> > remaining patches.
> >
>
> Hi Vignesh.
>
> Here are my review comments for the docs patch v1-0002.
>
> ======
> Commit message
>
> 1.
> This patch updates docs to describe the new feature allowing
> replication of generated
> columns. This includes addition of a new section "Generated Column
> Replication" to the
> "Logical Replication" documentation chapter.
>
> ~
>
> That first sentence was correct previously when this patch contained
> *all* the gencols documentation, but now some of the feature docs are
> already handled by previous patches, so the first sentence can be
> removed.
>
> Now patch 0002 is only for adding the new chapter, plus the references to it.
>
> ~
>
> /This includes addition of a new section/This patch adds a new section/

Modified

> ======
> doc/src/sgml/protocol.sgml
>
> 2.
>       <para>
> -      Next, one of the following submessages appears for each column
> (except generated columns):
> +      Next, one of the following submessages appears for each column:
>
> AFAIK this simply cancels out a change from the v1-0001 patch which
> IMO should have not been there in the first place. Please refer to my
> v1-0001 review for the same.

Removed it.

The changes for the same are available at v47 version patch attached
at [1]. I have not included the 0003 patch for now, I will include
once these two patch stabilizes.
[1] - https://www.postgresql.org/message-id/CALDaNm2sNfZoFfqOKq9GAjQZd3isqosij9iHaJjn7oQVmLLNYw%40mail.gmail.com

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Vignesh,

Here are my review comments for your latest patch v47-0001.

======
doc/src/sgml/ddl.sgml

1.
      <para>
-      Generated columns can be replicated during logical replication by
-      including them in the column list of the
-      <command>CREATE PUBLICATION</command> command.
+      Generated columns are allowed to be replicated during logical replication
+      according to the <command>CREATE PUBLICATION</command> option
+      <link linkend="sql-createpublication-params-with-publish-generated-columns">
+      <literal>include_generated_columns</literal></link> or by including them
+      in the column list of the <command>CREATE PUBLICATION</command> command.
      </para>

1a.
This text gives the wrong name for the new parameter.
/include_generated_columns/publish_generated_columns/

~

1b.
Everywhere in this patch (except here), this is called the
'publish_generated_columns' parameter (not "option") so it should be
called a parameter here also. Anyway, apparently that is the docs rule
-- see [1].

BTW, the same applies for the commit message 1st line of this patch:
[PATCH v47 1/2] Enable support for 'publish_generated_columns' option.
Should be
[PATCH v47 1/2] Enable support for 'publish_generated_columns' parameter.

======
doc/src/sgml/protocol.sgml

2.
-      Next, one of the following submessages appears for each column:
+      Next, one of the following submessages appears for each published column:

The change is OK. But, note that there are other descriptions just
like this one on the same page, so if you are going to say "published"
here, then to be consistent you probably want to consider updating the
other places as well.

======
src/backend/catalog/pg_publication.c

3.
+bool
+has_column_list_defined(Publication *pub, Oid relid)
+{
+ HeapTuple cftuple = NULL;
+ bool isnull = true;

Since you chose not to rearrange the HeapTupleIsValid check, this
'isnull' declaration should be relocated within the if-block.

======
src/backend/replication/logical/proto.c

4.
 /*
  * Check if the column 'att' of a table should be published.
  *
- * 'columns' represents the column list specified for that table in the
- * publication.
+ * 'columns' represents the publication column list (if any) for that table.
  *
- * Note that generated columns can be present only in 'columns' list.
+ * Note that generated columns can be published only when present in a
+ * publication column list, or when include_gencols is true.
  */
 bool
-logicalrep_should_publish_column(Form_pg_attribute att, Bitmapset *columns)
+logicalrep_should_publish_column(Form_pg_attribute att, Bitmapset *columns,
+ bool include_gencols)

The function comment describes 'columns' but it doesn't describe
'include_gencols'. I think knowing more about that parameter would be
helpful.

SUGGESTION:
The 'include_gencols' flag indicates whether generated columns should
be published when there is no column list. Typically, this will have
the same value as the 'publish_generated_columns' publication
parameter.

======
src/backend/replication/logical/relation.c

5.
@@ -421,7 +421,7 @@ logicalrep_rel_open(LogicalRepRelId remoteid,
LOCKMODE lockmode)
  int attnum;
  Form_pg_attribute attr = TupleDescAttr(desc, i);

- if (attr->attisdropped || attr->attgenerated)
+ if (attr->attisdropped)
  {
  entry->attrmap->attnums[i] = -1;
  continue;
@@ -432,7 +432,15 @@ logicalrep_rel_open(LogicalRepRelId remoteid,
LOCKMODE lockmode)

  entry->attrmap->attnums[i] = attnum;
  if (attnum >= 0)
+ {
+ if (attr->attgenerated)
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("replicating to a target relation's generated column \"%s\"
for \"%s.%s\" is not supported",
+    NameStr(attr->attname), remoterel->nspname, remoterel->relname));
+
  missingatts = bms_del_member(missingatts, attnum);
+ }

Hmm. I think this more descriptive error is a good improvement over
the previous "missing" error, but I just don't think it belongs in
this patch. This is impacting the existing "regular" ==> "generated"
replication as well, which seems out-of-scope for this gencols patch.

IMO this ought to be made as a separate patch that can be pushed to
master separately/independently *before* any of this new gencols
stuff.

Also, you already said in the commit message:
* Publisher not-generated column => subscriber generated column:
  This will give ERROR (not changed by this patch).

So the "not changed by this patch" part is not true if these changes
are included.

======
src/backend/replication/pgoutput/pgoutput.c

6.
+ /*
+ * Include publishing generated columns if 'publish_generated_columns'
+ * parameter is set to true, this will be set only if the relation
+ * contains any generated column.
+ */
+ bool include_gencols;
+

Minor rewording.

SUGGESTION:
Include generated columns for publication is set true if
'publish_generated_columns' parameter is true, and the relation
contains generated columns.

~~~

7.
+ /*
+ * Retrieve the columns if they haven't been prepared yet, and
+ * only if multiple publications exist.
+ */
+ if (!relcols && (list_length(publications) > 1))
+ {
+ pgoutput_ensure_entry_cxt(data, entry);
+ relcols = pub_form_cols_map(relation, entry->include_gencols,
+ entry->entry_cxt);
+ }

IIUC the purpose of this is for ensuring that the column lists are
consistent across all publications. That is why we only do this when
there are > 1 publications. For the 1st publication with no column
list we cache all the columns (in 'relcols') so later the cols of the
*current* publication (in 'cols') can be checked to see if they are
the same.

TBH, I think this part needs to have more explanation because it's a
bit too subtle; you have to read between the lines to figure out what
it is doing instead of just having a comment to clearly describe the
logic up-front.

======
[1] option versus parameter -
https://www.postgresql.org/message-id/CAKFQuwZVJ%2B_Z0pMX%3DBBKF9A6skVqiv89gxEgFOX7cwtWJj-Ccw%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Vignesh,

Here are my review comments for the v47-0002 (DOCS) patch.

======
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 577bcb4b71..a13f19bdbe 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -517,7 +517,8 @@ CREATE TABLE people (
       Generated columns are allowed to be replicated during logical replication
       according to the <command>CREATE PUBLICATION</command> option
       <link linkend="sql-createpublication-params-with-publish-generated-columns">
-      <literal>include_generated_columns</literal></link>.
+      <literal>include_generated_columns</literal></link>. See
+      <xref linkend="logical-replication-gencols"/> for details.
      </para>
     </listitem>
    </itemizedlist>

Previously (in v1-0002) above there was a link to the new gencols
section ("See XXX for details"), but in v47 that link is no longer
included. Why not?

======
doc/src/sgml/ref/create_publication.sgml

-      lists.
+      lists. See <xref linkend="logical-replication-gencols-howto"/> for more
+      information on the logical replication of generated columns using a
+      column list publication.
      </para>

I don't really think this change is necessary.

The existing paragraph already says "When a column list is specified,
only the named columns are replicated.", so there is nothing special
more than that which we really need to say for generated columns.

Also, this paragraph already has a link to the "Column List" chapter
for more details, so if the user really wants to learn about column
lists which happen to have generated columns in them, then that's
where they should look. and there is a link to the new chapter 29.6
from there.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Tue, Nov 5, 2024 at 7:00 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
>
> ~
>
> 1b.
> Everywhere in this patch (except here), this is called the
> 'publish_generated_columns' parameter (not "option") so it should be
> called a parameter here also. Anyway, apparently that is the docs rule
> -- see [1].
>

In the thread you linked, we have decided to name 'failover' an
option. I feel the same should be followed here but I agree that we
should spell it consistently throughout the patch.

>
> ======
> doc/src/sgml/protocol.sgml
>
> 2.
> -      Next, one of the following submessages appears for each column:
> +      Next, one of the following submessages appears for each published column:
>
> The change is OK. But, note that there are other descriptions just
> like this one on the same page, so if you are going to say "published"
> here, then to be consistent you probably want to consider updating the
> other places as well.
>

Are you referring to the existing message: "Next, the following
message part appears for each column included in the publication:"? If
so, we can change it to make it the same but the current one also
looks okay. We can consider changing it separately if required after
this patch.

> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 6.
> + /*
> + * Include publishing generated columns if 'publish_generated_columns'
> + * parameter is set to true, this will be set only if the relation
> + * contains any generated column.
> + */
> + bool include_gencols;
> +
>
> Minor rewording.
>
> SUGGESTION:
> Include generated columns for publication is set true if
>

/set true/set to true

>
> ======
> [1] option versus parameter -
> https://www.postgresql.org/message-id/CAKFQuwZVJ%2B_Z0pMX%3DBBKF9A6skVqiv89gxEgFOX7cwtWJj-Ccw%40mail.gmail.com
>

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Tue, 5 Nov 2024 at 07:55, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Vignesh,
>
> Here are my review comments for the v47-0002 (DOCS) patch.
>
> ======
> diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
> index 577bcb4b71..a13f19bdbe 100644
> --- a/doc/src/sgml/ddl.sgml
> +++ b/doc/src/sgml/ddl.sgml
> @@ -517,7 +517,8 @@ CREATE TABLE people (
>        Generated columns are allowed to be replicated during logical replication
>        according to the <command>CREATE PUBLICATION</command> option
>        <link linkend="sql-createpublication-params-with-publish-generated-columns">
> -      <literal>include_generated_columns</literal></link>.
> +      <literal>include_generated_columns</literal></link>. See
> +      <xref linkend="logical-replication-gencols"/> for details.
>       </para>
>      </listitem>
>     </itemizedlist>
>
> Previously (in v1-0002) above there was a link to the new gencols
> section ("See XXX for details"), but in v47 that link is no longer
> included. Why not?

Included it now.

> ======
> doc/src/sgml/ref/create_publication.sgml
>
> -      lists.
> +      lists. See <xref linkend="logical-replication-gencols-howto"/> for more
> +      information on the logical replication of generated columns using a
> +      column list publication.
>       </para>
>
> I don't really think this change is necessary.
>
> The existing paragraph already says "When a column list is specified,
> only the named columns are replicated.", so there is nothing special
> more than that which we really need to say for generated columns.
>
> Also, this paragraph already has a link to the "Column List" chapter
> for more details, so if the user really wants to learn about column
> lists which happen to have generated columns in them, then that's
> where they should look. and there is a link to the new chapter 29.6
> from there.

Removed it.

The v48 version patch attached at [1] has the changes for the same.

[1] - https://www.postgresql.org/message-id/CALDaNm3Ha5t9bOLJ7OBnaMRgYHX_Q4j9k3EbRsX%3D%2B1mxUo5BZw%40mail.gmail.com

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Vignesh,

Here are my review comments for patch v48-0001.

======
src/backend/catalog/pg_publication.c

has_column_list_defined:

1.
+ if (HeapTupleIsValid(cftuple))
+ {
+ bool isnull = true;
+
+ /* Lookup the column list attribute. */
+ (void) SysCacheGetAttr(PUBLICATIONRELMAP, cftuple,
+    Anum_pg_publication_rel_prattrs,
+    &isnull);

AFAIK it is not necessary to assign a default value to 'isnull' here.
e.g. most of the other 100s of calls to SysCacheGetAttr elsewhere in
PostgreSQL source don't bother to do this.

//////////

I also checked the docs patch v48-0002. That now looks good to me.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
vignesh C
Date:
On Tue, 5 Nov 2024 at 12:32, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Vignesh,
>
> Here are my review comments for patch v48-0001.
>
> ======
> src/backend/catalog/pg_publication.c
>
> has_column_list_defined:
>
> 1.
> + if (HeapTupleIsValid(cftuple))
> + {
> + bool isnull = true;
> +
> + /* Lookup the column list attribute. */
> + (void) SysCacheGetAttr(PUBLICATIONRELMAP, cftuple,
> +    Anum_pg_publication_rel_prattrs,
> +    &isnull);
>
> AFAIK it is not necessary to assign a default value to 'isnull' here.
> e.g. most of the other 100s of calls to SysCacheGetAttr elsewhere in
> PostgreSQL source don't bother to do this.

This is fixed in the v49 version patch attached at [1].

[1] - https://www.postgresql.org/message-id/CALDaNm3XV5mAeZzZMkOPSPieANMaxOH8xAydLqf8X5PQn%2Ba5EA%40mail.gmail.com

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Vignesh,

Here are my review comments for patch v49-0001.

======
src/backend/catalog/pg_publication.c

1. check_fetch_column_list

+bool
+check_fetch_column_list(Publication *pub, Oid relid, MemoryContext mcxt,
+ Bitmapset **cols)
+{
+ HeapTuple cftuple = NULL;
+ Datum cfdatum = 0;
+ bool found = false;
+

1a.
The 'cftuple' is unconditionally assigned; the default assignment
seems unnecessary.

~

1b.
The 'cfdatum' can be declared in a lower scope (in the if-block).
The 'cfdatum' is unconditionally assigned; the default assignment
seems unnecessary.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Wed, Nov 6, 2024 at 7:34 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Vignesh,
>
> Here are my review comments for patch v49-0001.
>

I have a question on the display of this new parameter.

postgres=# \dRp+
                                      Publication pub_gen
  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via
root | Generated columns
----------+------------+---------+---------+---------+-----------+----------+-------------------
 KapilaAm | f          | t       | t       | t       | t         | f        | t
Tables:
    "public.test_gen"

The current theory for the display of the "Generated Columns" option
could be that let's add new parameters at the end which sounds
reasonable. The other way to look at it is how it would be easier for
users to interpret. I think the value of the "Via root" option should
be either after "All tables" or at the end as that is higher level
table information than operations or column-level information. As
currently, it is at the end, so "Generated Columns" should be added
before.

Thoughts?

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
On Wed, Nov 6, 2024 at 3:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 6, 2024 at 7:34 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi Vignesh,
> >
> > Here are my review comments for patch v49-0001.
> >
>
> I have a question on the display of this new parameter.
>
> postgres=# \dRp+
>                                       Publication pub_gen
>   Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via
> root | Generated columns
> ----------+------------+---------+---------+---------+-----------+----------+-------------------
>  KapilaAm | f          | t       | t       | t       | t         | f        | t
> Tables:
>     "public.test_gen"
>
> The current theory for the display of the "Generated Columns" option
> could be that let's add new parameters at the end which sounds
> reasonable. The other way to look at it is how it would be easier for
> users to interpret. I think the value of the "Via root" option should
> be either after "All tables" or at the end as that is higher level
> table information than operations or column-level information. As
> currently, it is at the end, so "Generated Columns" should be added
> before.
>
> Thoughts?
>

FWIW, I've always felt the CREATE PUBLICATION parameters
publish
publish_via_root
publish_generated_columns

Should be documented (e.g. on CREATE PUBLICATION page) in alphabetical order:
publish
publish_generated_columns
publish_via_root

~

Following on from that. IMO it will make sense for the describe
(\dRp+) columns for those parameters to be in the same order as the
parameters in the documentation. So the end result would be the same
order as what you are wanting, even though the reason might be
different.

======
Kind Regards,
Peter Smith.
Fujitsu Australia.



Re: Pgoutput not capturing the generated columns

From
Peter Smith
Date:
Hi Vignesh,

I am observing some unexpected errors with the following scenario.

======
Tables:

Publisher table:
test_pub=# create table t1 (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
CREATE TABLE
test_pub=# insert into t1 values (1);
INSERT 0 1

~

And Subscriber table:
test_sub=# create table t1(a int, b int);
CREATE TABLE

======
TEST PART 1.

I create 2 publications, having different parameter values.

test_pub=# create publication pub1 for table t1 with
(publish_generated_columns=true);
CREATE PUBLICATION
test_pub=# create publication pub2 for table t1 with
(publish_generated_columns=false);
CREATE PUBLICATION

~

And I try creating a subscription simultaneously subscribing to both
of these publications. This fails with an expected error.

test_sub=# create subscription sub1 connection 'dbname=test_pub'
publication pub1, pub2;
ERROR:  cannot use different column lists for table "public.t1" in
different publications

======
TEST PART 2.

Now on publisher set parameter for pub2 to be true;

test_pub=# alter publication pub2 set (publish_generated_columns);
ALTER PUBLICATION
test_pub=# \dRp+
                                        Publication pub1
  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via
root | Genera
ted columns
----------+------------+---------+---------+---------+-----------+----------+-------
------------
 postgres | f          | t       | t       | t       | t         | f        | t
Tables:
    "public.t1"

                                        Publication pub2
  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via
root | Genera
ted columns
----------+------------+---------+---------+---------+-----------+----------+-------
------------
 postgres | f          | t       | t       | t       | t         | f        | t
Tables:
    "public.t1"

~

Now the create subscriber works OK.

test_sub=# create subscription sub1 connection 'dbname=test_pub'
publication pub1,pub2;
NOTICE:  created replication slot "sub1" on publisher
CREATE SUBSCRIPTION

======
TEST PART 3.

Now on Publisher let's alter that parameter back to false again...

test_pub=# alter publication pub2 set (publish_generated_columns=false);
ALTER PUBLICATION

And insert some data.

test_pub=# insert into t1 values (2);
INSERT 0 1

~

Now the subscriber starts failing again...

ERROR:  cannot use different values of publish_generated_columns for
table "public.t1" in different publications
etc...

======
TEST PART 4.

Finally, on the Publisher alter that parameter back to true again!

test_pub=# alter publication pub2 set (publish_generated_columns);
ALTER PUBLICATION
test_pub=# \dRp+
                                        Publication pub1
  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via
root | Genera
ted columns
----------+------------+---------+---------+---------+-----------+----------+-------
------------
 postgres | f          | t       | t       | t       | t         | f        | t
Tables:
    "public.t1"

                                        Publication pub2
  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via
root | Genera
ted columns
----------+------------+---------+---------+---------+-----------+----------+-------
------------
 postgres | f          | t       | t       | t       | t         | f        | t
Tables:
    "public.t1"


~~

Unfortunately, even though the publication parameters are the same
again, the subscription seems to continue forever failing....

ERROR:  cannot use different values of publish_generated_columns for
table "public.t1" in different publications

~~

I didn't think a REFRESH PUBLICATION was necessary for this case, but
anyway that does not seem to make any difference.

test_sub=# alter subscription sub1 refresh publication;
ALTER SUBSCRIPTION

... still getting repeating error
2024-11-06 16:54:44.839 AEDT [5659] ERROR:  could not receive data
from WAL stream: ERROR:  cannot use different values of
publish_generated_columns for table "public.t1" in different
publications

======

To summarize -- Altering the publication parameter combination from
good to bad has an immediate effect on breaking the subscription, but
then altering it back again from bad to good seemed to do nothing  at
all (the subscription just remains broken).

======
Kind Regards,
Peter Smith.
Fujitsu Australia.



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Wed, Nov 6, 2024 at 11:35 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> I am observing some unexpected errors with the following scenario.
>

You are getting an expected ERROR. It is because of the design of
logical decoding which relies on historic snapshots.

> ======
> Tables:
>
> Publisher table:
> test_pub=# create table t1 (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
> CREATE TABLE
> test_pub=# insert into t1 values (1);
> INSERT 0 1
>
> ~
>
> And Subscriber table:
> test_sub=# create table t1(a int, b int);
> CREATE TABLE
>
> ======
> TEST PART 1.
>
> I create 2 publications, having different parameter values.
>
> test_pub=# create publication pub1 for table t1 with
> (publish_generated_columns=true);
> CREATE PUBLICATION
> test_pub=# create publication pub2 for table t1 with
> (publish_generated_columns=false);
> CREATE PUBLICATION
>
> ~
>
> And I try creating a subscription simultaneously subscribing to both
> of these publications. This fails with an expected error.
>
> test_sub=# create subscription sub1 connection 'dbname=test_pub'
> publication pub1, pub2;
> ERROR:  cannot use different column lists for table "public.t1" in
> different publications
>
> ======
> TEST PART 2.
>
> Now on publisher set parameter for pub2 to be true;
>
> test_pub=# alter publication pub2 set (publish_generated_columns);
> ALTER PUBLICATION
> test_pub=# \dRp+
>                                         Publication pub1
>   Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via
> root | Genera
> ted columns
> ----------+------------+---------+---------+---------+-----------+----------+-------
> ------------
>  postgres | f          | t       | t       | t       | t         | f        | t
> Tables:
>     "public.t1"
>
>                                         Publication pub2
>   Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via
> root | Genera
> ted columns
> ----------+------------+---------+---------+---------+-----------+----------+-------
> ------------
>  postgres | f          | t       | t       | t       | t         | f        | t
> Tables:
>     "public.t1"
>
> ~
>
> Now the create subscriber works OK.
>
> test_sub=# create subscription sub1 connection 'dbname=test_pub'
> publication pub1,pub2;
> NOTICE:  created replication slot "sub1" on publisher
> CREATE SUBSCRIPTION
>
> ======
> TEST PART 3.
>
> Now on Publisher let's alter that parameter back to false again...
>
> test_pub=# alter publication pub2 set (publish_generated_columns=false);
> ALTER PUBLICATION
>
> And insert some data.
>
> test_pub=# insert into t1 values (2);
> INSERT 0 1
>
> ~
>
> Now the subscriber starts failing again...
>
> ERROR:  cannot use different values of publish_generated_columns for
> table "public.t1" in different publications
> etc...
>
> ======
> TEST PART 4.
>
> Finally, on the Publisher alter that parameter back to true again!
>
> test_pub=# alter publication pub2 set (publish_generated_columns);
> ALTER PUBLICATION
...
>
>
> ~~
>
> Unfortunately, even though the publication parameters are the same
> again, the subscription seems to continue forever failing....
>
> ERROR:  cannot use different values of publish_generated_columns for
> table "public.t1" in different publications
>

The reason is that the failing 'insert' uses a historic snapshot,
which has a catalog state where 'publish_generated_columns' is still
false. So, you are seeing that error repeatedly. This behavior exists
from the very beginning of logical replication and another issue due
to the same reason was reported recently [1] which is actually a setup
issue. We should improve this situation some day but it is not the
responsibility of this patch.

[1] - https://www.postgresql.org/message-id/18683-a98f79c0673be358%40postgresql.org

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Wed, Nov 6, 2024 at 4:18 PM vignesh C <vignesh21@gmail.com> wrote:
>
> The attached v50 version patch has the changes for the same.
>

Pushed.

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Peter Eisentraut
Date:
On 07.11.24 05:13, Amit Kapila wrote:
> On Wed, Nov 6, 2024 at 4:18 PM vignesh C <vignesh21@gmail.com> wrote:
>>
>> The attached v50 version patch has the changes for the same.

Could you (everybody on this thread) please provide guidance how this 
feature is supposed to interact with virtual generated columns [0].  I 
don't think it's reasonably possible to replicate virtual generated 
columns.  I had previously suggested to make it more explicit that this 
feature only works for stored generated columns (e.g., name the option 
like that), but I don't see that this was considered.

[0]: 
https://www.postgresql.org/message-id/flat/a368248e-69e4-40be-9c07-6c3b5880b0a6@eisentraut.org



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Thu, Nov 7, 2024 at 12:03 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 07.11.24 05:13, Amit Kapila wrote:
> > On Wed, Nov 6, 2024 at 4:18 PM vignesh C <vignesh21@gmail.com> wrote:
> >>
> >> The attached v50 version patch has the changes for the same.
>
> Could you (everybody on this thread) please provide guidance how this
> feature is supposed to interact with virtual generated columns [0].  I
> don't think it's reasonably possible to replicate virtual generated
> columns.
>

I haven't studied the patch but can't we think of a way where we can
compute the value of the virtual generated column on the fly (say by
evaluating the required expression) before sending it to the client?
We do evaluate the expressions during the row filter, so can't we do
it for virtual-generated columns? I think we need some more work
similar to row filter/column list where we need to ensure that the
columns used in expressions for virtual generated columns must be part
of replica identity. I haven't thought about all the details so I may
be missing something.

>
  I had previously suggested to make it more explicit that this
> feature only works for stored generated columns (e.g., name the option
> like that), but I don't see that this was considered.
>

It was considered in earlier versions of the patch like [1] but later
we focussed more on getting key parts of the feature ready. Sorry for
missing that part but we can do it now. The idea is that we explicitly
mention in docs that the new option 'publish_generated_columns' will
replicate only STORED generated columns and also explicitly compare
'attgenerated' as ATTRIBUTE_GENERATED_STORED during decoding and
adjust comments. I suggest we do that for now. We could also consider
naming the option as publish_stored_generated_columns but that way the
name would be too long. The other idea could be to make the new option
as a string but that would be useful only if we decide to replicate
virtual generated columns.

[1] - https://www.postgresql.org/message-id/CAHv8RjJsGWETA9U53iRiV2%2BVGtnHamEJ5PKMHUcfat269kQaSQ%40mail.gmail.com

--
With Regards,
Amit Kapila.



Re: Pgoutput not capturing the generated columns

From
Amit Kapila
Date:
On Thu, Nov 7, 2024 at 2:45 PM Shinoda, Noriyoshi (SXD Japan FSIP)
<noriyoshi.shinoda@hpe.com> wrote:
>
> Hi, Hackers.
>
> Thanks for developing this great feature.
> There seems to be a missing description of the "pubgencols" column added to the "pg_publication" catalog. The
attachedpatch adds the description to the catalog.sgml file. 
> Please fix the patch if you have a better explanation.
>

Can we slightly modify it as: "If true, this publication replicates
the generated columns in the tables associated with the publication."?
BTW, we might want to say: "If true, this publication replicates the
stored generated columns in the tables associated with the
publication." depending on the point Peter E. has raised, so, let's
wait for the conclusion.

--
With Regards,
Amit Kapila.