Re: Pgoutput not capturing the generated columns - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Pgoutput not capturing the generated columns
Date
Msg-id CAHut+PsuJfcaeg6zst=6PE5uyJv_UxVRHU3ck7W2aHb1uQYKng@mail.gmail.com
Whole thread Raw
In response to Re: Pgoutput not capturing the generated columns  (Shubham Khanna <khannashubham1197@gmail.com>)
Responses Re: Pgoutput not capturing the generated columns
List pgsql-hackers
Here are some review comments for the patch v1-0001.

======
GENERAL

G.1. Use consistent names

It seems to add unnecessary complications by having different names
for all the new options, fields and API parameters.

e.g. sometimes 'include_generated_columns'
e.g. sometimes 'publish_generated_columns'

Won't it be better to just use identical names everywhere for
everything? I don't mind which one you choose; I just felt you only
need one name, not two. This comment overrides everything else in this
post so whatever name you choose, make adjustments for all my other
review comments as necessary.

======

G.2. Is it possible to just use the existing bms?

A very large part of this patch is adding more API parameters to
delegate the 'publish_generated_columns' flag value down to when it is
finally checked and used. e.g.

The functions:
- logicalrep_write_insert(), logicalrep_write_update(),
logicalrep_write_delete()
... are delegating the new parameter 'publish_generated_column' down to:
- logicalrep_write_tuple

The functions:
- logicalrep_write_rel()
... are delegating the new parameter 'publish_generated_column' down to:
- logicalrep_write_attrs

AFAICT in all these places the API is already passing a "Bitmapset
*columns". I was wondering if it might be possible to modify the
"Bitmapset *columns" BEFORE any of those functions get called so that
the "columns" BMS either does or doesn't include generated cols (as
appropriate according to the option).

Well, it might not be so simple because there are some NULL BMS
considerations also, but I think it would be worth investigating at
least, because if indeed you can find some common place (somewhere
like pgoutput_change()?) where the columns BMS can be filtered to
remove bits for generated cols then it could mean none of those other
patch API changes are needed at all -- then the patch would only be
1/2 the size.

======
Commit message

1.
Now if include_generated_columns option is specified, the generated
column information and generated column data also will be sent.

Usage from pgoutput plugin:
SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
'proto_version', '1', 'publication_names', 'pub1',
'include_generated_columns', 'true');

Usage from test_decoding plugin:
SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1',
'include_generated_columns', '1');

~

I think there needs to be more background information given here. This
commit message doesn't seem to describe anything about what is the
problem and how this patch fixes it. It just jumps straight into
giving usages of a 'include_generated_columns' option.

It also doesn't say that this is an option that was newly *introduced*
by the patch -- it refers to it as though the reader should already
know about it.

Furthermore, your hacker's post says "Currently it is not supported as
a subscription option because table sync for the generated column is
not possible as copy command does not support getting data for the
generated column. If this feature is required we can remove this
limitation from the copy command and then add it as a subscription
option later." IMO that all seems like the kind of information that
ought to also be mentioned in this commit message.

======
contrib/test_decoding/sql/ddl.sql

2.
+-- check include_generated_columns option with generated column
+CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
AS (a * 2) STORED);
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include_generated_columns', '1');
+DROP TABLE gencoltable;
+

2a.
Perhaps you should include both option values to demonstrate the
difference in behaviour:

'include_generated_columns', '0'
'include_generated_columns', '1'

~

2b.
I think you maybe need to include more some test combinations where
there is and isn't a COLUMN LIST, because I am not 100% sure I
understand the current logic/expectations for all combinations.

e.g. When the generated column is in a column list but
'publish_generated_columns' is false then what should happen? etc.
Also if there are any special rules then those should be mentioned in
the commit message.

======
src/backend/replication/logical/proto.c

3.
For all the API changes the new parameter name should be plural.

/publish_generated_column/publish_generated_columns/

~~~

4. logical_rep_write_tuple:

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

- if (!column_in_column_list(att->attnum, columns))
+ if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
+ continue;
+
+ if (att->attgenerated && !publish_generated_column)
  continue;
That code seems confusing. Shouldn't the logic be exactly as also in
logicalrep_write_attrs()?

e.g. Shouldn't they both look like this:

if (att->attisdropped)
  continue;

if (att->attgenerated && !publish_generated_column)
  continue;

if (!column_in_column_list(att->attnum, columns))
  continue;
======
src/backend/replication/pgoutput/pgoutput.c

5.
 static void send_relation_and_attrs(Relation relation, TransactionId xid,
  LogicalDecodingContext *ctx,
- Bitmapset *columns);
+ Bitmapset *columns,
+ bool publish_generated_column);

Use plural. /publish_generated_column/publish_generated_columns/

~~~

6. parse_output_parameters

  bool origin_option_given = false;
+ bool generate_column_option_given = false;

  data->binary = false;
  data->streaming = LOGICALREP_STREAM_OFF;
  data->messages = false;
  data->two_phase = false;
+ data->publish_generated_column = false;

I think the 1st var should be 'include_generated_columns_option_given'
for consistency with the name of the actual option that was given.

======
src/include/replication/logicalproto.h

7.
(Same as a previous review comment)

For all the API changes the new parameter name should be plural.

/publish_generated_column/publish_generated_columns/

======
src/include/replication/pgoutput.h

8.
  bool publish_no_origin;
+ bool publish_generated_column;
 } PGOutputData;

/publish_generated_column/publish_generated_columns/

======
Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: "Andrey M. Borodin"
Date:
Subject: Pre-Commitfest Party on StHighload conf
Next
From: Daniel Gustafsson
Date:
Subject: Re: [UNVERIFIED SENDER] pg_upgrade can result in early wraparound on databases with high transaction load