Re: Pgoutput not capturing the generated columns - Mailing list pgsql-hackers

From Shubham Khanna
Subject Re: Pgoutput not capturing the generated columns
Date
Msg-id CAHv8RjKh12nqTB9H2Ww6hUAJ5ncd43sQarYs3vJvVm03GNe+wQ@mail.gmail.com
Whole thread Raw
In response to Pgoutput not capturing the generated columns  (Rajendra Kumar Dangwal <dangwalrajendra888@gmail.com>)
List pgsql-hackers
On Tue, Sep 24, 2024 at 5:16 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi. Here are my review comments for v32-0001
>
> You wrote: "I have addressed all the comments in the v32-0001 Patch.",
> however, I found multiple old review comments not addressed. Please
> give a reason if a comment is deliberately left out, otherwise, I will
> assume they are omitted by accident and so keep repeating them.
>
> There were also still some unanswered questions from previous reviews,
> so I have reminded you about those again here.
>
> ======
> Commit message
>
> 1.
> This commit enables support for the 'publish_generated_columns' option
> in logical replication, allowing the transmission of generated column
> information and data alongside regular table changes. The option
> 'publish_generated_columns' is a PUBLICATION parameter.
>
> ~
>
> That PUBLICATION info in the 2nd sentence would be easier to say in
> the 1st sentence.
> SUGGESTION:
> This commit supports the transmission of generated column information
> and data alongside regular table changes. This behaviour is controlled
> by a new PUBLICATION parameter ('publish_generated_columns').
>
> ~~~
>
> 2.
> When 'publish_generated_columns' is false, generated columns are not
> replicated, even when present in a PUBLICATION col-list.
>
> Hm. This contradicts the behaviour that Amit wanted, (e.g.
> "column-list takes precedence"). So I am not sure if this patch is
> already catering for the behaviour suggested by Amit or if that is yet
> to come in v33. For now, I am assuming that 32* has not caught up with
> the latest behaviour requirements, but that might be a wrong
> assumption; perhaps it is only this commit message that is bogus.
>
> ~~~
>
> 3. General.
>
> On the same subject, there is lots of code, like:
>
> if (att->attgenerated && !pub->pubgencols)
> continue;
>
> I suspect that might not be quite what you want for the "column-list
> takes precedence" behaviour, but I am not going to identify all those
> during this review. It needs lots of combinations of column list tests
> to verify it.
>
> ======
> doc/src/sgml/ddl.sgml
>
> 4ab.
> nit - Huh?? Not changed the linkend as told in a previous review [1-#3a]
> nit - Huh?? Not changed to call this a "parameter" instead of an
> "option" as told in a previous review [1-#3b]
>
> ======
> doc/src/sgml/protocol.sgml
>
> 5.
> -     <para>
> -      Next, the following message part appears for each column included in
> -      the publication (except generated columns):
> -     </para>
> -
>
> nit -- Huh?? I don't think you can just remove this whole paragraph.
> But, probably you can just remove the "except generated columns" part.
> I posted this same comment [4 #11] 20 patch versions back.
>
> ======
> doc/src/sgml/ref/create_publication.sgml
>
> 6abc.
> nit - Huh?? Not changed the parameter ID as told in a previous review [1-#6]
> nit - Huh?? Not removed paragraph "This option is only available..."
> as told in a previous review. See [1-#7]
> nit - Huh?? Not removed paragraph "This parameter can only be set" as
> told in a previous review. See [1-#8]
>
> ======
> src/backend/catalog/pg_publication.c
>
> 7.
>   if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
> - ereport(ERROR,
> + ereport(WARNING,
>   errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> - errmsg("cannot use generated column \"%s\" in publication column list",
> + errmsg("specified generated column \"%s\" in publication column list
> for publication with publish_generated_columns as false",
>      colname));
>
> I did not understand how this WARNING can know
> "publish_generated_columns as false"? Should the code be checking the
> function parameter 'pubgencols'?
>
> The errmsg also seemed a bit verbose. How about:
> "specified generated column \"%s\" in publication column list when
> publish_generated_columns = false"
>
> ======
> src/backend/replication/logical/proto.c
>
> 8.
> logicalrep_write_tuple:
> logicalrep_write_attrs:
>
> Reminder. I think I have multiple questions about this code from
> previous reviews that may be still unanswered. See [2 #4]. Maybe when
> you implement Amit's "column list takes precedence" behaviour then
> this code is fine as-is (because the replication message might include
> gencols or not-gecols regardless of the 'publish_generated_columns'
> value). But I don't think that is the current implementation, so
> something did not quite seem right. I am not sure. If you say it is
> fine then I will believe it, but the question [2 #4] remains
> unanswered.
>
> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 9.
> send_relation_and_attrs:
>
> Reminder: Here is another question that was answered from [2 #5]. I
> did not really trust it for the current implementation, but for the
> "column list takes precedence" behaviour probably it will be ok.
>
> ~~~
>
> 10.
> +/*
> + * Prepare new column list bitmap. This includes all the columns of the table.
> + */
> +static Bitmapset *
> +prepare_all_columns_bms(PGOutputData *data, RelationSyncEntry *entry,
> + TupleDesc desc)
> +{
>
> This function needs a better comment with more explanation about what
> this is REALLY doing. e.g. it says "includes all columns of the
> table", but tthe implementation is skipping generated cols, so clearly
> it is not "all columns of the table".
>
> ~~~
>
> 11. pgoutput_column_list_init
>
> TBH, I struggle to read the logic of this function. Rewriting some
> parts, inverting some variables, and adding more commentary might help
> a lot.
>
> 11a.
> There are too many "negatives" (with ! operator and with the word "no"
> in the variable).
>
> e.g. code is written in a backward way like:
> if (!pub_no_list)
> cols = pub_collist_to_bitmapset(cols, cfdatum, entry->entry_cxt);
> else
> cols = prepare_all_columns_bms(data, entry, desc);
>
> instead of what could have been said:
> if (pub_rel_has_collist)
> cols = pub_collist_to_bitmapset(cols, cfdatum, entry->entry_cxt);
> else
> cols = prepare_all_columns_bms(data, entry, desc);
>
> ~
>
> 11b.
> - * If the publication is FOR ALL TABLES then it is treated the same as
> - * if there are no column lists (even if other publications have a
> - * list).
> + * If the publication is FOR ALL TABLES and include generated columns
> + * then it is treated the same as if there are no column lists (even
> + * if other publications have a list).
>   */
> - if (!pub->alltables)
> + if (!pub->alltables || !pub->pubgencols)
>
> The code does not appear to match the comment ("If the publication is
> FOR ALL TABLES and include generated columns"). If it did it should
> look like "if (pub->alltables && pub->pubgencols)".
>
> Also, should "and include generated column" be properly referring to
> the new PUBLICATION parameter name?
>
> Also, the comment is somewhat confusing. I saw in the thread Vignesh
> wrote an explanation like "To handle cases where the
> publish_generated_columns option isn't specified for all tables in a
> publication, the pubgencolumns check needs to be performed. In such
> cases, we must create a column list that excludes generated columns"
> [3]. IMO that was clearer information so something similar should be
> written in this code comment.
> ~
>
> 11c.
> + /* Build the column list bitmap in the per-entry context. */
> + if (!pub_no_list || !pub->pubgencols) /* when not null */
>
> I don't know what "when not null" means here. Aren't those both
> booleans? How can it be "null"?
>
> ======
> src/bin/pg_dump/pg_dump.c
>
> 12. getPublications:
>
> Huh?? The code has not changed to address an old review comment I had
> posted to say there seem multiple code fragments missing that should
> say "false AS pubgencols". Refer to [2 #7].
>
> ======
> src/bin/pg_dump/t/002_pg_dump.pl
>
> 13.
> 'ALTER PUBLICATION pub5 ADD TABLE test_table WHERE (col1 > 0);' => {
> + create_order => 51,
> + create_sql =>
> +   'ALTER PUBLICATION pub5 ADD TABLE dump_test.test_table WHERE (col1 > 0);',
> + regexp => qr/^
> + \QALTER PUBLICATION pub5 ADD TABLE ONLY dump_test.test_table WHERE
> ((col1 > 0));\E
> + /xm,
> + like => { %full_runs, section_post_data => 1, },
> + unlike => {
> + exclude_dump_test_schema => 1,
> + exclude_test_table => 1,
> + },
> + },
> +
> + 'ALTER PUBLICATION pub5 ADD TABLE test_second_table WHERE (col2 = \'test\');'
> +   => {
> + create_order => 52,
> + create_sql =>
> +   'ALTER PUBLICATION pub5 ADD TABLE dump_test.test_second_table
> WHERE (col2 = \'test\');',
> + regexp => qr/^
> + \QALTER PUBLICATION pub5 ADD TABLE ONLY dump_test.test_second_table
> WHERE ((col2 = 'test'::text));\E
> + /xm,
> + like => { %full_runs, section_post_data => 1, },
> + unlike => { exclude_dump_test_schema => 1, },
> +   },
> +
>
> It wasn't clear to me how these tests are related to the patch.
> Shouldn't there instead be some ALTER tests for trying to modify the
> 'publish_generate_columns' parameter?
>
> ======
> src/test/regress/expected/publication.out
> src/test/regress/sql/publication.sql
>
> 14.
> --- error: generated column "d" can't be in list
> +-- ok: generated columns can be in the list too
>  ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d);
> -ERROR:  cannot use generated column "d" in publication column list
> +WARNING:  specified generated column "d" in publication column list
> for publication with publish_generated_columns as false
>
> I think these tests for the WARNING scenario need to be a bit more
> deliberate. This seems to have happened as a side-effect. For example,
> I was expecting more testing like:
>
> Comments about various combinations to say what you are doing and what
> you are expecting:
> - gencols in column list with publish_generated_columns=false, expecting WARNING
> - gencols in column list with publish_generated_columns=true, NOT
> expecting WARNING
> - gencols in column list with publish_generated_columns=true, then
> ALTER PUBLICATION setting publication_generate_columns=false,
> expecting WARNING
> - NO gencols in column list with publish_generated_columns=false, then
> ALTER PUBLICATION to add gencols to column list, expecting WARNING
>
> ======
> src/test/subscription/t/031_column_list.pl
>
> 15.
> -# TEST: Generated and dropped columns are not considered for the column list.
> +# TEST: Dropped columns are not considered for the column list.
>  # So, the publication having a column list except for those columns and a
> -# publication without any column (aka all columns as part of the columns
> +# publication without any column list (aka all columns as part of the column
>  # list) are considered to have the same column list.
>  $node_publisher->safe_psql(
>   'postgres', qq(
>   CREATE TABLE test_mix_4 (a int PRIMARY KEY, b int, c int, d int
> GENERATED ALWAYS AS (a + 1) STORED);
>   ALTER TABLE test_mix_4 DROP COLUMN c;
>
> - CREATE PUBLICATION pub_mix_7 FOR TABLE test_mix_4 (a, b);
> - CREATE PUBLICATION pub_mix_8 FOR TABLE test_mix_4;
> + CREATE PUBLICATION pub_mix_7 FOR TABLE test_mix_4 WITH
> (publish_generated_columns = true);
> + CREATE PUBLICATION pub_mix_8 FOR TABLE test_mix_4 WITH
> (publish_generated_columns = false);
>
> I felt the comment for this test ought to be saying something more
> about what you are doing with the 'publish_generated_columns'
> parameters and what behaviour it was expecting.
>
> ======
> Please refer to the attachment which addresses some of the nit
> comments mentioned above.
>
> ======
> [1] my review of v31-0001:
> https://www.postgresql.org/message-id/CAHut%2BPsv-neEP_ftvBUBahh%2BKCWw%2BqQMF9N3sGU3YHWPEzFH-Q%40mail.gmail.com
> [2] my review of v30-0001:
> https://www.postgresql.org/message-id/CAHut%2BPuaitgE4tu3nfaR%3DPCQEKjB%3DmpDtZ1aWkbwb%3DJZE8YvqQ%40mail.gmail.com
> [3] https://www.postgresql.org/message-id/CALDaNm1c7xPBodHw6LKp9e8hvGVJHcKH%3DDHK0iXmZuXKPnxZ3Q%40mail.gmail.com
> [4] https://www.postgresql.org/message-id/CAHut%2BPv45gB4cV%2BSSs6730Kb8urQyqjdZ9PBVgmpwqCycr1Ybg%40mail.gmail.com
>

I have addressed all the comments in the v34-0001 Patch. Please refer
to the updated v34-0001 Patch here in [1]. See [1] for the changes
added.

[1] https://www.postgresql.org/message-id/CAHv8RjJkUdYCdK_bL3yvEV%3DzKrA2dsnZYa1VMT2H5v0%2BqbaGbA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



pgsql-hackers by date:

Previous
From: Shubham Khanna
Date:
Subject: Re: Pgoutput not capturing the generated columns
Next
From: Shubham Khanna
Date:
Subject: Re: Pgoutput not capturing the generated columns