Re: Pgoutput not capturing the generated columns - Mailing list pgsql-hackers
From | Shubham Khanna |
---|---|
Subject | Re: Pgoutput not capturing the generated columns |
Date | |
Msg-id | CAHv8RjKh12nqTB9H2Ww6hUAJ5ncd43sQarYs3vJvVm03GNe+wQ@mail.gmail.com Whole thread Raw |
In response to | Pgoutput not capturing the generated columns (Rajendra Kumar Dangwal <dangwalrajendra888@gmail.com>) |
List | pgsql-hackers |
On Tue, Sep 24, 2024 at 5:16 AM Peter Smith <smithpb2250@gmail.com> wrote: > > Hi. Here are my review comments for v32-0001 > > You wrote: "I have addressed all the comments in the v32-0001 Patch.", > however, I found multiple old review comments not addressed. Please > give a reason if a comment is deliberately left out, otherwise, I will > assume they are omitted by accident and so keep repeating them. > > There were also still some unanswered questions from previous reviews, > so I have reminded you about those again here. > > ====== > Commit message > > 1. > This commit enables support for the 'publish_generated_columns' option > in logical replication, allowing the transmission of generated column > information and data alongside regular table changes. The option > 'publish_generated_columns' is a PUBLICATION parameter. > > ~ > > That PUBLICATION info in the 2nd sentence would be easier to say in > the 1st sentence. > SUGGESTION: > This commit supports the transmission of generated column information > and data alongside regular table changes. This behaviour is controlled > by a new PUBLICATION parameter ('publish_generated_columns'). > > ~~~ > > 2. > When 'publish_generated_columns' is false, generated columns are not > replicated, even when present in a PUBLICATION col-list. > > Hm. This contradicts the behaviour that Amit wanted, (e.g. > "column-list takes precedence"). So I am not sure if this patch is > already catering for the behaviour suggested by Amit or if that is yet > to come in v33. For now, I am assuming that 32* has not caught up with > the latest behaviour requirements, but that might be a wrong > assumption; perhaps it is only this commit message that is bogus. > > ~~~ > > 3. General. > > On the same subject, there is lots of code, like: > > if (att->attgenerated && !pub->pubgencols) > continue; > > I suspect that might not be quite what you want for the "column-list > takes precedence" behaviour, but I am not going to identify all those > during this review. It needs lots of combinations of column list tests > to verify it. > > ====== > doc/src/sgml/ddl.sgml > > 4ab. > nit - Huh?? Not changed the linkend as told in a previous review [1-#3a] > nit - Huh?? Not changed to call this a "parameter" instead of an > "option" as told in a previous review [1-#3b] > > ====== > doc/src/sgml/protocol.sgml > > 5. > - <para> > - Next, the following message part appears for each column included in > - the publication (except generated columns): > - </para> > - > > nit -- Huh?? I don't think you can just remove this whole paragraph. > But, probably you can just remove the "except generated columns" part. > I posted this same comment [4 #11] 20 patch versions back. > > ====== > doc/src/sgml/ref/create_publication.sgml > > 6abc. > nit - Huh?? Not changed the parameter ID as told in a previous review [1-#6] > nit - Huh?? Not removed paragraph "This option is only available..." > as told in a previous review. See [1-#7] > nit - Huh?? Not removed paragraph "This parameter can only be set" as > told in a previous review. See [1-#8] > > ====== > src/backend/catalog/pg_publication.c > > 7. > if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated) > - ereport(ERROR, > + ereport(WARNING, > errcode(ERRCODE_INVALID_COLUMN_REFERENCE), > - errmsg("cannot use generated column \"%s\" in publication column list", > + errmsg("specified generated column \"%s\" in publication column list > for publication with publish_generated_columns as false", > colname)); > > I did not understand how this WARNING can know > "publish_generated_columns as false"? Should the code be checking the > function parameter 'pubgencols'? > > The errmsg also seemed a bit verbose. How about: > "specified generated column \"%s\" in publication column list when > publish_generated_columns = false" > > ====== > src/backend/replication/logical/proto.c > > 8. > logicalrep_write_tuple: > logicalrep_write_attrs: > > Reminder. I think I have multiple questions about this code from > previous reviews that may be still unanswered. See [2 #4]. Maybe when > you implement Amit's "column list takes precedence" behaviour then > this code is fine as-is (because the replication message might include > gencols or not-gecols regardless of the 'publish_generated_columns' > value). But I don't think that is the current implementation, so > something did not quite seem right. I am not sure. If you say it is > fine then I will believe it, but the question [2 #4] remains > unanswered. > > ====== > src/backend/replication/pgoutput/pgoutput.c > > 9. > send_relation_and_attrs: > > Reminder: Here is another question that was answered from [2 #5]. I > did not really trust it for the current implementation, but for the > "column list takes precedence" behaviour probably it will be ok. > > ~~~ > > 10. > +/* > + * Prepare new column list bitmap. This includes all the columns of the table. > + */ > +static Bitmapset * > +prepare_all_columns_bms(PGOutputData *data, RelationSyncEntry *entry, > + TupleDesc desc) > +{ > > This function needs a better comment with more explanation about what > this is REALLY doing. e.g. it says "includes all columns of the > table", but tthe implementation is skipping generated cols, so clearly > it is not "all columns of the table". > > ~~~ > > 11. pgoutput_column_list_init > > TBH, I struggle to read the logic of this function. Rewriting some > parts, inverting some variables, and adding more commentary might help > a lot. > > 11a. > There are too many "negatives" (with ! operator and with the word "no" > in the variable). > > e.g. code is written in a backward way like: > if (!pub_no_list) > cols = pub_collist_to_bitmapset(cols, cfdatum, entry->entry_cxt); > else > cols = prepare_all_columns_bms(data, entry, desc); > > instead of what could have been said: > if (pub_rel_has_collist) > cols = pub_collist_to_bitmapset(cols, cfdatum, entry->entry_cxt); > else > cols = prepare_all_columns_bms(data, entry, desc); > > ~ > > 11b. > - * If the publication is FOR ALL TABLES then it is treated the same as > - * if there are no column lists (even if other publications have a > - * list). > + * If the publication is FOR ALL TABLES and include generated columns > + * then it is treated the same as if there are no column lists (even > + * if other publications have a list). > */ > - if (!pub->alltables) > + if (!pub->alltables || !pub->pubgencols) > > The code does not appear to match the comment ("If the publication is > FOR ALL TABLES and include generated columns"). If it did it should > look like "if (pub->alltables && pub->pubgencols)". > > Also, should "and include generated column" be properly referring to > the new PUBLICATION parameter name? > > Also, the comment is somewhat confusing. I saw in the thread Vignesh > wrote an explanation like "To handle cases where the > publish_generated_columns option isn't specified for all tables in a > publication, the pubgencolumns check needs to be performed. In such > cases, we must create a column list that excludes generated columns" > [3]. IMO that was clearer information so something similar should be > written in this code comment. > ~ > > 11c. > + /* Build the column list bitmap in the per-entry context. */ > + if (!pub_no_list || !pub->pubgencols) /* when not null */ > > I don't know what "when not null" means here. Aren't those both > booleans? How can it be "null"? > > ====== > src/bin/pg_dump/pg_dump.c > > 12. getPublications: > > Huh?? The code has not changed to address an old review comment I had > posted to say there seem multiple code fragments missing that should > say "false AS pubgencols". Refer to [2 #7]. > > ====== > src/bin/pg_dump/t/002_pg_dump.pl > > 13. > 'ALTER PUBLICATION pub5 ADD TABLE test_table WHERE (col1 > 0);' => { > + create_order => 51, > + create_sql => > + 'ALTER PUBLICATION pub5 ADD TABLE dump_test.test_table WHERE (col1 > 0);', > + regexp => qr/^ > + \QALTER PUBLICATION pub5 ADD TABLE ONLY dump_test.test_table WHERE > ((col1 > 0));\E > + /xm, > + like => { %full_runs, section_post_data => 1, }, > + unlike => { > + exclude_dump_test_schema => 1, > + exclude_test_table => 1, > + }, > + }, > + > + 'ALTER PUBLICATION pub5 ADD TABLE test_second_table WHERE (col2 = \'test\');' > + => { > + create_order => 52, > + create_sql => > + 'ALTER PUBLICATION pub5 ADD TABLE dump_test.test_second_table > WHERE (col2 = \'test\');', > + regexp => qr/^ > + \QALTER PUBLICATION pub5 ADD TABLE ONLY dump_test.test_second_table > WHERE ((col2 = 'test'::text));\E > + /xm, > + like => { %full_runs, section_post_data => 1, }, > + unlike => { exclude_dump_test_schema => 1, }, > + }, > + > > It wasn't clear to me how these tests are related to the patch. > Shouldn't there instead be some ALTER tests for trying to modify the > 'publish_generate_columns' parameter? > > ====== > src/test/regress/expected/publication.out > src/test/regress/sql/publication.sql > > 14. > --- error: generated column "d" can't be in list > +-- ok: generated columns can be in the list too > ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d); > -ERROR: cannot use generated column "d" in publication column list > +WARNING: specified generated column "d" in publication column list > for publication with publish_generated_columns as false > > I think these tests for the WARNING scenario need to be a bit more > deliberate. This seems to have happened as a side-effect. For example, > I was expecting more testing like: > > Comments about various combinations to say what you are doing and what > you are expecting: > - gencols in column list with publish_generated_columns=false, expecting WARNING > - gencols in column list with publish_generated_columns=true, NOT > expecting WARNING > - gencols in column list with publish_generated_columns=true, then > ALTER PUBLICATION setting publication_generate_columns=false, > expecting WARNING > - NO gencols in column list with publish_generated_columns=false, then > ALTER PUBLICATION to add gencols to column list, expecting WARNING > > ====== > src/test/subscription/t/031_column_list.pl > > 15. > -# TEST: Generated and dropped columns are not considered for the column list. > +# TEST: Dropped columns are not considered for the column list. > # So, the publication having a column list except for those columns and a > -# publication without any column (aka all columns as part of the columns > +# publication without any column list (aka all columns as part of the column > # list) are considered to have the same column list. > $node_publisher->safe_psql( > 'postgres', qq( > CREATE TABLE test_mix_4 (a int PRIMARY KEY, b int, c int, d int > GENERATED ALWAYS AS (a + 1) STORED); > ALTER TABLE test_mix_4 DROP COLUMN c; > > - CREATE PUBLICATION pub_mix_7 FOR TABLE test_mix_4 (a, b); > - CREATE PUBLICATION pub_mix_8 FOR TABLE test_mix_4; > + CREATE PUBLICATION pub_mix_7 FOR TABLE test_mix_4 WITH > (publish_generated_columns = true); > + CREATE PUBLICATION pub_mix_8 FOR TABLE test_mix_4 WITH > (publish_generated_columns = false); > > I felt the comment for this test ought to be saying something more > about what you are doing with the 'publish_generated_columns' > parameters and what behaviour it was expecting. > > ====== > Please refer to the attachment which addresses some of the nit > comments mentioned above. > > ====== > [1] my review of v31-0001: > https://www.postgresql.org/message-id/CAHut%2BPsv-neEP_ftvBUBahh%2BKCWw%2BqQMF9N3sGU3YHWPEzFH-Q%40mail.gmail.com > [2] my review of v30-0001: > https://www.postgresql.org/message-id/CAHut%2BPuaitgE4tu3nfaR%3DPCQEKjB%3DmpDtZ1aWkbwb%3DJZE8YvqQ%40mail.gmail.com > [3] https://www.postgresql.org/message-id/CALDaNm1c7xPBodHw6LKp9e8hvGVJHcKH%3DDHK0iXmZuXKPnxZ3Q%40mail.gmail.com > [4] https://www.postgresql.org/message-id/CAHut%2BPv45gB4cV%2BSSs6730Kb8urQyqjdZ9PBVgmpwqCycr1Ybg%40mail.gmail.com > I have addressed all the comments in the v34-0001 Patch. Please refer to the updated v34-0001 Patch here in [1]. See [1] for the changes added. [1] https://www.postgresql.org/message-id/CAHv8RjJkUdYCdK_bL3yvEV%3DzKrA2dsnZYa1VMT2H5v0%2BqbaGbA%40mail.gmail.com Thanks and Regards, Shubham Khanna.
pgsql-hackers by date: