Re: Pgoutput not capturing the generated columns - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: Pgoutput not capturing the generated columns |
Date | |
Msg-id | CAHut+PsXOCK3NpSo7qfQzc4jfgPCub9fOMBnRRm94Zj0ZsGF+g@mail.gmail.com Whole thread Raw |
In response to | Re: Pgoutput not capturing the generated columns (Shlok Kyal <shlok.kyal.oss@gmail.com>) |
Responses |
Re: Pgoutput not capturing the generated columns
|
List | pgsql-hackers |
Hi Shlok. Here are my review comments for patch v10-0003 ====== General. 1. The patch has lots of conditions like: if (att->attgenerated && (att->attgenerated != ATTRIBUTE_GENERATED_STORED || !include_generated_columns)) continue; IMO these are hard to read. Although more verbose, please consider if all those (for the sake of readability) would be better re-written like below : if (att->generated) { if (!include_generated_columns) continue; if (att->attgenerated != ATTRIBUTE_GENERATED_STORED) continue; } ====== contrib/test_decoding/test_decoding.c tuple_to_stringinfo: NITPICK = refactored the code and comments a bit here to make it easier NITPICK - there is no need to mention "virtual". Instead, say we only support STORED ====== src/backend/catalog/pg_publication.c publication_translate_columns: NITPICK - introduced variable 'att' to simplify this code ~ 2. + ereport(ERROR, + errcode(ERRCODE_INVALID_COLUMN_REFERENCE), + errmsg("cannot use virtual generated column \"%s\" in publication column list", + colname)); Is it better to avoid referring to "virtual" at all? Instead, consider rearranging the wording to say something like "generated column \"%s\" is not STORED so cannot be used in a publication column list" ~~~ pg_get_publication_tables: NITPICK - split the condition code for readability ====== src/backend/replication/logical/relation.c 3. logicalrep_rel_open + if (attr->attgenerated && attr->attgenerated != ATTRIBUTE_GENERATED_STORED) + continue; + Isn't this missing some code to say "entry->attrmap->attnums[i] = -1;", same as all the other nearby code is doing? ~~~ 4. I felt all the "generated column" logic should be kept together, so this new condition should be combined with the other generated column condition, like: if (attr->attgenerated) { /* comment... */ if (!MySubscription->includegencols) { entry->attrmap->attnums[i] = -1; continue; } /* comment... */ if (attr->attgenerated != ATTRIBUTE_GENERATED_STORED) { entry->attrmap->attnums[i] = -1; continue; } } ====== src/backend/replication/logical/tablesync.c 5. + if (gencols_allowed) + { + /* Replication of generated cols is supported, but not VIRTUAL cols. */ + appendStringInfo(&cmd, " AND a.attgenerated != 'v'"); + } Is it better here to use the ATTRIBUTE_GENERATED_VIRTUAL macro instead of the hardwired 'v'? (Maybe add another TODO comment to revisit this). Alternatively, consider if it is more future-proof to rearrange so it just says what *is* supported instead of what *isn't* supported: e.g. "AND a.attgenerated IN ('', 's')" ====== src/test/subscription/t/011_generated.pl NITPICK - some comments are missing the word "stored" NITPICK - sometimes "col" should be plural "cols" NITPICK = typo "GNERATED" ====== 6. In a previous review [1, comment #3] I mentioned that there should be some docs updates on the "Logical Replication Message Formats" section 53.9. So, I expected patch 0001 would make some changes and then patch 0003 would have to update it again to say something about "STORED". But all that is missing from the v10* patches. ====== 99. See also my nitpicks diff which is a top-up patch addressing all the nitpick comments mentioned above. Please apply all of these that you agree with. ====== [1] https://www.postgresql.org/message-id/CAHut%2BPvQ8CLq-JysTTeRj4u5SC9vTVcx3AgwTHcPUEOh-UnKcQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia On Mon, Jun 24, 2024 at 10:56 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote: > > On Fri, 21 Jun 2024 at 09:03, Peter Smith <smithpb2250@gmail.com> wrote: > > > > Hi, here are some review comments for patch v9-0002. > > > > ====== > > src/backend/replication/logical/relation.c > > > > 1. logicalrep_rel_open > > > > - if (attr->attisdropped) > > + if (attr->attisdropped || > > + (!MySubscription->includegencols && attr->attgenerated)) > > > > You replied to my question from the previous review [1, #2] as follows: > > In case 'include_generated_columns' is 'true'. column list in > > remoterel will have an entry for generated columns. So, in this case > > if we skip every attr->attgenerated, we will get a missing column > > error. > > > > ~ > > > > TBH, the reason seems very subtle to me. Perhaps that > > "(!MySubscription->includegencols && attr->attgenerated))" condition > > should be coded as a separate "if", so then you can include a comment > > similar to your answer, to explain it. > Fixed > > > ====== > > src/backend/replication/logical/tablesync.c > > > > make_copy_attnamelist: > > > > NITPICK - punctuation in function comment > > NITPICK - add/reword some more comments > > NITPICK - rearrange comments to be closer to the code they are commenting > Applied the changes > > > ~ > > > > 2. make_copy_attnamelist. > > > > + /* > > + * Construct column list for COPY. > > + */ > > + for (int i = 0; i < rel->remoterel.natts; i++) > > + { > > + if(!gencollist[i]) > > + attnamelist = lappend(attnamelist, > > + makeString(rel->remoterel.attnames[i])); > > + } > > > > IIUC isn't this assuming that the attribute number (aka column order) > > is the same on the subscriber side (e.g. gencollist idx) and on the > > publisher side (e.g. remoterel.attnames[i]). AFAIK logical > > replication does not require this ordering must be match like that, > > therefore I am suspicious this new logic is accidentally imposing new > > unwanted assumptions/restrictions. I had asked the same question > > before [1-#4] about this code, but there was no reply. > > > > Ideally, there would be more test cases for when the columns > > (including the generated ones) are all in different orders on the > > pub/sub tables. > 'gencollist' is set according to the remoterel > + gencollist[attnum] = true; > where attnum is the attribute number of the corresponding column on remote rel. > > I have also added the tests to confirm the behaviour > > > ~~~ > > > > 3. General - varnames. > > > > It would help with understanding if the 'attgenlist' variables in all > > these functions are re-named to make it very clear that this is > > referring to the *remote* (publisher-side) table genlist, not the > > subscriber table genlist. > Fixed > > > ~~~ > > > > 4. > > + int i = 0; > > + > > appendStringInfoString(&cmd, "COPY (SELECT "); > > - for (int i = 0; i < lrel.natts; i++) > > + foreach_ptr(ListCell, l, attnamelist) > > { > > - appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i])); > > - if (i < lrel.natts - 1) > > + appendStringInfoString(&cmd, quote_identifier(strVal(l))); > > + if (i < attnamelist->length - 1) > > appendStringInfoString(&cmd, ", "); > > + i++; > > } > > > > 4a. > > I think the purpose of the new macros is to avoid using ListCell, and > > also 'l' is an unhelpful variable name. Shouldn't this code be more > > like: > > foreach_node(String, att_name, attnamelist) > > > > ~ > > > > 4b. > > The code can be far simpler if you just put the comma (", ") always > > up-front except the *first* iteration, so you can avoid checking the > > list length every time. For example: > > > > if (i++) > > appendStringInfoString(&cmd, ", "); > Fixed > > > ====== > > src/test/subscription/t/011_generated.pl > > > > 5. General. > > > > Hmm. This patch 0002 included many formatting changes to tables tab2 > > and tab3 and subscriptions sub2 and sub3 but they do not belong here. > > The proper formatting for those needs to be done back in patch 0001 > > where they were introduced. Patch 0002 should just concentrate only on > > the new stuff for patch 0002. > Fixed > > > ~ > > > > 6. CREATE TABLES would be better in pairs > > > > IMO it will be better if the matching CREATE TABLE for pub and sub are > > kept together, instead of separating them by doing all pub then all > > sub. I previously made the same comment for patch 0001, so maybe it > > will be addressed next time... > Fixed > > > ~ > > > > 7. SELECT * > > > > +$result = > > + $node_subscriber->safe_psql('postgres', "SELECT * FROM tab4 ORDER BY a"); > > > > It will be prudent to do explicit "SELECT a,b,c" instead of "SELECT > > *", for readability and so there are no surprises. > Fixed > > > ====== > > > > 99. > > Please also refer to my attached nitpicks diff for numerous cosmetic > > changes, and apply if you agree. > Applied the changes. > > > ====== > > [1] https://www.postgresql.org/message-id/CAHut%2BPtAsEc3PEB1KUk1kFF5tcCrDCCTcbboougO29vP1B4E2Q%40mail.gmail.com > > I have attached a v10 patch to address the comments: > v10-0001 - Not Modified > v10-0002 - Support replication of generated columns during initial sync. > v10-0003 - Fix behaviour for Virtual Generated Columns. > > Thanks and Regards, > Shlok Kyal
Attachment
pgsql-hackers by date: