Re: Pgoutput not capturing the generated columns - Mailing list pgsql-hackers
From | Shubham Khanna |
---|---|
Subject | Re: Pgoutput not capturing the generated columns |
Date | |
Msg-id | CAHv8RjJ0gAUd62PvBRXCPYy2oTNZWEY-Qe8cBNzQaJPVMZCeGA@mail.gmail.com Whole thread Raw |
In response to | Re: Pgoutput not capturing the generated columns (Peter Smith <smithpb2250@gmail.com>) |
Responses |
Re: Pgoutput not capturing the generated columns
Re: Pgoutput not capturing the generated columns Re: Pgoutput not capturing the generated columns |
List | pgsql-hackers |
On Mon, Jul 8, 2024 at 10:53 AM Peter Smith <smithpb2250@gmail.com> wrote: > > Here are review comments for v15-0001 > > ====== > doc/src/sgml/ddl.sgml > > nitpick - there was a comma (,) which should be a period (.) > > ====== > .../libpqwalreceiver/libpqwalreceiver.c > > 1. > + if (options->proto.logical.include_generated_columns && > + PQserverVersion(conn->streamConn) >= 170000) > + appendStringInfoString(&cmd, ", include_generated_columns 'true'"); > + > > Should now say >= 180000 > > ====== > src/backend/replication/pgoutput/pgoutput.c > > nitpick - comment wording for RelationSyncEntry.collist. > > ~~ > > 2. > pgoutput_column_list_init: > > I found the current logic to be quite confusing. I assume the code is > working OK, because AFAIK there are plenty of tests and they are all > passing, but the logic seems somewhat repetitive and there are also no > comments to explain it adding to my confusion. > > IIUC, PRIOR TO THIS PATCH: > > BMS field 'columns' represented the "columns of the column list" or it > was NULL if there was no publication column list (and it was also NULL > if the column list contained every column). > > IIUC NOW, WITH THIS PATCH: > > The BMS field 'columns' meaning is changed slightly to be something > like "columns to be replicated" or NULL if all columns are to be > replicated. This is almost the same thing except we are now handing > the generated columns up-front, so generated columns will or won't > appear in the BMS according to the "include_generated_columns" > parameter. See how this is all a bit subtle which is why copious new > comments are required to explain it... > > So, although the test result evidence suggests this is working OK, I > have many questions/issues about it. Here are some to start with: > > 2a. It needs a lot more (summary and detailed) comments explaining the > logic now that the meaning is slightly different. > > 2b. What is the story with the FOR ALL TABLES case now? Previously, > there would always be NULL 'columns' for "FOR ALL TABLES" case -- the > comment still says so. But now you've tacked on a 2nd pass of > iterations to build the BMS outside of the "if (!pub->alltables)" > check. Is that OK? > > 2c. The following logic seemed unexpected: > - if (bms_num_members(cols) == nliveatts) > + if (bms_num_members(cols) == nliveatts && > + data->include_generated_columns) > { > bms_free(cols); > cols = NULL; > ` > I had thought the above code would look different -- more like: > if (att->attgenerated && !data->include_generated_columns) > continue; > > nliveatts++; > ... > > 2d. Was so much duplicated code necessary? It feels like the whole > "Get the number of live attributes." and assignment of cols to NULL > might be made common to both code paths. > > 2e. I'm beginning to question the pros/cons of the new BMS logic; I > had suggested trying this way (processing the generated columns > up-front in the BMS 'columns' list) to reduce patch code and simplify > all the subsequent API delegation of "include_generated_cloumns" > everywhere like it was in v14-0001. Indeed, that part was a success > and the patch is now smaller. But I don't like much that we've traded > reduced code overall for increased confusing code in that BMS > function. If all this BMS code can be refactored and commented to be > easier to understand then maybe all will be well, but if it can't then > maybe this BMS change was a bridge too far. I haven't given up on it > just yet, but I wonder what was your opinion about it, and do other > people have thoughts about whether this was the good direction to > take? I have created a separate patch(v17-0004) for this idea. Will address this comment in the next version of patches. > ====== > src/bin/pg_dump/pg_dump.c > > 3. > + if (fout->remoteVersion >= 170000) > + appendPQExpBufferStr(query, > + " s.subincludegencols\n"); > + else > + appendPQExpBufferStr(query, > + " false AS subincludegencols\n"); > > Should now say >= 180000 > > ====== > src/bin/psql/describe.c > > 4. > + /* include_generated_columns is only supported in v18 and higher */ > + if (pset.sversion >= 170000) > + appendPQExpBuffer(&buf, > + ", subincludegencols AS \"%s\"\n", > + gettext_noop("Include generated columns")); > + > > Should now say >= 180000 > > ====== > src/include/catalog/pg_subscription.h > > nitpick - let's make the comment the same as in WalRcvStreamOptions > > ====== > src/include/replication/logicalproto.h > > nitpick - extern for logicalrep_write_update should be unchanged by this patch > > ====== > src/test/regress/sql/subscription.sql > > nitpick = the comment "include_generated_columns and copy_data = true > are mutually exclusive" is not necessary because this all falls under > the existing comment "fail - invalid option combinations" > > nitpick - let's explicitly put "copy_data = true" in the CREATE > SUBSCRIPTION to make it more obvious > > ====== > 99. Please also refer to the attached 'diffs' patch which implements > all of my nitpicks issues mentioned above. The attached Patches contain all the suggested changes. Here, v17-0001 is modified to fix the comments, v17-0002 and v17-0003 are modified according to the changes in v17-0001 patch and v17-0004 patch contains the changes related to Bitmapset(BMS) idea that changes the 'columns' meaning to include generated cols also where necessary. Thanks and Regards, Shubham Khanna.
Attachment
pgsql-hackers by date: