Re: Pgoutput not capturing the generated columns - Mailing list pgsql-hackers

From Shubham Khanna
Subject Re: Pgoutput not capturing the generated columns
Date
Msg-id CAHv8RjJ0gAUd62PvBRXCPYy2oTNZWEY-Qe8cBNzQaJPVMZCeGA@mail.gmail.com
Whole thread Raw
In response to Re: Pgoutput not capturing the generated columns  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: Pgoutput not capturing the generated columns
Re: Pgoutput not capturing the generated columns
Re: Pgoutput not capturing the generated columns
List pgsql-hackers
On Mon, Jul 8, 2024 at 10:53 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are review comments for v15-0001
>
> ======
> doc/src/sgml/ddl.sgml
>
> nitpick - there was a comma (,) which should be a period (.)
>
> ======
> .../libpqwalreceiver/libpqwalreceiver.c
>
> 1.
> + if (options->proto.logical.include_generated_columns &&
> + PQserverVersion(conn->streamConn) >= 170000)
> + appendStringInfoString(&cmd, ", include_generated_columns 'true'");
> +
>
> Should now say >= 180000
>
> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> nitpick - comment wording for RelationSyncEntry.collist.
>
> ~~
>
> 2.
> pgoutput_column_list_init:
>
> I found the current logic to be quite confusing. I assume the code is
> working OK, because AFAIK there are plenty of tests and they are all
> passing, but the logic seems somewhat repetitive and there are also no
> comments to explain it adding to my confusion.
>
> IIUC, PRIOR TO THIS PATCH:
>
> BMS field 'columns' represented the "columns of the column list" or it
> was NULL if there was no publication column list (and it was also NULL
> if the column list contained every column).
>
> IIUC NOW, WITH THIS PATCH:
>
> The BMS field 'columns' meaning is changed slightly to be something
> like "columns to be replicated" or NULL if all columns are to be
> replicated. This is almost the same thing except we are now handing
> the generated columns up-front, so generated columns will or won't
> appear in the BMS according to the "include_generated_columns"
> parameter. See how this is all a bit subtle which is why copious new
> comments are required to explain it...
>
> So, although the test result evidence suggests this is working OK, I
> have many questions/issues about it. Here are some to start with:
>
> 2a. It needs a lot more (summary and detailed) comments explaining the
> logic now that the meaning is slightly different.
>
> 2b. What is the story with the FOR ALL TABLES case now? Previously,
> there would always be NULL 'columns' for "FOR ALL TABLES" case -- the
> comment still says so. But now you've tacked on a 2nd pass of
> iterations to build the BMS outside of the "if (!pub->alltables)"
> check. Is that OK?
>
> 2c. The following logic seemed unexpected:
> - if (bms_num_members(cols) == nliveatts)
> + if (bms_num_members(cols) == nliveatts &&
> + data->include_generated_columns)
>   {
>   bms_free(cols);
>   cols = NULL;
> `
> I had thought the above code would look different -- more like:
> if (att->attgenerated && !data->include_generated_columns)
>   continue;
>
> nliveatts++;
> ...
>
> 2d. Was so much duplicated code necessary? It feels like the whole
> "Get the number of live attributes." and assignment of cols to NULL
> might be made common to both code paths.
>
> 2e. I'm beginning to question the pros/cons of the new BMS logic; I
> had suggested trying this way (processing the generated columns
> up-front in the BMS 'columns' list) to reduce patch code and simplify
> all the subsequent API delegation of "include_generated_cloumns"
> everywhere like it was in v14-0001. Indeed, that part was a success
> and the patch is now smaller. But I don't like much that we've traded
> reduced code overall for increased confusing code in that BMS
> function. If all this BMS code can be refactored and commented to be
> easier to understand then maybe all will be well, but if it can't then
> maybe this BMS change was a bridge too far. I haven't given up on it
> just yet, but I wonder what was your opinion about it, and do other
> people have thoughts about whether this was the good direction to
> take?

I have created a separate patch(v17-0004) for this idea. Will address
this comment in the next version of patches.

> ======
> src/bin/pg_dump/pg_dump.c
>
> 3.
> + if (fout->remoteVersion >= 170000)
> + appendPQExpBufferStr(query,
> + " s.subincludegencols\n");
> + else
> + appendPQExpBufferStr(query,
> + " false AS subincludegencols\n");
>
> Should now say >= 180000
>
> ======
> src/bin/psql/describe.c
>
> 4.
> + /* include_generated_columns is only supported in v18 and higher */
> + if (pset.sversion >= 170000)
> + appendPQExpBuffer(&buf,
> +   ", subincludegencols AS \"%s\"\n",
> +   gettext_noop("Include generated columns"));
> +
>
> Should now say >= 180000
>
> ======
> src/include/catalog/pg_subscription.h
>
> nitpick - let's make the comment the same as in WalRcvStreamOptions
>
> ======
> src/include/replication/logicalproto.h
>
> nitpick - extern for logicalrep_write_update should be unchanged by this patch
>
> ======
> src/test/regress/sql/subscription.sql
>
> nitpick = the comment "include_generated_columns and copy_data = true
> are mutually exclusive" is not necessary because this all falls under
> the existing comment "fail - invalid option combinations"
>
> nitpick - let's explicitly put "copy_data = true" in the CREATE
> SUBSCRIPTION to make it more obvious
>
> ======
> 99. Please also refer to the attached 'diffs' patch which implements
> all of my nitpicks issues mentioned above.

The attached Patches contain all the suggested changes. Here, v17-0001
is modified to fix the comments, v17-0002 and v17-0003 are modified
according to the changes in v17-0001 patch and v17-0004 patch contains
the changes related to Bitmapset(BMS) idea that changes the 'columns'
meaning to include generated cols also where necessary.

Thanks and Regards,
Shubham Khanna.

Attachment

pgsql-hackers by date:

Previous
From: Bertrand Drouvot
Date:
Subject: Re: relfilenode statistics
Next
From: Shubham Khanna
Date:
Subject: Re: Pgoutput not capturing the generated columns