Re: Pgoutput not capturing the generated columns - Mailing list pgsql-hackers

From Shlok Kyal
Subject Re: Pgoutput not capturing the generated columns
Date
Msg-id CANhcyEUMCk6cCbw0vVZWo8FRd6ae9CmKG=gKP-9Q67jLn8HqtQ@mail.gmail.com
Whole thread Raw
In response to Re: Pgoutput not capturing the generated columns  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: Pgoutput not capturing the generated columns
RE: Pgoutput not capturing the generated columns
Re: Pgoutput not capturing the generated columns
List pgsql-hackers
On Fri, 21 Jun 2024 at 09:03, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, here are some review comments for patch v9-0002.
>
> ======
> src/backend/replication/logical/relation.c
>
> 1. logicalrep_rel_open
>
> - if (attr->attisdropped)
> + if (attr->attisdropped ||
> + (!MySubscription->includegencols && attr->attgenerated))
>
> You replied to my question from the previous review [1, #2] as follows:
> In case 'include_generated_columns' is 'true'. column list in
> remoterel will have an entry for generated columns. So, in this case
> if we skip every attr->attgenerated, we will get a missing column
> error.
>
> ~
>
> TBH, the reason seems very subtle to me. Perhaps that
> "(!MySubscription->includegencols && attr->attgenerated))" condition
> should be coded as a separate "if", so then you can include a comment
> similar to your answer, to explain it.
Fixed

> ======
> src/backend/replication/logical/tablesync.c
>
> make_copy_attnamelist:
>
> NITPICK - punctuation in function comment
> NITPICK - add/reword some more comments
> NITPICK - rearrange comments to be closer to the code they are commenting
Applied the changes

> ~
>
> 2. make_copy_attnamelist.
>
> + /*
> + * Construct column list for COPY.
> + */
> + for (int i = 0; i < rel->remoterel.natts; i++)
> + {
> + if(!gencollist[i])
> + attnamelist = lappend(attnamelist,
> +   makeString(rel->remoterel.attnames[i]));
> + }
>
> IIUC isn't this assuming that the attribute number (aka column order)
> is the same on the subscriber side (e.g. gencollist idx) and on the
> publisher side (e.g. remoterel.attnames[i]).  AFAIK logical
> replication does not require this ordering must be match like that,
> therefore I am suspicious this new logic is accidentally imposing new
> unwanted assumptions/restrictions. I had asked the same question
> before [1-#4] about this code, but there was no reply.
>
> Ideally, there would be more test cases for when the columns
> (including the generated ones) are all in different orders on the
> pub/sub tables.
'gencollist' is set according to the remoterel
+           gencollist[attnum] = true;
where attnum is the attribute number of the corresponding column on remote rel.

I have also added the tests to confirm the behaviour

> ~~~
>
> 3. General - varnames.
>
> It would help with understanding if the 'attgenlist' variables in all
> these functions are re-named to make it very clear that this is
> referring to the *remote* (publisher-side) table genlist, not the
> subscriber table genlist.
Fixed

> ~~~
>
> 4.
> + int i = 0;
> +
>   appendStringInfoString(&cmd, "COPY (SELECT ");
> - for (int i = 0; i < lrel.natts; i++)
> + foreach_ptr(ListCell, l, attnamelist)
>   {
> - appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i]));
> - if (i < lrel.natts - 1)
> + appendStringInfoString(&cmd, quote_identifier(strVal(l)));
> + if (i < attnamelist->length - 1)
>   appendStringInfoString(&cmd, ", ");
> + i++;
>   }
>
> 4a.
> I think the purpose of the new macros is to avoid using ListCell, and
> also 'l' is an unhelpful variable name. Shouldn't this code be more
> like:
> foreach_node(String, att_name, attnamelist)
>
> ~
>
> 4b.
> The code can be far simpler if you just put the comma (", ") always
> up-front except the *first* iteration, so you can avoid checking the
> list length every time. For example:
>
> if (i++)
>   appendStringInfoString(&cmd, ", ");
Fixed

> ======
> src/test/subscription/t/011_generated.pl
>
> 5. General.
>
> Hmm. This patch 0002 included many formatting changes to tables tab2
> and tab3 and subscriptions sub2 and sub3 but they do not belong here.
> The proper formatting for those needs to be done back in patch 0001
> where they were introduced. Patch 0002 should just concentrate only on
> the new stuff for patch 0002.
Fixed

> ~
>
> 6. CREATE TABLES would be better in pairs
>
> IMO it will be better if the matching CREATE TABLE for pub and sub are
> kept together, instead of separating them by doing all pub then all
> sub. I previously made the same comment for patch 0001, so maybe it
> will be addressed next time...
Fixed

> ~
>
> 7. SELECT *
>
> +$result =
> +  $node_subscriber->safe_psql('postgres', "SELECT * FROM tab4 ORDER BY a");
>
> It will be prudent to do explicit "SELECT a,b,c" instead of "SELECT
> *", for readability and so there are no surprises.
Fixed

> ======
>
> 99.
> Please also refer to my attached nitpicks diff for numerous cosmetic
> changes, and apply if you agree.
Applied the changes.

> ======
> [1] https://www.postgresql.org/message-id/CAHut%2BPtAsEc3PEB1KUk1kFF5tcCrDCCTcbboougO29vP1B4E2Q%40mail.gmail.com

I have attached a v10 patch to address the comments:
v10-0001 - Not Modified
v10-0002 - Support replication of generated columns during initial sync.
v10-0003 - Fix behaviour for Virtual Generated Columns.

Thanks and Regards,
Shlok Kyal

Attachment

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Improve EXPLAIN output for multicolumn B-Tree Index
Next
From: Peter Eisentraut
Date:
Subject: Re: replace strtok()