Re: Pgoutput not capturing the generated columns - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Pgoutput not capturing the generated columns
Date
Msg-id CAHut+PsXOCK3NpSo7qfQzc4jfgPCub9fOMBnRRm94Zj0ZsGF+g@mail.gmail.com
Whole thread Raw
In response to Re: Pgoutput not capturing the generated columns  (Shlok Kyal <shlok.kyal.oss@gmail.com>)
List pgsql-hackers
Hi Shlok. Here are my review comments for patch v10-0003

======
General.

1.
The patch has lots of conditions like:
if (att->attgenerated && (att->attgenerated !=
ATTRIBUTE_GENERATED_STORED || !include_generated_columns))
 continue;

IMO these are hard to read. Although more verbose, please consider if
all those (for the sake of readability) would be better re-written
like below :

if (att->generated)
{
  if (!include_generated_columns)
    continue;

  if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
    continue;
}

======
contrib/test_decoding/test_decoding.c

tuple_to_stringinfo:

NITPICK = refactored the code and comments a bit here to make it easier
NITPICK - there is no need to mention "virtual". Instead, say we only
support STORED

======
src/backend/catalog/pg_publication.c

publication_translate_columns:

NITPICK - introduced variable 'att' to simplify this code

~

2.
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+ errmsg("cannot use virtual generated column \"%s\" in publication
column list",
+    colname));

Is it better to avoid referring to "virtual" at all? Instead, consider
rearranging the wording to say something like "generated column \"%s\"
is not STORED so cannot be used in a publication column list"

~~~

pg_get_publication_tables:

NITPICK - split the condition code for readability

======
src/backend/replication/logical/relation.c

3. logicalrep_rel_open

+ if (attr->attgenerated && attr->attgenerated != ATTRIBUTE_GENERATED_STORED)
+ continue;
+

Isn't this missing some code to say "entry->attrmap->attnums[i] =
-1;", same as all the other nearby code is doing?

~~~

4.
I felt all the "generated column" logic should be kept together, so
this new condition should be combined with the other generated column
condition, like:

if (attr->attgenerated)
{
  /* comment... */
  if (!MySubscription->includegencols)
  {
    entry->attrmap->attnums[i] = -1;
    continue;
  }

  /* comment... */
  if (attr->attgenerated != ATTRIBUTE_GENERATED_STORED)
  {
    entry->attrmap->attnums[i] = -1;
    continue;
  }
}

======
src/backend/replication/logical/tablesync.c

5.
+ if (gencols_allowed)
+ {
+ /* Replication of generated cols is supported, but not VIRTUAL cols. */
+ appendStringInfo(&cmd, " AND a.attgenerated != 'v'");
+ }

Is it better here to use the ATTRIBUTE_GENERATED_VIRTUAL macro instead
of the hardwired 'v'? (Maybe add another TODO comment to revisit
this).

Alternatively, consider if it is more future-proof to rearrange so it
just says what *is* supported instead of what *isn't* supported:
e.g. "AND a.attgenerated IN ('', 's')"

======
src/test/subscription/t/011_generated.pl

NITPICK - some comments are missing the word "stored"
NITPICK - sometimes "col" should be plural "cols"
NITPICK = typo "GNERATED"

======

6.
In a previous review [1, comment #3] I mentioned that there should be
some docs updates on the "Logical Replication Message Formats" section
53.9. So, I expected patch 0001 would make some changes and then patch
0003 would have to update it again to say something about "STORED".
But all that is missing from the v10* patches.

======

99.
See also my nitpicks diff which is a top-up patch addressing all the
nitpick comments mentioned above. Please apply all of these that you
agree with.

======
[1] https://www.postgresql.org/message-id/CAHut%2BPvQ8CLq-JysTTeRj4u5SC9vTVcx3AgwTHcPUEOh-UnKcQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

On Mon, Jun 24, 2024 at 10:56 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> On Fri, 21 Jun 2024 at 09:03, Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi, here are some review comments for patch v9-0002.
> >
> > ======
> > src/backend/replication/logical/relation.c
> >
> > 1. logicalrep_rel_open
> >
> > - if (attr->attisdropped)
> > + if (attr->attisdropped ||
> > + (!MySubscription->includegencols && attr->attgenerated))
> >
> > You replied to my question from the previous review [1, #2] as follows:
> > In case 'include_generated_columns' is 'true'. column list in
> > remoterel will have an entry for generated columns. So, in this case
> > if we skip every attr->attgenerated, we will get a missing column
> > error.
> >
> > ~
> >
> > TBH, the reason seems very subtle to me. Perhaps that
> > "(!MySubscription->includegencols && attr->attgenerated))" condition
> > should be coded as a separate "if", so then you can include a comment
> > similar to your answer, to explain it.
> Fixed
>
> > ======
> > src/backend/replication/logical/tablesync.c
> >
> > make_copy_attnamelist:
> >
> > NITPICK - punctuation in function comment
> > NITPICK - add/reword some more comments
> > NITPICK - rearrange comments to be closer to the code they are commenting
> Applied the changes
>
> > ~
> >
> > 2. make_copy_attnamelist.
> >
> > + /*
> > + * Construct column list for COPY.
> > + */
> > + for (int i = 0; i < rel->remoterel.natts; i++)
> > + {
> > + if(!gencollist[i])
> > + attnamelist = lappend(attnamelist,
> > +   makeString(rel->remoterel.attnames[i]));
> > + }
> >
> > IIUC isn't this assuming that the attribute number (aka column order)
> > is the same on the subscriber side (e.g. gencollist idx) and on the
> > publisher side (e.g. remoterel.attnames[i]).  AFAIK logical
> > replication does not require this ordering must be match like that,
> > therefore I am suspicious this new logic is accidentally imposing new
> > unwanted assumptions/restrictions. I had asked the same question
> > before [1-#4] about this code, but there was no reply.
> >
> > Ideally, there would be more test cases for when the columns
> > (including the generated ones) are all in different orders on the
> > pub/sub tables.
> 'gencollist' is set according to the remoterel
> +           gencollist[attnum] = true;
> where attnum is the attribute number of the corresponding column on remote rel.
>
> I have also added the tests to confirm the behaviour
>
> > ~~~
> >
> > 3. General - varnames.
> >
> > It would help with understanding if the 'attgenlist' variables in all
> > these functions are re-named to make it very clear that this is
> > referring to the *remote* (publisher-side) table genlist, not the
> > subscriber table genlist.
> Fixed
>
> > ~~~
> >
> > 4.
> > + int i = 0;
> > +
> >   appendStringInfoString(&cmd, "COPY (SELECT ");
> > - for (int i = 0; i < lrel.natts; i++)
> > + foreach_ptr(ListCell, l, attnamelist)
> >   {
> > - appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i]));
> > - if (i < lrel.natts - 1)
> > + appendStringInfoString(&cmd, quote_identifier(strVal(l)));
> > + if (i < attnamelist->length - 1)
> >   appendStringInfoString(&cmd, ", ");
> > + i++;
> >   }
> >
> > 4a.
> > I think the purpose of the new macros is to avoid using ListCell, and
> > also 'l' is an unhelpful variable name. Shouldn't this code be more
> > like:
> > foreach_node(String, att_name, attnamelist)
> >
> > ~
> >
> > 4b.
> > The code can be far simpler if you just put the comma (", ") always
> > up-front except the *first* iteration, so you can avoid checking the
> > list length every time. For example:
> >
> > if (i++)
> >   appendStringInfoString(&cmd, ", ");
> Fixed
>
> > ======
> > src/test/subscription/t/011_generated.pl
> >
> > 5. General.
> >
> > Hmm. This patch 0002 included many formatting changes to tables tab2
> > and tab3 and subscriptions sub2 and sub3 but they do not belong here.
> > The proper formatting for those needs to be done back in patch 0001
> > where they were introduced. Patch 0002 should just concentrate only on
> > the new stuff for patch 0002.
> Fixed
>
> > ~
> >
> > 6. CREATE TABLES would be better in pairs
> >
> > IMO it will be better if the matching CREATE TABLE for pub and sub are
> > kept together, instead of separating them by doing all pub then all
> > sub. I previously made the same comment for patch 0001, so maybe it
> > will be addressed next time...
> Fixed
>
> > ~
> >
> > 7. SELECT *
> >
> > +$result =
> > +  $node_subscriber->safe_psql('postgres', "SELECT * FROM tab4 ORDER BY a");
> >
> > It will be prudent to do explicit "SELECT a,b,c" instead of "SELECT
> > *", for readability and so there are no surprises.
> Fixed
>
> > ======
> >
> > 99.
> > Please also refer to my attached nitpicks diff for numerous cosmetic
> > changes, and apply if you agree.
> Applied the changes.
>
> > ======
> > [1] https://www.postgresql.org/message-id/CAHut%2BPtAsEc3PEB1KUk1kFF5tcCrDCCTcbboougO29vP1B4E2Q%40mail.gmail.com
>
> I have attached a v10 patch to address the comments:
> v10-0001 - Not Modified
> v10-0002 - Support replication of generated columns during initial sync.
> v10-0003 - Fix behaviour for Virtual Generated Columns.
>
> Thanks and Regards,
> Shlok Kyal

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Injection point locking
Next
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Conflict detection and logging in logical replication