Re: Pgoutput not capturing the generated columns - Mailing list pgsql-hackers

From Shubham Khanna
Subject Re: Pgoutput not capturing the generated columns
Date
Msg-id CAHv8RjKgUQweSr+OqnTO7S5a1kHV2bAZ7+9RfoN9aWbKoyouyg@mail.gmail.com
Whole thread Raw
In response to Re: Pgoutput not capturing the generated columns  (vignesh C <vignesh21@gmail.com>)
List pgsql-hackers
On Thu, Oct 17, 2024 at 3:59 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, 16 Oct 2024 at 23:25, Shubham Khanna
> <khannashubham1197@gmail.com> wrote:
> >
> > On Wed, Oct 9, 2024 at 9:08 AM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > On Tue, 8 Oct 2024 at 11:37, Shubham Khanna <khannashubham1197@gmail.com> wrote:
> > > >
> > > > On Fri, Oct 4, 2024 at 9:36 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > > > >
> > > > > Hi Shubham, here are my review comments for v36-0001.
> > > > >
> > > > > ======
> > > > > 1. General  - merge patches
> > > > >
> > > > > It is long past due when patches 0001 and 0002 should've been merged.
> > > > > AFAIK the split was only because historically these parts had
> > > > > different authors. But, keeping them separated is not helpful anymore.
> > > > >
> > > > > ======
> > > > > src/backend/catalog/pg_publication.c
> > > > >
> > > > > 2.
> > > > >  Bitmapset *
> > > > > -pub_collist_validate(Relation targetrel, List *columns)
> > > > > +pub_collist_validate(Relation targetrel, List *columns, bool pubgencols)
> > > > >
> > > > > Since you removed the WARNING, this parameter 'pubgencols' is unused
> > > > > so it should also be removed.
> > > > >
> > > > > ======
> > > > > src/backend/replication/pgoutput/pgoutput.c
> > > > >
> > > > > 3.
> > > > >   /*
> > > > > - * If the publication is FOR ALL TABLES then it is treated the same as
> > > > > - * if there are no column lists (even if other publications have a
> > > > > - * list).
> > > > > + * To handle cases where the publish_generated_columns option isn't
> > > > > + * specified for all tables in a publication, we must create a column
> > > > > + * list that excludes generated columns. So, the publisher will not
> > > > > + * replicate the generated columns.
> > > > >   */
> > > > > - if (!pub->alltables)
> > > > > + if (!(pub->alltables && pub->pubgencols))
> > > > >
> > > > > I still found that comment hard to understand. Does this mean to say
> > > > > something like:
> > > > >
> > > > > ------
> > > > > Process potential column lists for the following cases:
> > > > >
> > > > > a. Any publication that is not FOR ALL TABLES.
> > > > >
> > > > > b. When the publication is FOR ALL TABLES and
> > > > > 'publish_generated_columns' is false.
> > > > > A FOR ALL TABLES publication doesn't have user-defined column lists,
> > > > > so all columns will be replicated by default. However, if
> > > > > 'publish_generated_columns' is set to false, column lists must still
> > > > > be created to exclude any generated columns from being published
> > > > > ------
> > > > >
> > > > > ======
> > > > > src/test/regress/sql/publication.sql
> > > > >
> > > > > 4.
> > > > > +SET client_min_messages = 'WARNING';
> > > > > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
> > > > >
> > > > > AFAIK you don't need to keep changing 'client_min_messages',
> > > > > particularly now that you've removed the WARNING message that was
> > > > > previously emitted.
> > > > >
> > > > > ~
> > > > >
> > > > > 5.
> > > > > nit - minor comment changes.
> > > > >
> > > > > ======
> > > > > Please refer to the attachment which implements any nits from above.
> > > > >
> > > >
> > > > I have fixed all the given comments. Also, I have created a new 0003
> > > > patch for the TAP-Tests related to the '011_generated.pl' file. I am
> > > > planning to merge 0001 and 0003 patches once they will get fixed.
> > > > The attached patches contain the required changes.
> > >
> > > Few comments:
> > > 1) Since we are no longer throwing an error for generated columns, the
> > > function header comments also need to be updated accordingly " Checks
> > > for and raises an ERROR for any; unknown columns, system columns,
> > > duplicate columns or generated columns."
> > > -               if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
> > > -                       ereport(ERROR,
> > > -
> > > errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> > > -                                       errmsg("cannot use generated
> > > column \"%s\" in publication column list",
> > > -                                                  colname));
> > > -
> > >
> > > 2) Tab completion missing for "PUBLISH_GENERATED_COLUMNS" option in
> > > ALTER PUBLICATION ... SET (
> > > postgres=# alter publication pub2 set (PUBLISH
> > > PUBLISH                     PUBLISH_VIA_PARTITION_ROOT
> > >
> > > 3) I was able to compile without this include, may be this is not required:
> > > --- a/src/backend/replication/logical/tablesync.c
> > > +++ b/src/backend/replication/logical/tablesync.c
> > > @@ -118,6 +118,7 @@
> > >  #include "utils/builtins.h"
> > >  #include "utils/lsyscache.h"
> > >  #include "utils/memutils.h"
> > > +#include "utils/rel.h"
> > >
> > > 4) You can include "\dRp+ pubname" after each of the create/alter
> > > publication to verify the columns that will be published:
> > > +-- Test the 'publish_generated_columns' parameter enabled or disabled for
> > > +-- different scenarios with/without generated columns in column lists.
> > > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
> > > +
> > > +-- Generated columns in column list, when 'publish_generated_columns'=false
> > > +CREATE PUBLICATION pub1 FOR table gencols(a, gen1) WITH
> > > (publish_generated_columns=false);
> > >
> > > +-- Generated columns in column list, when 'publish_generated_columns'=true
> > > +CREATE PUBLICATION pub2 FOR table gencols(a, gen1) WITH
> > > (publish_generated_columns=true);
> > > +
> > > +-- Generated columns in column list, then set
> > > 'publication_generate_columns'=false
> > > +ALTER PUBLICATION pub2 SET (publish_generated_columns = false);
> > > +
> > > +-- Remove generate columns from column list, when
> > > 'publish_generated_columns'=false
> > > +ALTER PUBLICATION pub2 SET TABLE gencols(a);
> > > +
> > > +-- Add generated columns in column list, when 'publish_generated_columns'=false
> > > +ALTER PUBLICATION pub2 SET TABLE gencols(a, gen1);
> > >
> >
> > I have fixed all the given comments. The attached patches contain the
> > required changes.
>
> Few comments:
> 1) File mode change is not required:
>  src/test/subscription/t/011_generated.pl | 354 +++++++++++++++++++++++
>  1 file changed, 354 insertions(+)
>  mode change 100644 => 100755 src/test/subscription/t/011_generated.pl
>
> diff --git a/src/test/subscription/t/011_generated.pl
> b/src/test/subscription/t/011_generated.pl
> old mode 100644
> new mode 100755
> index 8b2e5f4708..d1f2718078
> --- a/src/test/subscription/t/011_generated.pl
> +++ b/src/test/subscription/t/011_generated.pl
>
> 2) Here copy_data=true looks obvious no need to mention again and
> again in comments:
> +# Create table and subscription with copy_data=true.
> +$node_subscriber->safe_psql(
> +       'postgres', qq(
> +       CREATE TABLE tab_gen_to_nogen (a int, b int);
> +       CREATE SUBSCRIPTION regress_sub1_gen_to_nogen CONNECTION
> '$publisher_connstr' PUBLICATION regress_pub1_gen_to_nogen WITH
> (copy_data = true);
> +));
> +
> +# Create table and subscription with copy_data=true.
> +$node_subscriber->safe_psql(
> +       'test_pgc_true', qq(
> +       CREATE TABLE tab_gen_to_nogen (a int, b int);
> +       CREATE SUBSCRIPTION regress_sub2_gen_to_nogen CONNECTION
> '$publisher_connstr' PUBLICATION regress_pub2_gen_to_nogen WITH
> (copy_data = true);
> +));
> +
> +# Wait for initial sync.
> +$node_subscriber->wait_for_subscription_sync($node_publisher,
> +       'regress_sub1_gen_to_nogen', 'postgres');
> +$node_subscriber->wait_for_subscription_sync($node_publisher,
> +       'regress_sub2_gen_to_nogen', 'test_pgc_true');
> +
> +# Initial sync test when publish_generated_columns=false and copy_data=true.
> +# Verify that column 'b' is not replicated.
> +$result = $node_subscriber->safe_psql('postgres',
> +       "SELECT a, b FROM tab_gen_to_nogen");
> +is( $result, qq(1|
> +2|
> +3|), 'tab_gen_to_nogen initial sync, when publish_generated_columns=false');
> +
> +# Initial sync test when publish_generated_columns=true and copy_data=true.
> +$result = $node_subscriber->safe_psql('test_pgc_true',
> +       "SELECT a, b FROM tab_gen_to_nogen");
> +is( $result, qq(1|2
> +2|4
> +3|6),
> +       'tab_gen_to_nogen initial sync, when publish_generated_columns=true');
>
> 3) The database test_pgc_true and also can be cleaned as it is not
> required after this:
> +# cleanup
> +$node_subscriber->safe_psql('postgres',
> +       "DROP SUBSCRIPTION regress_sub1_gen_to_nogen");
> +$node_subscriber->safe_psql('test_pgc_true',
> +       "DROP SUBSCRIPTION regress_sub2_gen_to_nogen");
> +$node_publisher->safe_psql(
> +       'postgres', qq(
> +       DROP PUBLICATION regress_pub1_gen_to_nogen;
> +       DROP PUBLICATION regress_pub2_gen_to_nogen;
> +));
>
> 4) There is no error message verification in this test, let's add the
> error verification:
> +# =============================================================================
> +# Misc test.
> +#
> +# A "normal -> generated" replication fails, reporting an error that the
> +# subscriber side column is missing.
> +#
> +# In this test case we use DROP EXPRESSION to change the subscriber generated
> +# column into a normal column, then verify replication works ok.
> +# =============================================================================
>
> 5)
> 5.a) If possible have one regular column and one generated column in the tables
> +# --------------------------------------------------
> +# Testcase: Publisher replicates the column list data including generated
> +# columns even though publish_generated_columns option is false.
> +# --------------------------------------------------
> +
> +# Create table and publications.
> +$node_publisher->safe_psql(
> +       'postgres', qq(
> +       CREATE TABLE gen_to_nogen (a int, b int, gen1 int GENERATED
> ALWAYS AS (a * 2) STORED, gen2 int GENERATED ALWAYS AS (a * 2)
> STORED);
> +       CREATE TABLE gen_to_nogen2 (c int, d int, gen1 int GENERATED
> ALWAYS AS (c * 2) STORED, gen2 int GENERATED ALWAYS AS (c * 2)
> STORED);
> +       CREATE TABLE nogen_to_gen2 (c int, d int, gen1 int GENERATED
> ALWAYS AS (c * 2) STORED, gen2 int GENERATED ALWAYS AS (c * 2)
> STORED);
> +       CREATE PUBLICATION pub1 FOR table gen_to_nogen(a, b, gen2),
> gen_to_nogen2, nogen_to_gen2(gen1) WITH
> (publish_generated_columns=false);
> +));
>
> 5.b) Try to have same columns in all the tables
>
> 6) These are inserting two records:
> +# Insert data to verify incremental replication
> +$node_publisher->safe_psql(
> +       'postgres', qq(
> +       INSERT INTO gen_to_nogen VALUES (2), (3);
> +       INSERT INTO gen_to_nogen2 VALUES (2), (3);
> +       INSERT INTO nogen_to_gen2 VALUES (2), (3);
> +));
>
> I felt you wanted this to be:
> +# Insert data to verify incremental replication
> +$node_publisher->safe_psql(
> +       'postgres', qq(
> +       INSERT INTO gen_to_nogen VALUES (2, 3);
> +       INSERT INTO gen_to_nogen2 VALUES (2, 3);
> +       INSERT INTO nogen_to_gen2 VALUES (2, 3);
> +));

I have fixed all the comments and posted the v40 patches for them.
Please refer to the updated v40 Patches here in [1]. See [1] for the
changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjLviXAWtB3Kcn1A1jPpqORpkNay1y2U%2B55K64sqwCdrGw%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Incorrect comment on pg_shadow view
Next
From: Stefan Fercot
Date:
Subject: Re: Recovery of .partial WAL segments