Thread: Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

On Tue, Nov 5, 2024 at 12:53 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> To avoid the issue, we can disallow UPDATE/DELETE on table with
> unpublished generated column as REPLICA IDENTITY. I have attached a
> patch for the same.
>

+CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol;
+UPDATE testpub_gencol SET a = 100 WHERE a = 1;
+ERROR:  cannot update table "testpub_gencol"
+DETAIL:  Column list used by the publication does not cover the
replica identity.

This is not a correct ERROR message as the publication doesn't have
any column list associated with it. You have added the code to detect
this in the column list code path which I think is not required. BTW,
you also need to consider the latest commit 7054186c4e for this. I
guess you need to keep another flag in PublicationDesc to detect this
and then give an appropriate ERROR.

--
With Regards,
Amit Kapila.



On Friday, November 8, 2024 7:06 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> 
> Hi Amit,
> 
> On Thu, 7 Nov 2024 at 11:37, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Nov 5, 2024 at 12:53 PM Shlok Kyal <shlok.kyal.oss@gmail.com>
> wrote:
> > >
> > > To avoid the issue, we can disallow UPDATE/DELETE on table with
> > > unpublished generated column as REPLICA IDENTITY. I have attached a
> > > patch for the same.
> > >
> >
> > +CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol; UPDATE
> > +testpub_gencol SET a = 100 WHERE a = 1;
> > +ERROR:  cannot update table "testpub_gencol"
> > +DETAIL:  Column list used by the publication does not cover the
> > replica identity.
> >
> > This is not a correct ERROR message as the publication doesn't have
> > any column list associated with it. You have added the code to detect
> > this in the column list code path which I think is not required. BTW,
> > you also need to consider the latest commit 7054186c4e for this. I
> > guess you need to keep another flag in PublicationDesc to detect this
> > and then give an appropriate ERROR.
> 
> I have addressed the comments and provided an updated patch. Also, I am
> currently working to fix this issue in back branches.

Thanks for the patch. I am reviewing it and have some initial comments:


1.
+            char attgenerated = get_attgenerated(relid, attnum);
+

I think it's unnecessary to initialize attgenerated here because the value will
be overwritten if pubviaroot is true anyway. Also, the get_attgenerated()
is not cheap.

2.

I think the patch missed to check the case when table is marked REPLICA
IDENTITY FULL, and generated column is not published:

CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) STORED NOT NULL);
ALTER TABLE testpub_gencol REPLICA IDENTITY FULL;
CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol;
UPDATE testpub_gencol SET a = 2;

I expected the UPDATE to fail in above case, but it can still pass after applying the patch.

3.

+         * If the publication is FOR ALL TABLES we can skip the validation.
+         */

This comment seems not clear to me, could you elaborate a bit more on this ?

4.

Also, I think the patch does not handle the FOR ALL TABLE case correctly:

CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) STORED NOT NULL);
CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b);
ALTER TABLE testpub_gencol REPLICA IDENTITY USING index testpub_gencol_idx;
CREATE PUBLICATION pub_gencol FOR ALL TABLEs;
UPDATE testpub_gencol SET a = 2;

I expected the UPDATE to fail in above case as well.

5.

+    else if (cmd == CMD_UPDATE && !pubdesc.replident_has_valid_gen_cols)
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+                 errmsg("cannot update table \"%s\"",
+                        RelationGetRelationName(rel)),
+                 errdetail("REPLICA IDENTITY consists of an unpublished generated column.")));

I think it would be better to use lower case "replica identity" to consistent
with other existing messages.

Best Regards,
Hou zj

On Wed, 13 Nov 2024 at 11:15, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> Thanks for providing the comments.
>
> On Tue, 12 Nov 2024 at 12:52, Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Friday, November 8, 2024 7:06 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > >
> > > Hi Amit,
> > >
> > > On Thu, 7 Nov 2024 at 11:37, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Tue, Nov 5, 2024 at 12:53 PM Shlok Kyal <shlok.kyal.oss@gmail.com>
> > > wrote:
> > > > >
> > > > > To avoid the issue, we can disallow UPDATE/DELETE on table with
> > > > > unpublished generated column as REPLICA IDENTITY. I have attached a
> > > > > patch for the same.
> > > > >
> > > >
> > > > +CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol; UPDATE
> > > > +testpub_gencol SET a = 100 WHERE a = 1;
> > > > +ERROR:  cannot update table "testpub_gencol"
> > > > +DETAIL:  Column list used by the publication does not cover the
> > > > replica identity.
> > > >
> > > > This is not a correct ERROR message as the publication doesn't have
> > > > any column list associated with it. You have added the code to detect
> > > > this in the column list code path which I think is not required. BTW,
> > > > you also need to consider the latest commit 7054186c4e for this. I
> > > > guess you need to keep another flag in PublicationDesc to detect this
> > > > and then give an appropriate ERROR.
> > >
> > > I have addressed the comments and provided an updated patch. Also, I am
> > > currently working to fix this issue in back branches.
> >
> > Thanks for the patch. I am reviewing it and have some initial comments:
> >
> >
> > 1.
> > +                       char attgenerated = get_attgenerated(relid, attnum);
> > +
> >
> > I think it's unnecessary to initialize attgenerated here because the value will
> > be overwritten if pubviaroot is true anyway. Also, the get_attgenerated()
> > is not cheap.
> >
> Fixed
>
> > 2.
> >
> > I think the patch missed to check the case when table is marked REPLICA
> > IDENTITY FULL, and generated column is not published:
> >
> > CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) STORED NOT NULL);
> > ALTER TABLE testpub_gencol REPLICA IDENTITY FULL;
> > CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol;
> > UPDATE testpub_gencol SET a = 2;
> >
> > I expected the UPDATE to fail in above case, but it can still pass after applying the patch.
> >
> Fixed
>
> > 3.
> >
> > +                * If the publication is FOR ALL TABLES we can skip the validation.
> > +                */
> >
> > This comment seems not clear to me, could you elaborate a bit more on this ?
> >
> I missed to handle the case FOR ALL TABLES. Have removed the comment.
>
> > 4.
> >
> > Also, I think the patch does not handle the FOR ALL TABLE case correctly:
> >
> > CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) STORED NOT NULL);
> > CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b);
> > ALTER TABLE testpub_gencol REPLICA IDENTITY USING index testpub_gencol_idx;
> > CREATE PUBLICATION pub_gencol FOR ALL TABLEs;
> > UPDATE testpub_gencol SET a = 2;
> >
> > I expected the UPDATE to fail in above case as well.
> >
> Fixed
>
> > 5.
> >
> > +       else if (cmd == CMD_UPDATE && !pubdesc.replident_has_valid_gen_cols)
> > +               ereport(ERROR,
> > +                               (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> > +                                errmsg("cannot update table \"%s\"",
> > +                                               RelationGetRelationName(rel)),
> > +                                errdetail("REPLICA IDENTITY consists of an unpublished generated column.")));
> >
> > I think it would be better to use lower case "replica identity" to consistent
> > with other existing messages.
> >
> Fixed
>
> I have attached the updated patch here.

Few comments:
1) In the first check relation->rd_rel->relispartition also is checked
whereas in the below it is not checked, shouldn't the same check be
there below to avoid few of the function calls which are not required:
+       if (pubviaroot && relation->rd_rel->relispartition)
+       {
+               publish_as_relid =
GetTopMostAncestorInPublication(pubid, ancestors, NULL);
+
+               if (!OidIsValid(publish_as_relid))
+                       publish_as_relid = relid;
+       }
+

+                       if (pubviaroot)
+                       {
+                               /* attribute name in the child table */
+                               char       *colname =
get_attname(relid, attnum, false);
+
+                               /*
+                                * Determine the attnum for the
attribute name in parent (we
+                                * are using the column list defined
on the parent).
+                                */
+                               attnum = get_attnum(publish_as_relid, colname);
+                               attgenerated =
get_attgenerated(publish_as_relid, attnum);
+                       }
+                       else
+                               attgenerated = get_attgenerated(relid, attnum);

2) I think we could use check_and_fetch_column_list to see that it is
not a column list publication instead of below code:
+       if (!puballtables)
+       {
+               tuple = SearchSysCache2(PUBLICATIONRELMAP,
+
ObjectIdGetDatum(publish_as_relid),
+
ObjectIdGetDatum(pubid));
+
+               if (!HeapTupleIsValid(tuple))
+                       return false;
+
+               (void) SysCacheGetAttr(PUBLICATIONRELMAP, tuple,
+
Anum_pg_publication_rel_prattrs,
+                                                          &isnull);
+
+               ReleaseSysCache(tuple);
+       }
+
+       if(puballtables || isnull)

3) Since there is only a single statement, remove the enclosing parenthisis:
+               if (!pubform->pubgencols &&
+                       (pubform->pubupdate || pubform->pubdelete) &&
+                       replident_has_unpublished_gen_col(pubid,
relation, ancestors,
+
                   pubform->pubviaroot, pubform->puballtables))
+               {
+                       pubdesc->replident_has_valid_gen_cols = false;
+               }

4) Pgindent should be run there are few issues:
4.a)
+extern bool replident_has_unpublished_gen_col(Oid pubid, Relation relation,
+
                         List *ancestors, bool pubviaroot, bool
puballtables);
4.b)
+       }
+
+       if(puballtables || isnull)
+       {
+               int                     x;
+               Bitmapset  *idattrs = NULL;
4.c)
+                * generated column we should error out.
+                */
+               if(relation->rd_rel->relreplident == REPLICA_IDENTITY_FULL &&
+                  relation->rd_att->constr &&
relation->rd_att->constr->has_generated_stored)
+                       result = true;
4.d)
+               while ((x = bms_next_member(idattrs, x)) >= 0)
+               {
+                       AttrNumber      attnum = (x +
FirstLowInvalidHeapAttributeNumber);
+                       char attgenerated;

5) You could do this in a single line comment:
+               /*
+                * Check if any REPLICA IDENTITY column is an generated column.
+                */
+               while ((x = bms_next_member(idattrs, x)) >= 0)

6) I felt one of update or delete is enough in this case as the code
path is same:
+UPDATE testpub_gencol SET a = 100 WHERE a = 1;
+DELETE FROM testpub_gencol WHERE a = 100;
+
+-- error - generated column "b" is not published and REPLICA IDENTITY
is set FULL
+ALTER TABLE testpub_gencol REPLICA IDENTITY FULL;
+UPDATE testpub_gencol SET a = 100 WHERE a = 1;
+DELETE FROM testpub_gencol WHERE a = 100;
+DROP PUBLICATION pub_gencol;
+
+-- ok - generated column "b" is published and is part of REPLICA IDENTITY
+CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol with
(publish_generated_columns = true);
+UPDATE testpub_gencol SET a = 100 WHERE a = 1;
+DELETE FROM testpub_gencol WHERE a = 100;

Regards,
Vignesh



On Thu, 14 Nov 2024 at 15:51, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> Thanks for providing the comments.
>
> On Thu, 14 Nov 2024 at 12:22, vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Wed, 13 Nov 2024 at 11:15, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > >
> > > Thanks for providing the comments.
> > >
> > > On Tue, 12 Nov 2024 at 12:52, Zhijie Hou (Fujitsu)
> > > <houzj.fnst@fujitsu.com> wrote:
> > > >
> > > > On Friday, November 8, 2024 7:06 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > > > >
> > > > > Hi Amit,
> > > > >
> > > > > On Thu, 7 Nov 2024 at 11:37, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Nov 5, 2024 at 12:53 PM Shlok Kyal <shlok.kyal.oss@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > To avoid the issue, we can disallow UPDATE/DELETE on table with
> > > > > > > unpublished generated column as REPLICA IDENTITY. I have attached a
> > > > > > > patch for the same.
> > > > > > >
> > > > > >
> > > > > > +CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol; UPDATE
> > > > > > +testpub_gencol SET a = 100 WHERE a = 1;
> > > > > > +ERROR:  cannot update table "testpub_gencol"
> > > > > > +DETAIL:  Column list used by the publication does not cover the
> > > > > > replica identity.
> > > > > >
> > > > > > This is not a correct ERROR message as the publication doesn't have
> > > > > > any column list associated with it. You have added the code to detect
> > > > > > this in the column list code path which I think is not required. BTW,
> > > > > > you also need to consider the latest commit 7054186c4e for this. I
> > > > > > guess you need to keep another flag in PublicationDesc to detect this
> > > > > > and then give an appropriate ERROR.
> > > > >
> > > > > I have addressed the comments and provided an updated patch. Also, I am
> > > > > currently working to fix this issue in back branches.
> > > >
> > > > Thanks for the patch. I am reviewing it and have some initial comments:
> > > >
> > > >
> > > > 1.
> > > > +                       char attgenerated = get_attgenerated(relid, attnum);
> > > > +
> > > >
> > > > I think it's unnecessary to initialize attgenerated here because the value will
> > > > be overwritten if pubviaroot is true anyway. Also, the get_attgenerated()
> > > > is not cheap.
> > > >
> > > Fixed
> > >
> > > > 2.
> > > >
> > > > I think the patch missed to check the case when table is marked REPLICA
> > > > IDENTITY FULL, and generated column is not published:
> > > >
> > > > CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) STORED NOT NULL);
> > > > ALTER TABLE testpub_gencol REPLICA IDENTITY FULL;
> > > > CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol;
> > > > UPDATE testpub_gencol SET a = 2;
> > > >
> > > > I expected the UPDATE to fail in above case, but it can still pass after applying the patch.
> > > >
> > > Fixed
> > >
> > > > 3.
> > > >
> > > > +                * If the publication is FOR ALL TABLES we can skip the validation.
> > > > +                */
> > > >
> > > > This comment seems not clear to me, could you elaborate a bit more on this ?
> > > >
> > > I missed to handle the case FOR ALL TABLES. Have removed the comment.
> > >
> > > > 4.
> > > >
> > > > Also, I think the patch does not handle the FOR ALL TABLE case correctly:
> > > >
> > > > CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) STORED NOT NULL);
> > > > CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b);
> > > > ALTER TABLE testpub_gencol REPLICA IDENTITY USING index testpub_gencol_idx;
> > > > CREATE PUBLICATION pub_gencol FOR ALL TABLEs;
> > > > UPDATE testpub_gencol SET a = 2;
> > > >
> > > > I expected the UPDATE to fail in above case as well.
> > > >
> > > Fixed
> > >
> > > > 5.
> > > >
> > > > +       else if (cmd == CMD_UPDATE && !pubdesc.replident_has_valid_gen_cols)
> > > > +               ereport(ERROR,
> > > > +                               (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> > > > +                                errmsg("cannot update table \"%s\"",
> > > > +                                               RelationGetRelationName(rel)),
> > > > +                                errdetail("REPLICA IDENTITY consists of an unpublished generated column.")));
> > > >
> > > > I think it would be better to use lower case "replica identity" to consistent
> > > > with other existing messages.
> > > >
> > > Fixed
> > >
> > > I have attached the updated patch here.
> >
> > Few comments:
> > 1) In the first check relation->rd_rel->relispartition also is checked
> > whereas in the below it is not checked, shouldn't the same check be
> > there below to avoid few of the function calls which are not required:
> > +       if (pubviaroot && relation->rd_rel->relispartition)
> > +       {
> > +               publish_as_relid =
> > GetTopMostAncestorInPublication(pubid, ancestors, NULL);
> > +
> > +               if (!OidIsValid(publish_as_relid))
> > +                       publish_as_relid = relid;
> > +       }
> > +
> >
> > +                       if (pubviaroot)
> > +                       {
> > +                               /* attribute name in the child table */
> > +                               char       *colname =
> > get_attname(relid, attnum, false);
> > +
> > +                               /*
> > +                                * Determine the attnum for the
> > attribute name in parent (we
> > +                                * are using the column list defined
> > on the parent).
> > +                                */
> > +                               attnum = get_attnum(publish_as_relid, colname);
> > +                               attgenerated =
> > get_attgenerated(publish_as_relid, attnum);
> > +                       }
> > +                       else
> > +                               attgenerated = get_attgenerated(relid, attnum);
> I have updated the if condititon
>
> > 2) I think we could use check_and_fetch_column_list to see that it is
> > not a column list publication instead of below code:
> > +       if (!puballtables)
> > +       {
> > +               tuple = SearchSysCache2(PUBLICATIONRELMAP,
> > +
> > ObjectIdGetDatum(publish_as_relid),
> > +
> > ObjectIdGetDatum(pubid));
> > +
> > +               if (!HeapTupleIsValid(tuple))
> > +                       return false;
> > +
> > +               (void) SysCacheGetAttr(PUBLICATIONRELMAP, tuple,
> > +
> > Anum_pg_publication_rel_prattrs,
> > +                                                          &isnull);
> > +
> > +               ReleaseSysCache(tuple);
> > +       }
> > +
> > +       if(puballtables || isnull)
>
> Yes we can use it. I have updated the patch.
>
> > 3) Since there is only a single statement, remove the enclosing parenthisis:
> > +               if (!pubform->pubgencols &&
> > +                       (pubform->pubupdate || pubform->pubdelete) &&
> > +                       replident_has_unpublished_gen_col(pubid,
> > relation, ancestors,
> > +
> >                    pubform->pubviaroot, pubform->puballtables))
> > +               {
> > +                       pubdesc->replident_has_valid_gen_cols = false;
> > +               }
> >
> Fixed
>
> > 4) Pgindent should be run there are few issues:
> > 4.a)
> > +extern bool replident_has_unpublished_gen_col(Oid pubid, Relation relation,
> > +
> >                          List *ancestors, bool pubviaroot, bool
> > puballtables);
> > 4.b)
> > +       }
> > +
> > +       if(puballtables || isnull)
> > +       {
> > +               int                     x;
> > +               Bitmapset  *idattrs = NULL;
> > 4.c)
> > +                * generated column we should error out.
> > +                */
> > +               if(relation->rd_rel->relreplident == REPLICA_IDENTITY_FULL &&
> > +                  relation->rd_att->constr &&
> > relation->rd_att->constr->has_generated_stored)
> > +                       result = true;
> > 4.d)
> > +               while ((x = bms_next_member(idattrs, x)) >= 0)
> > +               {
> > +                       AttrNumber      attnum = (x +
> > FirstLowInvalidHeapAttributeNumber);
> > +                       char attgenerated;
> >
> Fixed
>
> > 5) You could do this in a single line comment:
> > +               /*
> > +                * Check if any REPLICA IDENTITY column is an generated column.
> > +                */
> > +               while ((x = bms_next_member(idattrs, x)) >= 0)
> >
>
> Fixed
>
> > 6) I felt one of update or delete is enough in this case as the code
> > path is same:
> > +UPDATE testpub_gencol SET a = 100 WHERE a = 1;
> > +DELETE FROM testpub_gencol WHERE a = 100;
> > +
> > +-- error - generated column "b" is not published and REPLICA IDENTITY
> > is set FULL
> > +ALTER TABLE testpub_gencol REPLICA IDENTITY FULL;
> > +UPDATE testpub_gencol SET a = 100 WHERE a = 1;
> > +DELETE FROM testpub_gencol WHERE a = 100;
> > +DROP PUBLICATION pub_gencol;
> > +
> > +-- ok - generated column "b" is published and is part of REPLICA IDENTITY
> > +CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol with
> > (publish_generated_columns = true);
> > +UPDATE testpub_gencol SET a = 100 WHERE a = 1;
> > +DELETE FROM testpub_gencol WHERE a = 100;
>
> Removed the 'DELETE' case.
>
> I have addressed the comments and updated the patch.

Few comments:
1) Current patch will not handle this scenario where subset of columns
are specified in the replica identity index:
CREATE TABLE t1 (a INT not null, a1 int not null, a2 int not null, b
INT GENERATED ALWAYS AS (a + 1) STORED NOT NULL);
create unique index idx1_t1 on t1(a, a1);

-- Replica identity will have subset of table columns
alter table t1 replica identity using index idx1_t1 ;
insert into t1 values(1,1,1);
create publication pub1 for table t1;

postgres=# update t1 set a = 2;
UPDATE 1

I felt we should throw an error in this case too.

2) Instead of checking if replica identity has a generated column, can
we check if the columns that will be published and the columns in the
replica identity matches:
+                       if (pubviaroot && relation->rd_rel->relispartition)
+                       {
+                               /* attribute name in the child table */
+                               char       *colname =
get_attname(relid, attnum, false);
+
+                               /*
+                                * Determine the attnum for the
attribute name in parent (we
+                                * are using the column list defined
on the parent).
+                                */
+                               attnum = get_attnum(publish_as_relid, colname);
+                               attgenerated =
get_attgenerated(publish_as_relid, attnum);
+                       }
+                       else
+                               attgenerated = get_attgenerated(relid, attnum);

3) publish_as_relid will be set accordingly based on pubviaroot, so it
need not be initialized:
+       Oid                     relid = RelationGetRelid(relation);
+       Oid                     publish_as_relid = RelationGetRelid(relation);
+       bool            result = false;

Regards,
Vignesh



On Fri, 15 Nov 2024 at 16:45, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> Thanks for providing the comments
>
> On Fri, 15 Nov 2024 at 10:59, vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Thu, 14 Nov 2024 at 15:51, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > >
> > > Thanks for providing the comments.
> > >
> > > On Thu, 14 Nov 2024 at 12:22, vignesh C <vignesh21@gmail.com> wrote:
> > > >
> > > > On Wed, 13 Nov 2024 at 11:15, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > > > >
> > > > > Thanks for providing the comments.
> > > > >
> > > > > On Tue, 12 Nov 2024 at 12:52, Zhijie Hou (Fujitsu)
> > > > > <houzj.fnst@fujitsu.com> wrote:
> > > > > >
> > > > > > On Friday, November 8, 2024 7:06 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi Amit,
> > > > > > >
> > > > > > > On Thu, 7 Nov 2024 at 11:37, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Nov 5, 2024 at 12:53 PM Shlok Kyal <shlok.kyal.oss@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > To avoid the issue, we can disallow UPDATE/DELETE on table with
> > > > > > > > > unpublished generated column as REPLICA IDENTITY. I have attached a
> > > > > > > > > patch for the same.
> > > > > > > > >
> > > > > > > >
> > > > > > > > +CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol; UPDATE
> > > > > > > > +testpub_gencol SET a = 100 WHERE a = 1;
> > > > > > > > +ERROR:  cannot update table "testpub_gencol"
> > > > > > > > +DETAIL:  Column list used by the publication does not cover the
> > > > > > > > replica identity.
> > > > > > > >
> > > > > > > > This is not a correct ERROR message as the publication doesn't have
> > > > > > > > any column list associated with it. You have added the code to detect
> > > > > > > > this in the column list code path which I think is not required. BTW,
> > > > > > > > you also need to consider the latest commit 7054186c4e for this. I
> > > > > > > > guess you need to keep another flag in PublicationDesc to detect this
> > > > > > > > and then give an appropriate ERROR.
> > > > > > >
> > > > > > > I have addressed the comments and provided an updated patch. Also, I am
> > > > > > > currently working to fix this issue in back branches.
> > > > > >
> > > > > > Thanks for the patch. I am reviewing it and have some initial comments:
> > > > > >
> > > > > >
> > > > > > 1.
> > > > > > +                       char attgenerated = get_attgenerated(relid, attnum);
> > > > > > +
> > > > > >
> > > > > > I think it's unnecessary to initialize attgenerated here because the value will
> > > > > > be overwritten if pubviaroot is true anyway. Also, the get_attgenerated()
> > > > > > is not cheap.
> > > > > >
> > > > > Fixed
> > > > >
> > > > > > 2.
> > > > > >
> > > > > > I think the patch missed to check the case when table is marked REPLICA
> > > > > > IDENTITY FULL, and generated column is not published:
> > > > > >
> > > > > > CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) STORED NOT NULL);
> > > > > > ALTER TABLE testpub_gencol REPLICA IDENTITY FULL;
> > > > > > CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol;
> > > > > > UPDATE testpub_gencol SET a = 2;
> > > > > >
> > > > > > I expected the UPDATE to fail in above case, but it can still pass after applying the patch.
> > > > > >
> > > > > Fixed
> > > > >
> > > > > > 3.
> > > > > >
> > > > > > +                * If the publication is FOR ALL TABLES we can skip the validation.
> > > > > > +                */
> > > > > >
> > > > > > This comment seems not clear to me, could you elaborate a bit more on this ?
> > > > > >
> > > > > I missed to handle the case FOR ALL TABLES. Have removed the comment.
> > > > >
> > > > > > 4.
> > > > > >
> > > > > > Also, I think the patch does not handle the FOR ALL TABLE case correctly:
> > > > > >
> > > > > > CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) STORED NOT NULL);
> > > > > > CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b);
> > > > > > ALTER TABLE testpub_gencol REPLICA IDENTITY USING index testpub_gencol_idx;
> > > > > > CREATE PUBLICATION pub_gencol FOR ALL TABLEs;
> > > > > > UPDATE testpub_gencol SET a = 2;
> > > > > >
> > > > > > I expected the UPDATE to fail in above case as well.
> > > > > >
> > > > > Fixed
> > > > >
> > > > > > 5.
> > > > > >
> > > > > > +       else if (cmd == CMD_UPDATE && !pubdesc.replident_has_valid_gen_cols)
> > > > > > +               ereport(ERROR,
> > > > > > +                               (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> > > > > > +                                errmsg("cannot update table \"%s\"",
> > > > > > +                                               RelationGetRelationName(rel)),
> > > > > > +                                errdetail("REPLICA IDENTITY consists of an unpublished generated
column.")));
> > > > > >
> > > > > > I think it would be better to use lower case "replica identity" to consistent
> > > > > > with other existing messages.
> > > > > >
> > > > > Fixed
> > > > >
> > > > > I have attached the updated patch here.
> > > >
> > > > Few comments:
> > > > 1) In the first check relation->rd_rel->relispartition also is checked
> > > > whereas in the below it is not checked, shouldn't the same check be
> > > > there below to avoid few of the function calls which are not required:
> > > > +       if (pubviaroot && relation->rd_rel->relispartition)
> > > > +       {
> > > > +               publish_as_relid =
> > > > GetTopMostAncestorInPublication(pubid, ancestors, NULL);
> > > > +
> > > > +               if (!OidIsValid(publish_as_relid))
> > > > +                       publish_as_relid = relid;
> > > > +       }
> > > > +
> > > >
> > > > +                       if (pubviaroot)
> > > > +                       {
> > > > +                               /* attribute name in the child table */
> > > > +                               char       *colname =
> > > > get_attname(relid, attnum, false);
> > > > +
> > > > +                               /*
> > > > +                                * Determine the attnum for the
> > > > attribute name in parent (we
> > > > +                                * are using the column list defined
> > > > on the parent).
> > > > +                                */
> > > > +                               attnum = get_attnum(publish_as_relid, colname);
> > > > +                               attgenerated =
> > > > get_attgenerated(publish_as_relid, attnum);
> > > > +                       }
> > > > +                       else
> > > > +                               attgenerated = get_attgenerated(relid, attnum);
> > > I have updated the if condititon
> > >
> > > > 2) I think we could use check_and_fetch_column_list to see that it is
> > > > not a column list publication instead of below code:
> > > > +       if (!puballtables)
> > > > +       {
> > > > +               tuple = SearchSysCache2(PUBLICATIONRELMAP,
> > > > +
> > > > ObjectIdGetDatum(publish_as_relid),
> > > > +
> > > > ObjectIdGetDatum(pubid));
> > > > +
> > > > +               if (!HeapTupleIsValid(tuple))
> > > > +                       return false;
> > > > +
> > > > +               (void) SysCacheGetAttr(PUBLICATIONRELMAP, tuple,
> > > > +
> > > > Anum_pg_publication_rel_prattrs,
> > > > +                                                          &isnull);
> > > > +
> > > > +               ReleaseSysCache(tuple);
> > > > +       }
> > > > +
> > > > +       if(puballtables || isnull)
> > >
> > > Yes we can use it. I have updated the patch.
> > >
> > > > 3) Since there is only a single statement, remove the enclosing parenthisis:
> > > > +               if (!pubform->pubgencols &&
> > > > +                       (pubform->pubupdate || pubform->pubdelete) &&
> > > > +                       replident_has_unpublished_gen_col(pubid,
> > > > relation, ancestors,
> > > > +
> > > >                    pubform->pubviaroot, pubform->puballtables))
> > > > +               {
> > > > +                       pubdesc->replident_has_valid_gen_cols = false;
> > > > +               }
> > > >
> > > Fixed
> > >
> > > > 4) Pgindent should be run there are few issues:
> > > > 4.a)
> > > > +extern bool replident_has_unpublished_gen_col(Oid pubid, Relation relation,
> > > > +
> > > >                          List *ancestors, bool pubviaroot, bool
> > > > puballtables);
> > > > 4.b)
> > > > +       }
> > > > +
> > > > +       if(puballtables || isnull)
> > > > +       {
> > > > +               int                     x;
> > > > +               Bitmapset  *idattrs = NULL;
> > > > 4.c)
> > > > +                * generated column we should error out.
> > > > +                */
> > > > +               if(relation->rd_rel->relreplident == REPLICA_IDENTITY_FULL &&
> > > > +                  relation->rd_att->constr &&
> > > > relation->rd_att->constr->has_generated_stored)
> > > > +                       result = true;
> > > > 4.d)
> > > > +               while ((x = bms_next_member(idattrs, x)) >= 0)
> > > > +               {
> > > > +                       AttrNumber      attnum = (x +
> > > > FirstLowInvalidHeapAttributeNumber);
> > > > +                       char attgenerated;
> > > >
> > > Fixed
> > >
> > > > 5) You could do this in a single line comment:
> > > > +               /*
> > > > +                * Check if any REPLICA IDENTITY column is an generated column.
> > > > +                */
> > > > +               while ((x = bms_next_member(idattrs, x)) >= 0)
> > > >
> > >
> > > Fixed
> > >
> > > > 6) I felt one of update or delete is enough in this case as the code
> > > > path is same:
> > > > +UPDATE testpub_gencol SET a = 100 WHERE a = 1;
> > > > +DELETE FROM testpub_gencol WHERE a = 100;
> > > > +
> > > > +-- error - generated column "b" is not published and REPLICA IDENTITY
> > > > is set FULL
> > > > +ALTER TABLE testpub_gencol REPLICA IDENTITY FULL;
> > > > +UPDATE testpub_gencol SET a = 100 WHERE a = 1;
> > > > +DELETE FROM testpub_gencol WHERE a = 100;
> > > > +DROP PUBLICATION pub_gencol;
> > > > +
> > > > +-- ok - generated column "b" is published and is part of REPLICA IDENTITY
> > > > +CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol with
> > > > (publish_generated_columns = true);
> > > > +UPDATE testpub_gencol SET a = 100 WHERE a = 1;
> > > > +DELETE FROM testpub_gencol WHERE a = 100;
> > >
> > > Removed the 'DELETE' case.
> > >
> > > I have addressed the comments and updated the patch.
> >
> > Few comments:
> > 1) Current patch will not handle this scenario where subset of columns
> > are specified in the replica identity index:
> > CREATE TABLE t1 (a INT not null, a1 int not null, a2 int not null, b
> > INT GENERATED ALWAYS AS (a + 1) STORED NOT NULL);
> > create unique index idx1_t1 on t1(a, a1);
> >
> > -- Replica identity will have subset of table columns
> > alter table t1 replica identity using index idx1_t1 ;
> > insert into t1 values(1,1,1);
> > create publication pub1 for table t1;
> >
> > postgres=# update t1 set a = 2;
> > UPDATE 1
> >
> > I felt we should throw an error in this case too.
> >
> I feel the above behaviour is expected. I think we can specify a
> subset of columns in the replica identity index as per documentation
> [1]. Thoughts?
>
> > 2) Instead of checking if replica identity has a generated column, can
> > we check if the columns that will be published and the columns in the
> > replica identity matches:
> > +                       if (pubviaroot && relation->rd_rel->relispartition)
> > +                       {
> > +                               /* attribute name in the child table */
> > +                               char       *colname =
> > get_attname(relid, attnum, false);
> > +
> > +                               /*
> > +                                * Determine the attnum for the
> > attribute name in parent (we
> > +                                * are using the column list defined
> > on the parent).
> > +                                */
> > +                               attnum = get_attnum(publish_as_relid, colname);
> > +                               attgenerated =
> > get_attgenerated(publish_as_relid, attnum);
> > +                       }
> > +                       else
> > +                               attgenerated = get_attgenerated(relid, attnum);
> >
> Fixed
>
> > 3) publish_as_relid will be set accordingly based on pubviaroot, so it
> > need not be initialized:
> > +       Oid                     relid = RelationGetRelid(relation);
> > +       Oid                     publish_as_relid = RelationGetRelid(relation);
> > +       bool            result = false;
> >
> Fixed
>
> I have addressed the comments and attached the updated patch.

Few comments:
1) I felt we can return from here after identifying it is replica
identity full instead of processing further:
+               /*
+                * REPLICA IDENTITY can be FULL only if there is no
column list for
+                * publication. If REPLICA IDENTITY is set as FULL and
relation has a
+                * generated column we should error out.
+                */
+               if (relation->rd_rel->relreplident == REPLICA_IDENTITY_FULL &&
+                       relation->rd_att->constr &&
+                       relation->rd_att->constr->has_generated_stored)
+                       result = true;

2) columns bms also should be freed here:
                     /* replica identity column, not covered by the
column list */
+                       if (!bms_is_member(attnum, columns))
+                       {
+                               result = true;
+                               break;
+                       }
+               }
+
+               bms_free(idattrs);

3) Error detail message should begin with upper case:
3.a)
@@ -809,6 +809,12 @@ CheckCmdReplicaIdentity(Relation rel, CmdType cmd)
                                 errmsg("cannot update table \"%s\"",
                                                RelationGetRelationName(rel)),
                                 errdetail("Column list used by the
publication does not cover the replica identity.")));
+       else if (cmd == CMD_UPDATE && !pubdesc.replident_has_valid_gen_cols)
+               ereport(ERROR,
+                               (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+                                errmsg("cannot update table \"%s\"",
+                                               RelationGetRelationName(rel)),
+                                errdetail("replica identity consists
of an unpublished generated column.")));
        else if (cmd == CMD_DELETE && !pubdesc.rf_valid_for_delete)
                ereport(ERROR,
                                (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),

3.b) Similarly here too:
                                 errdetail("Column list used by the
publication does not cover the replica identity.")));
+       else if (cmd == CMD_DELETE && !pubdesc.replident_has_valid_gen_cols)
+               ereport(ERROR,
+                               (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+                                errmsg("cannot delete from table \"%s\"",
+                                               RelationGetRelationName(rel)),
+                                errdetail("replica identity consists
of an unpublished generated column.")));

Regards,
Vignesh



On Sat, 16 Nov 2024 at 00:10, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> Thanks for providing the comments. I have fixed all the comments and
> attached the updated patch.

Few comments:
1) The replident_has_valid_gen_cols flag is set when either an update
or delete operation is published by the publication.
+               /*
+                * Check if all columns which are part of the REPLICA
IDENTITY is
+                * published.
+                */
+               if (!pubform->pubgencols &&
+                       (pubform->pubupdate || pubform->pubdelete) &&
+                       replident_has_unpublished_gen_col(pubid,
relation, ancestors,
+
                   pubform->pubviaroot))
+                       pubdesc->replident_has_valid_gen_cols = false;

You should create two separate flags—one for updates and one for
deletes—and set them accordingly, based on the operation being
published. This is similar to how the cols_valid_for_update and
cols_valid_for_delete flags are handled for column lists.

As shown in the test below, the delete operation fails even though the
delete operation is not published by the pub_gencol publication:
CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1)
STORED NOT NULL);
CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b);
ALTER TABLE testpub_gencol REPLICA IDENTITY USING index testpub_gencol_idx;

CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol with ( PUBLISH
= 'update');

-- This should be successful, since the publication is not publishing delete:
postgres=# delete from testpub_gencol ;
ERROR:  cannot delete from table "testpub_gencol"
DETAIL:  Replica identity consists of an unpublished generated column.

2) Since the code in replident_has_unpublished_gen_col and
pub_collist_contains_invalid_column is largely identical, we can
consolidate them into a single function that handles both column lists
and relation columns. The function name, header comments, and internal
comments should be updated accordingly.

Regards,
Vignesh



On Saturday, November 16, 2024 2:41 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> 
> >
> Thanks for providing the comments. I have fixed all the comments and attached
> the updated patch.

Thanks for the patch. I have one comment for the following codes:

+        /*
+         * Bitmap of published columns when publish_generated_columns is
+         * 'false' and no column list is specified.
+         */
+        columns = pub_form_cols_map(relation, false);
+
+        /*
+         * Attnums in the bitmap returned by RelationGetIndexAttrBitmap are
+         * offset (to handle system columns the usual way), while column list
+         * does not use offset, so we can't do bms_is_subset(). Instead, we
+         * have to loop over the idattrs and check all of them are in the
+         * list.
+         */
+        x = -1;
+        while ((x = bms_next_member(idattrs, x)) >= 0)
+        {
...
+        }


It doesn't seem necessary to build a bitmap and then iterator the replica
identity bitmap. Instead, we can efficiently traverse the columns as follows:

for (int i = 0; i < desc->natts; i++)
{
    Form_pg_attribute att = TupleDescAttr(desc, i);

    if (!att->attisdropped && att->attgenerated &&
        bms_is_member(att->attnum - FirstLowInvalidHeapAttributeNumber,
        idattrs))
    {
        result = true;
        break;
    }
}

Best Regards,
Hou zj

On Mon, 18 Nov 2024 at 08:57, Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Saturday, November 16, 2024 2:41 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >
> > >
> > Thanks for providing the comments. I have fixed all the comments and attached
> > the updated patch.
>
> Thanks for the patch. I have one comment for the following codes:
>
> +               /*
> +                * Bitmap of published columns when publish_generated_columns is
> +                * 'false' and no column list is specified.
> +                */
> +               columns = pub_form_cols_map(relation, false);
> +
> +               /*
> +                * Attnums in the bitmap returned by RelationGetIndexAttrBitmap are
> +                * offset (to handle system columns the usual way), while column list
> +                * does not use offset, so we can't do bms_is_subset(). Instead, we
> +                * have to loop over the idattrs and check all of them are in the
> +                * list.
> +                */
> +               x = -1;
> +               while ((x = bms_next_member(idattrs, x)) >= 0)
> +               {
> ...
> +               }
>
>
> It doesn't seem necessary to build a bitmap and then iterator the replica
> identity bitmap. Instead, we can efficiently traverse the columns as follows:
>
> for (int i = 0; i < desc->natts; i++)
> {
>         Form_pg_attribute att = TupleDescAttr(desc, i);
>
>         if (!att->attisdropped && att->attgenerated &&
>                 bms_is_member(att->attnum - FirstLowInvalidHeapAttributeNumber,
>         idattrs))
>         {
>                 result = true;
>                 break;
>         }
> }
>
> Best Regards,
> Hou zj

Thanks for providing the comments.
I agree with your approach and updated the same in the v7 patch [1].

[1]: https://www.postgresql.org/message-id/CANhcyEUi6T%2B0O83LEsG6jOJFL3BY_WD%3DvZ73bt0FRUcJHRt%3DsQ%40mail.gmail.com

Thanks and Regards,
Shlok Kyal



On Mon, 18 Nov 2024 at 13:07, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> Thanks for providing the comments.
>
> On Sat, 16 Nov 2024 at 17:29, vignesh C <vignesh21@gmail.com> wrote:
>
> I have attached the updated version of the patch.

Few comments:
1) We have the following check for cols validation and rf validation:
    /*
     * If we know everything is replicated and the column list is invalid
     * for update and delete, there is no point to check for other
     * publications.
     */
    if (pubdesc->pubactions.pubinsert && pubdesc->pubactions.pubupdate &&
      pubdesc->pubactions.pubdelete && pubdesc->pubactions.pubtruncate &&
      !pubdesc->cols_valid_for_update && !pubdesc->cols_valid_for_delete)
      break;

Should we do this for replident_valid_for_update and
replident_valid_for_delete also?

2) This variable is not required, there is a warning:
publicationcmds.c: In function ‘replident_has_unpublished_gen_col’:
publicationcmds.c:486:41: warning: unused variable ‘x’ [-Wunused-variable]

Regards,
Vignesh



On Tue, 19 Nov 2024 at 00:36, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> On Mon, 18 Nov 2024 at 19:19, vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Mon, 18 Nov 2024 at 13:07, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > >
> > > Thanks for providing the comments.
> > >
> > > On Sat, 16 Nov 2024 at 17:29, vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > I have attached the updated version of the patch.
> >
> > Few comments:
> > 1) We have the following check for cols validation and rf validation:
> >     /*
> >      * If we know everything is replicated and the column list is invalid
> >      * for update and delete, there is no point to check for other
> >      * publications.
> >      */
> >     if (pubdesc->pubactions.pubinsert && pubdesc->pubactions.pubupdate &&
> >       pubdesc->pubactions.pubdelete && pubdesc->pubactions.pubtruncate &&
> >       !pubdesc->cols_valid_for_update && !pubdesc->cols_valid_for_delete)
> >       break;
> >
> > Should we do this for replident_valid_for_update and
> > replident_valid_for_delete also?
> >
> Yes, we can add this check.
>
> > 2) This variable is not required, there is a warning:
> > publicationcmds.c: In function ‘replident_has_unpublished_gen_col’:
> > publicationcmds.c:486:41: warning: unused variable ‘x’ [-Wunused-variable]
> Fixed
>
> I have fixed the comments and attached an updated patch.

To ensure easy backtracking after the patch is committed, we should
include a brief explanation for the test removal in the commit
message:
diff --git a/src/test/subscription/t/100_bugs.pl
b/src/test/subscription/t/100_bugs.pl
index cb36ca7b16..794b928f50 100644
--- a/src/test/subscription/t/100_bugs.pl
+++ b/src/test/subscription/t/100_bugs.pl
@@ -377,8 +377,8 @@ $node_publisher->safe_psql('postgres', "DROP
PUBLICATION tap_pub_sch");
 $node_publisher->stop('fast');
 $node_subscriber->stop('fast');

-# The bug was that when the REPLICA IDENTITY FULL is used with dropped or
-# generated columns, we fail to apply updates and deletes
+# The bug was that when the REPLICA IDENTITY FULL is used with dropped
+# we fail to apply updates and deletes
 $node_publisher->rotate_logfile();
 $node_publisher->start();

@@ -389,18 +389,14 @@ $node_publisher->safe_psql(
  'postgres', qq(
  CREATE TABLE dropped_cols (a int, b_drop int, c int);
  ALTER TABLE dropped_cols REPLICA IDENTITY FULL;
- CREATE TABLE generated_cols (a int, b_gen int GENERATED ALWAYS AS (5
* a) STORED, c int);
- ALTER TABLE generated_cols REPLICA IDENTITY FULL;
- CREATE PUBLICATION pub_dropped_cols FOR TABLE dropped_cols, generated_cols;
+ CREATE PUBLICATION pub_dropped_cols FOR TABLE dropped_cols;

Regards,
Vignesh



On Tue, 19 Nov 2024 at 10:22, vignesh C <vignesh21@gmail.com> wrote:
>
> On Tue, 19 Nov 2024 at 00:36, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >
> > On Mon, 18 Nov 2024 at 19:19, vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > On Mon, 18 Nov 2024 at 13:07, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > > >
> > > > Thanks for providing the comments.
> > > >
> > > > On Sat, 16 Nov 2024 at 17:29, vignesh C <vignesh21@gmail.com> wrote:
> > > >
> > > > I have attached the updated version of the patch.
> > >
> > > Few comments:
> > > 1) We have the following check for cols validation and rf validation:
> > >     /*
> > >      * If we know everything is replicated and the column list is invalid
> > >      * for update and delete, there is no point to check for other
> > >      * publications.
> > >      */
> > >     if (pubdesc->pubactions.pubinsert && pubdesc->pubactions.pubupdate &&
> > >       pubdesc->pubactions.pubdelete && pubdesc->pubactions.pubtruncate &&
> > >       !pubdesc->cols_valid_for_update && !pubdesc->cols_valid_for_delete)
> > >       break;
> > >
> > > Should we do this for replident_valid_for_update and
> > > replident_valid_for_delete also?
> > >
> > Yes, we can add this check.
> >
> > > 2) This variable is not required, there is a warning:
> > > publicationcmds.c: In function ‘replident_has_unpublished_gen_col’:
> > > publicationcmds.c:486:41: warning: unused variable ‘x’ [-Wunused-variable]
> > Fixed
> >
> > I have fixed the comments and attached an updated patch.
>
> To ensure easy backtracking after the patch is committed, we should
> include a brief explanation for the test removal in the commit
> message:
> diff --git a/src/test/subscription/t/100_bugs.pl
> b/src/test/subscription/t/100_bugs.pl
> index cb36ca7b16..794b928f50 100644
> --- a/src/test/subscription/t/100_bugs.pl
> +++ b/src/test/subscription/t/100_bugs.pl
> @@ -377,8 +377,8 @@ $node_publisher->safe_psql('postgres', "DROP
> PUBLICATION tap_pub_sch");
>  $node_publisher->stop('fast');
>  $node_subscriber->stop('fast');
>
> -# The bug was that when the REPLICA IDENTITY FULL is used with dropped or
> -# generated columns, we fail to apply updates and deletes
> +# The bug was that when the REPLICA IDENTITY FULL is used with dropped
> +# we fail to apply updates and deletes
>  $node_publisher->rotate_logfile();
>  $node_publisher->start();
>
> @@ -389,18 +389,14 @@ $node_publisher->safe_psql(
>   'postgres', qq(
>   CREATE TABLE dropped_cols (a int, b_drop int, c int);
>   ALTER TABLE dropped_cols REPLICA IDENTITY FULL;
> - CREATE TABLE generated_cols (a int, b_gen int GENERATED ALWAYS AS (5
> * a) STORED, c int);
> - ALTER TABLE generated_cols REPLICA IDENTITY FULL;
> - CREATE PUBLICATION pub_dropped_cols FOR TABLE dropped_cols, generated_cols;
> + CREATE PUBLICATION pub_dropped_cols FOR TABLE dropped_cols;
>

I noticed that we can add 'publish_generated_columns = true' for the
case of generated column. So we won't need to remove the test. I have
made the changes in v9 patch [1].

[1]: https://www.postgresql.org/message-id/CANhcyEVCqrSYxAg_s99VYevUc4F-Lb9XowWUC2E5RG0i8RtZwA%40mail.gmail.com

Thanks and Regards,
Shlok Kyal



On Tuesday, November 19, 2024 3:15 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

> 
> I noticed that we can add 'publish_generated_columns = true' for the case of
> generated column. So we won't need to remove the test. I have made the
> changes in v9 patch [1].

I think this would unexpectedly change the original purpose of that testcase,
which is to test the bug mentioned in commit b797def.

Basically, I expected the new testcase to fail if we remove the codes fix added in
b797def, but the new testcase can pass even after that.

If we confirmed that that bug will never be triggered after applying the fix in
the thread, it would be better Tt remove that testcase and mention it in the
commit message.

> 
> [1]:
> https://www.postgresql.org/message-id/CANhcyEVCqrSYxAg_s99VYevUc4F
> -Lb9XowWUC2E5RG0i8RtZwA%40mail.gmail.com

Best Regards,
Hou zj

On Tuesday, November 19, 2024 5:10 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> 
> On Tuesday, November 19, 2024 3:15 PM Shlok Kyal
> <shlok.kyal.oss@gmail.com> wrote:
> 
> >
> > I noticed that we can add 'publish_generated_columns = true' for the case of
> > generated column. So we won't need to remove the test. I have made the
> > changes in v9 patch [1].
> 
> I think this would unexpectedly change the original purpose of that testcase,
> which is to test the bug mentioned in commit b797def.
> 
> Basically, I expected the new testcase to fail if we remove the codes fix added in
> b797def, but the new testcase can pass even after that.

Sorry, a typo here. I meant the commit adedf54 instead of b797def.

> 
> If we confirmed that that bug will never be triggered after applying the fix in
> the thread, it would be better Tt remove that testcase and mention it in the
> commit message.

Best Regards,
Hou zj


On Tue, 19 Nov 2024 at 19:12, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> On Tue, 19 Nov 2024 at 14:39, Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Tuesday, November 19, 2024 3:15 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >
> > >
> > > I noticed that we can add 'publish_generated_columns = true' for the case of
> > > generated column. So we won't need to remove the test. I have made the
> > > changes in v9 patch [1].
> >
> > I think this would unexpectedly change the original purpose of that testcase,
> > which is to test the bug mentioned in commit b797def.
> >
> > Basically, I expected the new testcase to fail if we remove the codes fix added in
> > b797def, but the new testcase can pass even after that.
> >
> > If we confirmed that that bug will never be triggered after applying the fix in
> > the thread, it would be better Tt remove that testcase and mention it in the
> > commit message.
> >
> I agree that we can remove the test. I debugged and found the test
> modified in above patch does not hit the condition added in commit
> adedf54.
> Also, according to me we cannot trigger the bug after the fix in this
> thread. So, I think we can remove the testcase.
>
> I have attached the latest patch with an updated commit message and
> also removed the testcase.

Few comments:
1) This seems like a copy paste from
pub_collist_contains_invalid_column, the comments should be updated
according to this function:
+       /*
+        * For a partition, if pubviaroot is true, find the topmost
ancestor that
+        * is published via this publication as we need to use its
column list for
+        * the changes.
+        *
+        * Note that even though the column list used is for an ancestor, the
+        * REPLICA IDENTITY used will be for the actual child table.
+        */
+       if (pubviaroot && relation->rd_rel->relispartition)

2) Here drop index is not required as the drop table will take care of
dropping the index too:
+UPDATE testpub_gencol SET a = 100 WHERE a = 1;
+DROP PUBLICATION pub_gencol;
+DROP INDEX testpub_gencol_idx;
+DROP TABLE testpub_gencol;
+RESET client_min_messages;

Regards,
Vignesh



On Thu, Nov 21, 2024 at 5:30 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>

Review comments:
===============
1.
+
+ /*
+ * true if all generated columns which are part of replica identity are
+ * published or the publication actions do not include UPDATE or DELETE.
+ */
+ bool replident_valid_for_update;
+ bool replident_valid_for_delete;

These are too generic names for the purpose they are used. How about
instead name them as gencols_valid_for_update and
gencols_valid_for_delete?

2. The comments atop RelationBuildPublicationDesc() is only about row
filter. We should update it for column list and generated columns as
well.

3. It is better to merge the functionality of the invalid column list
and unpublished generated columns as proposed by Hou-San above.



--
With Regards,
Amit Kapila.



On Fri, 29 Nov 2024 at 13:38, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> On Thu, 28 Nov 2024 at 16:38, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Nov 21, 2024 at 5:30 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > >
> >
> > Review comments:
> > ===============
> > 1.
> > +
> > + /*
> > + * true if all generated columns which are part of replica identity are
> > + * published or the publication actions do not include UPDATE or DELETE.
> > + */
> > + bool replident_valid_for_update;
> > + bool replident_valid_for_delete;
> >
> > These are too generic names for the purpose they are used. How about
> > instead name them as gencols_valid_for_update and
> > gencols_valid_for_delete?
> >
> > 2. The comments atop RelationBuildPublicationDesc() is only about row
> > filter. We should update it for column list and generated columns as
> > well.
> >
> > 3. It is better to merge the functionality of the invalid column list
> > and unpublished generated columns as proposed by Hou-San above.
> >
>
> Thanks for reviewing the patch. I have addressed the comments and
> updated the patch.

Shouldn't unpublished_gen_col be set only if the column list is absent?
-               /* Transform the column list datum to a bitmapset. */
-               columns = pub_collist_to_bitmapset(NULL, datum, NULL);
+               /* Check if generated column is part of REPLICA IDENTITY */
+               *unpublished_gen_col |= att->attgenerated;

-               /* Remember columns that are part of the REPLICA IDENTITY */
-               idattrs = RelationGetIndexAttrBitmap(relation,
-
                  INDEX_ATTR_BITMAP_IDENTITY_KEY);
+               if (columns == NULL)
+               {
+                       /* Break the loop if unpublished generated
columns exist. */
+                       if (*unpublished_gen_col)
+                               break;
+
+                       /* Skip validating the column list since it is
not defined */
+                       continue;
+               }


This scenario worked in v11 but fails in v12:
CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1)
STORED NOT NULL);
CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b);
ALTER TABLE testpub_gencol REPLICA IDENTITY USING index testpub_gencol_idx;
CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol(b);
UPDATE testpub_gencol SET a = 100 WHERE a = 1;

Regards,
Vignesh



Hi,

I was looking at the recently pushed code [1]. IMO the wording of some
of those new error messages of function CheckCmdReplicaIdentity() is
not quite correct.

According to my understanding, and according also to Chat-GPT:
------
The sentence "Replica identity consists of an unpublished generated
column." implies that the entire replica identity is made up of an
unpublished generated column and nothing else.

This is because the phrase "consists of" typically indicates a
complete composition, meaning that the replica identity is exclusively
composed of the unpublished generated column in this context.
------

IIUC, these errors are intended for when there is *any* unpublished
generated column found in the RI, and the RI might also have other
columns in it generated or otherwise. So, I think those error messages
saying "consists of" should be reworded like below, or similar:
* errdetail("Replica identity includes an unpublished generated column.")));
* errdetail("Replica identity has one or more unpublished generated
columns.")));
* errdetail("One or more unpublished generated columns are in the
Replica identity.")));
* ...

======
[]1 https://github.com/postgres/postgres/commit/87ce27de6963091f4a365f80bcdb06b9da098f00

Kind Regards,
Peter Smith.
Fujitsu Australia



On Thu, Dec 5, 2024 at 7:34 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> IIUC, these errors are intended for when there is *any* unpublished
> generated column found in the RI, and the RI might also have other
> columns in it generated or otherwise. So, I think those error messages
> saying "consists of" should be reworded like below, or similar:
> * errdetail("Replica identity includes an unpublished generated column.")));
> * errdetail("Replica identity has one or more unpublished generated
> columns.")));
> * errdetail("One or more unpublished generated columns are in the
> Replica identity.")));
> * ...
>

How about a bit clearer: "Replica identity must not contain any
unpublished generated column."?

--
With Regards,
Amit Kapila.



On Thu, Dec 5, 2024 at 2:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Dec 5, 2024 at 7:34 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > IIUC, these errors are intended for when there is *any* unpublished
> > generated column found in the RI, and the RI might also have other
> > columns in it generated or otherwise. So, I think those error messages
> > saying "consists of" should be reworded like below, or similar:
> > * errdetail("Replica identity includes an unpublished generated column.")));
> > * errdetail("Replica identity has one or more unpublished generated
> > columns.")));
> > * errdetail("One or more unpublished generated columns are in the
> > Replica identity.")));
> > * ...
> >
>
> How about a bit clearer: "Replica identity must not contain any
> unpublished generated column."?
>

Yes, that is better.

Compare:
Replica identity contains unpublished generated columns.
Replica identity must not contain unpublished generated columns.

Maybe it is just my imagination, but the "must not" version feels more
like it implies the Replica Identify is in the wrong, whereas I think
it is most likely that the Replica Identity is correct, and the real
problem is that the user just forgot to publish the generated column.

======
Kind Regards
Peter Smith.
Fujitsu Australia



On Thu, Dec 5, 2024 at 9:32 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Thu, Dec 5, 2024 at 2:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Dec 5, 2024 at 7:34 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > IIUC, these errors are intended for when there is *any* unpublished
> > > generated column found in the RI, and the RI might also have other
> > > columns in it generated or otherwise. So, I think those error messages
> > > saying "consists of" should be reworded like below, or similar:
> > > * errdetail("Replica identity includes an unpublished generated column.")));
> > > * errdetail("Replica identity has one or more unpublished generated
> > > columns.")));
> > > * errdetail("One or more unpublished generated columns are in the
> > > Replica identity.")));
> > > * ...
> > >
> >
> > How about a bit clearer: "Replica identity must not contain any
> > unpublished generated column."?
> >
>
> Yes, that is better.
>
> Compare:
> Replica identity contains unpublished generated columns.
> Replica identity must not contain unpublished generated columns.
>
> Maybe it is just my imagination, but the "must not" version feels more
> like it implies the Replica Identify is in the wrong, whereas I think
> it is most likely that the Replica Identity is correct, and the real
> problem is that the user just forgot to publish the generated column.
>

Either way is possible and I find the message "Replica identity must
not contain unpublished generated columns." clearer as compared to the
other option.

--
With Regards,
Amit Kapila.



On Thu, Dec 5, 2024 at 8:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Dec 5, 2024 at 9:32 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Thu, Dec 5, 2024 at 2:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Dec 5, 2024 at 7:34 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > > >
> > > > IIUC, these errors are intended for when there is *any* unpublished
> > > > generated column found in the RI, and the RI might also have other
> > > > columns in it generated or otherwise. So, I think those error messages
> > > > saying "consists of" should be reworded like below, or similar:
> > > > * errdetail("Replica identity includes an unpublished generated column.")));
> > > > * errdetail("Replica identity has one or more unpublished generated
> > > > columns.")));
> > > > * errdetail("One or more unpublished generated columns are in the
> > > > Replica identity.")));
> > > > * ...
> > > >
> > >
> > > How about a bit clearer: "Replica identity must not contain any
> > > unpublished generated column."?
> > >
> >
> > Yes, that is better.
> >
> > Compare:
> > Replica identity contains unpublished generated columns.
> > Replica identity must not contain unpublished generated columns.
> >
> > Maybe it is just my imagination, but the "must not" version feels more
> > like it implies the Replica Identify is in the wrong, whereas I think
> > it is most likely that the Replica Identity is correct, and the real
> > problem is that the user just forgot to publish the generated column.
> >
>
> Either way is possible and I find the message "Replica identity must
> not contain unpublished generated columns." clearer as compared to the
> other option.
>

OK, let's go with that.

======
Kind Regards
Peter Smith.
Fujitsu Australia.



On Fri, Dec 6, 2024 at 11:10 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> Thanks Peter, for pointing this out.
> I also feel that the error message suggested by Amit would be better.
> I have attached a patch for the same.
>

Pushed.

--
With Regards,
Amit Kapila.