Thread: [BUG] Unexpected action when publishing partition tables

[BUG] Unexpected action when publishing partition tables

From
"tanghy.fnst@fujitsu.com"
Date:
Hi

I met a problem when using logical replication. Maybe it's a bug in logical replication.
When publishing a partition table without replica identity, update
or delete operation can be successful in some cases.

For example:
create table tbl1 (a int) partition by range ( a );
create table tbl1_part1 partition of tbl1 for values from (1) to (101);
create table tbl1_part2 partition of tbl1 for values from (101) to (200);
insert into tbl1 select generate_series(1, 10);
delete from tbl1 where a=1;
create publication pub for table tbl1;
delete from tbl1 where a=2;

The last DELETE statement can be executed successfully, but it should report
error message about missing a replica identity.

I found this problem on HEAD and I could reproduce this problem at PG13 and
PG14. (Logical replication of partition table was introduced in PG13.)

Regards
Tang



RE: [BUG] Unexpected action when publishing partition tables

From
"houzj.fnst@fujitsu.com"
Date:
From Mon, Sep 6, 2021 3:59 PM tanghy.fnst@fujitsu.com <tanghy.fnst@fujitsu.com> wrote:
> I met a problem when using logical replication. Maybe it's a bug in logical
> replication.
> When publishing a partition table without replica identity, update
> or delete operation can be successful in some cases.
>
> For example:
> create table tbl1 (a int) partition by range ( a );
> create table tbl1_part1 partition of tbl1 for values from (1) to (101);
> create table tbl1_part2 partition of tbl1 for values from (101) to (200);
> insert into tbl1 select generate_series(1, 10);
> delete from tbl1 where a=1;
> create publication pub for table tbl1;
> delete from tbl1 where a=2;
>
> The last DELETE statement can be executed successfully, but it should report
> error message about missing a replica identity.
>
> I found this problem on HEAD and I could reproduce this problem at PG13 and
> PG14. (Logical replication of partition table was introduced in PG13.)

I can reproduce this bug.

I think the reason is it didn't invalidate all the leaf partitions' relcache
when add a partitioned table to the publication, so the publication info was
not rebuilt.

The following code only invalidate the target table:
---
PublicationAddTables
    publication_add_relation
        /* Invalidate relcache so that publication info is rebuilt. */
        CacheInvalidateRelcache(targetrel);
---

In addition, this problem can happen in both ADD TABLE, DROP
TABLE, and SET TABLE cases, so we need to invalidate the leaf partitions'
recache in all these cases.

Attach patches to fix this bug. The 0001 patch is borrowed from another
thread[1] which make the existing relation cache invalidation code into a
common function. The 0002 patch is to invalidate the leaf partitions' relcache.

[1] https://www.postgresql.org/message-id/CALDaNm27bs40Rxpy4oKfV97UgsPG%3DvVoZ5bj9pP_4BxnO-6DYA%40mail.gmail.com

Best regards,
Hou zj

Attachment

Re: [BUG] Unexpected action when publishing partition tables

From
Amit Kapila
Date:
On Mon, Sep 6, 2021 at 1:49 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> From Mon, Sep 6, 2021 3:59 PM tanghy.fnst@fujitsu.com <tanghy.fnst@fujitsu.com> wrote:
> > I met a problem when using logical replication. Maybe it's a bug in logical
> > replication.
> > When publishing a partition table without replica identity, update
> > or delete operation can be successful in some cases.
> >
> > For example:
> > create table tbl1 (a int) partition by range ( a );
> > create table tbl1_part1 partition of tbl1 for values from (1) to (101);
> > create table tbl1_part2 partition of tbl1 for values from (101) to (200);
> > insert into tbl1 select generate_series(1, 10);
> > delete from tbl1 where a=1;
> > create publication pub for table tbl1;
> > delete from tbl1 where a=2;
> >
> > The last DELETE statement can be executed successfully, but it should report
> > error message about missing a replica identity.
> >
> > I found this problem on HEAD and I could reproduce this problem at PG13 and
> > PG14. (Logical replication of partition table was introduced in PG13.)

Adding Amit L and Peter E who were involved in this work (commit:
17b9e7f9) to see if they have opinions on this matter.

>
> I can reproduce this bug.
>
> I think the reason is it didn't invalidate all the leaf partitions' relcache
> when add a partitioned table to the publication, so the publication info was
> not rebuilt.
>
> The following code only invalidate the target table:
> ---
> PublicationAddTables
>         publication_add_relation
>                 /* Invalidate relcache so that publication info is rebuilt. */
>                 CacheInvalidateRelcache(targetrel);
> ---
>
> In addition, this problem can happen in both ADD TABLE, DROP
> TABLE, and SET TABLE cases, so we need to invalidate the leaf partitions'
> recache in all these cases.
>

Few comments:
=============
  {
@@ -664,7 +673,13 @@ PublicationDropTables(Oid pubid, List *rels, bool
missing_ok)

  ObjectAddressSet(obj, PublicationRelRelationId, prid);
  performDeletion(&obj, DROP_CASCADE, 0);
+
+ relids = GetPubPartitionOptionRelations(relids, PUBLICATION_PART_LEAF,
+ relid);
  }
+
+ /* Invalidate relcache so that publication info is rebuilt. */
+ InvalidatePublicationRels(relids);
 }

We already register the invalidation for the main table in
RemovePublicationRelById which is called via performDeletion. I think
it is better to perform invalidation for partitions at that place.
Similarly is there a reason for not doing invalidations of partitions
in publication_add_relation()?

-- 
With Regards,
Amit Kapila.



RE: [BUG] Unexpected action when publishing partition tables

From
"houzj.fnst@fujitsu.com"
Date:
From Tues, Sep 7, 2021 12:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Sep 6, 2021 at 1:49 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:
> >
> > I can reproduce this bug.
> >
> > I think the reason is it didn't invalidate all the leaf partitions'
> > relcache when add a partitioned table to the publication, so the
> > publication info was not rebuilt.
> >
> > The following code only invalidate the target table:
> > ---
> > PublicationAddTables
> >         publication_add_relation
> >                 /* Invalidate relcache so that publication info is rebuilt. */
> >                 CacheInvalidateRelcache(targetrel);
> > ---
> >
> > In addition, this problem can happen in both ADD TABLE, DROP TABLE,
> > and SET TABLE cases, so we need to invalidate the leaf partitions'
> > recache in all these cases.
> >
> 
> Few comments:
> =============
>   {
> @@ -664,7 +673,13 @@ PublicationDropTables(Oid pubid, List *rels, bool
> missing_ok)
> 
>   ObjectAddressSet(obj, PublicationRelRelationId, prid);
>   performDeletion(&obj, DROP_CASCADE, 0);
> +
> + relids = GetPubPartitionOptionRelations(relids, PUBLICATION_PART_LEAF,
> + relid);
>   }
> +
> + /* Invalidate relcache so that publication info is rebuilt. */
> + InvalidatePublicationRels(relids);
>  }
> 
> We already register the invalidation for the main table in
> RemovePublicationRelById which is called via performDeletion. I think it is
> better to perform invalidation for partitions at that place.
> Similarly is there a reason for not doing invalidations of partitions in
> publication_add_relation()?

Thanks for the comment. I originally intended to reduce the number of invalid
message when add/drop serval tables while each table has lots of partitions which
could exceed the MAX_RELCACHE_INVAL_MSGS. But that seems a rare case, so ,
I changed the code as suggested.

Attach new version patches which addressed the comment.

Best regards,
Hou zj

Attachment

RE: [BUG] Unexpected action when publishing partition tables

From
"tanghy.fnst@fujitsu.com"
Date:
> > > ---
> > > PublicationAddTables
> > >         publication_add_relation
> > >                 /* Invalidate relcache so that publication info is rebuilt. */
> > >                 CacheInvalidateRelcache(targetrel);
> > > ---
> > >
> > > In addition, this problem can happen in both ADD TABLE, DROP TABLE,
> > > and SET TABLE cases, so we need to invalidate the leaf partitions'
> > > recache in all these cases.
> > >
> >
> > Few comments:
> > =============
> >   {
> > @@ -664,7 +673,13 @@ PublicationDropTables(Oid pubid, List *rels, bool
> > missing_ok)
> >
> >   ObjectAddressSet(obj, PublicationRelRelationId, prid);
> >   performDeletion(&obj, DROP_CASCADE, 0);
> > +
> > + relids = GetPubPartitionOptionRelations(relids, PUBLICATION_PART_LEAF,
> > + relid);
> >   }
> > +
> > + /* Invalidate relcache so that publication info is rebuilt. */
> > + InvalidatePublicationRels(relids);
> >  }
> >
> > We already register the invalidation for the main table in
> > RemovePublicationRelById which is called via performDeletion. I think it is
> > better to perform invalidation for partitions at that place.
> > Similarly is there a reason for not doing invalidations of partitions in
> > publication_add_relation()?
> 
> Thanks for the comment. I originally intended to reduce the number of invalid
> message when add/drop serval tables while each table has lots of partitions which
> could exceed the MAX_RELCACHE_INVAL_MSGS. But that seems a rare case, so ,
> I changed the code as suggested.
> 
> Attach new version patches which addressed the comment.

Thanks for your patch. I confirmed that the problem I reported was fixed.

Besides, Your v2 patch also fixed an existing a problem about "DROP PUBLICATION" on HEAD.
(Publication was dropped but it still reported errors about replica identity when trying to
update or delete a partition table.)

Regards
Tang

Re: [BUG] Unexpected action when publishing partition tables

From
vignesh C
Date:
On Tue, Sep 7, 2021 at 11:38 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> From Tues, Sep 7, 2021 12:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Mon, Sep 6, 2021 at 1:49 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:
> > >
> > > I can reproduce this bug.
> > >
> > > I think the reason is it didn't invalidate all the leaf partitions'
> > > relcache when add a partitioned table to the publication, so the
> > > publication info was not rebuilt.
> > >
> > > The following code only invalidate the target table:
> > > ---
> > > PublicationAddTables
> > >         publication_add_relation
> > >                 /* Invalidate relcache so that publication info is rebuilt. */
> > >                 CacheInvalidateRelcache(targetrel);
> > > ---
> > >
> > > In addition, this problem can happen in both ADD TABLE, DROP TABLE,
> > > and SET TABLE cases, so we need to invalidate the leaf partitions'
> > > recache in all these cases.
> > >
> >
> > Few comments:
> > =============
> >   {
> > @@ -664,7 +673,13 @@ PublicationDropTables(Oid pubid, List *rels, bool
> > missing_ok)
> >
> >   ObjectAddressSet(obj, PublicationRelRelationId, prid);
> >   performDeletion(&obj, DROP_CASCADE, 0);
> > +
> > + relids = GetPubPartitionOptionRelations(relids, PUBLICATION_PART_LEAF,
> > + relid);
> >   }
> > +
> > + /* Invalidate relcache so that publication info is rebuilt. */
> > + InvalidatePublicationRels(relids);
> >  }
> >
> > We already register the invalidation for the main table in
> > RemovePublicationRelById which is called via performDeletion. I think it is
> > better to perform invalidation for partitions at that place.
> > Similarly is there a reason for not doing invalidations of partitions in
> > publication_add_relation()?
>
> Thanks for the comment. I originally intended to reduce the number of invalid
> message when add/drop serval tables while each table has lots of partitions which
> could exceed the MAX_RELCACHE_INVAL_MSGS. But that seems a rare case, so ,
> I changed the code as suggested.
>
> Attach new version patches which addressed the comment.

Thanks for fixing this issue. The bug gets fixed by the patch, I did
not find any issues in my testing.
I just had one minor comment:

We could clean the table at the end by calling DROP TABLE testpub_parted2:
+-- still fail, because parent's publication replicates updates
+UPDATE testpub_parted2 SET a = 2;
+ERROR:  cannot update table "testpub_parted2" because it does not
have a replica identity and publishes updates
+HINT:  To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.
+ALTER PUBLICATION testpub_forparted DROP TABLE testpub_parted;
+-- works again, because update is no longer replicated
+UPDATE testpub_parted2 SET a = 2;

Regards,
Vignesh



RE: [BUG] Unexpected action when publishing partition tables

From
"houzj.fnst@fujitsu.com"
Date:
On Tuesday, September 14, 2021 10:41 PM vignesh C <vignesh21@gmail.com> wrote:
> On Tue, Sep 7, 2021 at 11:38 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:
> >
> >
> > Attach new version patches which addressed the comment.
> 
> Thanks for fixing this issue. The bug gets fixed by the patch, I did not find any
> issues in my testing.
> I just had one minor comment:
> 
> We could clean the table at the end by calling DROP TABLE testpub_parted2:
> +-- still fail, because parent's publication replicates updates UPDATE
> +testpub_parted2 SET a = 2;
> +ERROR:  cannot update table "testpub_parted2" because it does not
> have a replica identity and publishes updates
> +HINT:  To enable updating the table, set REPLICA IDENTITY using ALTER
> TABLE.
> +ALTER PUBLICATION testpub_forparted DROP TABLE testpub_parted;
> +-- works again, because update is no longer replicated UPDATE
> +testpub_parted2 SET a = 2;

Thanks for the comment.
Attach new version patches which clean the table at the end.

Best regards,
Hou zj

Attachment

Re: [BUG] Unexpected action when publishing partition tables

From
Amit Kapila
Date:
On Thu, Sep 16, 2021 at 7:15 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> On Tuesday, September 14, 2021 10:41 PM vignesh C <vignesh21@gmail.com> wrote:
> > On Tue, Sep 7, 2021 at 11:38 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:
>
> Thanks for the comment.
> Attach new version patches which clean the table at the end.
>

+ * For partitioned table contained in the publication, we must
+ * invalidate all partitions contained in the respective partition
+ * trees, not just the one explicitly mentioned in the publication.

Can we slightly change the above comment as: "For the partitioned
tables, we must invalidate all partitions contained in the respective
partition hierarchies, not just the one explicitly mentioned in the
publication. This is required because we implicitly publish the child
tables when the parent table is published."

Apart from this, the patch looks good to me.

I think we need to back-patch this till v13. What do you think? If
yes, then can you please prepare and test the patches for
back-branches? Does anyone else have opinions on back-patching this?

I think this is not a show-stopper bug, so even if we decide to
back-patch, I will do it next week after 14 RC1.

-- 
With Regards,
Amit Kapila.



RE: [BUG] Unexpected action when publishing partition tables

From
"houzj.fnst@fujitsu.com"
Date:
On Thursday, September 16, 2021 6:05 PM Amit Kapila <amit.kapila16@gmail.com>
> > On Tuesday, September 14, 2021 10:41 PM vignesh C <vignesh21@gmail.com> wrote:
> > > On Tue, Sep 7, 2021 at 11:38 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:
> >
> > Thanks for the comment.
> > Attach new version patches which clean the table at the end.
> >
> 
> + * For partitioned table contained in the publication, we must
> + * invalidate all partitions contained in the respective partition
> + * trees, not just the one explicitly mentioned in the publication.
> 
> Can we slightly change the above comment as: "For the partitioned tables, we
> must invalidate all partitions contained in the respective partition hierarchies,
> not just the one explicitly mentioned in the publication. This is required
> because we implicitly publish the child tables when the parent table is
> published."
> 
> Apart from this, the patch looks good to me.
> 
> I think we need to back-patch this till v13. What do you think? 

I agreed.

Attach patches for back-branch, each has passed regression tests and pgindent.

Best regards,
Hou zj


Attachment

Re: [BUG] Unexpected action when publishing partition tables

From
Amit Kapila
Date:
On Fri, Sep 17, 2021 at 11:36 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> On Thursday, September 16, 2021 6:05 PM Amit Kapila <amit.kapila16@gmail.com>
> > > On Tuesday, September 14, 2021 10:41 PM vignesh C <vignesh21@gmail.com> wrote:
> > > > On Tue, Sep 7, 2021 at 11:38 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:
> > >
> > > Thanks for the comment.
> > > Attach new version patches which clean the table at the end.
> > >
> >
> > + * For partitioned table contained in the publication, we must
> > + * invalidate all partitions contained in the respective partition
> > + * trees, not just the one explicitly mentioned in the publication.
> >
> > Can we slightly change the above comment as: "For the partitioned tables, we
> > must invalidate all partitions contained in the respective partition hierarchies,
> > not just the one explicitly mentioned in the publication. This is required
> > because we implicitly publish the child tables when the parent table is
> > published."
> >
> > Apart from this, the patch looks good to me.
> >
> > I think we need to back-patch this till v13. What do you think?
>
> I agreed.
>
> Attach patches for back-branch, each has passed regression tests and pgindent.
>

Thanks, your patches look good to me. I'll push them sometime next
week after Tuesday unless there are any comments.

-- 
With Regards,
Amit Kapila.



Re: [BUG] Unexpected action when publishing partition tables

From
Amit Kapila
Date:
On Fri, Sep 17, 2021 at 4:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Thanks, your patches look good to me. I'll push them sometime next
> week after Tuesday unless there are any comments.
>

Pushed.

-- 
With Regards,
Amit Kapila.



Re: [BUG] Unexpected action when publishing partition tables

From
Amit Langote
Date:
On Fri, Sep 17, 2021 at 7:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Sep 17, 2021 at 11:36 AM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> > On Thursday, September 16, 2021 6:05 PM Amit Kapila <amit.kapila16@gmail.com>
> > > > On Tuesday, September 14, 2021 10:41 PM vignesh C <vignesh21@gmail.com> wrote:
> > > > > On Tue, Sep 7, 2021 at 11:38 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:
> > > >
> > > > Thanks for the comment.
> > > > Attach new version patches which clean the table at the end.
> > > >
> > >
> > > + * For partitioned table contained in the publication, we must
> > > + * invalidate all partitions contained in the respective partition
> > > + * trees, not just the one explicitly mentioned in the publication.
> > >
> > > Can we slightly change the above comment as: "For the partitioned tables, we
> > > must invalidate all partitions contained in the respective partition hierarchies,
> > > not just the one explicitly mentioned in the publication. This is required
> > > because we implicitly publish the child tables when the parent table is
> > > published."
> > >
> > > Apart from this, the patch looks good to me.
> > >
> > > I think we need to back-patch this till v13. What do you think?
> >
> > I agreed.
> >
> > Attach patches for back-branch, each has passed regression tests and pgindent.
>
> Thanks, your patches look good to me. I'll push them sometime next
> week after Tuesday unless there are any comments.

Thanks Amit, Tang, and Hou for this.

Sorry that I didn't comment on this earlier, but I think either
GetPubPartitionOptionRelations() or InvalidatePublicationRels()
introduced in the commit 4548c76738b should lock the partitions, just
like to the parent partitioned table would be, before invalidating
them.  There may be some hazards to invalidating a relation without
locking it.

For example, maybe add a 'lockmode' parameter to
GetPubPartitionOptionRelations() which it passes down to
find_all_inheritors() instead of NoLock as now.  And make all sites
except GetPublicationRelations() pass ShareUpdateExclusiveLock for it.
Maybe like the attached.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: [BUG] Unexpected action when publishing partition tables

From
Amit Kapila
Date:
On Thu, Oct 7, 2021 at 12:39 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Fri, Sep 17, 2021 at 7:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Fri, Sep 17, 2021 at 11:36 AM houzj.fnst@fujitsu.com
> > <houzj.fnst@fujitsu.com> wrote:
> > > On Thursday, September 16, 2021 6:05 PM Amit Kapila <amit.kapila16@gmail.com>
> > > > > On Tuesday, September 14, 2021 10:41 PM vignesh C <vignesh21@gmail.com> wrote:
> > > > > > On Tue, Sep 7, 2021 at 11:38 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:
> > > > >
> > > > > Thanks for the comment.
> > > > > Attach new version patches which clean the table at the end.
> > > > >
> > > >
> > > > + * For partitioned table contained in the publication, we must
> > > > + * invalidate all partitions contained in the respective partition
> > > > + * trees, not just the one explicitly mentioned in the publication.
> > > >
> > > > Can we slightly change the above comment as: "For the partitioned tables, we
> > > > must invalidate all partitions contained in the respective partition hierarchies,
> > > > not just the one explicitly mentioned in the publication. This is required
> > > > because we implicitly publish the child tables when the parent table is
> > > > published."
> > > >
> > > > Apart from this, the patch looks good to me.
> > > >
> > > > I think we need to back-patch this till v13. What do you think?
> > >
> > > I agreed.
> > >
> > > Attach patches for back-branch, each has passed regression tests and pgindent.
> >
> > Thanks, your patches look good to me. I'll push them sometime next
> > week after Tuesday unless there are any comments.
>
> Thanks Amit, Tang, and Hou for this.
>
> Sorry that I didn't comment on this earlier, but I think either
> GetPubPartitionOptionRelations() or InvalidatePublicationRels()
> introduced in the commit 4548c76738b should lock the partitions, just
> like to the parent partitioned table would be, before invalidating
> them.  There may be some hazards to invalidating a relation without
> locking it.
>

I see your point but then on the same lines didn't the existing code
"for all tables" case (where we call CacheInvalidateRelcacheAll()
without locking all relations) have a similar problem. Also, in your
patch, you are assuming that the callers of GetPublicationRelations()
will lock all the relations but what when it gets called from
AlterPublicationOptions()?

-- 
With Regards,
Amit Kapila.



Re: [BUG] Unexpected action when publishing partition tables

From
Amit Langote
Date:
Hi Amit,

On Fri, Oct 8, 2021 at 12:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Oct 7, 2021 at 12:39 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > Sorry that I didn't comment on this earlier, but I think either
> > GetPubPartitionOptionRelations() or InvalidatePublicationRels()
> > introduced in the commit 4548c76738b should lock the partitions, just
> > like to the parent partitioned table would be, before invalidating
> > them.  There may be some hazards to invalidating a relation without
> > locking it.
>
> I see your point but then on the same lines didn't the existing code
> "for all tables" case (where we call CacheInvalidateRelcacheAll()
> without locking all relations) have a similar problem.

There might be.  I checked to see how other callers/modules use
CacheInvalidateRelcacheAll(), though it seems that only the functions
in publicationcmds.c use it or really was invented in 665d1fad99e for
use by publication commands.

Maybe I need to look harder than I've done for any examples of hazard.

> Also, in your
> patch, you are assuming that the callers of GetPublicationRelations()
> will lock all the relations but what when it gets called from
> AlterPublicationOptions()?

Ah, my bad.  I hadn't noticed that one for some reason.

Now that you mention it, I do find it somewhat concerning (on the
similar grounds as what prompted my previous email) that
AlterPublicationOptions() does away with any locking on the affected
relations.

Anyway, I'll think a bit more about the possible hazards of not doing
the locking and will reply again if there's indeed a problem(s) that
needs to be fixed.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: [BUG] Unexpected action when publishing partition tables

From
Amit Kapila
Date:
On Wed, Oct 13, 2021 at 9:11 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> Hi Amit,
>
> On Fri, Oct 8, 2021 at 12:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Thu, Oct 7, 2021 at 12:39 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > Sorry that I didn't comment on this earlier, but I think either
> > > GetPubPartitionOptionRelations() or InvalidatePublicationRels()
> > > introduced in the commit 4548c76738b should lock the partitions, just
> > > like to the parent partitioned table would be, before invalidating
> > > them.  There may be some hazards to invalidating a relation without
> > > locking it.
> >
> > I see your point but then on the same lines didn't the existing code
> > "for all tables" case (where we call CacheInvalidateRelcacheAll()
> > without locking all relations) have a similar problem.
>
> There might be.  I checked to see how other callers/modules use
> CacheInvalidateRelcacheAll(), though it seems that only the functions
> in publicationcmds.c use it or really was invented in 665d1fad99e for
> use by publication commands.
>
> Maybe I need to look harder than I've done for any examples of hazard.
>
> > Also, in your
> > patch, you are assuming that the callers of GetPublicationRelations()
> > will lock all the relations but what when it gets called from
> > AlterPublicationOptions()?
>
> Ah, my bad.  I hadn't noticed that one for some reason.
>
> Now that you mention it, I do find it somewhat concerning (on the
> similar grounds as what prompted my previous email) that
> AlterPublicationOptions() does away with any locking on the affected
> relations.
>
> Anyway, I'll think a bit more about the possible hazards of not doing
> the locking and will reply again if there's indeed a problem(s) that
> needs to be fixed.
>

I think you can try to reproduce the problem via the debugger. You can
stop before calling GetPubPartitionOptionRelations in
publication_add_relation()  in session-1 and then from another session
(say session-2) try to delete one of the partition table (without
replica identity).  Then stop in session-2 somewhere after acquiring
lock to the corresponding partition relation. Now, continue in
session-1 and invalidate the rels and let it complete the command. I
think session-2 will complete the update without processing the
invalidations.

If the above is true, then, this breaks the following behavior
specified in the documentation: "The tables added to a publication
that publishes UPDATE and/or DELETE operations must have REPLICA
IDENTITY defined. Otherwise, those operations will be disallowed on
those tables.". Also, I think such updates won't be replicated on
subscribers as there is no replica identity or primary key column.

-- 
With Regards,
Amit Kapila.



Re: [BUG] Unexpected action when publishing partition tables

From
Amit Kapila
Date:
On Mon, Oct 18, 2021 at 12:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Oct 13, 2021 at 9:11 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > Anyway, I'll think a bit more about the possible hazards of not doing
> > the locking and will reply again if there's indeed a problem(s) that
> > needs to be fixed.
> >
>
> I think you can try to reproduce the problem via the debugger. You can
> stop before calling GetPubPartitionOptionRelations in
> publication_add_relation()  in session-1 and then from another session
> (say session-2) try to delete one of the partition table (without
> replica identity).  Then stop in session-2 somewhere after acquiring
> lock to the corresponding partition relation. Now, continue in
> session-1 and invalidate the rels and let it complete the command. I
> think session-2 will complete the update without processing the
> invalidations.
>

In the last sentence, it should be delete rather than update.

-- 
With Regards,
Amit Kapila.