On Wed, Oct 13, 2021 at 9:11 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> Hi Amit,
>
> On Fri, Oct 8, 2021 at 12:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Thu, Oct 7, 2021 at 12:39 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > Sorry that I didn't comment on this earlier, but I think either
> > > GetPubPartitionOptionRelations() or InvalidatePublicationRels()
> > > introduced in the commit 4548c76738b should lock the partitions, just
> > > like to the parent partitioned table would be, before invalidating
> > > them. There may be some hazards to invalidating a relation without
> > > locking it.
> >
> > I see your point but then on the same lines didn't the existing code
> > "for all tables" case (where we call CacheInvalidateRelcacheAll()
> > without locking all relations) have a similar problem.
>
> There might be. I checked to see how other callers/modules use
> CacheInvalidateRelcacheAll(), though it seems that only the functions
> in publicationcmds.c use it or really was invented in 665d1fad99e for
> use by publication commands.
>
> Maybe I need to look harder than I've done for any examples of hazard.
>
> > Also, in your
> > patch, you are assuming that the callers of GetPublicationRelations()
> > will lock all the relations but what when it gets called from
> > AlterPublicationOptions()?
>
> Ah, my bad. I hadn't noticed that one for some reason.
>
> Now that you mention it, I do find it somewhat concerning (on the
> similar grounds as what prompted my previous email) that
> AlterPublicationOptions() does away with any locking on the affected
> relations.
>
> Anyway, I'll think a bit more about the possible hazards of not doing
> the locking and will reply again if there's indeed a problem(s) that
> needs to be fixed.
>
I think you can try to reproduce the problem via the debugger. You can
stop before calling GetPubPartitionOptionRelations in
publication_add_relation() in session-1 and then from another session
(say session-2) try to delete one of the partition table (without
replica identity). Then stop in session-2 somewhere after acquiring
lock to the corresponding partition relation. Now, continue in
session-1 and invalidate the rels and let it complete the command. I
think session-2 will complete the update without processing the
invalidations.
If the above is true, then, this breaks the following behavior
specified in the documentation: "The tables added to a publication
that publishes UPDATE and/or DELETE operations must have REPLICA
IDENTITY defined. Otherwise, those operations will be disallowed on
those tables.". Also, I think such updates won't be replicated on
subscribers as there is no replica identity or primary key column.
--
With Regards,
Amit Kapila.