On Wed, May 19, 2021 at 12:04 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, May 18, 2021 at 07:42:08PM -0400, Tom Lane wrote:
> > I count three distinct bugs that were exposed by this attempt:
> >
> > 1. In the part of 013_partition.pl that tests firing AFTER
> > triggers on partitioned tables, we have a case of continuing
> > to access a relcache entry that's already been closed.
> > (I'm not quite sure why prion's -DRELCACHE_FORCE_RELEASE
> > hasn't exposed this.) It looks to me like instead we had
> > a relcache reference leak before f3b141c48, but now, the
> > only relcache reference count on a partition child table
> > is dropped by ExecCleanupTupleRouting, which logical/worker.c
> > invokes before it fires triggers on that table. Kaboom.
Oops.
> > This might go away if worker.c weren't so creatively different
> > from the other code paths concerned with executor shutdown.
>
> The tuple routing has made the whole worker logic messier by a larger
> degree compared to when this stuff was only able to apply DMLs changes
> on the partition leaves. I know that it is not that great to be more
> creative here, but we need to make sure that AfterTriggerEndQuery() is
> moved before ExecCleanupTupleRouting(). We could either keep the
> ExecCleanupTupleRouting() calls as they are now, and call
> AfterTriggerEndQuery() in more code paths.
Yeah, that's what I thought to propose doing too. Your patch looks
enough in that regard. Thanks for writing it.
> Or we could have one
> PartitionTupleRouting and one ModifyTableState created beforehand
> in create_estate_for_relation() if applying the change on a
> partitioned table but that means manipulating more structures across
> more layers of this code.
Yeah, that seems like too much change to me too.
> Something like the attached fixes the
> problem for me, but honestly it does not help in clarifying this code
> more. I was not patient enough to wait for CLOBBER_CACHE_ALWAYS to
> initialize the nodes in the TAP tests, so I have tested that with a
> setup initialized with a non-clobber build, and reproduced the problem
> with CLOBBER_CACHE_ALWAYS builds on those same nodes.
I have checked the fix works with a CLOBBER_CACHE_ALWAYS build.
> You are right that this is not a problem of 14~. I can reproduce the
> problem on 13 as well, and we have no coverage for tuple routing with
> triggers on this branch, so this would never have been stressed in the
> buildfarm. There is a good argument to be made here in cherry-picking
> 2ecfeda3 to REL_13_STABLE.
+1
--
Amit Langote
EDB: http://www.enterprisedb.com