Re: Subscription tests fail under CLOBBER_CACHE_ALWAYS - Mailing list pgsql-hackers

From Amit Langote
Subject Re: Subscription tests fail under CLOBBER_CACHE_ALWAYS
Date
Msg-id CA+HiwqH+WShxk3Aeh64yYra0oSM1G8ir3B1efG-y0p-v=54Kog@mail.gmail.com
Whole thread Raw
In response to Re: Subscription tests fail under CLOBBER_CACHE_ALWAYS  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Wed, May 19, 2021 at 12:04 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, May 18, 2021 at 07:42:08PM -0400, Tom Lane wrote:
> > I count three distinct bugs that were exposed by this attempt:
> >
> > 1. In the part of 013_partition.pl that tests firing AFTER
> > triggers on partitioned tables, we have a case of continuing
> > to access a relcache entry that's already been closed.
> > (I'm not quite sure why prion's -DRELCACHE_FORCE_RELEASE
> > hasn't exposed this.)  It looks to me like instead we had
> > a relcache reference leak before f3b141c48, but now, the
> > only relcache reference count on a partition child table
> > is dropped by ExecCleanupTupleRouting, which logical/worker.c
> > invokes before it fires triggers on that table.  Kaboom.

Oops.

> > This might go away if worker.c weren't so creatively different
> > from the other code paths concerned with executor shutdown.
>
> The tuple routing has made the whole worker logic messier by a larger
> degree compared to when this stuff was only able to apply DMLs changes
> on the partition leaves.  I know that it is not that great to be more
> creative here, but we need to make sure that AfterTriggerEndQuery() is
> moved before ExecCleanupTupleRouting().  We could either keep the
> ExecCleanupTupleRouting() calls as they are now, and call
> AfterTriggerEndQuery() in more code paths.

Yeah, that's what I thought to propose doing too.  Your patch looks
enough in that regard.  Thanks for writing it.

> Or we could have one
> PartitionTupleRouting and one ModifyTableState created beforehand
> in create_estate_for_relation() if applying the change on a
> partitioned table but that means manipulating more structures across
> more layers of this code.

Yeah, that seems like too much change to me too.

>  Something like the attached fixes the
> problem for me, but honestly it does not help in clarifying this code
> more.  I was not patient enough to wait for CLOBBER_CACHE_ALWAYS to
> initialize the nodes in the TAP tests, so I have tested that with a
> setup initialized with a non-clobber build, and reproduced the problem
> with CLOBBER_CACHE_ALWAYS builds on those same nodes.

I have checked the fix works with a CLOBBER_CACHE_ALWAYS build.

> You are right that this is not a problem of 14~.  I can reproduce the
> problem on 13 as well, and we have no coverage for tuple routing with
> triggers on this branch, so this would never have been stressed in the
> buildfarm.  There is a good argument to be made here in cherry-picking
> 2ecfeda3 to REL_13_STABLE.

+1

-- 
Amit Langote
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Multiple pg_waldump --rmgr options
Next
From: Tom Lane
Date:
Subject: Re: Subscription tests fail under CLOBBER_CACHE_ALWAYS