Re: Weird test mixup - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Weird test mixup
Date
Msg-id Zjwa4lX6wADRAili@paquier.xyz
Whole thread Raw
In response to Re: Weird test mixup  (Noah Misch <noah@leadboat.com>)
Responses Re: Weird test mixup
List pgsql-hackers
On Tue, May 07, 2024 at 11:53:10AM -0700, Noah Misch wrote:
> On Tue, May 07, 2024 at 10:17:49AM +0900, Michael Paquier wrote:
>> Always resetting
>> condition->name when detaching a point is a simpler flow and saner
>> IMO.
>>
>> Overall, this switches from one detach behavior to a different one,
>
> Can you say more about that?  The only behavior change known to me is that a
> given injection point workload uses more of INJ_MAX_CONDITION.  If there's
> another behavior change, it was likely unintended.

As far as I read the previous patch, the conditions stored in
InjectionPointSharedState would never be really gone, even if the
points are removed from InjectionPointHash.

> The problem I'm trying to tackle in this thread is to make
> src/test/modules/gin installcheck-safe.  $SUBJECT's commit 5105c90 started
> that work, having seen the intarray test suite break when run concurrently
> with the injection_points test suite.  That combination still does break at
> the exit-time race condition.  To reproduce, apply this attachment to add
> sleeps, and run:
>
> make -C src/test/modules/gin installcheck USE_MODULE_DB=1 & sleep 2; make -C contrib/intarray installcheck
USE_MODULE_DB=1

Thanks for that.  I am not really sure how to protect that without a
small cost in flexibility for the cases of detach vs run paths.  This
comes down to the fact that a custom routine could be run while it
could be detached concurrently, removing any stuff a callback could
depend on in the module.

It was mentioned upthread to add to InjectionPointCacheEntry a fixed
area of memory that modules could use to store some "status" data, but
it would not close the run/detach race because a backend could still
hold a pointer to a callback, with concurrent backends playing with
the contents of InjectionPointCacheEntry (concurrent detaches and
attaches that would cause the same entries to be reused).

One way to close entirely the window would be to hold
InjectionPointLock longer in InjectionPointRun() until the callback
finishes or until it triggers an ERROR.  This would mean that the case
you've mentioned in [1] would change, by blocking the detach() of s3
until the callback of s2 finishes.  We don't have tests in the tree
that do any of that, so holding InjectionPointLock longer would not
break anything on HEAD.  A detach being possible while the callback is
run is something I've considered as valid in d86d20f0ba79, but perhaps
that was too cute of me, even more with the use case of local points.

> Separately, I see injection_points_attach() populates InjectionPointCondition
> after InjectionPointAttach().  Shouldn't InjectionPointAttach() come last, to
> avoid the same sort of race?  I've not tried to reproduce that one.

Good point.  You could run into the case of a concurrent backend
running an injection point that should be local if waiting between
InjectionPointAttach() and the condition getting registered in
injection_points_attach().  That should be reversed.

[1] https://www.postgresql.org/message-id/20240501231214.40@rfd.leadboat.com
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Expand applicability of aggregate's sortop optimization
Next
From: Kirk Wolak
Date:
Subject: Re: Idea Feedback: psql \h misses -> Offers Links?