Re: Weird test mixup - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Weird test mixup
Date
Msg-id 20240509031553.47@rfd.leadboat.com
Whole thread Raw
In response to Re: Weird test mixup  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Weird test mixup
List pgsql-hackers
On Thu, May 09, 2024 at 09:37:54AM +0900, Michael Paquier wrote:
> On Tue, May 07, 2024 at 11:53:10AM -0700, Noah Misch wrote:
> > The problem I'm trying to tackle in this thread is to make
> > src/test/modules/gin installcheck-safe.  $SUBJECT's commit 5105c90 started
> > that work, having seen the intarray test suite break when run concurrently
> > with the injection_points test suite.  That combination still does break at
> > the exit-time race condition.  To reproduce, apply this attachment to add
> > sleeps, and run:
> > 
> > make -C src/test/modules/gin installcheck USE_MODULE_DB=1 & sleep 2; make -C contrib/intarray installcheck
USE_MODULE_DB=1
> 
> Thanks for that.  I am not really sure how to protect that without a
> small cost in flexibility for the cases of detach vs run paths.  This
> comes down to the fact that a custom routine could be run while it
> could be detached concurrently, removing any stuff a callback could
> depend on in the module.
> 
> It was mentioned upthread to add to InjectionPointCacheEntry a fixed
> area of memory that modules could use to store some "status" data, but
> it would not close the run/detach race because a backend could still
> hold a pointer to a callback, with concurrent backends playing with
> the contents of InjectionPointCacheEntry (concurrent detaches and
> attaches that would cause the same entries to be reused).

> One way to close entirely the window would be to hold
> InjectionPointLock longer in InjectionPointRun() until the callback
> finishes or until it triggers an ERROR.  This would mean that the case
> you've mentioned in [1] would change, by blocking the detach() of s3
> until the callback of s2 finishes.

> [1] https://www.postgresql.org/message-id/20240501231214.40@rfd.leadboat.com

Yes, that would be a loss for test readability.  Also, I wouldn't be surprised
if some test will want to attach POINT-B while POINT-A is in injection_wait().
Various options avoiding those limitations:

1. The data area formerly called a "status" area is immutable after attach.
   The core code copies it into a stack buffer to pass in a const callback
   argument.

2. InjectionPointAttach() returns an attachment serial number, and the
   callback receives that as an argument.  injection_points.c would maintain
   an InjectionPointCondition for every still-attached serial number,
   including global attachments.  Finding no condition matching the serial
   number argument means detach already happened and callback should do
   nothing.

3. Move the PID check into core code.

4. Separate the concept of "make ineligible to fire" from "detach", with
   stronger locking for detach.  v1 pointed in this direction, though not
   using that terminology.

5. Injection point has multiple callbacks.  At least one runs with the lock
   held and can mutate the "status" data.  At least one runs without the lock.

(1) is, I think, simple and sufficient.  How about that?



pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.
Next
From: David Rowley
Date:
Subject: Re: Support tid range scan in parallel?