Thread: Injection point locking

Injection point locking

From
Heikki Linnakangas
Date:
InjectionPointRun() acquires InjectionPointLock, looks up the hash 
entry, and releases the lock:

>     LWLockAcquire(InjectionPointLock, LW_SHARED);
>     entry_by_name = (InjectionPointEntry *)
>         hash_search(InjectionPointHash, name,
>                     HASH_FIND, &found);
>     LWLockRelease(InjectionPointLock);

Later, it reads fields from the entry it looked up:

>         /* not found in local cache, so load and register */
>         snprintf(path, MAXPGPATH, "%s/%s%s", pkglib_path,
>                  entry_by_name->library, DLSUFFIX);

Isn't that a straightforward race condition, if the injection point is 
detached in between?


Another thing:

I wanted use injection points to inject an error early at backend 
startup, to write a test case for the scenarios that Jacob point out at 
https://www.postgresql.org/message-id/CAOYmi%2Bnwvu21mJ4DYKUa98HdfM_KZJi7B1MhyXtnsyOO-PB6Ww%40mail.gmail.com. 
But I can't do that, because InjectionPointRun() requires a PGPROC 
entry, because it uses an LWLock. That also makes it impossible to use 
injection points in the postmaster. Any chance we could allow injection 
points to be triggered without a PGPROC entry? Could we use a simple 
spinlock instead? With a fast path for the case that no injection points 
are attached or something?

-- 
Heikki Linnakangas
Neon (https://neon.tech)



Re: Injection point locking

From
Tom Lane
Date:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> ... I can't do that, because InjectionPointRun() requires a PGPROC 
> entry, because it uses an LWLock. That also makes it impossible to use 
> injection points in the postmaster. Any chance we could allow injection 
> points to be triggered without a PGPROC entry? Could we use a simple 
> spinlock instead? With a fast path for the case that no injection points 
> are attached or something?

Even taking a spinlock in the postmaster is contrary to project
policy.  Maybe we could look the other way for debug-only code,
but it seems like a pretty horrible precedent.

Given your point that the existing locking is just a fig leaf
anyway, maybe we could simply not have any?  Then it's on the
test author to be sure that the injection point won't be
getting invoked when it's about to be removed.  Or just rip
out the removal capability, and say that once installed an
injection point is there until postmaster shutdown (or till
shared memory reinitialization, anyway).

            regards, tom lane



Re: Injection point locking

From
Michael Paquier
Date:
On Mon, Jun 24, 2024 at 01:29:38PM +0300, Heikki Linnakangas wrote:
> InjectionPointRun() acquires InjectionPointLock, looks up the hash entry,
> and releases the lock:
>
> >     LWLockAcquire(InjectionPointLock, LW_SHARED);
> >     entry_by_name = (InjectionPointEntry *)
> >         hash_search(InjectionPointHash, name,
> >                     HASH_FIND, &found);
> >     LWLockRelease(InjectionPointLock);
>
> Later, it reads fields from the entry it looked up:
>
> >         /* not found in local cache, so load and register */
> >         snprintf(path, MAXPGPATH, "%s/%s%s", pkglib_path,
> >                  entry_by_name->library, DLSUFFIX);
>
> Isn't that a straightforward race condition, if the injection point is
> detached in between?

This is a feature, not a bug :)

Jokes apart, this is a behavior that Noah was looking for so as it is
possible to detach a point to emulate what a debugger would do with a
breakpoint for some of his tests with concurrent DDL bugs, so not
taking a lock while running a point is important.  It's true, though,
that we could always delay the LWLock release once the local cache is
loaded, but would it really matter?
--
Michael

Attachment

Re: Injection point locking

From
Noah Misch
Date:
On Mon, Jun 24, 2024 at 11:03:09AM -0400, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
> > ... I can't do that, because InjectionPointRun() requires a PGPROC 
> > entry, because it uses an LWLock. That also makes it impossible to use 
> > injection points in the postmaster. Any chance we could allow injection 
> > points to be triggered without a PGPROC entry? Could we use a simple 
> > spinlock instead?

That sounds fine to me.  If calling hash_search() with a spinlock feels too
awful, a list to linear-search could work.

> > With a fast path for the case that no injection points 
> > are attached or something?
> 
> Even taking a spinlock in the postmaster is contrary to project
> policy.  Maybe we could look the other way for debug-only code,
> but it seems like a pretty horrible precedent.

If you're actually using an injection point in the postmaster, that would be
the least of concerns.  It is something of a concern for running an injection
point build while not attaching any injection point.  One solution could be a
GUC to control whether the postmaster participates in injection points.
Another could be to make the data structure readable with atomics only.

> Given your point that the existing locking is just a fig leaf
> anyway, maybe we could simply not have any?  Then it's on the
> test author to be sure that the injection point won't be
> getting invoked when it's about to be removed.

That's tricky with injection_points_set_local() plus an injection point at a
frequently-reached location.  It's one thing to control when the
injection_points_set_local() process reaches the injection point.  It's too
hard to control all the other processes that just reach the injection point
and conclude it's not for their PID.

> Or just rip
> out the removal capability, and say that once installed an
> injection point is there until postmaster shutdown (or till
> shared memory reinitialization, anyway).

That could work.  Tests do need a way to soft-disable, but it's okay with me
if nothing can reclaim the resources.



Re: Injection point locking

From
Michael Paquier
Date:
On Mon, Jun 24, 2024 at 11:03:09AM -0400, Tom Lane wrote:
> Given your point that the existing locking is just a fig leaf
> anyway, maybe we could simply not have any?  Then it's on the
> test author to be sure that the injection point won't be
> getting invoked when it's about to be removed.

That would work for me to put the responsibility to the test author,
ripping out the LWLock.  I was wondering when somebody would come up
with a case where they'd want to point to the postmaster to do
something, without really coming down to a case, so there was that
from my side originally.

Looking at all the points currently in the tree, nothing cares about
the concurrent locking when attaching or detaching a point, so perhaps
it is a good thing based on these experiences to just let this LWLock
go.  This should not impact the availability of the tests, either.

> Or just rip
> out the removal capability, and say that once installed an
> injection point is there until postmaster shutdown (or till
> shared memory reinitialization, anyway).

But not that.  Being able to remove points on the fly can be important
in some cases, for example where you'd still want to issue an ERROR
(partial write path is one case) in a SQL test, then remove it in a
follow-up SQL query to not trigger the same ERROR.
--
Michael

Attachment

Re: Injection point locking

From
Noah Misch
Date:
On Tue, Jun 25, 2024 at 11:14:57AM +0900, Michael Paquier wrote:
> On Mon, Jun 24, 2024 at 01:29:38PM +0300, Heikki Linnakangas wrote:
> > InjectionPointRun() acquires InjectionPointLock, looks up the hash entry,
> > and releases the lock:
> > 
> > >     LWLockAcquire(InjectionPointLock, LW_SHARED);
> > >     entry_by_name = (InjectionPointEntry *)
> > >         hash_search(InjectionPointHash, name,
> > >                     HASH_FIND, &found);
> > >     LWLockRelease(InjectionPointLock);
> > 
> > Later, it reads fields from the entry it looked up:
> > 
> > >         /* not found in local cache, so load and register */
> > >         snprintf(path, MAXPGPATH, "%s/%s%s", pkglib_path,
> > >                  entry_by_name->library, DLSUFFIX);
> > 
> > Isn't that a straightforward race condition, if the injection point is
> > detached in between?
> 
> This is a feature, not a bug :)
> 
> Jokes apart, this is a behavior that Noah was looking for so as it is
> possible to detach a point to emulate what a debugger would do with a
> breakpoint for some of his tests with concurrent DDL bugs, so not
> taking a lock while running a point is important.  It's true, though,
> that we could always delay the LWLock release once the local cache is
> loaded, but would it really matter?

I think your last sentence is what Heikki is saying should happen, and I
agree.  Yes, it matters.  As written, InjectionPointRun() could cache an
entry_by_name->function belonging to a different injection point.



Re: Injection point locking

From
Michael Paquier
Date:
On Tue, Jun 25, 2024 at 09:10:06AM -0700, Noah Misch wrote:
> I think your last sentence is what Heikki is saying should happen, and I
> agree.  Yes, it matters.  As written, InjectionPointRun() could cache an
> entry_by_name->function belonging to a different injection point.

That's true, we could delay the release of the lock to happen just
before a callback is run.

Now, how much do people wish to see for the postmaster bits mentioned
upthread?  Taking a spinlock for so long is not going to work, so we
could just remove it and let developers deal with that and feed on the
flexibility with the lock removal to allow this stuff in more areas.
All the existing tests are OK with that, and I think that also the
case of what you have proposed for the concurrency issues with
in-place updates of catalogs.  Or we could live with a no-lock path
when going through that with the postmaster, but that's a bit weird.

Note that with the current callbacks in the module, assuming that a
point is added within BackendStartup() in the postmaster like the
attached, an ERROR is promoted to a FATAL, taking down the cluster.  A
NOTICE of course works find.  Waits with conditional variables are not
really OK.  How much are you looking for here?

The shmem state being initialized in the DSM registry is not something
that's going to work in the context of the postmaster, but we could
tweak the module so as it can be loaded, initializing the shared state
with the shmem hooks and falling back to a DSM registry when the
library is not loaded with shared_preload_libraries.  For example, see
the POC attached, where I've played with injection points in
BackendStartup(), which is the area I'm guessing Heikki was looking
at.
--
Michael

Attachment

Re: Injection point locking

From
Michael Paquier
Date:
On Wed, Jun 26, 2024 at 10:56:12AM +0900, Michael Paquier wrote:
> That's true, we could delay the release of the lock to happen just
> before a callback is run.

I am not sure what else we can do for the postmaster case for now, so
I've moved ahead with the concern regarding the existing locking
release delay when running a point, and pushed a patch for it.
--
Michael

Attachment