Re: Adding facility for injection points (or probe points?) for more advanced tests - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Adding facility for injection points (or probe points?) for more advanced tests
Date
Msg-id CAExHW5skzA6UPEYALz4GkCZdJZBt94uWahp5fExu09S90e1iHQ@mail.gmail.com
Whole thread Raw
In response to Re: Adding facility for injection points (or probe points?) for more advanced tests  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Adding facility for injection points (or probe points?) for more advanced tests
List pgsql-hackers
On Tue, Nov 21, 2023 at 6:56 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> >> This facility is hidden behind a specific configure/Meson switch,
> >> making it a no-op by default:
> >> --enable-injection-points
> >> -Dinjection_points={ true | false }
> >
> > That's useful, but we will also see demand to enable those in
> > production (under controlled circumstances). But let the functionality
> > mature under a separate flag and developer builds before used in
> > production.
>
> Hmm.  Okay.  I'd still keep that under a compile switch for now
> anyway to keep a cleaner separation and avoid issues where it would be
> possible to inject code in a production build.  Note that I was
> planning to switch one of my buildfarm animals to use it should this
> stuff make it into the tree.  And people would be free to use it if
> they want.  If in production, that would be a risk, IMO.

makes sense. Just to be clear - by "in production" I mean user
installations - they may be testing environments or production
environments.

>
> > +/*
> > + * Local cache of injection callbacks already loaded, stored in
> > + * TopMemoryContext.
> > + */
> > +typedef struct InjectionPointArrayEntry
> > +{
> > + char name[INJ_NAME_MAXLEN];
> > + InjectionPointCallback callback;
> > +} InjectionPointArrayEntry;
> > +
> > +static InjectionPointArrayEntry *InjectionPointCacheArray = NULL;
> >
> > Initial size of this array is small, but given the size of code in a
> > given path to be tested, I fear that this may not be sufficient. I
> > feel it's better to use hash table whose APIs are already available.
>
> I've never seen in recent years a need for a given test to use more
> than 4~5 points.  But I guess that you've seen more than that wanted
> in a prod environment with a fork of Postgres?

A given case may not require more than 4 -5 points but users may
create scripts with many frequently required injection points and
install handlers for them.

>
> I'm OK to switch that to use a hash table in the initial
> implementation, even for a low number of entries with points that are
> not in hot code paths that should be OK.  At least for my cases
> related to testing that's OK.  Am I right to assume that you mean a
> HTAB in TopMemoryContext?

Yes.

>
> > I am glad that it covers the frequently needed injections error, notice and
> > wait. If someone adds multiple injection points for wait and all of them are
> > triggered (in different backends), test_injection_points_wake() would
> > wake them all. When debugging cascaded functionality it's not easy to
> > follow sequence add, trigger, wake for multiple injections. We should
> > associate a conditional variable with the required injection points. Attach
> > should create conditional variable in the shared memory for that injection
> > point and detach should remove it. I see this being mentioned in the commit
> > message, but I think it's something we need in the first version of "wait" to
> > be useful.
>
> More to the point, I actually disagree with that, because it could be
> possible as well that the same condition variable is used across
> multiple points.  At the end, IMHO, the central hash table should hold
> zero meta-data associated to an injection point like a counter, an
> elog, a condition variable, a sleep time, etc. or any combination of
> all these ones, and should just know about how to load a callback with
> a library path and a routine name.  I understand that this is leaving
> the responsibility of what a callback should do down to the individual
> developer implementing a callback into their own extension, where they
> should be free to have conditions of their own.
>
> Something that I agree would be very useful for the in-core APIs,
> depending on the cases, is to be able to push some information to the
> callback at runtime to let a callback decide what to do depending on a
> running state, including a condition registered when a point was
> attached.  See my argument about an _ARG macro that passes down to the
> callback a (void *).

The injection_run function is called from the place where the
injection point is declared but that place does not know what
injection function is going to be run. So a user can not pass
arguments to injection declaration. What injection to run is decided
by the injection_attach and thus one can pass arguments to it but then
injection_attach stores the information in the shared memory from
where it's picked up by injection_run. So even though you don't want
to store the arguments in the shared memory, you are creating a design
which takes us towards that direction eventually - otherwise users
will end up writing many injection functions - one for each possible
combination of count, sleep, conditional variable etc. But let's see
whether that happens to be the case in practice. We will need to
evolve this feature based on usage.

>
> > Those tests need to also install extension. That's another pain point.
> > So anyone wants to run the tests needs to compile the extension too. I
> > am wondering whether we should have this functionality in the core
> > itself somewhere and will be only useful when built with injection.
>
> That's something that may be discussed on top of the backend APIs,
> though this comes down to how and what kind of meta-data should be
> associated to the central shmem hash table.  Keeping the shmem hash as
> small as possible to keep minimal the traces of this code in core is a
> design choice that I'd rather not change.

shmem hash size won't depend upon the number of functions we write in
the core. Yes it will add to the core code and may add maintenance
burden. So I understand your inclination to keep the core minimal.

>
> > Many a times it's only a single backend which needs to be subjected to
> > an injection. For inducing ERROR and NOTICE, many a times it's also
> > the same backend attached the client session.
>
> Yep.  I've used that across multiple sessions.  For the basic
> facility, I think that's the absolute minimum.
>
> > For WAIT, however we
> > need a way to inject from some other session.
>
> You can do that already with the patch, no?  If you know that a
> different session would cross a given path, you could set a macro in
> it.  If you wish for this session to wait before that, it is possible
> to use a second point to make it do so.  I've used such techniques as
> well for more complex reproducible failures than what I've posted in
> the patch series.  In the last months, I've topped a TAP test to rely
> on 5 deterministic points, I think.  Or perhaps 6.  That was a fun
> exercise, for a TAP test coded while self-complaining about the core
> backend code that does not make this stuff easier.
>
> > We might be able to use
> > current signalling mechanism for that (wake sends SIGUSR1 with
> > reason). Leaving aside WAIT for a moment when the same backend's
> > behaviour is being controlled, do we really need shared memory and
> > also affect all the running backends. I see some discussion about
> > being able to trigger only for a given PID, but when that PID of the
> > current backend itself shared memory is not required.
>
> I am not convinced that there is any need for signalling in most
> cases, as long as you know beforehand the PID of the session you'd
> like to stop, because this would still require a second session to
> register a condition based on the PID known.  Another possibility that
> I can think of is to use a custom wait event with a second point to
> setup a different condition.  At the end, my point is that it is
> possible to control everything in some extension code that holds the
> callbacks, with an extra shmem area in the extension that associates
> some meta-data with a point name, for instance.

If the session which attaches to an injection point is same as the
session where the injection point is triggered (most of the ERROR and
NOTICE injections will see this pattern), we don't need shared memory.
There's a performance penalty to it since injection_run will look up
shared memory. For WAIT we may or may not need shared memory. But
let's see what other think and what usage patterns we see.


--
Best Wishes,
Ashutosh Bapat



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCH] Add CHECK_FOR_INTERRUPTS in scram_SaltedPassword loop.
Next
From: Tom Lane
Date:
Subject: Re: initdb --no-locale=C doesn't work as specified when the environment is not C