Thread: Adding facility for injection points (or probe points?) for more advanced tests
Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
Hi all, I don't remember how many times in the last few years when I've had to hack the backend to produce a test case that involves a weird race condition across multiple processes running in the backend, to be able to prove a point or just test a fix (one recent case: 2b8e5273e949). Usually, I come to hardcoding stuff for the following situations: - Trigger a PANIC, to force recovery. - A FATAL, to take down a session, or just an ERROR. - palloc() failure injection. - Sleep to slow down a code path. - Pause and release with condition variable. And, while that's helpful to prove a point on a thread, nothing comes out of it in terms of regression test coverage in the tree because these tests are usually too slow and expensive, as they usually rely on hardcoded timeouts. So that's pretty much attempting to emulate what one would do with a debugger in a predictable way, without the manual steps because human hands don't scale well. The reason behind that is of course more advanced testing, to be able to expand coverage when we have weird and non-deterministic race issues to deal with, and the code becoming more complex every year makes that even harder. Fault and failure injection in specific paths comes into mind, additionally, particularly if you manage complex projects based on Postgres. So, please find attached a patch set that introduces an in-core facility to be able to set what I'm calling here an "injection point", that consists of being able to register in shared memory a callback that can be run within a defined location of the code. It means that it is not possible to trigger a callback before shared memory is set, but I've faced far more the case where I wanted to trigger something after shmem is set anyway. Persisting an injection point across restarts is also possible by adding some through an extension's shmem hook, as we do for custom LWLocks for example, as long as a library is loaded. This will remind a bit of what Alexander Korotkov has proposed here: https://www.postgresql.org/message-id/CAPpHfdtSEOHX8dSk9Qp%2BZ%2B%2Bi4BGQoffKip6JDWngEA%2Bg7Z-XmQ%40mail.gmail.com Also, this is much closee to what Craig Ringer is mentioning here, where it is named probe points, but I am using a minimal design that allows to achieve the same: https://www.postgresql.org/message-id/CAPpHfdsn-hzneYNbX4qcY5rnwr-BA1ogOCZ4TQCKQAw9qa48kA%40mail.gmail.com A difference is that I don't really see a point in passing to the callback triggered an area of data coming from the hash table itself, as at the end a callback could just refer to an area in shared memory or a static set of variables depending on what it wants, with one or more injection points (say a location to set a state, and a second to check it). So, at the end, the problem comes down in my opinion to two things: - Possibility to trigger a condition defined by some custom code, in the backend (core code or even out-of-core). - Possibility to define a location in the code where a named point would be checked. 0001 introduces three APIs to create, run, and drop injection points: +extern void InjectionPointCreate(const char *name, + InjectionPointCallback callback); +extern void InjectionPointRun(const char *name); +extern void InjectionPointDrop(const char *name); Then one just needs to add a macro like that to trigger the callback registered in the code to test: INJECTION_POINT_RUN("String"); So the footprint in the core tree is not zero, but it is as minimal as it can be. I have added some documentation to explain that, as well. I am not wedded to the name proposed in the patch, so if you feel there is better, feel free to propose ideas. This facility is hidden behind a specific configure/Meson switch, making it a no-op by default: --enable-injection-points -Dinjection_points={ true | false } 0002 is a test module to test these routines, that I have kept a maximum simple to ease review of the basics proposed here. This could be extended further to propose more default modes with TAP tests on its own, as I don't see a real point in having the SQL bits or some common callbacks (like for the PANIC or the FATAL cases) in core. Thoughts and comments are welcome. -- Michael
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Amul Sul
Date:
On Wed, Oct 25, 2023 at 9:43 AM Michael Paquier <michael@paquier.xyz> wrote:
Hi all,
I don't remember how many times in the last few years when I've had to
hack the backend to produce a test case that involves a weird race
condition across multiple processes running in the backend, to be able
to prove a point or just test a fix (one recent case: 2b8e5273e949).
Usually, I come to hardcoding stuff for the following situations:
- Trigger a PANIC, to force recovery.
- A FATAL, to take down a session, or just an ERROR.
- palloc() failure injection.
- Sleep to slow down a code path.
- Pause and release with condition variable.
+1 for the feature.
TWIMW, here[1] is an interesting talk from pgconf.in 2020 on the similar topic.
Regards,
Amul Sul
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
On Wed, Oct 25, 2023 at 10:06:17AM +0530, Amul Sul wrote: > +1 for the feature. > > TWIMW, here[1] is an interesting talk from pgconf.in 2020 on the similar > topic. > > 1] https://pgconf.in/conferences/pgconfin2020/program/proposals/101 Right, this uses a shared hash table. There is a patch from 2019 that summarizes this presentation as well: https://www.postgresql.org/message-id/CANXE4TdxdESX1jKw48xet-5GvBFVSq%3D4cgNeioTQff372KO45A%40mail.gmail.com A different idea is that this patch could leverage a bgworker instead of having a footprint in the postmaster. FWIW, I think that my patch is more flexible than the modes added by faultinjector.h (see 0001), because the actions that can be taken should not be limited by the core code: the point registered could just use what it wants as callback, so an extension could register a custom thing as well. -- Michael
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Nazir Bilal Yavuz
Date:
Hi, On Wed, 25 Oct 2023 at 07:13, Michael Paquier <michael@paquier.xyz> wrote: > > Hi all, > > I don't remember how many times in the last few years when I've had to > hack the backend to produce a test case that involves a weird race > condition across multiple processes running in the backend, to be able > to prove a point or just test a fix (one recent case: 2b8e5273e949). > Usually, I come to hardcoding stuff for the following situations: > - Trigger a PANIC, to force recovery. > - A FATAL, to take down a session, or just an ERROR. > - palloc() failure injection. > - Sleep to slow down a code path. > - Pause and release with condition variable. I liked the idea; thanks for working on this! What do you think about creating a function for updating the already created injection point's callback or name (mostly callback)? For now, you need to drop and recreate the injection point to change the callback or the name. Here is my code correctness review: diff --git a/meson_options.txt b/meson_options.txt +option('injection_points', type: 'boolean', value: true, + description: 'Enable injection points') + It is enabled by default while building with meson. diff --git a/src/backend/utils/misc/injection_point.c b/src/backend/utils/misc/injection_point.c + LWLockRelease(InjectionPointLock); + + /* If not found, do nothing? */ + if (!found) + return; It would be good to log a warning message here. I tried to compile that with -Dwerror=true -Dinjection_points=false and got some errors (warnings): injection_point.c: In function ‘InjectionPointShmemSize’: injection_point.c:59:1: error: control reaches end of non-void function [-Werror=return-type] injection_point.c: At top level: injection_point.c:32:14: error: ‘InjectionPointHashByName’ defined but not used [-Werror=unused-variable] test_injection_points.c: In function ‘test_injection_points_run’: test_injection_points.c:69:21: error: unused variable ‘name’ [-Werror=unused-variable] The test_injection_points test runs and passes although I set -Dinjection_points=false. That could be misleading, IMO the test should be skipped if Postgres is not compiled with the injection points. Regards, Nazir Bilal Yavuz Microsoft
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
On Mon, Nov 06, 2023 at 10:28:14PM +0300, Nazir Bilal Yavuz wrote: > I liked the idea; thanks for working on this! Thanks for the review. > What do you think about creating a function for updating the already > created injection point's callback or name (mostly callback)? For now, > you need to drop and recreate the injection point to change the > callback or the name. I am not sure if that's worth the addition. TBH, all the code I've seen that would benefit from these APIs just set up a cluster, register a few injection points with a module, and then run a set of tests. They can also remove points. So I'm just aiming for simplest for the moment. > Here is my code correctness review: > > diff --git a/meson_options.txt b/meson_options.txt > +option('injection_points', type: 'boolean', value: true, > + description: 'Enable injection points') > + > > It is enabled by default while building with meson. Indeed, fixed. > diff --git a/src/backend/utils/misc/injection_point.c > b/src/backend/utils/misc/injection_point.c > + LWLockRelease(InjectionPointLock); > + > + /* If not found, do nothing? */ > + if (!found) > + return; > > It would be good to log a warning message here. I don't think that's a good idea. If a code path defines a INJECTION_POINT_RUN() we'd get spurious warnings except if a point is always defined when the build switch is enabled. > I tried to compile that with -Dwerror=true -Dinjection_points=false > and got some errors (warnings): Right, fixed these three. > The test_injection_points test runs and passes although I set > -Dinjection_points=false. That could be misleading, IMO the test > should be skipped if Postgres is not compiled with the injection > points. The test suite has been using an alternate output, but perhaps you are right that this has little value without the switch enabled anyway. I've made the processing optional when the option is not used for meson and ./configure (requires a variable in Makefile.global.in in the latter case), removing the alternate output. Please find v2. -- Michael
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Nathan Bossart
Date:
On Tue, Nov 07, 2023 at 05:01:16PM +0900, Michael Paquier wrote: > On Mon, Nov 06, 2023 at 10:28:14PM +0300, Nazir Bilal Yavuz wrote: >> I liked the idea; thanks for working on this! +1, this seems very useful. > +#ifdef USE_INJECTION_POINTS > +#define INJECTION_POINT_RUN(name) InjectionPointRun(name) > +#else > +#define INJECTION_POINT_RUN(name) ((void) name) > +#endif nitpick: Why is the non-injection-point version "(void) name"? I see "(void) true" used elsewhere for this purpose. > + <para> > + Here is an example of callback for > + <literal>InjectionPointCallback</literal>: > +<programlisting> > +static void > +custom_injection_callback(const char *name) > +{ > + elog(NOTICE, "%s: executed custom callback", name); > +} > +</programlisting> Why do we provide the name to the callback functions? Overall, the design looks pretty good to me. I think it's a good idea to keep it simple to start with. Since this is really only intended for special tests that run in special builds, it seems like we ought to be able to change it easily in the future as needed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Andres Freund
Date:
Hi, On 2023-10-25 13:13:38 +0900, Michael Paquier wrote: > So, please find attached a patch set that introduces an in-core > facility to be able to set what I'm calling here an "injection point", > that consists of being able to register in shared memory a callback > that can be run within a defined location of the code. It means that > it is not possible to trigger a callback before shared memory is set, > but I've faced far more the case where I wanted to trigger something > after shmem is set anyway. Persisting an injection point across > restarts is also possible by adding some through an extension's shmem > hook, as we do for custom LWLocks for example, as long as a library is > loaded. I would like to see a few example tests using this facility - without that it's a bit hard to judge how the impact on core code would be and how easy tests are to write. It also seems like there's a few bits and pieces missing to actually be able to write interesting tests. It's one thing to be able to inject code, but what you commonly want to do for tests is to actually wait for such a spot in the code to be reached, then perhaps wait inside the "modified" code, and do something else in the test script. But as-is a decent amount of C code would need to be written to write such a test, from what I can tell? Greetings, Andres Freund
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
On Fri, Nov 10, 2023 at 02:44:25PM -0600, Nathan Bossart wrote: > On Tue, Nov 07, 2023 at 05:01:16PM +0900, Michael Paquier wrote: >> +#ifdef USE_INJECTION_POINTS >> +#define INJECTION_POINT_RUN(name) InjectionPointRun(name) >> +#else >> +#define INJECTION_POINT_RUN(name) ((void) name) >> +#endif > > nitpick: Why is the non-injection-point version "(void) name"? I see > "(void) true" used elsewhere for this purpose. Or (void) 0. >> + <para> >> + Here is an example of callback for >> + <literal>InjectionPointCallback</literal>: >> +<programlisting> >> +static void >> +custom_injection_callback(const char *name) >> +{ >> + elog(NOTICE, "%s: executed custom callback", name); >> +} >> +</programlisting> > > Why do we provide the name to the callback functions? This is for the use of the same callback across multiple points, and tracking the name of the event happening was making sense to me to know which code path is being taken when a callback is called. One thing that I got in mind as well here is to be able to register custom wait events based on the name of the callback taken, for example on a condition variable, a latch or a named LWLock. > Overall, the design looks pretty good to me. I think it's a good idea to > keep it simple to start with. Since this is really only intended for > special tests that run in special builds, it seems like we ought to be able > to change it easily in the future as needed. Yes, my first idea is to keep the initial design minimal and take the temperature. As far as I can see, there seem to not be any strong objection with this basic design, still I agree that I need to show a bit more code about its usability. I have some SQL and recovery cases where this is handy and these have piled over time, including at least two/three of them with more basic APIs in the test module may make sense in the initial batch of what I am proposing here. -- Michael
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
On Fri, Nov 10, 2023 at 06:32:27PM -0800, Andres Freund wrote: > I would like to see a few example tests using this facility - without that > it's a bit hard to judge how the impact on core code would be and how easy > tests are to write. Sure. I was wondering if people would be interested in that first. > It also seems like there's a few bits and pieces missing to actually be able > to write interesting tests. It's one thing to be able to inject code, but what > you commonly want to do for tests is to actually wait for such a spot in the > code to be reached, then perhaps wait inside the "modified" code, and do > something else in the test script. But as-is a decent amount of C code would > need to be written to write such a test, from what I can tell? Depends on what you'd want to achieve. As I mentioned at the top of the thread, error, fatal, panics, hardcoded waits are the most common cases I've seen in the last years. Conditional waits are not in the main patch but these are simple to support done (I mean, as in the 0003 attached with a TAP example). While on it, I have extended the patch in the hash table a library name and a function name so as the callback is loaded each time an injection point is run. (Perhaps the list of callbacks already loaded in a process should be saved in a session-level static list/array to avoid loading the same callbacks again, not sure if that's worth doing for a test facility assuming that the number of times a callback is called in a single session is usually very limited. Anyway, that would be simple to add if people prefer this addition.) Anyway, here is a short list of commits that could have taken benefit from this facility. There are is much more, but that's a list I grabbed quickly from my notes: 1) 8a4237908c0f 2) cb0cca188072 3) 7863ee4def65 (See https://postgr.es/m/YnT/Y2sEYj7pyOdc@paquier.xyz where an expensive TAP test was included, and I've seen users facing this bug in real life). Revert of the original is clean here as well. The trick is simple: stop a restartpoint during a promotion, and let the restartpoint finish after the promotion. 4) 409f9ca44713, where injecting an error would stress the consistency of the data reset (mentioned an error injected at https://postgr.es/m/YWZk6nmAzQZS4B/z@paquier.xyz). This reverts cleanly even today. 5) b4721f39505b, quite similar (mentioned an error injection exactly here: https://postgr.es/m/20181011033810.GB23570@paquier.xyz). This one requires an error when a transaction is started, something can be achieved if the error is triggered conditionally (note that hard failure would prevent the transaction to begin with the initial snapshot taken in InitPostgres, but the module could just use a static variable to track that). Among these, I have implemented two examples on top of the main patch set in 0002 and 0003: 4) as a TAP test with replication commands and an error injection, and 3) that relies on a custom wait event and a conditional variable to make the test posted on the other thread cheaper, with an injection point waiting for a condition variable in the middle of a restartpoint in the checkpointer. I don't mean to necessarily include all that in the upstream tree, these are just here for reference first. 3) is the most interesting in this set, for sure. That was a nasty problem, and some cheap coverage in the core tree could be really good for it, so I'd like to propose for commit after more polishing. The test of the bug 3) I am referring to takes originally 30~45s to run and it was unstable as it could timeout. With an injection point it takes 1~2s. Note that test_injection_points gains a wait/wake logic to be able to use condition variables to wait on the restartpoint of a promoted standby). Both tests are not shaped for prime day yet, but that's enough for a set of examples IMHO to show what can be done. Does it answer your questions? -- Michael
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Alvaro Herrera
Date:
Hello, Good stuff here, I also have a bunch of bugfix commits that ended up not having a test because of the need for a debugger or other interaction, so let's move forward. I think the docs (and the macro/function naming) describe things backwards. In my mind, it is INJECTION_POINT_RUN() that creates the injection point; then InjectionPointCreate() attaches something to it. So I would rename the macro to just INJECTION_POINT() and the function to InjectionPointAttach(). This way you're saying "attach function FN from library L to the injection point P"; where P is an entity that is being created by the INJECTION_POINT() call in the code. You named the hash table InjectionPointHashByName, which seems weird. Is there any *other* way to locate an injection point that is not by name? In this patch, injection points are instance-wide (because the hash table is in shmem). As soon as you install a callback to one point, that callback will be fired in every session. Maybe for some tests this is OK (and in particular your TAP tests have them attached in one ->safe_psql call and then they hit a completely different session, which wouldn't work if the attachments were process-local), but maybe one would want them limited to some specific process. Maybe give an optional PID so that if any other process hits that injection point, nothing happens? -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
On Tue, Nov 14, 2023 at 02:11:50PM +0100, Alvaro Herrera wrote: > Good stuff here, I also have a bunch of bugfix commits that ended up not > having a test because of the need for a debugger or other interaction, > so let's move forward. > > I think the docs (and the macro/function naming) describe things > backwards. In my mind, it is INJECTION_POINT_RUN() that creates the > injection point; then InjectionPointCreate() attaches something to it. > So I would rename the macro to just INJECTION_POINT() and the function > to InjectionPointAttach(). This way you're saying "attach function FN > from library L to the injection point P"; where P is an entity that is > being created by the INJECTION_POINT() call in the code. Okay. I am not strongly attached to the terms used by the patch. The first WIP I wrote used completely different terms. > You named the hash table InjectionPointHashByName, which seems weird. > Is there any *other* way to locate an injection point that is not by > name? I am not sure what you mean here. Names are kind of the most portable and simplest thing I could think of. Is there something else you have in mind that would allow a mapping between a code path and what should be run? Perhaps that's useful in some cases, but you were also thinking about an in-core API where it is possible to retrieve a list of callbacks based on a library name and/or a function name? I didn't see a use for it, but why not. > In this patch, injection points are instance-wide (because the hash > table is in shmem). As soon as you install a callback to one point, > that callback will be fired in every session. Maybe for some tests this > is OK (and in particular your TAP tests have them attached in one > ->safe_psql call and then they hit a completely different session, which > wouldn't work if the attachments were process-local), but maybe one > would want them limited to some specific process. Maybe give an > optional PID so that if any other process hits that injection point, > nothing happens? Yes, still not something that's required in the core APIs or an initial batch. This is something I've seen used and a central place where the callbacks are registered allows that because the callback is triggered based on a global state like a MyProcPid or a getpid(), so it is possible to pass a condition to a callback when it is created (or attached per your wording), with the condition maintained in a shmem area that can be part of an extension module that defines the callbacks (in test_injection_points). One trick sometimes is to know the PID beforehand, which may need a second wait point (for example) to make a test deterministic so as a test script has the time to get the PID of a running session (bgworkers included) before the process has time to do anything critical for the scenario tested. An extra thing is that this design can be extended so as it could be possible to pass down to the callback execution a private pointer of data, though that's bound to the code path running the injection point (not in the initial patch). Then it's up to the callback to decide if it needs to do something or not (say, I don't want to run this callback except if I am manipulating page N in an access method, etc.). The conditional complexity is pushed to the injection callbacks, not the core routines in charge is finding a callback or attaching/creating one. I am not sure that it is a good idea to enforce a specific conditional logic in the backend core code. -- Michael
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Alvaro Herrera
Date:
On 2023-Nov-15, Michael Paquier wrote: > On Tue, Nov 14, 2023 at 02:11:50PM +0100, Alvaro Herrera wrote: > > You named the hash table InjectionPointHashByName, which seems weird. > > Is there any *other* way to locate an injection point that is not by > > name? > > I am not sure what you mean here. Names are kind of the most > portable and simplest thing I could think of. Oh, I think you're overthinking what my comment was. I was saying, just name it "InjectionPointsHash". Since there appears to be no room for another hash table for injection points, then there's no need to specify that this one is the ByName hash. I couldn't think of any other way to organize the injection points either. > > In this patch, injection points are instance-wide (because the hash > > table is in shmem). > > Yes, still not something that's required in the core APIs or an > initial batch. I agree that we can do the easy thing first and build it up later. I just hope we don't get too wedded on the current interface because of lack of time in the current release that we get stuck with it. > I am not sure that it is a good idea to enforce a specific conditional > logic in the backend core code. Agreed, let's get more experience on what other types of tests people want to build, and how are things going to interact with each other. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "No necesitamos banderas No reconocemos fronteras" (Jorge González)
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
On Wed, Nov 15, 2023 at 12:21:40PM +0100, Alvaro Herrera wrote: > On 2023-Nov-15, Michael Paquier wrote: > Oh, I think you're overthinking what my comment was. I was saying, just > name it "InjectionPointsHash". Since there appears to be no room for > another hash table for injection points, then there's no need to specify > that this one is the ByName hash. I couldn't think of any other way to > organize the injection points either. Aha, OK. No problem, this was itching me as well but I didn't see an argument with changing these names, so I've renamed things a bit more. >> Yes, still not something that's required in the core APIs or an >> initial batch. > > I agree that we can do the easy thing first and build it up later. I > just hope we don't get too wedded on the current interface because of > lack of time in the current release that we get stuck with it. One thing that I assume we will need with more advanced testing is the possibility to pass down one (for a structure of data) or more arguments to the callback a point is attached to. For that, it is possible to add more macros, like INJECTION_POINT_1ARG, INJECTION_POINT_ARG(), etc. that use some (void *) pointers. It would be possible to add that even in stable branches, as need arises, changing the structure of the shmem hash table if required to control the way a callback is run. At the end, I suspect that it is one of these things where we'll need to adapt depending on what people want to do with this stuff. FWIW, I can do already quite a bit with this minimalistic design and an external extension. Custom wait events are also very handy to monitor a wait. >> I am not sure that it is a good idea to enforce a specific conditional >> logic in the backend core code. > > Agreed, let's get more experience on what other types of tests people > want to build, and how are things going to interact with each other. I've worked more on polishing the core facility today, with 0001 introducing the backend-side facility. One thing that I mentioned lacking is a local cache for processes so as they don't load an external callback more than once if run. So I have added this local cache. When a point is scanned but not found, a previous cache entry is cleared if any (this leaks but we don't have a way to unload stuff, and for testing that's not a big deal). I've renamed the routines to use attach and detach as terms, and adopted the name you've suggested for the macro. The names around the hash table and its entries have been changed to what you've suggested. You are right, that's more intuitive. 0002 is the test module for the basics, split into its own patch, with a couple of tests for the local cache part. 0003 and 0004 have been adjusted with the new SQL functions. At the end, I'd like to propose 0004 as it's been a PITA for me and I don't want to break this case again. It needs more work and can be cheaper. One more argument in favor of it is the addition of condition variables to wait and wake points (perhaps with even more variables?) in the test module. If there is interest for 0003, I'm OK to work more on it as well, but that's less important IMV. Thoughts and comments are welcome, with a v4 series attached. -- Michael
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Ashutosh Bapat
Date:
On Wed, Oct 25, 2023 at 9:43 AM Michael Paquier <michael@paquier.xyz> wrote: > > Hi all, > > I don't remember how many times in the last few years when I've had to > hack the backend to produce a test case that involves a weird race > condition across multiple processes running in the backend, to be able > to prove a point or just test a fix (one recent case: 2b8e5273e949). > Usually, I come to hardcoding stuff for the following situations: > - Trigger a PANIC, to force recovery. > - A FATAL, to take down a session, or just an ERROR. > - palloc() failure injection. > - Sleep to slow down a code path. > - Pause and release with condition variable. > > And, while that's helpful to prove a point on a thread, nothing comes > out of it in terms of regression test coverage in the tree because > these tests are usually too slow and expensive, as they usually rely > on hardcoded timeouts. So that's pretty much attempting to emulate > what one would do with a debugger in a predictable way, without the > manual steps because human hands don't scale well. > > The reason behind that is of course more advanced testing, to be able > to expand coverage when we have weird and non-deterministic race > issues to deal with, and the code becoming more complex every year > makes that even harder. Fault and failure injection in specific paths > comes into mind, additionally, particularly if you manage complex > projects based on Postgres. > > So, please find attached a patch set that introduces an in-core > facility to be able to set what I'm calling here an "injection point", > that consists of being able to register in shared memory a callback > that can be run within a defined location of the code. It means that > it is not possible to trigger a callback before shared memory is set, > but I've faced far more the case where I wanted to trigger something > after shmem is set anyway. Persisting an injection point across > restarts is also possible by adding some through an extension's shmem > hook, as we do for custom LWLocks for example, as long as a library is > loaded. > > This will remind a bit of what Alexander Korotkov has proposed here: > https://www.postgresql.org/message-id/CAPpHfdtSEOHX8dSk9Qp%2BZ%2B%2Bi4BGQoffKip6JDWngEA%2Bg7Z-XmQ%40mail.gmail.com > Also, this is much closee to what Craig Ringer is mentioning here, > where it is named probe points, but I am using a minimal design that > allows to achieve the same: > https://www.postgresql.org/message-id/CAPpHfdsn-hzneYNbX4qcY5rnwr-BA1ogOCZ4TQCKQAw9qa48kA%40mail.gmail.com > > A difference is that I don't really see a point in passing to the > callback triggered an area of data coming from the hash table itself, > as at the end a callback could just refer to an area in shared memory > or a static set of variables depending on what it wants, with one or > more injection points (say a location to set a state, and a second to > check it). So, at the end, the problem comes down in my opinion to > two things: > - Possibility to trigger a condition defined by some custom code, in > the backend (core code or even out-of-core). > - Possibility to define a location in the code where a named point > would be checked. > > 0001 introduces three APIs to create, run, and drop injection points: > +extern void InjectionPointCreate(const char *name, > + InjectionPointCallback callback); > +extern void InjectionPointRun(const char *name); > +extern void InjectionPointDrop(const char *name); > > Then one just needs to add a macro like that to trigger the callback > registered in the code to test: > INJECTION_POINT_RUN("String"); > So the footprint in the core tree is not zero, but it is as minimal as > it can be. > > I have added some documentation to explain that, as well. I am not > wedded to the name proposed in the patch, so if you feel there is > better, feel free to propose ideas. Actually with Attach and Detach terminology, INJECTION_POINT becomes the place where we "declare" the injection point. So the documentation needs to first explain INJECTION_POINT and then explain the other operations. > > This facility is hidden behind a specific configure/Meson switch, > making it a no-op by default: > --enable-injection-points > -Dinjection_points={ true | false } That's useful, but we will also see demand to enable those in production (under controlled circumstances). But let the functionality mature under a separate flag and developer builds before used in production. > > 0002 is a test module to test these routines, that I have kept a > maximum simple to ease review of the basics proposed here. This could > be extended further to propose more default modes with TAP tests on > its own, as I don't see a real point in having the SQL bits or some > common callbacks (like for the PANIC or the FATAL cases) in core. > > Thoughts and comments are welcome. I think this is super useful functionality. Some comments here. +/* + * Local cache of injection callbacks already loaded, stored in + * TopMemoryContext. + */ +typedef struct InjectionPointArrayEntry +{ + char name[INJ_NAME_MAXLEN]; + InjectionPointCallback callback; +} InjectionPointArrayEntry; + +static InjectionPointArrayEntry *InjectionPointCacheArray = NULL; Initial size of this array is small, but given the size of code in a given path to be tested, I fear that this may not be sufficient. I feel it's better to use hash table whose APIs are already available. + test_injection_points_attach +------------------------------ + +(1 row) + +SELECT test_injection_points_run('TestInjectionBooh'); -- nothing I find it pretty useless to expose that as a SQL API. Also adding it in tests would make it usable as an example. In this patchset INJECTION_POINT should be the only way to trigger an injection point. That also brings another question. The INJECTION_POINT needs to be added in the core code but can only be used through an extension. I think there should be some rudimentary albeit raw test in core for this. Extension does some good things like provides built-in actions when the injection is triggered. So, we should keep those too. I am glad that it covers the frequently needed injections error, notice and wait. If someone adds multiple injection points for wait and all of them are triggered (in different backends), test_injection_points_wake() would wake them all. When debugging cascaded functionality it's not easy to follow sequence add, trigger, wake for multiple injections. We should associate a conditional variable with the required injection points. Attach should create conditional variable in the shared memory for that injection point and detach should remove it. I see this being mentioned in the commit message, but I think it's something we need in the first version of "wait" to be useful. At some point we may want to control how many times an injection is triggered by using a count. Often, I have wanted an ERROR to be thrown in a code path once or twice and then stop triggering it. For example to test WAL sender restart after a certain event. We can't really time Detach correctly to avoid multiple restarts. A count is useful is such a case. +/* + * Attach a new injection point. + */ +void +InjectionPointAttach(const char *name, + const char *library, + const char *function) +{ +#ifdef USE_INJECTION_POINTS In a non-injection-point build this function would be compiled and a call to this function would throw an error. This means that any test we write which uses injection points can not do so optionally. Tests which can be run with and without injection points built will reduce duplication. We should define this function as no-op in non-injection-point build to faciliate such tests. Those tests need to also install extension. That's another pain point. So anyone wants to run the tests needs to compile the extension too. I am wondering whether we should have this functionality in the core itself somewhere and will be only useful when built with injection. Many a times it's only a single backend which needs to be subjected to an injection. For inducing ERROR and NOTICE, many a times it's also the same backend attached the client session. For WAIT, however we need a way to inject from some other session. We might be able to use current signalling mechanism for that (wake sends SIGUSR1 with reason). Leaving aside WAIT for a moment when the same backend's behaviour is being controlled, do we really need shared memory and also affect all the running backends. I see some discussion about being able to trigger only for a given PID, but when that PID of the current backend itself shared memory is not required. -- Best Wishes, Ashutosh Bapat
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
On Mon, Nov 20, 2023 at 04:53:45PM +0530, Ashutosh Bapat wrote: > On Wed, Oct 25, 2023 at 9:43 AM Michael Paquier <michael@paquier.xyz> wrote: >> I have added some documentation to explain that, as well. I am not >> wedded to the name proposed in the patch, so if you feel there is >> better, feel free to propose ideas. > > Actually with Attach and Detach terminology, INJECTION_POINT becomes > the place where we "declare" the injection point. So the documentation > needs to first explain INJECTION_POINT and then explain the other > operations. Sure. >> This facility is hidden behind a specific configure/Meson switch, >> making it a no-op by default: >> --enable-injection-points >> -Dinjection_points={ true | false } > > That's useful, but we will also see demand to enable those in > production (under controlled circumstances). But let the functionality > mature under a separate flag and developer builds before used in > production. Hmm. Okay. I'd still keep that under a compile switch for now anyway to keep a cleaner separation and avoid issues where it would be possible to inject code in a production build. Note that I was planning to switch one of my buildfarm animals to use it should this stuff make it into the tree. And people would be free to use it if they want. If in production, that would be a risk, IMO. > +/* > + * Local cache of injection callbacks already loaded, stored in > + * TopMemoryContext. > + */ > +typedef struct InjectionPointArrayEntry > +{ > + char name[INJ_NAME_MAXLEN]; > + InjectionPointCallback callback; > +} InjectionPointArrayEntry; > + > +static InjectionPointArrayEntry *InjectionPointCacheArray = NULL; > > Initial size of this array is small, but given the size of code in a > given path to be tested, I fear that this may not be sufficient. I > feel it's better to use hash table whose APIs are already available. I've never seen in recent years a need for a given test to use more than 4~5 points. But I guess that you've seen more than that wanted in a prod environment with a fork of Postgres? I'm OK to switch that to use a hash table in the initial implementation, even for a low number of entries with points that are not in hot code paths that should be OK. At least for my cases related to testing that's OK. Am I right to assume that you mean a HTAB in TopMemoryContext? > I find it pretty useless to expose that as a SQL API. Also adding it > in tests would make it usable as an example. In this patchset > INJECTION_POINT should be the only way to trigger an injection point. That's useful to test the cache logic while providing simple coverage for the core API, and that's cheap. So I'd rather keep this test module around with these tests. > That also brings another question. The INJECTION_POINT needs to be added in the > core code but can only be used through an extension. I think there should be > some rudimentary albeit raw test in core for this. Extension does some good > things like provides built-in actions when the injection is triggered. So, we > should keep those too. Yeah, I'd like to agree with that but everything I've seen in the recent years is that every setup finishes by having different assumptions about what they want to do, so my intention is to trim down the core of the patch to a bare acceptable minimum and then work on top of that. > I am glad that it covers the frequently needed injections error, notice and > wait. If someone adds multiple injection points for wait and all of them are > triggered (in different backends), test_injection_points_wake() would > wake them all. When debugging cascaded functionality it's not easy to > follow sequence add, trigger, wake for multiple injections. We should > associate a conditional variable with the required injection points. Attach > should create conditional variable in the shared memory for that injection > point and detach should remove it. I see this being mentioned in the commit > message, but I think it's something we need in the first version of "wait" to > be useful. More to the point, I actually disagree with that, because it could be possible as well that the same condition variable is used across multiple points. At the end, IMHO, the central hash table should hold zero meta-data associated to an injection point like a counter, an elog, a condition variable, a sleep time, etc. or any combination of all these ones, and should just know about how to load a callback with a library path and a routine name. I understand that this is leaving the responsibility of what a callback should do down to the individual developer implementing a callback into their own extension, where they should be free to have conditions of their own. Something that I agree would be very useful for the in-core APIs, depending on the cases, is to be able to push some information to the callback at runtime to let a callback decide what to do depending on a running state, including a condition registered when a point was attached. See my argument about an _ARG macro that passes down to the callback a (void *). > At some point we may want to control how many times an injection is > triggered by using a count. Often, I have wanted an ERROR to be thrown > in a code path once or twice and then stop triggering it. For example > to test WAL sender restart after a certain event. We can't really time > Detach correctly to avoid multiple restarts. A count is useful is such > a case. Yeah. That's also something that can be achieved outside the shmem hash table, so this is intentionally not part of InjectionPointHash. > +/* > + * Attach a new injection point. > + */ > +void > +InjectionPointAttach(const char *name, > + const char *library, > + const char *function) > +{ > +#ifdef USE_INJECTION_POINTS > > In a non-injection-point build this function would be compiled and a call to > this function would throw an error. This means that any test we write which > uses injection points can not do so optionally. Tests which can be run with and > without injection points built will reduce duplication. We should define this > function as no-op in non-injection-point build to faciliate such tests. The argument goes both ways, I guess. I'd rather choose a hard failure to know that what I am doing is not silently ignored, which is why I made this choice in the patch. > Those tests need to also install extension. That's another pain point. > So anyone wants to run the tests needs to compile the extension too. I > am wondering whether we should have this functionality in the core > itself somewhere and will be only useful when built with injection. That's something that may be discussed on top of the backend APIs, though this comes down to how and what kind of meta-data should be associated to the central shmem hash table. Keeping the shmem hash as small as possible to keep minimal the traces of this code in core is a design choice that I'd rather not change. > Many a times it's only a single backend which needs to be subjected to > an injection. For inducing ERROR and NOTICE, many a times it's also > the same backend attached the client session. Yep. I've used that across multiple sessions. For the basic facility, I think that's the absolute minimum. > For WAIT, however we > need a way to inject from some other session. You can do that already with the patch, no? If you know that a different session would cross a given path, you could set a macro in it. If you wish for this session to wait before that, it is possible to use a second point to make it do so. I've used such techniques as well for more complex reproducible failures than what I've posted in the patch series. In the last months, I've topped a TAP test to rely on 5 deterministic points, I think. Or perhaps 6. That was a fun exercise, for a TAP test coded while self-complaining about the core backend code that does not make this stuff easier. > We might be able to use > current signalling mechanism for that (wake sends SIGUSR1 with > reason). Leaving aside WAIT for a moment when the same backend's > behaviour is being controlled, do we really need shared memory and > also affect all the running backends. I see some discussion about > being able to trigger only for a given PID, but when that PID of the > current backend itself shared memory is not required. I am not convinced that there is any need for signalling in most cases, as long as you know beforehand the PID of the session you'd like to stop, because this would still require a second session to register a condition based on the PID known. Another possibility that I can think of is to use a custom wait event with a second point to setup a different condition. At the end, my point is that it is possible to control everything in some extension code that holds the callbacks, with an extra shmem area in the extension that associates some meta-data with a point name, for instance. -- Michael
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Ashutosh Bapat
Date:
On Tue, Nov 21, 2023 at 6:56 AM Michael Paquier <michael@paquier.xyz> wrote: > > >> This facility is hidden behind a specific configure/Meson switch, > >> making it a no-op by default: > >> --enable-injection-points > >> -Dinjection_points={ true | false } > > > > That's useful, but we will also see demand to enable those in > > production (under controlled circumstances). But let the functionality > > mature under a separate flag and developer builds before used in > > production. > > Hmm. Okay. I'd still keep that under a compile switch for now > anyway to keep a cleaner separation and avoid issues where it would be > possible to inject code in a production build. Note that I was > planning to switch one of my buildfarm animals to use it should this > stuff make it into the tree. And people would be free to use it if > they want. If in production, that would be a risk, IMO. makes sense. Just to be clear - by "in production" I mean user installations - they may be testing environments or production environments. > > > +/* > > + * Local cache of injection callbacks already loaded, stored in > > + * TopMemoryContext. > > + */ > > +typedef struct InjectionPointArrayEntry > > +{ > > + char name[INJ_NAME_MAXLEN]; > > + InjectionPointCallback callback; > > +} InjectionPointArrayEntry; > > + > > +static InjectionPointArrayEntry *InjectionPointCacheArray = NULL; > > > > Initial size of this array is small, but given the size of code in a > > given path to be tested, I fear that this may not be sufficient. I > > feel it's better to use hash table whose APIs are already available. > > I've never seen in recent years a need for a given test to use more > than 4~5 points. But I guess that you've seen more than that wanted > in a prod environment with a fork of Postgres? A given case may not require more than 4 -5 points but users may create scripts with many frequently required injection points and install handlers for them. > > I'm OK to switch that to use a hash table in the initial > implementation, even for a low number of entries with points that are > not in hot code paths that should be OK. At least for my cases > related to testing that's OK. Am I right to assume that you mean a > HTAB in TopMemoryContext? Yes. > > > I am glad that it covers the frequently needed injections error, notice and > > wait. If someone adds multiple injection points for wait and all of them are > > triggered (in different backends), test_injection_points_wake() would > > wake them all. When debugging cascaded functionality it's not easy to > > follow sequence add, trigger, wake for multiple injections. We should > > associate a conditional variable with the required injection points. Attach > > should create conditional variable in the shared memory for that injection > > point and detach should remove it. I see this being mentioned in the commit > > message, but I think it's something we need in the first version of "wait" to > > be useful. > > More to the point, I actually disagree with that, because it could be > possible as well that the same condition variable is used across > multiple points. At the end, IMHO, the central hash table should hold > zero meta-data associated to an injection point like a counter, an > elog, a condition variable, a sleep time, etc. or any combination of > all these ones, and should just know about how to load a callback with > a library path and a routine name. I understand that this is leaving > the responsibility of what a callback should do down to the individual > developer implementing a callback into their own extension, where they > should be free to have conditions of their own. > > Something that I agree would be very useful for the in-core APIs, > depending on the cases, is to be able to push some information to the > callback at runtime to let a callback decide what to do depending on a > running state, including a condition registered when a point was > attached. See my argument about an _ARG macro that passes down to the > callback a (void *). The injection_run function is called from the place where the injection point is declared but that place does not know what injection function is going to be run. So a user can not pass arguments to injection declaration. What injection to run is decided by the injection_attach and thus one can pass arguments to it but then injection_attach stores the information in the shared memory from where it's picked up by injection_run. So even though you don't want to store the arguments in the shared memory, you are creating a design which takes us towards that direction eventually - otherwise users will end up writing many injection functions - one for each possible combination of count, sleep, conditional variable etc. But let's see whether that happens to be the case in practice. We will need to evolve this feature based on usage. > > > Those tests need to also install extension. That's another pain point. > > So anyone wants to run the tests needs to compile the extension too. I > > am wondering whether we should have this functionality in the core > > itself somewhere and will be only useful when built with injection. > > That's something that may be discussed on top of the backend APIs, > though this comes down to how and what kind of meta-data should be > associated to the central shmem hash table. Keeping the shmem hash as > small as possible to keep minimal the traces of this code in core is a > design choice that I'd rather not change. shmem hash size won't depend upon the number of functions we write in the core. Yes it will add to the core code and may add maintenance burden. So I understand your inclination to keep the core minimal. > > > Many a times it's only a single backend which needs to be subjected to > > an injection. For inducing ERROR and NOTICE, many a times it's also > > the same backend attached the client session. > > Yep. I've used that across multiple sessions. For the basic > facility, I think that's the absolute minimum. > > > For WAIT, however we > > need a way to inject from some other session. > > You can do that already with the patch, no? If you know that a > different session would cross a given path, you could set a macro in > it. If you wish for this session to wait before that, it is possible > to use a second point to make it do so. I've used such techniques as > well for more complex reproducible failures than what I've posted in > the patch series. In the last months, I've topped a TAP test to rely > on 5 deterministic points, I think. Or perhaps 6. That was a fun > exercise, for a TAP test coded while self-complaining about the core > backend code that does not make this stuff easier. > > > We might be able to use > > current signalling mechanism for that (wake sends SIGUSR1 with > > reason). Leaving aside WAIT for a moment when the same backend's > > behaviour is being controlled, do we really need shared memory and > > also affect all the running backends. I see some discussion about > > being able to trigger only for a given PID, but when that PID of the > > current backend itself shared memory is not required. > > I am not convinced that there is any need for signalling in most > cases, as long as you know beforehand the PID of the session you'd > like to stop, because this would still require a second session to > register a condition based on the PID known. Another possibility that > I can think of is to use a custom wait event with a second point to > setup a different condition. At the end, my point is that it is > possible to control everything in some extension code that holds the > callbacks, with an extra shmem area in the extension that associates > some meta-data with a point name, for instance. If the session which attaches to an injection point is same as the session where the injection point is triggered (most of the ERROR and NOTICE injections will see this pattern), we don't need shared memory. There's a performance penalty to it since injection_run will look up shared memory. For WAIT we may or may not need shared memory. But let's see what other think and what usage patterns we see. -- Best Wishes, Ashutosh Bapat
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
On Wed, Nov 22, 2023 at 09:23:21PM +0530, Ashutosh Bapat wrote: > On Tue, Nov 21, 2023 at 6:56 AM Michael Paquier <michael@paquier.xyz> wrote: >> I've never seen in recent years a need for a given test to use more >> than 4~5 points. But I guess that you've seen more than that wanted >> in a prod environment with a fork of Postgres? > > A given case may not require more than 4 -5 points but users may > create scripts with many frequently required injection points and > install handlers for them. Sure, if a callback is generic enough it could be shared across multiple points. > The injection_run function is called from the place where the > injection point is declared but that place does not know what > injection function is going to be run. So a user can not pass > arguments to injection declaration. It is possible to make that predictible but it means that a callback is most likely to be used by one single point. This makes extensions in charge of holding the callbacks more complicated, at the benefit of keeping a minimal footprint in the backend code. > What injection to run is decided > by the injection_attach and thus one can pass arguments to it but then > injection_attach stores the information in the shared memory from > where it's picked up by injection_run. So even though you don't want > to store the arguments in the shared memory, you are creating a design > which takes us towards that direction eventually - otherwise users > will end up writing many injection functions - one for each possible > combination of count, sleep, conditional variable etc. But let's see > whether that happens to be the case in practice. We will need to > evolve this feature based on usage. A one-one mapping between callback and point is not always necessary. If you wish to use a combination of N points with a sleep callback and different sleep times, one can just register a second shmem area in the extension holding the callbacks that links the point names with the sleep time to use. > shmem hash size won't depend upon the number of functions we write in > the core. Yes it will add to the core code and may add maintenance > burden. So I understand your inclination to keep the core minimal. Yeah, without a clear benefit, my point is just to throw the responsibility to extension developers for now, which would mean the addition of tests that depend on test_injection_points/, or just install this extension optionally in other code path that need it. Maybe 0004 should be in src/test/recovery/ and do that, actually.. I'll most likely agree with extending all the backend stuff in a more meaningful way, but I am not sure which method should be enforced. > If the session which attaches to an injection point is same as the > session where the injection point is triggered (most of the ERROR and > NOTICE injections will see this pattern), we don't need shared memory. > There's a performance penalty to it since injection_run will look up > shared memory. For WAIT we may or may not need shared memory. But > let's see what other think and what usage patterns we see. The first POC of the patch that you can find at the top of this thread did that, actually, but this is too limited. IMO, linking things to a central table is just *much* more useful. I've implemented a v5 that switches the cache to use a seconf hash table on TopMemoryContext for the cache instead of an array. This makes the cache handling slightly cleaner, so your suggestion was right. 0003 and 0004 are the same as previously, passing or failing under the same conditions. I'm wondering if folks have other comments about 0001 and 0002? It sounds to me like the consensus is that this stuff is useful and that there are no string objections, so feel free to comment. I don't want to propose 0003 in the tree, just an improved version of 0004 for the test coverage (still need to improve that). -- Michael
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Ashutosh Bapat
Date:
On Fri, Nov 24, 2023 at 7:26 AM Michael Paquier <michael@paquier.xyz> wrote: > If you wish to use a combination of N points with a sleep callback and > different sleep times, one can just register a second shmem area in > the extension holding the callbacks that links the point names with > the sleep time to use. > Interesting idea. For that the callback needs to know the injection point name. At least we should pass that to the callback. It's trivial thing to do. > > shmem hash size won't depend upon the number of functions we write in > > the core. Yes it will add to the core code and may add maintenance > > burden. So I understand your inclination to keep the core minimal. > > Yeah, without a clear benefit, my point is just to throw the > responsibility to extension developers for now, which would mean the > addition of tests that depend on test_injection_points/, or just > install this extension optionally in other code path that need it. > Maybe 0004 should be in src/test/recovery/ and do that, actually.. That might work, but in order to run tests in that directory one has to also install the extension. Do we have precedence for such kind of dependency? > > > If the session which attaches to an injection point is same as the > > session where the injection point is triggered (most of the ERROR and > > NOTICE injections will see this pattern), we don't need shared memory. > > There's a performance penalty to it since injection_run will look up > > shared memory. For WAIT we may or may not need shared memory. But > > let's see what other think and what usage patterns we see. > > The first POC of the patch that you can find at the top of this thread > did that, actually, but this is too limited. IMO, linking things to a > central table is just *much* more useful. > > I've implemented a v5 that switches the cache to use a seconf hash > table on TopMemoryContext for the cache instead of an array. This > makes the cache handling slightly cleaner, so your suggestion was > right. glad that you liked the outcome. > 0003 and 0004 are the same as previously, passing or failing > under the same conditions. I'm wondering if folks have other comments > about 0001 and 0002? It sounds to me like the consensus is that this > stuff is useful I think so. > and that there are no string objections, so feel free > to comment. Let's get some more opinions on the design. I will review the detailed code then. > > I don't want to propose 0003 in the tree, just an improved version of > 0004 for the test coverage (still need to improve that). Are you working on v6 already? -- Best Wishes, Ashutosh Bapat
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
On Fri, Nov 24, 2023 at 04:37:58PM +0530, Ashutosh Bapat wrote: > Interesting idea. For that the callback needs to know the injection > point name. At least we should pass that to the callback. It's trivial > thing to do. This is what's done from the beginning, as well as of 0001 in the v5 series: +INJECTION_POINT(name); [...] + injection_callback(name); > That might work, but in order to run tests in that directory one has > to also install the extension. Do we have precedence for such kind of > dependency? Yes, please see EXTRA_INSTALL in some of the Makefiles. This can install stuff from paths different than the location where the tests are run. >> and that there are no string objections, so feel free >> to comment. > > Let's get some more opinions on the design. I will review the detailed > code then. Sure. Thanks. >> I don't want to propose 0003 in the tree, just an improved version of >> 0004 for the test coverage (still need to improve that). > > Are you working on v6 already? No, what would be the point at this stage? I dont have much more to add to 0001 and 0002 at the moment, which focus on the core of the problem. -- Michael
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Ashutosh Bapat
Date:
On Fri, Nov 24, 2023 at 7:37 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Nov 24, 2023 at 04:37:58PM +0530, Ashutosh Bapat wrote: > > Interesting idea. For that the callback needs to know the injection > > point name. At least we should pass that to the callback. It's trivial > > thing to do. > > This is what's done from the beginning, as well as of 0001 in the v5 > series: > +INJECTION_POINT(name); > [...] > + injection_callback(name); In my first look I missed the actual call to the injection callback in InjectionPointRun() injection_callback(name); Sorry for that. The way I see it is that an extension using this functionality will create an auxiliary lookup table keyed by the injection point name to obtain the injection point specific arguments (sleep time, count etc.) in the shared memory or local memory. Every time an injection callback is called it will consult this look up table to get the arguments. That looks ok to me. There might be other ways to achieve the same effect. We will learn and absorb whatever benefits core and the users. I like that. > > > That might work, but in order to run tests in that directory one has > > to also install the extension. Do we have precedence for such kind of > > dependency? > > Yes, please see EXTRA_INSTALL in some of the Makefiles. This can > install stuff from paths different than the location where the tests > are run. WFM then. > > >> and that there are no string objections, so feel free > >> to comment. > > > > Let's get some more opinions on the design. I will review the detailed > > code then. > > Sure. Thanks. > > >> I don't want to propose 0003 in the tree, just an improved version of > >> 0004 for the test coverage (still need to improve that). > > > > Are you working on v6 already? > > No, what would be the point at this stage? I dont have much more to > add to 0001 and 0002 at the moment, which focus on the core of the > problem. Since you wroten "(still need to improve ...) I thought you are working on v6. No problem. Sorry for the confusion. -- Best Wishes, Ashutosh Bapat
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
On Mon, Nov 27, 2023 at 12:14:05PM +0530, Ashutosh Bapat wrote: > Since you wroten "(still need to improve ...) I thought you are > working on v6. No problem. Sorry for the confusion. I see why my previous message could be confusing. Sorry about that. -- Michael
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Dilip Kumar
Date:
On Tue, Nov 28, 2023 at 4:07 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Nov 27, 2023 at 12:14:05PM +0530, Ashutosh Bapat wrote: > > Since you wroten "(still need to improve ...) I thought you are > > working on v6. No problem. Sorry for the confusion. > > I see why my previous message could be confusing. Sorry about that. I haven't specifically done a review or testing of this patch, but I have used this for testing the CLOG group update code with my SLRU-specific changes and I found it quite helpful to test some of the concurrent areas where you need to stop processing somewhere in the middle of the code and testing that area without this kind of injection point framework is really difficult or may not be even possible. We wanted to test the case of clog group update where we can get multiple processes added to a single group and get the xid status updated by the group leader, you can refer to my test in that thread[1] (the last patch test_group_commit.patch is using this framework for testing). Overall I feel this framework is quite useful and easy to use as well. [1] https://www.postgresql.org/message-id/CAFiTN-udSTGG_t5n9Z3eBbb4_%3DzNoKU%2B8FP-S6zpv-r4Gm-Y%2BQ%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
On Mon, Dec 11, 2023 at 11:09:45AM +0530, Dilip Kumar wrote: > I haven't specifically done a review or testing of this patch, but I > have used this for testing the CLOG group update code with my > SLRU-specific changes and I found it quite helpful to test some of the > concurrent areas where you need to stop processing somewhere in the > middle of the code and testing that area without this kind of > injection point framework is really difficult or may not be even > possible. We wanted to test the case of clog group update where we > can get multiple processes added to a single group and get the xid > status updated by the group leader, you can refer to my test in that > thread[1] (the last patch test_group_commit.patch is using this > framework for testing). Could you be more specific? test_group_commit.patch includes this line but there is nothing specific about this injection point getting used in a test or a callback assigned to it: ./test_group_commit.patch:+ INJECTION_POINT("ClogGroupCommit"); > Overall I feel this framework is quite useful > and easy to use as well. Cool, thanks. -- Michael
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Dilip Kumar
Date:
On Mon, Dec 11, 2023 at 3:14 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Dec 11, 2023 at 11:09:45AM +0530, Dilip Kumar wrote: > > I haven't specifically done a review or testing of this patch, but I > > have used this for testing the CLOG group update code with my > > SLRU-specific changes and I found it quite helpful to test some of the > > concurrent areas where you need to stop processing somewhere in the > > middle of the code and testing that area without this kind of > > injection point framework is really difficult or may not be even > > possible. We wanted to test the case of clog group update where we > > can get multiple processes added to a single group and get the xid > > status updated by the group leader, you can refer to my test in that > > thread[1] (the last patch test_group_commit.patch is using this > > framework for testing). > > Could you be more specific? test_group_commit.patch includes this > line but there is nothing specific about this injection point getting > used in a test or a callback assigned to it: > ./test_group_commit.patch:+ INJECTION_POINT("ClogGroupCommit"); Oops, I only included the code changes where I am adding injection points and some comments to verify that, but missed the actual test file. Attaching it here. Note: I think the latest patches are conflicting with the head, can you rebase? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
On Tue, Dec 12, 2023 at 10:27:09AM +0530, Dilip Kumar wrote: > Oops, I only included the code changes where I am adding injection > points and some comments to verify that, but missed the actual test > file. Attaching it here. I see. Interesting that this requires persistent connections to work. That's something I've found clunky to rely on when the scenarios a test needs to deal with are rather complex. That's an area that could be made easier to use outside of this patch.. Something got proposed by Andrew Dunstan to make the libpq routines usable through a perl module, for example. > Note: I think the latest patches are conflicting with the head, can you rebase? Indeed, as per the recent manipulations in ipci.c for the shmem initialization areas. Here goes a v6. -- Michael
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Ashutosh Bapat
Date:
On Tue, Dec 12, 2023 at 4:15 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Dec 12, 2023 at 10:27:09AM +0530, Dilip Kumar wrote: > > Oops, I only included the code changes where I am adding injection > > points and some comments to verify that, but missed the actual test > > file. Attaching it here. > > I see. Interesting that this requires persistent connections to work. > That's something I've found clunky to rely on when the scenarios a > test needs to deal with are rather complex. That's an area that could > be made easier to use outside of this patch.. Something got proposed > by Andrew Dunstan to make the libpq routines usable through a perl > module, for example. > > > Note: I think the latest patches are conflicting with the head, can you rebase? > > Indeed, as per the recent manipulations in ipci.c for the shmem > initialization areas. Here goes a v6. Sorry for replying late here. Another minor conflict has risen again. It's minor enough to be ignored for a review. On Tue, Nov 21, 2023 at 6:56 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Nov 20, 2023 at 04:53:45PM +0530, Ashutosh Bapat wrote: > > On Wed, Oct 25, 2023 at 9:43 AM Michael Paquier <michael@paquier.xyz> wrote: > >> I have added some documentation to explain that, as well. I am not > >> wedded to the name proposed in the patch, so if you feel there is > >> better, feel free to propose ideas. > > > > Actually with Attach and Detach terminology, INJECTION_POINT becomes > > the place where we "declare" the injection point. So the documentation > > needs to first explain INJECTION_POINT and then explain the other > > operations. > > Sure. This discussion has not been addressed in v6. I think the interface needs to be documented in the order below INJECTION_POINT - this declares an injection point - i.e. a place in code where an external code can be injected (and run). InjectionPointAttach() - this is used to associate ("attach") external code to an injection point. InjectionPointDetach() - this is used to disassociate ("detach") external code from an injection point. Specifying that InjectionPointAttach() "defines" an injection point gives an impression that the injection point will be "somehow" added to the code by calling InjectionPointAttach() which is not true. For InjectionPointAttach() to be useful, the first argument to it should be something already "declared" in the code using INJECTION_POINT(). Hence INJECTION_POINT needs to be mentioned in the documentation first, followed by Attach and detach. The documentation needs to be rephrased to use terms "declare", "attach" and "detach" instead of "define", "run". The first set is aligned with the functionality whereas the second set is aligned with the implementation. Even if an INJECTION_POINT is not "declared" attach would succeed but doesn't do anything. I think this needs to be made clear in the documentation. Better if we could somehow make Attach() fail if the specified injection point is not "declared" using INJECTION_POINT. Of course we don't want to bloat the hash table with all "declared" injection points even if they aren't being attached to and hence not used. I think, exposing the current injection point strings as #defines and encouraging users to use these macros instead of string literals will be a good start. With the current implementation it's possible to "declare" injection point with same name at multiple places. It's useful but is it intended? /* Field sizes */ #define INJ_NAME_MAXLEN 64 #define INJ_LIB_MAXLEN 128 #define INJ_FUNC_MAXLEN 128 I think these limits should be either documented or specified in the error messages for users to fix their code in case of errors/unexpected behaviour. Here are some code level comments on 0001 +typedef struct InjectionPointArrayEntry This is not an array entry anymore. I think we should rename InjectionPointEntry as SharedInjectionPointEntry and InjectionPointArrayEntry as LocalInjectionPointEntry. +/* utilities to handle the local array cache */ +static void +injection_point_cache_add(const char *name, + InjectionPointCallback callback) +{ ... snip ... + + entry = (InjectionPointCacheEntry *) + hash_search(InjectionPointCache, name, HASH_ENTER, &found); + + if (!found) The function is called only when the injection point is not found in the local cache. So this condition will always be true. An Assert will help to make it clear and also prevent an unintended callback replacement. +#ifdef USE_INJECTION_POINTS +static bool +file_exists(const char *name) There's similar function in jit.c and dfmgr.c. Can we not reuse that code? + /* Save the entry */ + memcpy(entry_by_name->name, name, sizeof(entry_by_name->name)); + entry_by_name->name[INJ_NAME_MAXLEN - 1] = '\0'; + memcpy(entry_by_name->library, library, sizeof(entry_by_name->library)); + entry_by_name->library[INJ_LIB_MAXLEN - 1] = '\0'; + memcpy(entry_by_name->function, function, sizeof(entry_by_name->function)); + entry_by_name->function[INJ_FUNC_MAXLEN - 1] = '\0'; Most of the code is using strncpy instead of memcpy. Why is this code different? + injection_callback = injection_point_cache_get(name); + if (injection_callback == NULL) + { + char path[MAXPGPATH]; + + /* Found, so just run the callback registered */ The condition indicates that the callback was not found. Comment looks wrong. + snprintf(path, MAXPGPATH, "%s/%s%s", pkglib_path, + entry_by_name->library, DLSUFFIX); + + if (!file_exists(path)) + elog(ERROR, "could not find injection library \"%s\"", path); + + injection_callback = (InjectionPointCallback) + load_external_function(path, entry_by_name->function, true, NULL); + + /* add it to the local cache when found */ + injection_point_cache_add(name, injection_callback); + } + Consider case Backend 2 InjectionPointAttach("xyz", "abc", "pqr"); Backend 1 INJECTION_POINT("xyz"); Backend 2 InjectionPointDetach("xyz"); InjectionPointAttach("xyz", "uvw", "lmn"); Backend 1 INJECTION_POINT("xyz"); IIUC, the last INJECTION_POINT would run abc.pqr instead of uvw.lmn. Am I correct? To fix this, we have to a. either save qualified name of the function in local cache too OR resolve the function name every time INJECTION_POINT is invoked and is found in the shared hash table. The first one option is cheaper I think. But it will be good if we can invalidate the local entry when the global entry changes. To keep code simple, we may choose to ignore close race conditions where INJECTION_POINT is run while InjectionPointAttach or InjectionPointDetach is happening. But this way we don't have to look up shared hash table every time INJECTION_POINT is invoked thus improving performance. I will look at 0002 next. -- Best Wishes, Ashutosh Bapat
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Nathan Bossart
Date:
I'd like to spend some more time reviewing this one, but here are a couple of comments. On Tue, Dec 12, 2023 at 11:44:57AM +0100, Michael Paquier wrote: > +/* > + * Allocate shmem space for dynamic shared hash. > + */ > +void > +InjectionPointShmemInit(void) > +{ > +#ifdef USE_INJECTION_POINTS > + HASHCTL info; > + > + /* key is a NULL-terminated string */ > + info.keysize = sizeof(char[INJ_NAME_MAXLEN]); > + info.entrysize = sizeof(InjectionPointEntry); > + InjectionPointHash = ShmemInitHash("InjectionPoint hash", > + INJECTION_POINT_HASH_INIT_SIZE, > + INJECTION_POINT_HASH_MAX_SIZE, > + &info, > + HASH_ELEM | HASH_STRINGS); > +#endif > +} Should we specify HASH_FIXED_SIZE, too? This hash table will be in the main shared memory segment and therefore won't be able to expand too far beyond the declared maximum size. > + /* > + * Check if the callback exists in the local cache, to avoid unnecessary > + * external loads. > + */ > + injection_callback = injection_point_cache_get(name); > + if (injection_callback == NULL) > + { > + char path[MAXPGPATH]; > + > + /* Found, so just run the callback registered */ > + snprintf(path, MAXPGPATH, "%s/%s%s", pkglib_path, > + entry_by_name->library, DLSUFFIX); > + > + if (!file_exists(path)) > + elog(ERROR, "could not find injection library \"%s\"", path); > + > + injection_callback = (InjectionPointCallback) > + load_external_function(path, entry_by_name->function, true, NULL); > + > + /* add it to the local cache when found */ > + injection_point_cache_add(name, injection_callback); > + } I'm wondering how important it is to cache the callbacks locally. load_external_function() won't reload an already-loaded library, so AFAICT this is ultimately just saving a call to dlsym(). > + <literal>name</literal> is the name of the injection point, that > + will execute the <literal>function</literal> loaded from > + <literal>library</library>. > + Injection points are saved in a hash table in shared memory, and > + last until the server is shut down. > + </para> I think </library> is supposed to be </literal> here. > +++ b/src/test/modules/test_injection_points/t/002_invalid_checkpoint_after_promote.pl 0003 and 0004 add tests to the test_injection_points module. Is the idea that we'd add any tests that required injection points here? I think it'd be better if we could move the tests closer to the logic they're testing, but perhaps that is difficult because you also need to define the callback functions somewhere. Hm... -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
On Tue, Jan 02, 2024 at 11:14:56PM -0600, Nathan Bossart wrote: > Should we specify HASH_FIXED_SIZE, too? This hash table will be in the > main shared memory segment and therefore won't be able to expand too far > beyond the declared maximum size. Good point. > I'm wondering how important it is to cache the callbacks locally. > load_external_function() won't reload an already-loaded library, so AFAICT > this is ultimately just saving a call to dlsym(). This keeps a copy to a callback under the same address space, and I guess that it would matter if the code where a callback is added gets very hot because this means less function pointers. At the end I would keep the cache as the code to handle it is neither complex nor long, while being isolated in its own paths. > I think </library> is supposed to be </literal> here. Okay. > 0003 and 0004 add tests to the test_injection_points module. Is the idea > that we'd add any tests that required injection points here? I think it'd > be better if we could move the tests closer to the logic they're testing, > but perhaps that is difficult because you also need to define the callback > functions somewhere. Hm... Yeah. Agreed that the final result should not have these tests in the module test_injection_points. What I was thinking here is to move 002_invalid_checkpoint_after_promote.pl to src/test/recovery/ and pull the module with the callbacks with an EXTRA_INSTALL. -- Michael
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Ashutosh Bapat
Date:
On Thu, Jan 4, 2024 at 5:23 AM Michael Paquier <michael@paquier.xyz> wrote: > > > 0003 and 0004 add tests to the test_injection_points module. Is the idea > > that we'd add any tests that required injection points here? I think it'd > > be better if we could move the tests closer to the logic they're testing, > > but perhaps that is difficult because you also need to define the callback > > functions somewhere. Hm... > > Yeah. Agreed that the final result should not have these tests in the > module test_injection_points. What I was thinking here is to move > 002_invalid_checkpoint_after_promote.pl to src/test/recovery/ and pull > the module with the callbacks with an EXTRA_INSTALL. 0003 and 0004 are using the extension in this module for some serious testing. The name of the extension test_injection_point indicates that it's for testing injection points and not for some serious use of injection callbacks it adds. Changes 0003 and 0004 suggest otherwise. I suggest we move test_injection_points from src/test/modules to contrib/ and rename it as "injection_points". The test files may still be named as test_injection_point. The TAP tests in 0003 and 0004 once moved to their appropriate places, will load injection_point extension and use it. That way predefined injection point callbacks will also be available for others to use. -- Best Wishes, Ashutosh Bapat
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Ashutosh Bapat
Date:
On Tue, Jan 2, 2024 at 3:36 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > I will look at 0002 next. One more comment on 0001 InjectionPointAttach() doesn't test whether the given function exists in the given library. Even if InjectionPointAttach() succeeds, INJECTION_POINT might throw error because the function doesn't exist. This can be seen as an unwanted behaviour. I think InjectionPointAttach() should test whether the function exists and possibly load it as well by adding it to the local cache. 0002 comments --- /dev/null +++ b/src/test/modules/test_injection_points/expected/test_injection_points.out When built without injection point support, this test fails. We should add an alternate output file for such a build so that the behaviour with and without injection point support is tested. Or set up things such that the test is not run under make check in that directory. I will prefer the first option. + +SELECT test_injection_points_run('TestInjectionError'); -- error +ERROR: error triggered for injection point TestInjectionError +-- Re-load and run again. What's getting Re-loaded here? \c will create a new connection and thus a new backend. Maybe the comment should say "test in a fresh backend" or something of that sort? + +SELECT test_injection_points_run('TestInjectionError'); -- error +ERROR: error triggered for injection point TestInjectionError +-- Remove one entry and check the other one. Looks confusing to me, we are testing the one removed as well. Am I missing something? +(1 row) + +-- All entries removed, nothing happens We aren't removing all entries TestInjectionLog2 is still there. Am I missing something? 0003 looks mostly OK. 0004 comments + +# after recovery, the server will not start, and log PANIC: could not locate a valid checkpoint record IIUC the comment describes the behaviour with 7863ee4def65 reverted. But the test after this comment is written for the behaviour with 7863ee4def65. That's confusing. Is the intent to describe both behaviours in the comment? + + /* And sleep.. */ + ConditionVariablePrepareToSleep(&inj_state->wait_point); + ConditionVariableSleep(&inj_state->wait_point, test_injection_wait_event); + ConditionVariableCancelSleep(); According to the prologue of ConditionVariableSleep(), that function should be called in a loop checking for the desired condition. All the callers that I examined follow that pattern. I think we need to follow that pattern here as well. Below comment from ConditionVariableTimedSleep() makes me think that the caller of ConditionVariableSleep() can be woken up even if the condition variable was not signaled. That's why the while() loop around ConditionVariableSleep(). * If we're still in the wait list, then the latch must have been set * by something other than ConditionVariableSignal; though we don't * guarantee not to return spuriously, we'll avoid this obvious case. */. That's all I have for now. -- Best Wishes, Ashutosh Bapat
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Nathan Bossart
Date:
On Thu, Jan 04, 2024 at 08:53:11AM +0900, Michael Paquier wrote: > On Tue, Jan 02, 2024 at 11:14:56PM -0600, Nathan Bossart wrote: >> I'm wondering how important it is to cache the callbacks locally. >> load_external_function() won't reload an already-loaded library, so AFAICT >> this is ultimately just saving a call to dlsym(). > > This keeps a copy to a callback under the same address space, and I > guess that it would matter if the code where a callback is added gets > very hot because this means less function pointers. At the end I > would keep the cache as the code to handle it is neither complex nor > long, while being isolated in its own paths. Fair enough. >> 0003 and 0004 add tests to the test_injection_points module. Is the idea >> that we'd add any tests that required injection points here? I think it'd >> be better if we could move the tests closer to the logic they're testing, >> but perhaps that is difficult because you also need to define the callback >> functions somewhere. Hm... > > Yeah. Agreed that the final result should not have these tests in the > module test_injection_points. What I was thinking here is to move > 002_invalid_checkpoint_after_promote.pl to src/test/recovery/ and pull > the module with the callbacks with an EXTRA_INSTALL. +1 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Nathan Bossart
Date:
On Thu, Jan 04, 2024 at 06:04:20PM +0530, Ashutosh Bapat wrote: > 0003 and 0004 are using the extension in this module for some serious > testing. The name of the extension test_injection_point indicates that > it's for testing injection points and not for some serious use of > injection callbacks it adds. Changes 0003 and 0004 suggest otherwise. Yeah, I think test_injection_point should be reserved for testing the injection point machinery. > I suggest we move test_injection_points from src/test/modules to > contrib/ and rename it as "injection_points". The test files may still > be named as test_injection_point. The TAP tests in 0003 and 0004 once > moved to their appropriate places, will load injection_point extension > and use it. That way predefined injection point callbacks will also be > available for others to use. Rather than defining a module somewhere that tests would need to load, should we just put the common callbacks in the core server? Unless there's a strong reason to define them elsewhere, that could be a nice way to save a step in the tests. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
On Thu, Jan 04, 2024 at 04:31:02PM -0600, Nathan Bossart wrote: > On Thu, Jan 04, 2024 at 06:04:20PM +0530, Ashutosh Bapat wrote: >> 0003 and 0004 are using the extension in this module for some serious >> testing. The name of the extension test_injection_point indicates that >> it's for testing injection points and not for some serious use of >> injection callbacks it adds. Changes 0003 and 0004 suggest otherwise. > > Yeah, I think test_injection_point should be reserved for testing the > injection point machinery. Sure. FWIW, it makes sense to me to keep the SQL interface and the callbacks in the module, per the reasons below. >> I suggest we move test_injection_points from src/test/modules to >> contrib/ and rename it as "injection_points". The test files may still >> be named as test_injection_point. The TAP tests in 0003 and 0004 once >> moved to their appropriate places, will load injection_point extension >> and use it. That way predefined injection point callbacks will also be >> available for others to use. > > Rather than defining a module somewhere that tests would need to load, > should we just put the common callbacks in the core server? Unless there's > a strong reason to define them elsewhere, that could be a nice way to save > a step in the tests. Nah, having some pre-existing callbacks existing in the backend is against the original minimalistic design spirit. These would also require an SQL interface, and the interface design also depends on the functions registering them when pushing down custom conditions. Pushing that down to extensions to do what they want will lead to less noise, particularly if you consider that we will most likely want to tweak the callback interfaces for backpatched bugs. That's also why I think contrib/ is not a good idea, src/test/modules/ serving the actual testing purpose here. -- Michael
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Ashutosh Bapat
Date:
On Fri, Jan 5, 2024 at 5:08 AM Michael Paquier <michael@paquier.xyz> wrote: > > >> I suggest we move test_injection_points from src/test/modules to > >> contrib/ and rename it as "injection_points". The test files may still > >> be named as test_injection_point. The TAP tests in 0003 and 0004 once > >> moved to their appropriate places, will load injection_point extension > >> and use it. That way predefined injection point callbacks will also be > >> available for others to use. > > > > Rather than defining a module somewhere that tests would need to load, > > should we just put the common callbacks in the core server? Unless there's > > a strong reason to define them elsewhere, that could be a nice way to save > > a step in the tests. > > Nah, having some pre-existing callbacks existing in the backend is > against the original minimalistic design spirit. These would also > require an SQL interface, and the interface design also depends on the > functions registering them when pushing down custom conditions. > Pushing that down to extensions to do what they want will lead to less > noise, particularly if you consider that we will most likely want to > tweak the callback interfaces for backpatched bugs. That's also why I > think contrib/ is not a good idea, src/test/modules/ serving the > actual testing purpose here. Well, you have already showed that the SQL interface created for the test module is being used for testing a core feature. The tests for that should stay somewhere near the other tests for those features. Using an extension named "test_injection_point" and which resides in a test module for testing core features doesn't look great. Hence suggestion to move it to contrib. -- Best Wishes, Ashutosh Bapat
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
On Fri, Jan 05, 2024 at 12:41:33PM +0530, Ashutosh Bapat wrote: > Well, you have already showed that the SQL interface created for the > test module is being used for testing a core feature. The tests for > that should stay somewhere near the other tests for those features. > Using an extension named "test_injection_point" and which resides in a > test module for testing core features doesn't look great. Hence > suggestion to move it to contrib. I mean why? We test a bunch of stuff in src/test/modules/, and this is not intended to be released to the outside world. Putting that in contrib/ has a lot of extra cost. One is documentation and more complexity regarding versioning when it comes to upgrading it to a new version. I don't think that it is a good idea to deal with this extra load of work for something that I'd aim to be used for having improved *test* coverage, and the build switch should stay. Saying that, I'd be OK with renaming the module to injection_points, but I will fight hard about keeping that in src/test/modules/. That's less maintenance headache to think about when having to deal with complex racy bugs. -- Michael
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Ashutosh Bapat
Date:
On Fri, Jan 5, 2024 at 12:49 PM Michael Paquier <michael@paquier.xyz> wrote: > > I mean why? We test a bunch of stuff in src/test/modules/, and this > is not intended to be released to the outside world. > > Putting that in contrib/ has a lot of extra cost. One is > documentation and more complexity regarding versioning when it comes > to upgrading it to a new version. I don't think that it is a good > idea to deal with this extra load of work for something that I'd aim > to be used for having improved *test* coverage, and the build switch > should stay. Saying that, I'd be OK with renaming the module to > injection_points, Ok. Thanks. > but I will fight hard about keeping that in > src/test/modules/. That's less maintenance headache to think about > when having to deal with complex racy bugs. For me getting this feature in code in a usable manner is more important than its place in the code. I have no plans to fight over it. :). -- Best Wishes, Ashutosh Bapat
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Dilip Kumar
Date:
On Tue, Dec 12, 2023 at 4:15 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Dec 12, 2023 at 10:27:09AM +0530, Dilip Kumar wrote: > > Oops, I only included the code changes where I am adding injection > > points and some comments to verify that, but missed the actual test > > file. Attaching it here. > > I see. Interesting that this requires persistent connections to work. > That's something I've found clunky to rely on when the scenarios a > test needs to deal with are rather complex. That's an area that could > be made easier to use outside of this patch.. Something got proposed > by Andrew Dunstan to make the libpq routines usable through a perl > module, for example. > > > Note: I think the latest patches are conflicting with the head, can you rebase? > > Indeed, as per the recent manipulations in ipci.c for the shmem > initialization areas. Here goes a v6. Some comments in 0001, mostly cosmetics 1. +/* utilities to handle the local array cache */ +static void +injection_point_cache_add(const char *name, + InjectionPointCallback callback) I think the comment for this function should be more specific about adding an entry to the local injection_point_cache_add. And add comments for other functions as well e.g. injection_point_cache_get 2. +typedef struct InjectionPointEntry +{ + char name[INJ_NAME_MAXLEN]; /* hash key */ + char library[INJ_LIB_MAXLEN]; /* library */ + char function[INJ_FUNC_MAXLEN]; /* function */ +} InjectionPointEntry; Some comments would be good for the structure 3. +static bool +file_exists(const char *name) +{ + struct stat st; + + Assert(name != NULL); + if (stat(name, &st) == 0) + return !S_ISDIR(st.st_mode); + else if (!(errno == ENOENT || errno == ENOTDIR)) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not access file \"%s\": %m", name))); + return false; +} dfmgr.c has a similar function so can't we reuse it by making that function external? 4. + if (found) + { + LWLockRelease(InjectionPointLock); + elog(ERROR, "injection point \"%s\" already defined", name); + } + ... +#else + elog(ERROR, "Injection points are not supported by this build"); Better to use similar formatting for error output, Injection vs injection (better not to capitalize the first letter for consistency pov) 5. + * Check first the shared hash table, and adapt the local cache + * depending on that as it could be possible that an entry to run + * has been removed. + */ What if the entry is removed after we have released the InjectionPointLock? Or this would not cause any harm? 0004: I think test_injection_points_wake() and test_injection_wait() can be moved as part of 0002 -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Nathan Bossart
Date:
On Fri, Jan 05, 2024 at 08:38:22AM +0900, Michael Paquier wrote: > On Thu, Jan 04, 2024 at 04:31:02PM -0600, Nathan Bossart wrote: >> Rather than defining a module somewhere that tests would need to load, >> should we just put the common callbacks in the core server? Unless there's >> a strong reason to define them elsewhere, that could be a nice way to save >> a step in the tests. > > Nah, having some pre-existing callbacks existing in the backend is > against the original minimalistic design spirit. These would also > require an SQL interface, and the interface design also depends on the > functions registering them when pushing down custom conditions. > Pushing that down to extensions to do what they want will lead to less > noise, particularly if you consider that we will most likely want to > tweak the callback interfaces for backpatched bugs. That's also why I > think contrib/ is not a good idea, src/test/modules/ serving the > actual testing purpose here. Ah, so IIUC we'd have to put some functions in pg_proc.dat even though they would only be used for a handful of tests in special builds. I'd agree that's not desirable. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Nathan Bossart
Date:
On Fri, Jan 05, 2024 at 04:18:49PM +0900, Michael Paquier wrote: > Putting that in contrib/ has a lot of extra cost. One is > documentation and more complexity regarding versioning when it comes > to upgrading it to a new version. I don't think that it is a good > idea to deal with this extra load of work for something that I'd aim > to be used for having improved *test* coverage, and the build switch > should stay. +1 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
On Fri, Jan 05, 2024 at 10:28:47AM -0600, Nathan Bossart wrote: > +1 Extra note for this thread: it is possible to add a SQL test case for problems like what's been reported on this thread when facing a partial write failure: https://www.postgresql.org/message-id/18259-6e256429825dd435@postgresql.org -- Michael
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
On Fri, Jan 05, 2024 at 03:00:25PM +0530, Dilip Kumar wrote: > Some comments in 0001, mostly cosmetics > > 1. > +/* utilities to handle the local array cache */ > +static void > +injection_point_cache_add(const char *name, > + InjectionPointCallback callback) > > I think the comment for this function should be more specific about > adding an entry to the local injection_point_cache_add. And add > comments for other functions as well e.g. injection_point_cache_get And it is not an array anymore. Note InjectionPointArrayEntry that still existed. > 2. > +typedef struct InjectionPointEntry > +{ > + char name[INJ_NAME_MAXLEN]; /* hash key */ > + char library[INJ_LIB_MAXLEN]; /* library */ > + char function[INJ_FUNC_MAXLEN]; /* function */ > +} InjectionPointEntry; > > Some comments would be good for the structure Sure. I've spent more time documenting things in injection_point.c, addressing any inconsistencies. > 3. > > +static bool > +file_exists(const char *name) > +{ > + struct stat st; > + > + Assert(name != NULL); > + if (stat(name, &st) == 0) > + return !S_ISDIR(st.st_mode); > + else if (!(errno == ENOENT || errno == ENOTDIR)) > + ereport(ERROR, > + (errcode_for_file_access(), > + errmsg("could not access file \"%s\": %m", name))); > + return false; > +} > > dfmgr.c has a similar function so can't we reuse it by making that > function external? Yes. Note that jit.c has an extra copy of it. I was holding on the refactoring, but let's bite the bullet and have a single routine. I've moved that into a 0001 that builds on top of the rest. > 4. > + if (found) > + { > + LWLockRelease(InjectionPointLock); > + elog(ERROR, "injection point \"%s\" already defined", name); > + } > + > ... > +#else > + elog(ERROR, "Injection points are not supported by this build"); > > Better to use similar formatting for error output, Injection vs > injection (better not to capitalize the first letter for consistency > pov) Fixed. > 5. > + * Check first the shared hash table, and adapt the local cache > + * depending on that as it could be possible that an entry to run > + * has been removed. > + */ > > What if the entry is removed after we have released the > InjectionPointLock? Or this would not cause any harm? With an entry found in the shmem table? I don't really think that we need to care about such cases, TBH, because the injection point would have been found in the table to start with. This comes down to if we should try to hold InjectionPointLock while calling the callback, and that may not be a good idea in some cases if you'd expect a high concurrency on the callback running. > 0004: > > I think > test_injection_points_wake() and test_injection_wait() can be moved as > part of 0002 Nah. I intend to keep the introduction of this API where it becomes relevant. Perhaps this could also use an isolation test? This could always be polished once we agree on 0001 and 0002. (I'll post a v6 a bit later, there are more comments posted here and there.) -- Michael
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
(Compiled two separate review emails into a single one) On Tue, Jan 02, 2024 at 03:36:12PM +0530, Ashutosh Bapat wrote: > This discussion has not been addressed in v6. I think the interface > needs to be documented in the order below > INJECTION_POINT - this declares an injection point - i.e. a place in > code where an external code can be injected (and run). > InjectionPointAttach() - this is used to associate ("attach") external > code to an injection point. > InjectionPointDetach() - this is used to disassociate ("detach") > external code from an injection point. > > [arguments about doc organization] > > Even if an INJECTION_POINT is not "declared" attach would succeed but > doesn't do anything. I think this needs to be made clear in the > documentation. Better if we could somehow make Attach() fail if the > specified injection point is not "declared" using INJECTION_POINT. Of > course we don't want to bloat the hash table with all "declared" > injection points even if they aren't being attached to and hence not > used. Okay, I can see your point. I have reorganized the docs in the following order: - INJECTION_POINT - Attach - Detach > I think, exposing the current injection point strings as > #defines and encouraging users to use these macros instead of string > literals will be a good start. Nah, I disagree with this one actually. It is easy to grep for the macro INJECTION_POINT to be able to achieve the same research job, and this would make the code more inconsistent when callbacks are run within extensions which don't care about a #define in a backend header. > With the current implementation it's possible to "declare" injection > point with same name at multiple places. It's useful but is it > intended? Yes. I would recommend not doing that, but I don't see why there would be a point in restricting that, either. > /* Field sizes */ > #define INJ_NAME_MAXLEN 64 > #define INJ_LIB_MAXLEN 128 > #define INJ_FUNC_MAXLEN 128 > I think these limits should be either documented or specified in the > error messages for users to fix their code in case of > errors/unexpected behaviour. Adding them to the error messages when attaching is a good idea. Done. > This is not an array entry anymore. I think we should rename > InjectionPointEntry as SharedInjectionPointEntry and InjectionPointArrayEntry > as LocalInjectionPointEntry. Yep, fixed. > +/* utilities to handle the local array cache */ > +static void > +injection_point_cache_add(const char *name, > + InjectionPointCallback callback) > +{ > ... snip ... > + > + entry = (InjectionPointCacheEntry *) > + hash_search(InjectionPointCache, name, HASH_ENTER, &found); > + > + if (!found) > > The function is called only when the injection point is not found in the local > cache. So this condition will always be true. An Assert will help to make it > clear and also prevent an unintended callback replacement. Right, as coded that seems pointless to make the found conditional. I think that I coded it this way when doing some earlier work with this code, and finished with a simpler thing. > +#ifdef USE_INJECTION_POINTS > +static bool > +file_exists(const char *name) > > There's similar function in jit.c and dfmgr.c. Can we not reuse that code? This has been mentioned in a different comment. Refactored as of 0001, but there is something here related to EACCES for the JIT path. Seems weird to me that we would not fail if the JIT library cannot be accessed when stat() fails. > + /* Save the entry */ > + memcpy(entry_by_name->name, name, sizeof(entry_by_name->name)); > + entry_by_name->name[INJ_NAME_MAXLEN - 1] = '\0'; > + memcpy(entry_by_name->library, library, sizeof(entry_by_name->library)); > + entry_by_name->library[INJ_LIB_MAXLEN - 1] = '\0'; > + memcpy(entry_by_name->function, function, sizeof(entry_by_name->function)); > + entry_by_name->function[INJ_FUNC_MAXLEN - 1] = '\0'; > > Most of the code is using strncpy instead of memcpy. Why is this code different? strncpy() is less used in the backend code. It comes to a matter of taste, IMO. > + injection_callback = injection_point_cache_get(name); > + if (injection_callback == NULL) > + { > + char path[MAXPGPATH]; > + > + /* Found, so just run the callback registered */ > > The condition indicates that the callback was not found. Comment looks wrong. Fixed. > Consider case > > Backend 2 > InjectionPointAttach("xyz", "abc", "pqr"); > > Backend 1 > INJECTION_POINT("xyz"); > > Backend 2 > InjectionPointDetach("xyz"); > InjectionPointAttach("xyz", "uvw", "lmn"); > > Backend 1 > INJECTION_POINT("xyz"); > > IIUC, the last INJECTION_POINT would run abc.pqr instead of uvw.lmn. > Am I correct? Yeah, that's an intended design choice to keep the code simpler and faster as there is no need to track the library and function names in the local caches or implement something similar to invalidation messages for this facility because it would impact performance anyway in the call paths. In short, just don't do that, or use two distinct points. On Thu, Jan 04, 2024 at 06:22:35PM +0530, Ashutosh Bapat wrote: > One more comment on 0001 > InjectionPointAttach() doesn't test whether the given function exists > in the given library. Even if InjectionPointAttach() succeeds, > INJECTION_POINT might throw error because the function doesn't exist. > This can be seen as an unwanted behaviour. I think > InjectionPointAttach() should test whether the function exists and > possibly load it as well by adding it to the local cache. This has the disadvantage of filling the local cache but that may not be necessary with an extra load_external_function() in the attach path. I agree to make things safer, but I would do that when attempting to run the callback instead. Perhaps there's an argument for the case of somebody replacing a library on-the-fly. I don't really buy it, but people like doing fancy things sometimes. > 0002 comments > --- /dev/null > +++ b/src/test/modules/test_injection_points/expected/test_injection_points.out > > When built without injection point support, this test fails. We should > add an alternate output file for such a build so that the behaviour > with and without injection point support is tested. Or set up things > such that the test is not run under make check in that directory. I > will prefer the first option. src/test/modules/Makefile has a safeguard for ./configure, and there's one in test_injection_points/meson.build for Meson. The test is not run when the switches are not used, rather than using an alternate output file. There was a different issue when moving the tests to src/test/recovery/, though, where we need to make the execution of the tests conditional on get_option('injection_points'). > + > +SELECT test_injection_points_run('TestInjectionError'); -- error > +ERROR: error triggered for injection point TestInjectionError > +-- Re-load and run again. > > What's getting Re-loaded here? \c will create a new connection and > thus a new backend. Maybe the comment should say "test in a fresh > backend" or something of that sort? The local cache is reloaded. Reworded. > Looks confusing to me, we are testing the one removed as well. Am I > missing something? > [...] > We aren't removing all entries TestInjectionLog2 is still there. Am I > missing something? Reworded all that. > +# after recovery, the server will not start, and log PANIC: could not > locate a valid checkpoint record > > IIUC the comment describes the behaviour with 7863ee4def65 reverted. > But the test after this comment is written for the behaviour with > 7863ee4def65. That's confusing. Is the intent to describe both > behaviours in the comment? This came from the original test case posted on the thread that treated this bug. There's more that bugs me for this script that I would like to polish. Let's focus on 0001 and 0002 for now.. > According to the prologue of ConditionVariableSleep(), that function > should be called in a loop checking for the desired condition. All the > callers that I examined follow that pattern. I think we need to follow > that pattern here as well. > > Below comment from ConditionVariableTimedSleep() makes me think that > the caller of ConditionVariableSleep() can be woken up even if the > condition variable was not signaled. That's why the while() loop > around ConditionVariableSleep(). That's the thing here, we don't have an extra condition to check after. The variable sleep is what triggers the stop. :) Perhaps this could be made smarter or with something else, I'm OK to revisit that with the polishing for 0003 I'm planning. We could use a separate shared state, for example, but that does not improve the test readability, either. Attached is a v7 series. What do you think? 0004 and 0005 for the extra tests still need more discussion and much more polishing, IMO. -- Michael
Attachment
- signature.asc
- v7-0001-Refactor-code-to-check-file-file-existence.patch
- v7-0002-Add-backend-facility-for-injection-points.patch
- v7-0003-Add-test-module-injection_points.patch
- v7-0004-Add-regression-test-to-show-snapbuild-consistency.patch
- v7-0005-Add-basic-test-for-promotion-and-restart-race-con.patch
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Ashutosh Bapat
Date:
On Tue, Jan 9, 2024 at 10:09 AM Michael Paquier <michael@paquier.xyz> wrote: > > Okay, I can see your point. I have reorganized the docs in the > following order: > - INJECTION_POINT > - Attach > - Detach Thanks. This looks better. Needs further wordsmithy. But that can wait till the code has been reviewed. > > > I think, exposing the current injection point strings as > > #defines and encouraging users to use these macros instead of string > > literals will be a good start. > > Nah, I disagree with this one actually. It is easy to grep for the > macro INJECTION_POINT to be able to achieve the same research job, and > this would make the code more inconsistent when callbacks are run > within extensions which don't care about a #define in a backend > header. The macros should be in extension facing header, ofc. But I take back this suggestion since defining these macros is extra work (every time a new injection point is declared) and your suggestion to grep practically works. We can improve things as needed. > > > With the current implementation it's possible to "declare" injection > > point with same name at multiple places. It's useful but is it > > intended? > > Yes. I would recommend not doing that, but I don't see why there > would be a point in restricting that, either. Since an unintentional misspelling might trigger an unintended injection point. But we will see how much of that happens in practice. > > +#ifdef USE_INJECTION_POINTS > > +static bool > > +file_exists(const char *name) > > > > There's similar function in jit.c and dfmgr.c. Can we not reuse that code? > > This has been mentioned in a different comment. Refactored as of > 0001, but there is something here related to EACCES for the JIT path. > Seems weird to me that we would not fail if the JIT library cannot be > accessed when stat() fails. I agree with this change to jit. Without having search permissions on every directory in the path, the function can not determine if the file exists or not. So throwing an error is better than just returning false which means that the file does not exist. > > > + /* Save the entry */ > > + memcpy(entry_by_name->name, name, sizeof(entry_by_name->name)); > > + entry_by_name->name[INJ_NAME_MAXLEN - 1] = '\0'; > > + memcpy(entry_by_name->library, library, sizeof(entry_by_name->library)); > > + entry_by_name->library[INJ_LIB_MAXLEN - 1] = '\0'; > > + memcpy(entry_by_name->function, function, sizeof(entry_by_name->function)); > > + entry_by_name->function[INJ_FUNC_MAXLEN - 1] = '\0'; > > > > Most of the code is using strncpy instead of memcpy. Why is this code different? > > strncpy() is less used in the backend code. It comes to a matter of > taste, IMO. To me using memcpy implies that the contents of the memory being copied can be non-character. For a buffer containing a character string I would prefer strncpy. But I wouldn't argue furher.. > > Yeah, that's an intended design choice to keep the code simpler and > faster as there is no need to track the library and function names in > the local caches or implement something similar to invalidation > messages for this facility because it would impact performance anyway > in the call paths. In short, just don't do that, or use two distinct > points. In practice the InjectionPointDetach() and InjectionPointAttach() calls may not be close by and the user may not be able to figure out why the injection points are behaviour weird. It may impact performance but unexpected behaviour should be avoided, IMO. If nothing else this should be documented. > > On Thu, Jan 04, 2024 at 06:22:35PM +0530, Ashutosh Bapat wrote: > > One more comment on 0001 > > InjectionPointAttach() doesn't test whether the given function exists > > in the given library. Even if InjectionPointAttach() succeeds, > > INJECTION_POINT might throw error because the function doesn't exist. > > This can be seen as an unwanted behaviour. I think > > InjectionPointAttach() should test whether the function exists and > > possibly load it as well by adding it to the local cache. > > This has the disadvantage of filling the local cache but that may not > be necessary with an extra load_external_function() in the attach > path. I agree to make things safer, but I would do that when > attempting to run the callback instead. Perhaps there's an argument > for the case of somebody replacing a library on-the-fly. I don't > really buy it, but people like doing fancy things sometimes. I am ok with not populating the cache but checking with just load_external_function(). This is again another ease of use scenario where a silly mistake by user is caught earlier making user's life easier. That at least should be the goal of the first cut. > > > 0002 comments > > --- /dev/null > > +++ b/src/test/modules/test_injection_points/expected/test_injection_points.out > > > > When built without injection point support, this test fails. We should > > add an alternate output file for such a build so that the behaviour > > with and without injection point support is tested. Or set up things > > such that the test is not run under make check in that directory. I > > will prefer the first option. > > src/test/modules/Makefile has a safeguard for ./configure, and there's > one in test_injection_points/meson.build for Meson. The test is not > run when the switches are not used, rather than using an alternate > output file. With v6 I could run the test when built with enable_injection_point false. I just ran make check in that folder. Didn't test meson build. > There was a different issue when moving the tests to > src/test/recovery/, though, where we need to make the execution of the > tests conditional on get_option('injection_points'). > > > + > > +SELECT test_injection_points_run('TestInjectionError'); -- error > > +ERROR: error triggered for injection point TestInjectionError > > +-- Re-load and run again. > > > > What's getting Re-loaded here? \c will create a new connection and > > thus a new backend. Maybe the comment should say "test in a fresh > > backend" or something of that sort? > > The local cache is reloaded. Reworded. We are starting a new backend not "re"loading a cache in an existing backend per say. > > That's the thing here, we don't have an extra condition to check > after. The variable sleep is what triggers the stop. :) > Perhaps this could be made smarter or with something else, I'm OK to > revisit that with the polishing for 0003 I'm planning. We could use a > separate shared state, for example, but that does not improve the test > readability, either. Yeah, I think we have to use another shared state. If the waiting backend moves ahead without test_injection_point_wake() being called, that could lead to unexpected and very weird behaviour. It looks like ConditionVariable just remembers the processes that need to be woken up during broadcast or signal. But by itself it doesn't guarantee desired condition when woken up. > > Attached is a v7 series. What do you think? 0004 and 0005 for the > extra tests still need more discussion and much more polishing, IMO. Generally I think the 0001 and 0002 are in good shape. However, I would like them to be more easy to use - like catching simple user errors that can be easily caught. That saves a lot of frustration because of unexpected behaviour. I will review 0001 and 0002 from v7 in detail again, but it might take a few days. -- Best Wishes, Ashutosh Bapat
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
On Wed, Jan 10, 2024 at 03:21:03PM +0530, Ashutosh Bapat wrote: > On Tue, Jan 9, 2024 at 10:09 AM Michael Paquier <michael@paquier.xyz> wrote: >>> +#ifdef USE_INJECTION_POINTS >>> +static bool >>> +file_exists(const char *name) >>> >>> There's similar function in jit.c and dfmgr.c. Can we not reuse that code? >> >> This has been mentioned in a different comment. Refactored as of >> 0001, but there is something here related to EACCES for the JIT path. >> Seems weird to me that we would not fail if the JIT library cannot be >> accessed when stat() fails. > > I agree with this change to jit. Without having search permissions on > every directory in the path, the function can not determine if the > file exists or not. So throwing an error is better than just returning > false which means that > the file does not exist. I was looking at the original set of threads related to JIT, and this has been mentioned nowhere. I think that I'm going to give it a shot and see how the buildfarm reacts. If that finishes with red, we could always revert this part of the patch in jit.c still keep the refactored routine. >> Yeah, that's an intended design choice to keep the code simpler and >> faster as there is no need to track the library and function names in >> the local caches or implement something similar to invalidation >> messages for this facility because it would impact performance anyway >> in the call paths. In short, just don't do that, or use two distinct >> points. > > In practice the InjectionPointDetach() and InjectionPointAttach() > calls may not be close by and the user may not be able to figure out > why the injection points are behaviour weird. It may impact > performance but unexpected behaviour should be avoided, IMO. > > If nothing else this should be documented. In all the infrastructures I've looked at, folks did not really care about having an invalidation for the callbacks loaded. Still I'm OK to add something in the documentation about that, say among the lines of an extra sentence like: "The callbacks loaded by a process are cached within each process. There is no invalidation facility for the callbacks attached to injection points, hence updating a callback for an injection point requires a restart of the process to release its cache and the previous callbacks attached to it." > I am ok with not populating the cache but checking with just > load_external_function(). This is again another ease of use scenario > where a silly mistake by user is caught earlier making user's life > easier. That at least should be the goal of the first cut. I don't really aim for complicated here, just useful. > With v6 I could run the test when built with enable_injection_point > false. I just ran make check in that folder. Didn't test meson build. The CI has been failing because 041_invalid_checkpoint_after_promote was loading Time::HiRes::nanosleep and Windows does not support it. > Yeah, I think we have to use another shared state. If the waiting > backend moves ahead without test_injection_point_wake() being called, > that could lead to unexpected and very weird behaviour. > > It looks like ConditionVariable just remembers the processes that need > to be woken up during broadcast or signal. But by itself it doesn't > guarantee desired condition when woken up. Yeah, I'm not sure yet about how to do that in the most elegant way. But this part could always happen after 0001~0003. >> Attached is a v7 series. What do you think? 0004 and 0005 for the >> extra tests still need more discussion and much more polishing, IMO. > > Generally I think the 0001 and 0002 are in good shape. However, I > would like them to be more easy to use - like catching simple user > errors that can be easily caught. That saves a lot of frustration > because of unexpected behaviour. I will review 0001 and 0002 from v7 > in detail again, but it might take a few days. Thanks again for the reviews. I still intend to focus solely on 0001, 0002 and 0003 for the current commit fest to have something able to enforce error states in backends, at least. There have been quite a few bugs that could have coverage thanks for that. -- Michael
Attachment
- v8-0001-Refactor-code-to-check-file-file-existence.patch
- v8-0002-Add-backend-facility-for-injection-points.patch
- v8-0003-Add-test-module-injection_points.patch
- v8-0004-Add-regression-test-to-show-snapbuild-consistency.patch
- v8-0005-Add-basic-test-for-promotion-and-restart-race-con.patch
- signature.asc
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Ashutosh Bapat
Date:
On Thu, Jan 11, 2024 at 9:42 AM Michael Paquier <michael@paquier.xyz> wrote: > > >> Yeah, that's an intended design choice to keep the code simpler and > >> faster as there is no need to track the library and function names in > >> the local caches or implement something similar to invalidation > >> messages for this facility because it would impact performance anyway > >> in the call paths. In short, just don't do that, or use two distinct > >> points. > > > > In practice the InjectionPointDetach() and InjectionPointAttach() > > calls may not be close by and the user may not be able to figure out > > why the injection points are behaviour weird. It may impact > > performance but unexpected behaviour should be avoided, IMO. > > > > If nothing else this should be documented. > > In all the infrastructures I've looked at, folks did not really care > about having an invalidation for the callbacks loaded. Still I'm OK > to add something in the documentation about that, say among the lines > of an extra sentence like: > "The callbacks loaded by a process are cached within each process. > There is no invalidation facility for the callbacks attached to > injection points, hence updating a callback for an injection point > requires a restart of the process to release its cache and the > previous callbacks attached to it." It doesn't behave exactly like that either. If the INJECTION_POINT is run after detach (but before Attach), the local cache will be updated. A subsequent attach and INJECTION_POINT call would fetch the new callback. > > > I am ok with not populating the cache but checking with just > > load_external_function(). This is again another ease of use scenario > > where a silly mistake by user is caught earlier making user's life > > easier. That at least should be the goal of the first cut. > > I don't really aim for complicated here, just useful. It isn't complicated. Such simple error check improve user's confidence on the feature and better be part of the 1st cut. > > > With v6 I could run the test when built with enable_injection_point > > false. I just ran make check in that folder. Didn't test meson build. > > The CI has been failing because 041_invalid_checkpoint_after_promote > was loading Time::HiRes::nanosleep and Windows does not support it. Some miscommunication here. The SQL test under injection_point module can be run in a build without injection_point and it fails. I think it's better to have an alternate output for the same or prohibit the test running itself. -- Best Wishes, Ashutosh Bapat
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
On Thu, Jan 11, 2024 at 02:47:27PM +0530, Ashutosh Bapat wrote: > On Thu, Jan 11, 2024 at 9:42 AM Michael Paquier <michael@paquier.xyz> wrote: >> I don't really aim for complicated here, just useful. > > It isn't complicated. Such simple error check improve user's > confidence on the feature and better be part of the 1st cut. I'm really not sure about that, because it does not impact the scope of the facility even with all the use cases I've seen where injection points could be used. It could always be added later if there's a strong push for it. For testing, I'm biased about attempting to load callbacks in the process attaching them. >>> With v6 I could run the test when built with enable_injection_point >>> false. I just ran make check in that folder. Didn't test meson build. >> >> The CI has been failing because 041_invalid_checkpoint_after_promote >> was loading Time::HiRes::nanosleep and Windows does not support it. > > Some miscommunication here. The SQL test under injection_point module > can be run in a build without injection_point and it fails. I think > it's better to have an alternate output for the same or prohibit the > test running itself. The same problem exists if you try to run the SSL tests in src/test/ssl/ without support build for them. Protections at the upper levels are good enough for the CI and the buildfarm, while making the overall maintenance cheaper, so I'm happy with just these. It also seems like you've missed this message, where this has been mentioned (spoiler: first version of the patch used an alternate output): https://www.postgresql.org/message-id/ZUnuzPimkZCOVEcz@paquier.xyz -- Michael
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Ashutosh Bapat
Date:
On Fri, Jan 12, 2024 at 5:05 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Jan 11, 2024 at 02:47:27PM +0530, Ashutosh Bapat wrote: > > On Thu, Jan 11, 2024 at 9:42 AM Michael Paquier <michael@paquier.xyz> wrote: > >> I don't really aim for complicated here, just useful. > > > > It isn't complicated. Such simple error check improve user's > > confidence on the feature and better be part of the 1st cut. > > I'm really not sure about that, because it does not impact the scope > of the facility even with all the use cases I've seen where injection > points could be used. It could always be added later if there's a > strong push for it. For testing, I'm biased about attempting to load > callbacks in the process attaching them. > I am not able to understand the objection to adding another handful of lines of code. The core code is quite minimal and better to be robust. We may seek someone else's opinion to break the tie. > >>> With v6 I could run the test when built with enable_injection_point > >>> false. I just ran make check in that folder. Didn't test meson build. > >> > >> The CI has been failing because 041_invalid_checkpoint_after_promote > >> was loading Time::HiRes::nanosleep and Windows does not support it. > > > > Some miscommunication here. The SQL test under injection_point module > > can be run in a build without injection_point and it fails. I think > > it's better to have an alternate output for the same or prohibit the > > test running itself. > > The same problem exists if you try to run the SSL tests in > src/test/ssl/ without support build for them. Protections at the > upper levels are good enough for the CI and the buildfarm, while > making the overall maintenance cheaper, so I'm happy with just these. > > It also seems like you've missed this message, where this has been > mentioned (spoiler: first version of the patch used an alternate > output): > https://www.postgresql.org/message-id/ZUnuzPimkZCOVEcz@paquier.xyz Ah! sorry for missing that. If there's a precedent, I am ok. If the confusion arises we can fix it later. -- Best Wishes, Ashutosh Bapat
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
On Fri, Jan 12, 2024 at 08:35:42AM +0900, Michael Paquier wrote: > It also seems like you've missed this message, where this has been > mentioned (spoiler: first version of the patch used an alternate > output): > https://www.postgresql.org/message-id/ZUnuzPimkZCOVEcz@paquier.xyz The refactoring of 0001 has now been applied as of e72a37528dda, and the buildfarm looks stable (at least for now). Here is a rebased patch set of the rest. -- Michael
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Ashutosh Bapat
Date:
Hi Michael, There is some overlap between Dtrace functionality and this functionality. But I see differences too. E.g. injection points offer deeper integration whereas dtrace provides more information to the probe like callstack and argument values etc. We need to assess whether these functionality can co-exist and whether we need both of them. If the answer to both of these questions is yes, it will be good to add documentation explaining the differences and similarities and also some guidance on when to use what. On Fri, Jan 12, 2024 at 9:56 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Jan 12, 2024 at 08:35:42AM +0900, Michael Paquier wrote: > > It also seems like you've missed this message, where this has been > > mentioned (spoiler: first version of the patch used an alternate > > output): > > https://www.postgresql.org/message-id/ZUnuzPimkZCOVEcz@paquier.xyz > > The refactoring of 0001 has now been applied as of e72a37528dda, and > the buildfarm looks stable (at least for now). > > Here is a rebased patch set of the rest. + +#ifdef USE_INJECTION_POINTS +static bool +file_exists(const char *name) +{ + struct stat st; + + Assert(name != NULL); + if (stat(name, &st) == 0) + return !S_ISDIR(st.st_mode); + else if (!(errno == ENOENT || errno == ENOTDIR)) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not access file \"%s\": %m", name))); + return false; +} Shouldn't this be removed now? The code should use one from fd.c Other code changes look good. I think the documentation and comments need some changes esp. considering the users point of view. Have attached two patches (0003, and 0004) with those changes to be applied on top of 0001 and 0002 respectively. Please review them. Might need some wordsmithy and language correction. Attaching the whole patch set to keep cibot happy. This is review of 0001 and 0002 only. Once we take care of these comments I think those patches will be ready for commit except one point of contention mentioned in [1]. We haven't heard any third opinion yet. [1] https://www.postgresql.org/message-id/CAExHW5sc_ar7=W9XCcC9TwYxZF71Ghc6poQ_+u4HXTXmNB7KAw@mail.gmail.com -- Best Wishes, Ashutosh Bapat
Attachment
- 0002-Comment-and-documentation-suggestions.patch
- 0004-Comment-and-documentation-suggestions.patch
- 0003-Add-test-module-injection_points.patch
- 0001-Add-backend-facility-for-injection-points.patch
- 0005-Add-regression-test-to-show-snapbuild-consistency.patch
- 0006-Add-basic-test-for-promotion-and-restart-race-condit.patch
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
On Thu, Jan 18, 2024 at 10:56:09AM +0530, Ashutosh Bapat wrote: > There is some overlap between Dtrace functionality and this > functionality. But I see differences too. E.g. injection points offer > deeper integration whereas dtrace provides more information to the > probe like callstack and argument values etc. We need to assess > whether these functionality can co-exist and whether we need both of > them. If the answer to both of these questions is yes, it will be good > to add documentation explaining the differences and similarities and > also some guidance on when to use what. Perhaps, I'm not sure how much we want to do regarding that yet, injection points have no external dependencies and will work across all environments as long as dlsym() (or an equivalent) is able to work, while being cheaper because they don't spawn an external process to trace the call. > + > +#ifdef USE_INJECTION_POINTS > +static bool > +file_exists(const char *name) > +{ > + struct stat st; > + > + Assert(name != NULL); > + if (stat(name, &st) == 0) > + return !S_ISDIR(st.st_mode); > + else if (!(errno == ENOENT || errno == ENOTDIR)) > + ereport(ERROR, > + (errcode_for_file_access(), > + errmsg("could not access file \"%s\": %m", name))); > + return false; > +} > > Shouldn't this be removed now? The code should use one from fd.c Yep, removed that. > Other code changes look good. I think the documentation and comments > need some changes esp. considering the users point of view. Have > attached two patches (0003, and 0004) with those changes to be applied > on top of 0001 and 0002 respectively. Please review them. Might need > some wordsmithy and language correction. Attaching the whole patch set > to keep cibot happy. The CF bot was perhaps happy but your 0004 has forgotten to update the expected output. There were also a few typos, some markups and edits required for 0002 but as a whole what you have suggested was an improvement. Thanks. > This is review of 0001 and 0002 only. Once we take care of these > comments I think those patches will be ready for commit except one > point of contention mentioned in [1]. We haven't heard any third > opinion yet. 0001~0004 have been now applied, and I'm marking the CF entry as committed. I'll create a new thread once I have put more energy into the regression test improvements. Now the fun can really begin. I am also going to switch my buildfarm animals to use the new ./configure switch. -- Michael
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Ashutosh Bapat
Date:
On Mon, Jan 22, 2024 at 10:08 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Jan 18, 2024 at 10:56:09AM +0530, Ashutosh Bapat wrote: > > There is some overlap between Dtrace functionality and this > > functionality. But I see differences too. E.g. injection points offer > > deeper integration whereas dtrace provides more information to the > > probe like callstack and argument values etc. We need to assess > > whether these functionality can co-exist and whether we need both of > > them. If the answer to both of these questions is yes, it will be good > > to add documentation explaining the differences and similarities and > > also some guidance on when to use what. > > Perhaps, I'm not sure how much we want to do regarding that yet, > injection points have no external dependencies and will work across > all environments as long as dlsym() (or an equivalent) is able to > work, while being cheaper because they don't spawn an external process > to trace the call. Yes. Both have their advantages and disadvantages. So I believe both will stay but that means the guidance is necessary. We may want to see reception and add the guidance later in the release cycle. > > > Other code changes look good. I think the documentation and comments > > need some changes esp. considering the users point of view. Have > > attached two patches (0003, and 0004) with those changes to be applied > > on top of 0001 and 0002 respectively. Please review them. Might need > > some wordsmithy and language correction. Attaching the whole patch set > > to keep cibot happy. > > The CF bot was perhaps happy but your 0004 has forgotten to update the > expected output. There were also a few typos, some markups and edits > required for 0002 but as a whole what you have suggested was an > improvement. Thanks. Sorry for that. Glad that you found those suggestions acceptable. -- Best Wishes, Ashutosh Bapat
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Heikki Linnakangas
Date:
On 22/01/2024 06:38, Michael Paquier wrote: > 0001~0004 have been now applied, and I'm marking the CF entry as > committed. Woo-hoo! I wrote the attached patch to enable injection points in the Cirrus CI config, to run the injection tests I wrote for a GIN bug today [1]. But that led to a crash in the asan-enabled build [2]. I didn't investigate it yet. [1] https://www.postgresql.org/message-id/d8f0b068-0e6e-4b2c-8932-62507eb7e1c6%40iki.fi [2] https://cirrus-ci.com/task/5242888636858368 -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Heikki Linnakangas
Date:
On 22/01/2024 18:08, Heikki Linnakangas wrote: > I wrote the attached patch to enable injection points in the Cirrus CI > config, to run the injection tests I wrote for a GIN bug today [1]. But > that led to a crash in the asan-enabled build [2]. I didn't investigate > it yet. Pushed a fix for the crash. What do you think of enabling this in the Cirrus CI config? -- Heikki Linnakangas Neon (https://neon.tech)
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
On Mon, Jan 22, 2024 at 09:02:48PM +0200, Heikki Linnakangas wrote: > On 22/01/2024 18:08, Heikki Linnakangas wrote: >> I wrote the attached patch to enable injection points in the Cirrus CI >> config, to run the injection tests I wrote for a GIN bug today [1]. But >> that led to a crash in the asan-enabled build [2]. I didn't investigate >> it yet. > > Pushed a fix for the crash. That's embarrassing. Thanks for the quick fix. > What do you think of enabling this in the Cirrus CI config? That was on my TODO list of things to tackle and propose, but perhaps there is no point in waiting more so I've applied your patch. -- Michael
Attachment
Re: Adding facility for injection points (or probe points?) for more advanced tests
From
Michael Paquier
Date:
On Tue, Jan 23, 2024 at 12:08:17PM +0900, Michael Paquier wrote: > That was on my TODO list of things to tackle and propose, but perhaps > there is no point in waiting more so I've applied your patch. Slightly off topic and while I don't forget about it.. Please find attached a copy of the patch posted around [1] to be able to define injection points with input arguments, so as it is possible to execute callbacks with values coming from the code path where the point is attached. For example, a backend could use this kind of macro to have a callback attached to this point use some runtime value: INJECTION_POINT_1ARG("InjectionPointBoo", &some_value); [1]: https://www.postgresql.org/message-id/Za8TLyD9HIjzFlhJ%40paquier.xyz -- Michael