Thread: Injection points: some tools to wait and wake
Hi all, (Ashutosh in CC as he was involved in the discussion last time.) I have proposed on the original thread related to injection points to have more stuff to be able to wait at an arbtrary point and wake at will the process waiting so as it is possible to control the order of actions taken in a test: https://www.postgresql.org/message-id/ZTiV8tn_MIb_H2rE%40paquier.xyz I didn't do that in the other thread out of time, but here is a patch set to complete what I wanted, using a condition variable to wait and wake processes: - State is in shared memory, using a DSM tracked by the registry and an integer counter. - Callback to wait on a condition variable. - SQL function to update the shared state and broadcast the update to the condition variable. - Use a custom wait event to track the wait in pg_stat_activity. 0001 requires no backend changes, only more stuff into the test module injection_points so that could be backpatched assuming that the backend is able to support injection points. This could be expanded into using more variables and/or states, but I don't really see a point in introducing more without a reason to do so, and I have no need for more at the moment. 0002 is a polished version of the TAP test that makes use of this facility, providing coverage for the bug fixed by 7863ee4def65 (reverting this commit causes the test to fail), where a restart point runs across a promotion request. The trick is to stop the checkpointer in the middle of a restart point and issue a promotion in-between. Thoughts and comments are welcome. -- Michael
Attachment
On Mon, Feb 19, 2024 at 03:01:40PM +0900, Michael Paquier wrote: > 0002 is a polished version of the TAP test that makes use of this > facility, providing coverage for the bug fixed by 7863ee4def65 > (reverting this commit causes the test to fail), where a restart point > runs across a promotion request. The trick is to stop the > checkpointer in the middle of a restart point and issue a promotion > in-between. The CF bot has been screaming at this one on Windows because the process started with IPC::Run::start was not properly finished, so attached is an updated version to bring that back to green. -- Michael
Attachment
> On 19 Feb 2024, at 09:01, Michael Paquier <michael@paquier.xyz> wrote: > > Thoughts and comments are welcome. Hi Michael, thanks for your work on injection points! I want to test a bunch of stuff using this facility. I have a wishlist of functionality that I'd like to see in injection points. I hope you will find some of these ideas usefulto improve the feature. 1. injection_points_wake() will wake all of waiters. But it's not suitable for complex tests. I think there must be a wayto wake only specific waiter by injection point name. 2. Alexander Korotkov's stopevents could be used in isolation tests. This kind of tests is perfect for describing complexrace conditions. (as a side note, I'd be happy if we could have primary\standby in isolation tests too) 3. Can we have some Perl function for this? +# Wait until the checkpointer is in the middle of the restart point +# processing, relying on the custom wait event generated in the +# wait callback used in the injection point previously attached. +ok( $node_standby->poll_query_until( + 'postgres', + qq[SELECT count(*) FROM pg_stat_activity + WHERE backend_type = 'checkpointer' AND wait_event = 'injection_wait' ;], + '1'), + 'checkpointer is waiting in restart point' +) or die "Timed out while waiting for checkpointer to run restart point"; Perhaps something like $node->do_a_query_and_wait_for_injection_point_observed(sql,injection_point_name); 4. Maybe I missed it, but I'd like to see a guideline on how to name injection points. 5. In many cases we need to have injection point under critical section. I propose to have a "prepared injection point".See [0] for example in v2-0003-Test-multixact-CV-sleep.patch + INJECTION_POINT_PREPARE("GetNewMultiXactId-done"); + START_CRIT_SECTION(); + INJECTION_POINT_RUN_PREPARED(); 6. Currently our codebase have files injection_point.c and injection_points.c. It's very difficult to remember which is where... 7. This is extremely distant, but some DBMSs allow to enable injection points by placing files on the filesystem. That wouldallow to test something during recovery when no SQL interface is present. Let's test all the neat stuff! Thank you! Best regards, Andrey Borodin. [0] https://www.postgresql.org/message-id/0925F9A9-4D53-4B27-A87E-3D83A757B0E0@yandex-team.ru
Hi, On Mon, Feb 19, 2024 at 04:51:45PM +0900, Michael Paquier wrote: > On Mon, Feb 19, 2024 at 03:01:40PM +0900, Michael Paquier wrote: > > 0002 is a polished version of the TAP test that makes use of this > > facility, providing coverage for the bug fixed by 7863ee4def65 > > (reverting this commit causes the test to fail), where a restart point > > runs across a promotion request. The trick is to stop the > > checkpointer in the middle of a restart point and issue a promotion > > in-between. > > The CF bot has been screaming at this one on Windows because the > process started with IPC::Run::start was not properly finished, so > attached is an updated version to bring that back to green. Thanks for the patch, that's a very cool feature! Random comments: 1 === +CREATE FUNCTION injection_points_wake() what about injection_points_wakeup() instead? 2 === +/* Shared state information for injection points. */ +typedef struct InjectionPointSharedState +{ + /* protects accesses to wait_counts */ + slock_t lock; + + /* Counter advancing when injection_points_wake() is called */ + int wait_counts; If slock_t protects "only" one counter, then what about using pg_atomic_uint64 or pg_atomic_uint32 instead? And btw do we need wait_counts at all? (see comment number 4) 3 === + * SQL function for waking a condition variable waking up instead? 4 === + for (;;) + { + int new_wait_counts; + + SpinLockAcquire(&inj_state->lock); + new_wait_counts = inj_state->wait_counts; + SpinLockRelease(&inj_state->lock); + + if (old_wait_counts != new_wait_counts) + break; + ConditionVariableSleep(&inj_state->wait_point, injection_wait_event); + } I'm wondering if this loop and wait_counts are needed, why not doing something like (and completely get rid of wait_counts)? " ConditionVariablePrepareToSleep(&inj_state->wait_point); ConditionVariableSleep(&inj_state->wait_point, injection_wait_event); ConditionVariableCancelSleep(); " It's true that the comment above ConditionVariableSleep() mentions that: " * This should be called in a predicate loop that tests for a specific exit * condition and otherwise sleeps " but is it needed in our particular case here? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Mon, Feb 19, 2024 at 11:54:20AM +0300, Andrey M. Borodin wrote: > 1. injection_points_wake() will wake all of waiters. But it's not > suitable for complex tests. I think there must be a way to wake only > specific waiter by injection point name. I don't disagree with that, but I don't have a strong argument for implementing that until there is an explicit need for it in core. It is also possible to plug in your own module, outside of core, if you are looking for something more specific. The backend APIs allow that. > 2. Alexander Korotkov's stopevents could be used in isolation > tests. This kind of tests is perfect for describing complex race > conditions. (as a side note, I'd be happy if we could have > primary\standby in isolation tests too) This requires plugging is more into src/test/isolation/, with multiple connection strings. This has been suggested in the past. > 3. Can we have some Perl function for this? > +# Wait until the checkpointer is in the middle of the restart point > +# processing, relying on the custom wait event generated in the > +# wait callback used in the injection point previously attached. > +ok( $node_standby->poll_query_until( > + 'postgres', > + qq[SELECT count(*) FROM pg_stat_activity > + WHERE backend_type = 'checkpointer' AND wait_event = 'injection_wait' ;], > + '1'), > + 'checkpointer is waiting in restart point' > +) or die "Timed out while waiting for checkpointer to run restart point"; > > Perhaps something like > $node->do_a_query_and_wait_for_injection_point_observed(sql,injection_point_name); I don't see why not. But I'm not sure how much I need to plug in into the main modules yet. > 4. Maybe I missed it, but I'd like to see a guideline on how to name > injection points. I don't think we have any of that, or even that we need one. In short, I'm OK to be more consistent with the choice of ginbtree.c and give priority to it and make it more official in the docs. > 5. In many cases we need to have injection point under critical > section. I propose to have a "prepared injection point". See [0] for > example in v2-0003-Test-multixact-CV-sleep.patch > + INJECTION_POINT_PREPARE("GetNewMultiXactId-done"); > + > START_CRIT_SECTION(); > > + INJECTION_POINT_RUN_PREPARED(); I don't see how that's different from a wait/wake logic? The only thing you've changed is to stop a wait when a point is detached and you want to make the stop conditional. Plugging in a condition variable is more flexible than a hardcoded sleep in terms of wait, while being more responsive. > 7. This is extremely distant, but some DBMSs allow to enable > injection points by placing files on the filesystem. That would > allow to test something during recovery when no SQL interface is > present. Yeah, I could see why you'd want to do something like that. If a use case pops up, that can surely be discussed. -- Michael
Attachment
On Mon, Feb 19, 2024 at 02:28:04PM +0000, Bertrand Drouvot wrote: > +CREATE FUNCTION injection_points_wake() > > what about injection_points_wakeup() instead? Sure. > +/* Shared state information for injection points. */ > +typedef struct InjectionPointSharedState > +{ > + /* protects accesses to wait_counts */ > + slock_t lock; > + > + /* Counter advancing when injection_points_wake() is called */ > + int wait_counts; > > If slock_t protects "only" one counter, then what about using pg_atomic_uint64 > or pg_atomic_uint32 instead? And btw do we need wait_counts at all? (see comment > number 4) We could, indeed, even if we use more than one counter. Still, I would be tempted to keep it in case more data is added to this structure as that would make for less stuff to do while being normally non-critical. This sentence may sound pedantic, though.. > + * SQL function for waking a condition variable > > waking up instead? Okay. > I'm wondering if this loop and wait_counts are needed, why not doing something > like (and completely get rid of wait_counts)? > > " > ConditionVariablePrepareToSleep(&inj_state->wait_point); > ConditionVariableSleep(&inj_state->wait_point, injection_wait_event); > ConditionVariableCancelSleep(); > " > > It's true that the comment above ConditionVariableSleep() mentions that: Perhaps not, but it encourages good practices around the use of condition variables, and we need to track all that in shared memory anyway. Ashutosh has argued in favor of the approach taken by the patch in the original thread when I've sent a version doing exactly what you are saying now to not track a state in shmem. -- Michael
Attachment
> On 20 Feb 2024, at 02:21, Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Feb 19, 2024 at 11:54:20AM +0300, Andrey M. Borodin wrote: >> 1. injection_points_wake() will wake all of waiters. But it's not >> suitable for complex tests. I think there must be a way to wake only >> specific waiter by injection point name. > > I don't disagree with that, but I don't have a strong argument for > implementing that until there is an explicit need for it in core. It > is also possible to plug in your own module, outside of core, if you > are looking for something more specific. The backend APIs allow that. In [0] I want to create a test for edge case of reading recent multixact. External module does not allow to have a test withinrepository. In that test I need to sync backends in 3 steps (backend 1 starts to wait in injection point, backend 2 starts to sleep inother point, backend 1 is released and observe 3rd injection point). "wake them all" implementation allows only 2-stepsynchronization. I will try to simplify test to 2-step, but it would be much easier to implement if injection points could be awaken independently. >> 2. Alexander Korotkov's stopevents could be used in isolation >> tests. This kind of tests is perfect for describing complex race >> conditions. (as a side note, I'd be happy if we could have >> primary\standby in isolation tests too) > > This requires plugging is more into src/test/isolation/, with multiple > connection strings. This has been suggested in the past. I think standby isolation tests are just a sugar-on-top feature here. Wrt injection points, I'd like to see a function to wait until some injection point is observed. With this function at hand developer can implement race condition tests as an isolation test. >> 5. In many cases we need to have injection point under critical >> section. I propose to have a "prepared injection point". See [0] for >> example in v2-0003-Test-multixact-CV-sleep.patch >> + INJECTION_POINT_PREPARE("GetNewMultiXactId-done"); >> + >> START_CRIT_SECTION(); >> >> + INJECTION_POINT_RUN_PREPARED(); > > I don't see how that's different from a wait/wake logic? The only > thing you've changed is to stop a wait when a point is detached and > you want to make the stop conditional. Plugging in a condition > variable is more flexible than a hardcoded sleep in terms of wait, > while being more responsive. No, "prepared injection point" is not about wait\wake logic. It's about having injection point in critical section. Normal injection point will pstrdup(name) and fail. In [0] I need a test that waits after multixact generation before WAL-loggingit. It's only possible in a critical section. Thanks! Best regards, Andrey Borodin. [0] https://www.postgresql.org/message-id/0925F9A9-4D53-4B27-A87E-3D83A757B0E0@yandex-team.ru
Hi, On Tue, Feb 20, 2024 at 08:28:28AM +0900, Michael Paquier wrote: > On Mon, Feb 19, 2024 at 02:28:04PM +0000, Bertrand Drouvot wrote: > > If slock_t protects "only" one counter, then what about using pg_atomic_uint64 > > or pg_atomic_uint32 instead? And btw do we need wait_counts at all? (see comment > > number 4) > > We could, indeed, even if we use more than one counter. Still, I > would be tempted to keep it in case more data is added to this > structure as that would make for less stuff to do while being normally > non-critical. This sentence may sound pedantic, though.. Okay, makes sense to keep this as it is as a "template" in case more stuff is added. + /* Counter advancing when injection_points_wake() is called */ + int wait_counts; In that case what about using an unsigned instead? (Nit) > > I'm wondering if this loop and wait_counts are needed, why not doing something > > like (and completely get rid of wait_counts)? > > > > " > > ConditionVariablePrepareToSleep(&inj_state->wait_point); > > ConditionVariableSleep(&inj_state->wait_point, injection_wait_event); > > ConditionVariableCancelSleep(); > > " > > > > It's true that the comment above ConditionVariableSleep() mentions that: > > Perhaps not, but it encourages good practices around the use of > condition variables, and we need to track all that in shared memory > anyway. Ashutosh has argued in favor of the approach taken by the > patch in the original thread when I've sent a version doing exactly > what you are saying now to not track a state in shmem. Oh okay I missed this previous discussion, let's keep it as it is then. New comments: 1 === +void +injection_wait(const char *name) Looks like name is not used in the function. I guess the reason it is a parameter is because that's the way the callback function is being called in InjectionPointRun()? 2 === +PG_FUNCTION_INFO_V1(injection_points_wake); +Datum +injection_points_wake(PG_FUNCTION_ARGS) +{ I think that This function will wake up all the "wait" injection points. Would that make sense to implement some filtering based on the name? That could be useful for tests that would need multiple wait injection points and that want to wake them up "sequentially". We could think about it if there is such a need in the future though. 3 === +# Register a injection point on the standby so as the follow-up typo: "an injection"? 4 === +for (my $i = 0; $i < 3000; $i++) +{ is 3000 due to?: +checkpoint_timeout = 30s If so, would that make sense to reduce the value for both? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Tue, Feb 20, 2024 at 03:55:08PM +0000, Bertrand Drouvot wrote: > Okay, makes sense to keep this as it is as a "template" in case more stuff is > added. > > + /* Counter advancing when injection_points_wake() is called */ > + int wait_counts; > > In that case what about using an unsigned instead? (Nit) uint32. Sure. > 1 === > > +void > +injection_wait(const char *name) > > Looks like name is not used in the function. I guess the reason it is a parameter > is because that's the way the callback function is being called in > InjectionPointRun()? Right. The callback has to define this argument. > 2 === > > +PG_FUNCTION_INFO_V1(injection_points_wake); > +Datum > +injection_points_wake(PG_FUNCTION_ARGS) > +{ > > I think that This function will wake up all the "wait" injection points. > Would that make sense to implement some filtering based on the name? That could > be useful for tests that would need multiple wait injection points and that want > to wake them up "sequentially". > > We could think about it if there is such a need in the future though. Well, both you and Andrey are asking for it now, so let's do it. The implementation is simple: - Store in InjectionPointSharedState an array of wait_counts and an array of names. There is only one condition variable. - When a point wants to wait, it takes the spinlock and looks within the array of names until it finds a free slot, adds its name into the array to reserve a wait counter at the same position, releases the spinlock. Then it loops on the condition variable for an update of the counter it has reserved. It is possible to make something more efficient, but at a small size it would not really matter. - The wakeup takes a point name in argument, acquires the spinlock, and checks if it can find the point into the array, pinpoints the location of the counter to update and updates it. Then it broadcasts the change. - The wait loop checks its counter, leaves its loop, cancels the sleep, takes the spinlock to unregister from the array, and leaves. I would just hardcode the number of points that can wait, say 5 of them tracked in shmem? Does that look like what you are looking at? > +# Register a injection point on the standby so as the follow-up > > typo: "an injection"? Oops. Fixed locally. > +for (my $i = 0; $i < 3000; $i++) > +{ > > is 3000 due to?: > > +checkpoint_timeout = 30s > > If so, would that make sense to reduce the value for both? That had better be based on PostgreSQL::Test::Utils::timeout_default, actually, as in something like: foreach my $i (0 .. 10 * $PostgreSQL::Test::Utils::timeout_default) -- Michael
Attachment
On Tue, Feb 20, 2024 at 05:32:38PM +0300, Andrey M. Borodin wrote: > I will try to simplify test to 2-step, but it would be much easier > to implement if injection points could be awaken independently. I don't mind implementing something that wakes only a given point name, that's just more state data to track in shmem for the module. > No, "prepared injection point" is not about wait\wake logic. It's > about having injection point in critical section. > Normal injection point will pstrdup(name) and fail. In [0] I need a > test that waits after multixact generation before WAL-logging > it. It's only possible in a critical section. It took me some time to understand what you mean here. You are referring to the allocations done in load_external_function() -> expand_dynamic_library_name() -> substitute_libpath_macro(), which is something that has to happen the first time a callback is loaded into the local cache of a process. So what you want to achieve is to preload the callback in its cache in a first step without running it, then be able to run it, so your Prepare[d] layer is just a way to split InjectionPointRun() into two. Fancy use case. You could disable the critical section around the INJECTION_POINT() as one solution, though having something to pre-warm the local cache for a point name and avoid the allocations done in the external load would be a second way to achieve that. "Prepare" is not the best term I would have used, perhaps just have one INJECTION_POINT_PRELOAD() macro to warm up the cache outside the critical section with a wrapper routine? You could then leave InjectionPointRun() as it is now. Having a check on CritSectionCount in the injection point internals to disable temporarily a critical section would not feel right as this is used as a safety check anywhere else. -- Michael
Attachment
On Wed, Feb 21, 2024 at 07:08:03AM +0900, Michael Paquier wrote: > Well, both you and Andrey are asking for it now, so let's do it. The > implementation is simple: > - Store in InjectionPointSharedState an array of wait_counts and an > array of names. There is only one condition variable. > - When a point wants to wait, it takes the spinlock and looks within > the array of names until it finds a free slot, adds its name into the > array to reserve a wait counter at the same position, releases the > spinlock. Then it loops on the condition variable for an update of > the counter it has reserved. It is possible to make something more > efficient, but at a small size it would not really matter. > - The wakeup takes a point name in argument, acquires the spinlock, > and checks if it can find the point into the array, pinpoints the > location of the counter to update and updates it. Then it broadcasts > the change. > - The wait loop checks its counter, leaves its loop, cancels the > sleep, takes the spinlock to unregister from the array, and leaves. > > I would just hardcode the number of points that can wait, say 5 of > them tracked in shmem? Does that look like what you are looking at? I was looking at that, and it proves to work OK, so you can do stuff like waits and wakeups for multiple processes in a controlled manner. The attached patch authorizes up to 32 waiters. I have switched things so as the information reported in pg_stat_activity is the name of the injection point itself. Comments and ideas are welcome. -- Michael
Attachment
Hi, On Wed, Feb 21, 2024 at 07:08:03AM +0900, Michael Paquier wrote: > On Tue, Feb 20, 2024 at 03:55:08PM +0000, Bertrand Drouvot wrote: > > +PG_FUNCTION_INFO_V1(injection_points_wake); > > +Datum > > +injection_points_wake(PG_FUNCTION_ARGS) > > +{ > > > > I think that This function will wake up all the "wait" injection points. > > Would that make sense to implement some filtering based on the name? That could > > be useful for tests that would need multiple wait injection points and that want > > to wake them up "sequentially". > > > > We could think about it if there is such a need in the future though. > > Well, both you and Andrey are asking for it now, so let's do it. Thanks! > The implementation is simple: > - Store in InjectionPointSharedState an array of wait_counts and an > array of names. There is only one condition variable. > - When a point wants to wait, it takes the spinlock and looks within > the array of names until it finds a free slot, adds its name into the > array to reserve a wait counter at the same position, releases the > spinlock. Then it loops on the condition variable for an update of > the counter it has reserved. It is possible to make something more > efficient, but at a small size it would not really matter. > - The wakeup takes a point name in argument, acquires the spinlock, > and checks if it can find the point into the array, pinpoints the > location of the counter to update and updates it. Then it broadcasts > the change. > - The wait loop checks its counter, leaves its loop, cancels the > sleep, takes the spinlock to unregister from the array, and leaves. > I think that makes sense and now the "counter" makes more sense to me (thanks to it we don't need multiple CV). > I would just hardcode the number of points that can wait, say 5 of > them tracked in shmem? Does that look like what you are looking at? I think so yes and more than 5 points would look like a complicated test IMHO. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Wed, Feb 21, 2024 at 04:46:00PM +0900, Michael Paquier wrote: > On Wed, Feb 21, 2024 at 07:08:03AM +0900, Michael Paquier wrote: > > Well, both you and Andrey are asking for it now, so let's do it. The > > implementation is simple: > > - Store in InjectionPointSharedState an array of wait_counts and an > > array of names. There is only one condition variable. > > - When a point wants to wait, it takes the spinlock and looks within > > the array of names until it finds a free slot, adds its name into the > > array to reserve a wait counter at the same position, releases the > > spinlock. Then it loops on the condition variable for an update of > > the counter it has reserved. It is possible to make something more > > efficient, but at a small size it would not really matter. > > - The wakeup takes a point name in argument, acquires the spinlock, > > and checks if it can find the point into the array, pinpoints the > > location of the counter to update and updates it. Then it broadcasts > > the change. > > - The wait loop checks its counter, leaves its loop, cancels the > > sleep, takes the spinlock to unregister from the array, and leaves. > > > > I would just hardcode the number of points that can wait, say 5 of > > them tracked in shmem? Does that look like what you are looking at? > > I was looking at that, and it proves to work OK, so you can do stuff > like waits and wakeups for multiple processes in a controlled manner. > The attached patch authorizes up to 32 waiters. I have switched > things so as the information reported in pg_stat_activity is the name > of the injection point itself. Thanks! I think the approach is fine and the hardcoded value is "large" enough (it would be surprising, at least to me, to write a test that would need more than 32 waiters). A few comments: 1 === +-- Wakes a condition variable I think "up" is missing at several places in the patch where "wake" is used. I could be wrong as non native english speaker though. 2 === + /* Counters advancing when injection_points_wakeup() is called */ + int wait_counts[INJ_MAX_WAIT]; uint32? (here and other places where counter is manipulated). 3 === + /* Remove us from the waiting list */ "Remove this injection wait name from the waiting list" instead? 4 === + * SQL function for waking a condition variable. s/a condition variable/an injection wait point/ ? 5 === +PG_FUNCTION_INFO_V1(injection_points_wakeup); +Datum +injection_points_wakeup(PG_FUNCTION_ARGS) Empty line missing before "Datum"? 6 === Also maybe some comments are missing above injection_point_init_state(), injection_init_shmem() but it's more a Nit. 7 === While at it, should we add a second injection wait point in t/041_invalid_checkpoint_after_promote.pl to check that they are wake up individually? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Feb 21, 2024 at 11:50:21AM +0000, Bertrand Drouvot wrote: > I think the approach is fine and the hardcoded value is "large" enough (it would > be surprising, at least to me, to write a test that would need more than 32 > waiters). This could use less. I've never used more than 3 of them in a single test, and that was already very complex to follow. > A few comments: > > 1 === > I think "up" is missing at several places in the patch where "wake" is used. > I could be wrong as non native english speaker though. Patched up three places to be more consisten. > 2 === > > + /* Counters advancing when injection_points_wakeup() is called */ > + int wait_counts[INJ_MAX_WAIT]; > > uint32? (here and other places where counter is manipulated). Okay, why not. > "Remove this injection wait name from the waiting list" instead? > s/a condition variable/an injection wait point/ ? Okay. > 5 === > > +PG_FUNCTION_INFO_V1(injection_points_wakeup); > +Datum > +injection_points_wakeup(PG_FUNCTION_ARGS) > > Empty line missing before "Datum"? I've used the same style as anywhere else. > Also maybe some comments are missing above injection_point_init_state(), > injection_init_shmem() but it's more a Nit. Sure. > While at it, should we add a second injection wait point in > t/041_invalid_checkpoint_after_promote.pl to check that they are wake up > individually? I'm not sure that's a good idea for the sake of this test, TBH, as that would provide coverage outside the original scope of the restartpoint/promote check. I have also looked at if it would be possible to get an isolation test out of that, but the arbitrary wait does not make that possible with the existing isolation APIs, see GetSafeSnapshotBlockingPids() used in pg_isolation_test_session_is_blocked(). isolation/README seems to be a bit off here, actually, mentioning pg_locks.. We could use some tricks with transactions or locks internally, but I'm not really tempted to make the wait callback more complicated for the sake of more coverage as I'd rather keep it generic for anybody who wants to control the order of events across processes. Attaching a v3 for reference with everything that has been mentioned up to now. -- Michael
Attachment
Hi, On Thu, Feb 22, 2024 at 12:02:01PM +0900, Michael Paquier wrote: > On Wed, Feb 21, 2024 at 11:50:21AM +0000, Bertrand Drouvot wrote: > > A few comments: > > > > 1 === > > I think "up" is missing at several places in the patch where "wake" is used. > > I could be wrong as non native english speaker though. > > Patched up three places to be more consisten. Thanks! > > 5 === > > > > +PG_FUNCTION_INFO_V1(injection_points_wakeup); > > +Datum > > +injection_points_wakeup(PG_FUNCTION_ARGS) > > > > Empty line missing before "Datum"? > > I've used the same style as anywhere else. humm, looking at src/test/regress/regress.c for example, I can see an empty line between PG_FUNCTION_INFO_V1 and Datum for all the "PG_FUNCTION_INFO_V1" ones. > > While at it, should we add a second injection wait point in > > t/041_invalid_checkpoint_after_promote.pl to check that they are wake up > > individually? > > I'm not sure that's a good idea for the sake of this test, TBH, as > that would provide coverage outside the original scope of the > restartpoint/promote check. Yeah, that was my doubt too. > I have also looked at if it would be possible to get an isolation test > out of that, but the arbitrary wait does not make that possible with > the existing isolation APIs, see GetSafeSnapshotBlockingPids() used in > pg_isolation_test_session_is_blocked(). isolation/README seems to be > a bit off here, actually, mentioning pg_locks.. We could use some > tricks with transactions or locks internally, but I'm not really > tempted to make the wait callback more complicated for the sake of > more coverage as I'd rather keep it generic for anybody who wants to > control the order of events across processes. Makes sense to me, let's keep it as it is. > > Attaching a v3 for reference with everything that has been mentioned > up to now. Thanks! Sorry, I missed those ones previously: 1 === +/* Maximum number of wait usable in injection points at once */ s/Maximum number of wait/Maximum number of waits/ ? 2 === +# Check the logs that the restart point has started on standby. This is +# optional, but let's be sure. +my $log = slurp_file($node_standby->logfile, $logstart); +my $checkpoint_start = 0; +if ($log =~ m/restartpoint starting: immediate wait/) +{ + $checkpoint_start = 1; +} +is($checkpoint_start, 1, 'restartpoint has started'); what about? ok( $node_standby->log_contains( "restartpoint starting: immediate wait", $logstart), "restartpoint has started"); Except for the above, v3 looks good to me. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Thu, Feb 22, 2024 at 08:00:24AM +0000, Bertrand Drouvot wrote: > +/* Maximum number of wait usable in injection points at once */ > > s/Maximum number of wait/Maximum number of waits/ ? Thanks. I've edited a few more places while scanning the whole. > > 2 === > > +# Check the logs that the restart point has started on standby. This is > +# optional, but let's be sure. > +my $log = slurp_file($node_standby->logfile, $logstart); > +my $checkpoint_start = 0; > +if ($log =~ m/restartpoint starting: immediate wait/) > +{ > + $checkpoint_start = 1; > +} > +is($checkpoint_start, 1, 'restartpoint has started'); > > what about? > > ok( $node_standby->log_contains( "restartpoint starting: immediate wait", $logstart), > "restartpoint has started"); And I'm behind the commit that introduced it (392ea0c78fdb). It is possible to remove the dependency to slurp_file() entirely by switching the second location checking the logs for the checkpoint completion. > Except for the above, v3 looks good to me. Thanks. I'm looking at applying that at the beginning of next week if there are no more comments, to get something by the feature freeze. We could be more flexible for all that as that's related to testing, but let's be in the clear. I've also renamed the test to 041_checkpoint_at_promote.pl, as now that the original is fixed, the checkpoint is not invalid. That's cleaner this way. -- Michael
Attachment
Hi, On Mon, Feb 26, 2024 at 12:57:09PM +0900, Michael Paquier wrote: > On Thu, Feb 22, 2024 at 08:00:24AM +0000, Bertrand Drouvot wrote: > > +/* Maximum number of wait usable in injection points at once */ > > > > s/Maximum number of wait/Maximum number of waits/ ? > > Thanks. I've edited a few more places while scanning the whole. Thanks! > > > > 2 === > > > > +# Check the logs that the restart point has started on standby. This is > > +# optional, but let's be sure. > > +my $log = slurp_file($node_standby->logfile, $logstart); > > +my $checkpoint_start = 0; > > +if ($log =~ m/restartpoint starting: immediate wait/) > > +{ > > + $checkpoint_start = 1; > > +} > > +is($checkpoint_start, 1, 'restartpoint has started'); > > > > what about? > > > > ok( $node_standby->log_contains( "restartpoint starting: immediate wait", $logstart), > > "restartpoint has started"); > > And I'm behind the commit that introduced it (392ea0c78fdb). ;-) > It is > possible to remove the dependency to slurp_file() entirely by > switching the second location checking the logs for the checkpoint > completion. Yeah right. > > Except for the above, v3 looks good to me. > > Thanks. I'm looking at applying that at the beginning of next week if > there are no more comments, to get something by the feature freeze. > We could be more flexible for all that as that's related to testing, > but let's be in the clear. Sounds reasonable to me. > I've also renamed the test to 041_checkpoint_at_promote.pl, as now > that the original is fixed, the checkpoint is not invalid. That's > cleaner this way. Agree. I'll try to submit the POC patch in [1] before beginning of next week now that we're "just waiting" if there is more comments on this current thread. [1]: https://www.postgresql.org/message-id/ZdTNafYSxwnKNIhj%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
> On 26 Feb 2024, at 08:57, Michael Paquier <michael@paquier.xyz> wrote: > > <v4-0001-injection_points-Add-routines-to-wait-and-wake-pr.patch> Would it be possible to have a helper function to check this: +ok( $node_standby->poll_query_until( + 'postgres', + qq[SELECT count(*) FROM pg_stat_activity + WHERE backend_type = 'checkpointer' AND wait_event = 'CreateRestartPoint' ;], + '1'), + 'checkpointer is waiting in restart point' +) or die "Timed out while waiting for checkpointer to run restart point”; So that we could do something like ok(node_standby->await_injection_point(“CreateRestartPoint”,”checkpointer")); IMO, this could make many tests cleaner. Or, perhaps, it’s a functionality for a future development? Thanks! Best regards, Andrey Borodin.
On Mon, Feb 26, 2024 at 02:10:49PM +0500, Andrey M. Borodin wrote: > So that we could do something like > > ok(node_standby->await_injection_point(“CreateRestartPoint”,”checkpointer")); It would be more flexible with a full string to describe the test rather than a process name in the second argument. > IMO, this could make many tests cleaner. > Or, perhaps, it’s a functionality for a future development? This could just happen as separate refactoring, I guess. But I'd wait to see if more tests requiring scans to pg_stat_activity pop up. For example, the test just posted here does not rely on that: https://www.postgresql.org/message-id/ZdyZya4YrNapWKqz@ip-10-97-1-34.eu-west-3.compute.internal -- Michael
Attachment
> On 27 Feb 2024, at 04:29, Michael Paquier <michael@paquier.xyz> wrote: > > For > example, the test just posted here does not rely on that: > https://www.postgresql.org/message-id/ZdyZya4YrNapWKqz@ip-10-97-1-34.eu-west-3.compute.internal Instead, that test is scanning logs + # Note: $node_primary->wait_for_replay_catchup($node_standby) would be + # hanging here due to the injection point, so check the log instead.+ + my $terminated = 0; + for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++) + { + if ($node_standby->log_contains( + 'terminating process .* to release replication slot \"injection_activeslot\"', $logstart)) + { + $terminated = 1; + last; + } + usleep(100_000); + } But, AFAICS, the purpose is the same: wait until event happened. Best regards, Andrey Borodin.
Hi, On Tue, Feb 27, 2024 at 11:00:10AM +0500, Andrey M. Borodin wrote: > > > > On 27 Feb 2024, at 04:29, Michael Paquier <michael@paquier.xyz> wrote: > > > > For > > example, the test just posted here does not rely on that: > > https://www.postgresql.org/message-id/ZdyZya4YrNapWKqz@ip-10-97-1-34.eu-west-3.compute.internal > > Instead, that test is scanning logs > > + # Note: $node_primary->wait_for_replay_catchup($node_standby) would be > + # hanging here due to the injection point, so check the log instead.+ > + my $terminated = 0; > + for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++) > + { > + if ($node_standby->log_contains( > + 'terminating process .* to release replication slot \"injection_activeslot\"', $logstart)) > + { > + $terminated = 1; > + last; > + } > + usleep(100_000); > + } > > But, AFAICS, the purpose is the same: wait until event happened. I think it's easier to understand the tests (I mean what the purpose of the injection points are) if we don't use an helper function. While the helper function would make the test easier to read / cleaner, I think it may make them more difficult to understand as 'await_injection_point' would probably be too generic. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, > On 27 Feb 2024, at 16:08, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > On Tue, Feb 27, 2024 at 11:00:10AM +0500, Andrey M. Borodin wrote: >> >> But, AFAICS, the purpose is the same: wait until event happened. > > I think it's easier to understand the tests (I mean what the purpose of the > injection points are) if we don't use an helper function. While the helper > function would make the test easier to read / cleaner, I think it may make them > more difficult to understand as 'await_injection_point' would probably be too > generic. For the record: I’m fine if there is no such function. There will be at least one implementation of this function in every single test with waiting injection points. That’s the case where we might want something generic. What the specific there might be? The test can check that - conditions are logged - injection point reached in specific backend (e.g. checkpointer) etc I doubt that this specific checks worth cleanness of the test. And sacrificing test readability in favour of teaching readerwhat injection points are sounds strange. Best regards, Andrey Borodin.
Hi, On Tue, Feb 27, 2024 at 04:49:03PM +0500, Andrey M. Borodin wrote: > Hi, > > > On 27 Feb 2024, at 16:08, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > > > On Tue, Feb 27, 2024 at 11:00:10AM +0500, Andrey M. Borodin wrote: > >> > >> But, AFAICS, the purpose is the same: wait until event happened. > > > > I think it's easier to understand the tests (I mean what the purpose of the > > injection points are) if we don't use an helper function. While the helper > > function would make the test easier to read / cleaner, I think it may make them > > more difficult to understand as 'await_injection_point' would probably be too > > generic. > > For the record: I’m fine if there is no such function. > There will be at least one implementation of this function in every single test with waiting injection points. > That’s the case where we might want something generic. > What the specific there might be? The test can check that > - conditions are logged > - injection point reached in specific backend (e.g. checkpointer) > etc > > I doubt that this specific checks worth cleanness of the test. And sacrificing test readability in favour of teaching readerwhat injection points are sounds strange. I'm giving a second thought and it does not have to be exclusive, for example in this specific test we could: 1) check that the injection point is reached with the helper (querying pg_stat_activity looks good to me) And 2) check in the log So even if two checks might sound "too much" they both make sense to give 1) better readability and 2) better understanding of the test. So, I'm ok with the new helper too. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Tue, Feb 27, 2024 at 01:39:59PM +0000, Bertrand Drouvot wrote: > So, I'm ok with the new helper too. If both of you feel strongly about that, I'm OK with introducing something like that. Now, a routine should be only about waiting on pg_stat_activity to report something, as for the logs we already have log_contains(). And a test may want one check, but unlikely both. Even if both are wanted, it's just a matter of using log_contains() and the new routine that does pg_stat_activity lookups. -- Michael
Attachment
> On 28 Feb 2024, at 09:26, Michael Paquier <michael@paquier.xyz> wrote: > > Now, a routine should be only about waiting on > pg_stat_activity to report something BTW, if we had an SQL function such as injection_point_await(name), we could use it in our isolation tests as well as ourTAP tests :) However, this might well be a future improvement, if we had a generic “await" Perl function - we wouldn’t need to re-usenew SQL code in so many places. Thanks! Best regards, Andrey Borodin.
Hi, On Wed, Feb 28, 2024 at 01:26:46PM +0900, Michael Paquier wrote: > On Tue, Feb 27, 2024 at 01:39:59PM +0000, Bertrand Drouvot wrote: > > So, I'm ok with the new helper too. > > If both of you feel strongly about that, I'm OK with introducing > something like that. Thanks! > Now, a routine should be only about waiting on > pg_stat_activity to report something, as for the logs we already have > log_contains(). Agree. > And a test may want one check, but unlikely both. > Even if both are wanted, it's just a matter of using log_contains() > and the new routine that does pg_stat_activity lookups. Yeah, that's also my point of view. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Feb 28, 2024 at 10:27:52AM +0500, Andrey M. Borodin wrote: > BTW, if we had an SQL function such as injection_point_await(name), > we could use it in our isolation tests as well as our TAP tests :) I am not quite sure to follow here. The isolation test facility now relies on the in-core function pg_isolation_test_session_is_blocked() to check the state of backends getting blocked, and that's outside of the scope of the module injection_points. -- Michael
Attachment
On Wed, Feb 28, 2024 at 06:20:41AM +0000, Bertrand Drouvot wrote: > On Wed, Feb 28, 2024 at 01:26:46PM +0900, Michael Paquier wrote: >> On Tue, Feb 27, 2024 at 01:39:59PM +0000, Bertrand Drouvot wrote: >> > So, I'm ok with the new helper too. >> >> If both of you feel strongly about that, I'm OK with introducing >> something like that. > > Thanks! (Cough. As in, "feel free to send a patch" on top of what's already proposed if any of you feel that's better at the end.) ;) -- Michael
Attachment
Hi, On Thu, Feb 29, 2024 at 02:56:23PM +0900, Michael Paquier wrote: > On Wed, Feb 28, 2024 at 06:20:41AM +0000, Bertrand Drouvot wrote: > > On Wed, Feb 28, 2024 at 01:26:46PM +0900, Michael Paquier wrote: > >> On Tue, Feb 27, 2024 at 01:39:59PM +0000, Bertrand Drouvot wrote: > >> > So, I'm ok with the new helper too. > >> > >> If both of you feel strongly about that, I'm OK with introducing > >> something like that. > > > > Thanks! > > (Cough. As in, "feel free to send a patch" on top of what's already > proposed if any of you feel that's better at the end.) > > ;) okay ;-) Something like the attached? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
> On 29 Feb 2024, at 13:35, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Something like the attached? There's extraneous print "done\n"; Also can we have optional backend type :) Best regards, Andrey Borodin.
Hi, On Thu, Feb 29, 2024 at 02:34:35PM +0500, Andrey M. Borodin wrote: > > > > On 29 Feb 2024, at 13:35, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > > > Something like the attached? > > There's extraneous print "done\n"; doh! bad copy/paste ;-) > Also can we have optional backend type :) done in v2 attached. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
> On 29 Feb 2024, at 15:20, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > done in v2 attached. Works fine for me. Thanks! Best regards, Andrey Borodin.
On Thu, Feb 29, 2024 at 06:19:58PM +0500, Andrey M. Borodin wrote: > Works fine for me. Thanks! + "timed out waiting for the backend type to wait for the injection point name"; The log should provide some context about the caller failing, meaning that the backend type and the injection point name should be mentioned in these logs to help in debugging issues. -- Michael
Attachment
Hi, On Fri, Mar 01, 2024 at 11:02:01AM +0900, Michael Paquier wrote: > On Thu, Feb 29, 2024 at 06:19:58PM +0500, Andrey M. Borodin wrote: > > Works fine for me. Thanks! > > + "timed out waiting for the backend type to wait for the injection point name"; > > The log should provide some context about the caller failing, meaning > that the backend type and the injection point name should be mentioned > in these logs to help in debugging issues. Yeah, done in v3 attached. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Mon, Feb 26, 2024 at 08:24:04AM +0000, Bertrand Drouvot wrote: > I'll try to submit the POC patch in [1] before beginning of next week now that > we're "just waiting" if there is more comments on this current thread. I'll look at what you have there in more details. > [1]: https://www.postgresql.org/message-id/ZdTNafYSxwnKNIhj%40ip-10-97-1-34.eu-west-3.compute.internal The main routines have been now applied as 37b369dc67bc, with the test in 6782709df81f. I have used the same naming policy as 6a1ea02c491d for consistency, naming the injection point create-restart-point. -- Michael
Attachment
On Fri, Mar 01, 2024 at 06:52:45AM +0000, Bertrand Drouvot wrote: > + if (defined($backend_type)) > + { > + $backend_type = qq('$backend_type'); > + $die_message = "the backend type $backend_type"; > + } > + else > + { > + $backend_type = 'backend_type'; > + $die_message = 'one backend'; > + > + } > + > + $self->poll_query_until( > + 'postgres', qq[ > + SELECT count(*) > 0 FROM pg_stat_activity > + WHERE backend_type = $backend_type AND wait_event = '$injection_name' > + ]) > + or die > + qq(timed out waiting for $die_message to wait for the injection point '$injection_name'); I was looking at that, and found v3 to be an overkill. First, I think that we should encourage callers to pass down a backend_type. Perhaps I am wrong to assume so, but that's based on my catalog of tests waiting in my stack. A second thing is that this is entirely unrelated to injection points, because a test may want to wait for a given wait_event on a backend_type without using the module injection_points. At the end, I have renamed the routine to wait_for_event(), tweaked a bit its internals, and the result looked fine so I have applied it while updating 041_checkpoint_at_promote.pl to use it. Its internals could always be expanded more depending on the demand for it. -- Michael
Attachment
I noticed this was committed, and took a quick look. It sounds very useful for testing Citus to be able to use injection points too, but I noticed this code is included in src/test/modules. It sounds like that location will make it somewhat hard to install. If that's indeed the case, would it be possible to move it to contrib instead?
> On 4 Mar 2024, at 06:44, Michael Paquier <michael@paquier.xyz> wrote: > so I have applied it Great! Thank you! A really useful stuff for an asynchronous testing! > On 4 Mar 2024, at 09:17, Jelte Fennema-Nio <postgres@jeltef.nl> wrote: > > this code is included in src/test/modules. It sounds like that > location will make it somewhat hard to install. +1. I joined the thread too late to ask why it’s not in core. But, it seems to me that separating logic even further is notnecessary, given build option —with-injection-points off by default. > If that's indeed the > case, would it be possible to move it to contrib instead? There’s no point in shipping this to users, especially with the build without injection points compiled. Best regards, Andrey Borodin.
On Mon, Mar 04, 2024 at 05:17:52AM +0100, Jelte Fennema-Nio wrote: > I noticed this was committed, and took a quick look. It sounds very > useful for testing Citus to be able to use injection points too, but I > noticed this code is included in src/test/modules. It sounds like that > location will make it somewhat hard to install. If that's indeed the > case, would it be possible to move it to contrib instead? One problem with installing that in contrib/ is that it would require more maintenance as a stable and "released" module. The aim of this module is to be more flexible than that, so as it is possible to extend it at will even in the back branches to be able to implement features that could help with tests that we'd want to implement in stable branches. I have mentioned that on a separate thread, but adding more extension maintenance burden while implementing complex tests does not sound like a good idea for me in the long-run. Perhaps we could consider that as an exception in "contrib", or have a separate path for test modules we're OK to install (the calls had better be superuser-only if we do that). Another thing with the backend support of injection points is that you could implement your own extension within citus, able to do what you mimic this in-core module, and get inspiration from it. Duplication is never cool, I agree, though. -- Michael
Attachment
Hi, On Mon, Mar 04, 2024 at 10:44:34AM +0900, Michael Paquier wrote: > On Fri, Mar 01, 2024 at 06:52:45AM +0000, Bertrand Drouvot wrote: > > + if (defined($backend_type)) > > + { > > + $backend_type = qq('$backend_type'); > > + $die_message = "the backend type $backend_type"; > > + } > > + else > > + { > > + $backend_type = 'backend_type'; > > + $die_message = 'one backend'; > > + > > + } > > + > > + $self->poll_query_until( > > + 'postgres', qq[ > > + SELECT count(*) > 0 FROM pg_stat_activity > > + WHERE backend_type = $backend_type AND wait_event = '$injection_name' > > + ]) > > + or die > > + qq(timed out waiting for $die_message to wait for the injection point '$injection_name'); > > I was looking at that, and found v3 to be an overkill. First, I think > that we should encourage callers to pass down a backend_type. Perhaps > I am wrong to assume so, but that's based on my catalog of tests > waiting in my stack. Works for me. > A second thing is that this is entirely unrelated to injection points, > because a test may want to wait for a given wait_event on a > backend_type without using the module injection_points. At the end, I > have renamed the routine to wait_for_event(), Good idea, fully makes sense. > tweaked a bit its > internals, and the result looked fine so I have applied it Thanks! Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Mon, 4 Mar 2024 at 06:27, Michael Paquier <michael@paquier.xyz> wrote: > I have mentioned that on a separate thread, Yeah, I didn't read all emails related to this feature > Perhaps we could consider that as an exception in "contrib", or have a > separate path for test modules we're OK to install (the calls had > better be superuser-only if we do that). Yeah, it makes sense that you'd want to backport fixes/changes to this. As long as you put a disclaimer in the docs that you can do that for this module, I think it would be fine. Our tests fairly regularly break anyway when changing minor versions of postgres in our CI, e.g. due to improvements in the output of isolationtester. So if changes to this module require some changes that's fine by me. Seems much nicer than having to copy-paste the code.
On Mon, Mar 04, 2024 at 10:51:41AM +0100, Jelte Fennema-Nio wrote: > Yeah, it makes sense that you'd want to backport fixes/changes to > this. As long as you put a disclaimer in the docs that you can do that > for this module, I think it would be fine. Our tests fairly regularly > break anyway when changing minor versions of postgres in our CI, e.g. > due to improvements in the output of isolationtester. So if changes to > this module require some changes that's fine by me. Seems much nicer > than having to copy-paste the code. In my experience, anybody who does serious testing with their product integrated with Postgres have two or three types of builds with their own scripts: one with assertions, -DG and other developer-oriented options enabled, and one for production deployments with more optimized options like -O2. Once there are custom scripts to build and package Postgres, do we really need to move that to contrib/ at all? make install would work for a test module as long as the command is run locally in its directory. -- Michael
Attachment
On Mon, 4 Mar 2024 at 23:23, Michael Paquier <michael@paquier.xyz> wrote: > In my experience, anybody who does serious testing with their product > integrated with Postgres have two or three types of builds with their > own scripts: one with assertions, -DG and other developer-oriented > options enabled, and one for production deployments with more > optimized options like -O2. Once there are custom scripts to build > and package Postgres, do we really need to move that to contrib/ at > all? I do think there is quite a bit of a difference from a user perspective to providing a few custom configure flags and having to go to a separate directory and run "make install" there too. Simply because it's yet another step. For dev environments most/all of my team uses pgenv: https://github.com/theory/pgenv There I agree we could add functionality to it to also install test modules when given a certain flag/environment variable, but it would be nice if that wasn't needed. One big downside to not having it in contrib is that we also run tests of Citus against official PGDG postgres packages and those would likely not include these test modules, so we wouldn't be able to run all our tests then. Also I think the injection points extension is quite different from the other modules in src/modules/test. All of these other modules are basically test binaries that need to be separate modules, but they don't provide any useful functionality when installing them. The injection_poinst one actually provides useful functionality itself, that can be used by other people testing things against postgres. > make install would work for a test module as long as the command > is run locally in its directory. What would I need to do for meson builds? I tried the following command but it doesn't seem to actually install the injection points command: ninja -C build install-test-files
On Tue, Mar 05, 2024 at 09:43:03AM +0100, Jelte Fennema-Nio wrote: > I do think there is quite a bit of a difference from a user > perspective to providing a few custom configure flags and having to go > to a separate directory and run "make install" there too. Simply > because it's yet another step. For dev environments most/all of my > team uses pgenv: https://github.com/theory/pgenv There I agree we > could add functionality to it to also install test modules when given > a certain flag/environment variable, but it would be nice if that > wasn't needed. Didn't know this one, TBH. That's interesting. My own scripts emulate the same things with versioning, patching, etc. > One big downside to not having it in contrib is that we also run tests > of Citus against official PGDG postgres packages and those would > likely not include these test modules, so we wouldn't be able to run > all our tests then. No idea about this part, unfortunately. One thing that I'd do is first check if the in-core module is enough to satisfy the requirements Citus would want. I got pinged about Timescale and greenplum recently, and they can reuse the backends APIs, but they've also wanted a few more callbacks than the in-core module so they will need to have their own code for tests with a custom callback library. Perhaps we could move that to contrib/ and document that this is a module for testing, that can be updated without notice even in minor upgrades and that there are no version upgrades done, or something like that. I'm open to that if there's enough demand for it, but I don't know how much we should accomodate with the existing requirements of contrib/ for something that's only developer-oriented. > Also I think the injection points extension is quite different from > the other modules in src/modules/test. All of these other modules are > basically test binaries that need to be separate modules, but they > don't provide any useful functionality when installing them. The > injection_poinst one actually provides useful functionality itself, > that can be used by other people testing things against postgres. I'm not sure about that yet, TBH. Anything that gets added to this module should be used in some way by the in-core tests, or just not be there. That's a line I don't want to cross, which is why it's a test module. FWIW, it would be really annoying to have documentation requirements, actually, because that increases maintenance and I'm not sure it's a good idea to add a module maintenance on top of what could require more facility in the module to implement a test for a bug fix. > What would I need to do for meson builds? I tried the following > command but it doesn't seem to actually install the injection points > command: > ninja -C build install-test-files Weird, that works here. I was digging into the meson tree and noticed that this was the only run_target() related to the test module install data, and injection_points gets installed as well as all the other modules. This should just need -Dinjection_points=true. -- Michael
Attachment
On Wed, 6 Mar 2024 at 07:17, Michael Paquier <michael@paquier.xyz> wrote: > I'm open to that if there's enough demand for it, but I > don't know how much we should accomodate with the existing > requirements of contrib/ for something that's only developer-oriented. There's quite a few developer-oriented GUCs as well, even requiring their own DEVELOPER_OPTIONS section. So I don't think it's too weird to have some developer-oriented extensions too. > I'm not sure about that yet, TBH. Anything that gets added to this > module should be used in some way by the in-core tests, or just not be > there. That's a line I don't want to cross, which is why it's a test > module. That seems fine, if people need more functionality they can indeed create their own test helpers. It mainly seems annoying for people to have to copy-paste the ones from core to their own extension. What I mainly meant is that anything in src/test/modules is not even close to somewhat useful for other people to use. They are really just very specific tests that need to be written in C. Afaict all those modules are not even used by tests outside of their own module. But these functions are helper functions, to be used by other tests. And limiting the users of those helper functions to just be in-core Postgres code seems a bit annoying. I feel like these functions are more akin to the pgregress/isolationtester binaries in their usage, than akin to other modules in src/test/modules. > FWIW, it would be really annoying to have documentation > requirements, actually, because that increases maintenance and I'm not > sure it's a good idea to add a module maintenance on top of what could > require more facility in the module to implement a test for a bug fix. Quite a few contrib modules have very limited documentation. I think it would be fine for this as well. > This should just need -Dinjection_points=true. Ugh... Sorry... I didn't realize that it needed a dedicated configure flag. When providing that flag it indeed installs the expected files. I guess that rules out testing against PGDG packages, because those packages almost certainly wouldn't specify this flag.
On Wed, Mar 06, 2024 at 10:19:41AM +0100, Jelte Fennema-Nio wrote: > What I mainly meant is that anything in src/test/modules is not even > close to somewhat useful for other people to use. They are really just > very specific tests that need to be written in C. Afaict all those > modules are not even used by tests outside of their own module. But > these functions are helper functions, to be used by other tests. And > limiting the users of those helper functions to just be in-core > Postgres code seems a bit annoying. I feel like these functions are > more akin to the pgregress/isolationtester binaries in their usage, > than akin to other modules in src/test/modules. Perhaps. I think that we're still in the discovery phase for this stuff, and more people should get used to it first (this will take some time and everybody is busy with their own stuff for the last commit fest). At least it does not seem good to rush any decisions at this stage. >> FWIW, it would be really annoying to have documentation >> requirements, actually, because that increases maintenance and I'm not >> sure it's a good idea to add a module maintenance on top of what could >> require more facility in the module to implement a test for a bug fix. > > Quite a few contrib modules have very limited documentation. I think > it would be fine for this as well. I'd argue that we should try to improve the existing documentation rather that use that as an argument to add more modules with limited documentation ;) > Ugh... Sorry... I didn't realize that it needed a dedicated configure > flag. When providing that flag it indeed installs the expected files. > I guess that rules out testing against PGDG packages, because those > packages almost certainly wouldn't specify this flag. The CI enables the switch, and I've updated all my buildfarm members to use it. In terms of coverage, that's already quite good. -- Michael