Re: Restart pg_usleep when interrupted - Mailing list pgsql-hackers
From | Bertrand Drouvot |
---|---|
Subject | Re: Restart pg_usleep when interrupted |
Date | |
Msg-id | ZsYCdsRnidLOiCM5@ip-10-97-1-34.eu-west-3.compute.internal Whole thread Raw |
In response to | Re: Restart pg_usleep when interrupted (Sami Imseih <samimseih@gmail.com>) |
Responses |
Re: Restart pg_usleep when interrupted
|
List | pgsql-hackers |
Hi, On Tue, Aug 20, 2024 at 02:25:19PM -0500, Sami Imseih wrote: > > As it looks like we have a consensus that reducing the number of interrupts also > > makes sense, I just provided a rebase version of the 1 Hz version (see [0], that > > also makes clear in the doc that the new field might show slightly old values). > > That makes sense. However, I suspect the "1 Hz" work code will no longer > be needed if CF entry 5118 [1] mentioned by Thomas [2] a few days back > goes through. Maybe this extra work can be removed if [1] goes through. > What do you think? Yeah agree that the "1 Hz" work wouldn't be needed anymore if the CF entry 5118 goes through *and* if vacuum delays keep doing pg_usleep() (as the leader won't receive SIGUSR1 from the parallel workers while executing nanosleep() anymore). It could still receive SIGHUP or such but that's outside of the PqMsg_Progress case though. > With regards to CF 5118 and what it means to the current discussion, below > are my thoughts. > > I tested with both CF 5118 [1] and the cost-delay tracking patch. With that in place, > pg_usleep is able to sleep the full requested, as mentioned by Thomas [3]. This is > because certain interrupts like Parallel Message and others are not signaled > by SIGUSR1 any longer, but with latches. > Yeah. > From this discussion, there is desire for a sleep function that: > 1/ Sleeps for the full duration of the requested time > 2/ Continues to handle important interrupts during the sleep > > While something like CF 5118 will take care of point #1, I'm not sure, even with CF entry 5118, nanosleep() could be interrupted. But I agree that the leader won't be interrupted by PqMsg_Progress anymore. > it will not deal > with #2. Also, the v11 [4] patch does not do #2 either. So I think > in the sleep loop, we need a C_F_I call. The same type of loop can > also be used to call WaitForSingleObject. > > If CF 5118 gets committed, will still need similar loop that calls C_F_I, > but the function will need to call WaitLatchUs [5]. > > Thoughts? If CF 5118 gets committed, then I think we would need to move to using WaitLatchUs() to react to urgent request. We'd also need to find a way to ensure that we'd wait for the full duration of the requested time depending of the reason why we waked up (well, only if we agree that 1/ is needed and I'm not sure we got a consensus). So I think that: 1. we should still implement the "1 Hz" stuff as 1.1/ it could be useful if CF 5118 gets committed and we move to WaitLatchUs() and 2.2/ it won't break anything if CF gets committed and we don't move to WaitLatchUs(). For 1.1 it would still prevent the leader to be waked up too frequently by the parallel workers. 2. still discuss the "need" and get a consensus regarding a sleep that could ensure the wait duration (in some cases and depending of the reason why the process is waked up). Thoughts? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: