Re: Restart pg_usleep when interrupted - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Restart pg_usleep when interrupted
Date
Msg-id ZsYCdsRnidLOiCM5@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Restart pg_usleep when interrupted  (Sami Imseih <samimseih@gmail.com>)
Responses Re: Restart pg_usleep when interrupted
List pgsql-hackers
Hi,

On Tue, Aug 20, 2024 at 02:25:19PM -0500, Sami Imseih wrote:
> > As it looks like we have a consensus that reducing the number of interrupts also 
> > makes sense, I just provided a rebase version of the 1 Hz version (see [0], that
> > also makes clear in the doc that the new field might show slightly old values).
> 
> That makes sense. However, I suspect the "1 Hz" work code will no longer
> be needed if CF entry 5118 [1] mentioned by Thomas [2] a few days back 
> goes through. Maybe this extra work can be removed if [1] goes through.
> What do you think?

Yeah agree that the "1 Hz" work wouldn't be needed anymore if the CF entry 5118
goes through *and* if vacuum delays keep doing pg_usleep() (as the leader won't
receive SIGUSR1 from the parallel workers while executing nanosleep() anymore).
It could still receive SIGHUP or such but that's outside of the PqMsg_Progress
case though.

> With regards to CF 5118 and what it means to the current discussion, below
> are my thoughts.
> 
> I tested with both CF 5118 [1] and the cost-delay tracking patch. With that in place, 
> pg_usleep is able to sleep the full requested, as mentioned by Thomas [3]. This is
> because certain interrupts like Parallel Message and others are not signaled
> by SIGUSR1 any longer, but with latches.
>

Yeah.
 
> From this discussion, there is desire for a sleep function that:
> 1/ Sleeps for the full duration of the requested time
> 2/ Continues to handle important interrupts during the sleep
> 
> While something like CF 5118 will take care of point #1,

I'm not sure, even with CF entry 5118, nanosleep() could be interrupted. But I
agree that the leader won't be interrupted by PqMsg_Progress anymore.

> it will not deal
> with #2. Also, the v11 [4] patch does not do #2 either. So I think
> in the sleep loop, we need a C_F_I call. The same type of loop can
> also be used to call WaitForSingleObject.
> 
> If CF 5118 gets committed, will still need similar loop that calls C_F_I, 
> but the function will need to call WaitLatchUs [5].
> 
> Thoughts?

If CF 5118 gets committed, then I think we would need to move to using WaitLatchUs()
to react to urgent request. We'd also need to find a way to ensure that we'd
wait for the full duration of the requested time depending of the reason why we
waked up (well, only if we agree that 1/ is needed and I'm not sure we got a
consensus).

So I think that:

1. we should still implement the "1 Hz" stuff as 1.1/ it could be useful if CF
5118 gets committed and we move to WaitLatchUs() and 2.2/ it won't break anything
if CF gets committed and we don't move to WaitLatchUs(). For 1.1 it would still
prevent the leader to be waked up too frequently by the parallel workers.

2. still discuss the "need" and get a consensus regarding a sleep that could
ensure the wait duration (in some cases and depending of the reason why the
process is waked up).

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: pg_combinebackup does not detect missing files
Next
From: Nathan Bossart
Date:
Subject: Re: Remove dependence on integer wrapping