Thread: pg_usleep for multisecond delays

pg_usleep for multisecond delays

From
Nathan Bossart
Date:
I just found myself carefully counting the zeros in a call to pg_usleep().
Besides getting my eyes checked, perhaps there should be a wrapper called
pg_ssleep() than can be used for multisecond sleeps.  Or maybe the
USECS_PER_SEC macro should be used more widely.  I attached a patch for the
former approach.  I don't have a strong opinion, but I do think it's worth
improving readability a bit here.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: pg_usleep for multisecond delays

From
Andres Freund
Date:
Hi,

On 2023-02-09 12:59:29 -0800, Nathan Bossart wrote:
> I just found myself carefully counting the zeros in a call to pg_usleep().
> Besides getting my eyes checked, perhaps there should be a wrapper called
> pg_ssleep() than can be used for multisecond sleeps.  Or maybe the
> USECS_PER_SEC macro should be used more widely.  I attached a patch for the
> former approach.  I don't have a strong opinion, but I do think it's worth
> improving readability a bit here.

pg_usleep() should pretty much never used for sleeps that long, at least in
the backend - depending on the platform they're not interruptible. Most of the
things changed here are debugging tools, but even so, it's e.g. pretty
annoying. E.g. you can't normally shut down while a backend is in
pre_auth_delay.

So I'm not sure it's the right direction to make pg_usleep() easier to use...

Greetings,

Andres Freund



Re: pg_usleep for multisecond delays

From
Nathan Bossart
Date:
On Thu, Feb 09, 2023 at 01:30:27PM -0800, Andres Freund wrote:
> On 2023-02-09 12:59:29 -0800, Nathan Bossart wrote:
>> I just found myself carefully counting the zeros in a call to pg_usleep().
>> Besides getting my eyes checked, perhaps there should be a wrapper called
>> pg_ssleep() than can be used for multisecond sleeps.  Or maybe the
>> USECS_PER_SEC macro should be used more widely.  I attached a patch for the
>> former approach.  I don't have a strong opinion, but I do think it's worth
>> improving readability a bit here.
> 
> pg_usleep() should pretty much never used for sleeps that long, at least in
> the backend - depending on the platform they're not interruptible. Most of the
> things changed here are debugging tools, but even so, it's e.g. pretty
> annoying. E.g. you can't normally shut down while a backend is in
> pre_auth_delay.
> 
> So I'm not sure it's the right direction to make pg_usleep() easier to use...

Hm...  We might be able to use WaitLatch() for some of these.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: pg_usleep for multisecond delays

From
Kyotaro Horiguchi
Date:
At Thu, 9 Feb 2023 14:51:14 -0800, Nathan Bossart <nathandbossart@gmail.com> wrote in 
> On Thu, Feb 09, 2023 at 01:30:27PM -0800, Andres Freund wrote:
> > So I'm not sure it's the right direction to make pg_usleep() easier to use...
> Hm...  We might be able to use WaitLatch() for some of these.

And I think we are actully going to reduce or eliminate the use of
pg_sleep().

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pg_usleep for multisecond delays

From
Michael Paquier
Date:
On Thu, Feb 09, 2023 at 02:51:14PM -0800, Nathan Bossart wrote:
> Hm...  We might be able to use WaitLatch() for some of these.

Perhaps less than you think as a bit of work has been done in the last
few years to reduce the gap and make such code paths more responsive,
still I would not be surprised to find a couple of these..

I would let the portions of the code related to things like
pre_auth_delay or XLOG_REPLAY_DELAY as they are, though, without an
extra pg_Xsleep() implementation.
--
Michael

Attachment

Re: pg_usleep for multisecond delays

From
Alvaro Herrera
Date:
On 2023-Feb-09, Nathan Bossart wrote:

> I just found myself carefully counting the zeros in a call to pg_usleep().
> Besides getting my eyes checked, perhaps there should be a wrapper called
> pg_ssleep() than can be used for multisecond sleeps.  Or maybe the
> USECS_PER_SEC macro should be used more widely.  I attached a patch for the
> former approach.  I don't have a strong opinion, but I do think it's worth
> improving readability a bit here.

Hmm, it seems about half the patch would go away if you were to add a
PostAuthDelaySleep() function.

> @@ -2976,7 +2976,7 @@ _bt_pendingfsm_finalize(Relation rel, BTVacState *vstate)
>       * never be effective without some other backend concurrently consuming an
>       * XID.
>       */
> -    pg_usleep(5000000L);
> +    pg_ssleep(5);

Maybe for these cases where a WaitLatch is not desired, it'd be simpler
to do pg_usleep (5L * 1000 * 1000);

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"I am amazed at [the pgsql-sql] mailing list for the wonderful support, and
lack of hesitasion in answering a lost soul's question, I just wished the rest
of the mailing list could be like this."                               (Fotis)
               (http://archives.postgresql.org/pgsql-sql/2006-06/msg00265.php)



Re: pg_usleep for multisecond delays

From
Robert Haas
Date:
On Fri, Feb 10, 2023 at 3:30 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Maybe for these cases where a WaitLatch is not desired, it'd be simpler
> to do pg_usleep (5L * 1000 * 1000);

I somehow feel that we should be trying to get rid of cases where
WaitLatch is not desired.

That's probably overly simplistic - there might be cases where the
caller isn't just polling and has a really legitimate need to wait for
5 seconds of wall clock time. But even in that case, it seems like we
want to respond to barriers and interrupts during that time, in almost
all cases.

I wonder if we should have a wrapper around WaitLatch() that documents
that if the latch is set before the time expires, it will reset the
latch and try again to wait for the remaining time, after checking for
interrupts etc.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_usleep for multisecond delays

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I somehow feel that we should be trying to get rid of cases where
> WaitLatch is not desired.

+1

> I wonder if we should have a wrapper around WaitLatch() that documents
> that if the latch is set before the time expires, it will reset the
> latch and try again to wait for the remaining time, after checking for
> interrupts etc.

Resetting the latch seems not very friendly for general-purpose use
... although I guess we could set it again on the way out.

BTW, we have an existing pg_sleep() function that deals with all
of this except re-setting the latch.

            regards, tom lane



Re: pg_usleep for multisecond delays

From
Nathan Bossart
Date:
On Fri, Feb 10, 2023 at 10:18:34AM -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I wonder if we should have a wrapper around WaitLatch() that documents
>> that if the latch is set before the time expires, it will reset the
>> latch and try again to wait for the remaining time, after checking for
>> interrupts etc.
> 
> Resetting the latch seems not very friendly for general-purpose use
> ... although I guess we could set it again on the way out.
> 
> BTW, we have an existing pg_sleep() function that deals with all
> of this except re-setting the latch.

That seems workable.  I think it might also need to accept a function
pointer for custom interrupt checking (e.g., archiver code should use
HandlePgArchInterrupts()).  I'll put something together if that sounds
alright.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: pg_usleep for multisecond delays

From
Nathan Bossart
Date:
On Fri, Feb 10, 2023 at 10:51:20AM -0800, Nathan Bossart wrote:
> On Fri, Feb 10, 2023 at 10:18:34AM -0500, Tom Lane wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> I wonder if we should have a wrapper around WaitLatch() that documents
>>> that if the latch is set before the time expires, it will reset the
>>> latch and try again to wait for the remaining time, after checking for
>>> interrupts etc.
>> 
>> Resetting the latch seems not very friendly for general-purpose use
>> ... although I guess we could set it again on the way out.
>> 
>> BTW, we have an existing pg_sleep() function that deals with all
>> of this except re-setting the latch.
> 
> That seems workable.  I think it might also need to accept a function
> pointer for custom interrupt checking (e.g., archiver code should use
> HandlePgArchInterrupts()).  I'll put something together if that sounds
> alright.

Here is a work-in-progress patch.  I quickly scanned through my previous
patch and applied this new function everywhere it seemed safe to use (which
unfortunately is rather limited).

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: pg_usleep for multisecond delays

From
Tom Lane
Date:
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Fri, Feb 10, 2023 at 10:51:20AM -0800, Nathan Bossart wrote:
>> On Fri, Feb 10, 2023 at 10:18:34AM -0500, Tom Lane wrote:
>>> BTW, we have an existing pg_sleep() function that deals with all
>>> of this except re-setting the latch.

> Here is a work-in-progress patch.  I quickly scanned through my previous
> patch and applied this new function everywhere it seemed safe to use (which
> unfortunately is rather limited).

A quick grep for pg_usleep with large intervals finds rather more
than you touched:

contrib/auth_delay/auth_delay.c: 46:         pg_usleep(1000L * auth_delay_milliseconds);
src/backend/access/nbtree/nbtpage.c: 2979:     pg_usleep(5000000L);
src/backend/access/transam/xlog.c: 5109:         pg_usleep(60000000L);
src/backend/libpq/pqcomm.c: 717:         pg_usleep(100000L);        /* wait 0.1 sec */
src/backend/postmaster/bgwriter.c: 199:         pg_usleep(1000000L);
src/backend/postmaster/checkpointer.c: 313:         pg_usleep(1000000L);
src/backend/postmaster/checkpointer.c: 1009:         pg_usleep(100000L);        /* wait 0.1 sec, then retry */
src/backend/postmaster/postmaster.c: 4295:         pg_usleep(PreAuthDelay * 1000000L);
src/backend/postmaster/walwriter.c: 192:         pg_usleep(1000000L);
src/backend/postmaster/bgworker.c: 762:         pg_usleep(PostAuthDelay * 1000000L);
src/backend/postmaster/pgarch.c: 456:                 pg_usleep(1000000L);
src/backend/postmaster/pgarch.c: 488:                 pg_usleep(1000000L);    /* wait a bit before retrying */
src/backend/postmaster/autovacuum.c: 444:         pg_usleep(PostAuthDelay * 1000000L);
src/backend/postmaster/autovacuum.c: 564:         pg_usleep(1000000L);
src/backend/postmaster/autovacuum.c: 690:                 pg_usleep(1000000L);    /* 1s */
src/backend/postmaster/autovacuum.c: 1711:             pg_usleep(PostAuthDelay * 1000000L);
src/backend/storage/ipc/procarray.c: 3799:         pg_usleep(100 * 1000L); /* 100ms */
src/backend/utils/init/postinit.c: 985:             pg_usleep(PostAuthDelay * 1000000L);
src/backend/utils/init/postinit.c: 1192:         pg_usleep(PostAuthDelay * 1000000L);

Did you have reasons for excluding the rest of these?

Taking a step back, I think it might be a mistake to try to share code
with the SQL-exposed function; at least, that is causing some API
decisions that feel odd.  I have mixed emotions about both the use
of double as the sleep-time datatype, and about the choice of seconds
(rather than, say, msec) as the unit.  Admittedly this is not all bad
--- for example, several of the calls I list above are delaying for
100ms, which we can easily accommodate in this scheme as "0.1", and
maybe it'd be a good idea to hit up the stuff waiting for 10ms too.
It still seems unusual for an internal support function though.
I haven't done the math on it, but are we limiting the precision
of the sleep (due to roundoff error) in any way that would matter?

A bigger issue is that we almost certainly ought to pass through
a wait event code instead of allowing all these cases to be
WAIT_EVENT_PG_SLEEP.

I'd skip the unconditional latch reset you added to pg_sleep_sql.
I realize that's bug-compatible with what happens now, but I still
think it's expending extra code to do what might well be the
wrong thing.

We should update the header comment for pg_usleep to direct people
to this new function.

            regards, tom lane



Re: pg_usleep for multisecond delays

From
Nathan Bossart
Date:
On Fri, Mar 10, 2023 at 12:28:28PM -0500, Tom Lane wrote:
> A quick grep for pg_usleep with large intervals finds rather more
> than you touched:
> 
> [...]
> 
> Did you have reasons for excluding the rest of these?

I'm still looking into each of these, but my main concerns were 1) ensuring
latch support had been set up and 2) figuring out the right interrupt
function to use.  Thus far, I don't think latch support is a problem
because InitializeLatchSupport() is called quite early.  However, I'm not
as confident with the interrupt functions yet.  Some of these multisecond
sleeps are done very early before the signal handlers are set up.  Others
are done within the sigsetjmp() block.  And at least one is in a code path
that both the startup process and single-user mode touch.

At the moment, I'm thinking about either removing the check_interrupts
function pointer argument altogether or making it optional for code paths
where we might not want any interrupt handling to run.  In the former
approach, we could simply call WaitLatch() without a latch.  While this
wouldn't do any interrupt handling, we'd still get custom wait event
support and better responsiveness when the postmaster dies, which is
strictly better than what's there today.  And many of these sleeps are in
uncommon or debug paths, so delaying interrupt handling might be okay.  In
the latter approach, we would have interrupt handling, but I'm worried that
would be easy to get wrong.  I think it would be nice to have interrupt
handling if possible, so I'm still (over)thinking about this.

I agree with the rest of your comments.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: pg_usleep for multisecond delays

From
"Gregory Stark (as CFM)"
Date:
On Mon, 13 Mar 2023 at 17:17, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Fri, Mar 10, 2023 at 12:28:28PM -0500, Tom Lane wrote:
> > A quick grep for pg_usleep with large intervals finds rather more
> > than you touched:
> >
> > [...]
> >
> > Did you have reasons for excluding the rest of these?
>
> I'm still looking into each of these, but my main concerns were 1) ensuring
> latch support had been set up and 2) figuring out the right interrupt
> function to use.  Thus far, I don't think latch support is a problem
> because InitializeLatchSupport() is called quite early.  However, I'm not
> as confident with the interrupt functions yet.  Some of these multisecond
> sleeps are done very early before the signal handlers are set up.  Others
> are done within the sigsetjmp() block.  And at least one is in a code path
> that both the startup process and single-user mode touch.

Is this still a WIP? Is it targeting this release? There are only a
few days left before the feature freeze. I'm guessing it should just
move to the next CF for the next release?



-- 
Gregory Stark
As Commitfest Manager



Re: pg_usleep for multisecond delays

From
Nathan Bossart
Date:
On Mon, Apr 03, 2023 at 04:33:27PM -0400, Gregory Stark (as CFM) wrote:
> Is this still a WIP? Is it targeting this release? There are only a
> few days left before the feature freeze. I'm guessing it should just
> move to the next CF for the next release?

I moved it to the next commitfest and marked the target version as v17.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: pg_usleep for multisecond delays

From
Daniel Gustafsson
Date:
> On 4 Apr 2023, at 05:31, Nathan Bossart <nathandbossart@gmail.com> wrote:
> 
> On Mon, Apr 03, 2023 at 04:33:27PM -0400, Gregory Stark (as CFM) wrote:
>> Is this still a WIP? Is it targeting this release? There are only a
>> few days left before the feature freeze. I'm guessing it should just
>> move to the next CF for the next release?
> 
> I moved it to the next commitfest and marked the target version as v17.

Have you had any chance to revisit this such that there is a new patch to
review, or should it be returned/moved for now?

--
Daniel Gustafsson




Re: pg_usleep for multisecond delays

From
Nathan Bossart
Date:
On Mon, Jul 10, 2023 at 10:12:36AM +0200, Daniel Gustafsson wrote:
> Have you had any chance to revisit this such that there is a new patch to
> review, or should it be returned/moved for now?

Not yet.  I moved it to the next commitfest.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: pg_usleep for multisecond delays

From
Nathan Bossart
Date:
On Mon, Mar 13, 2023 at 02:16:31PM -0700, Nathan Bossart wrote:
> At the moment, I'm thinking about either removing the check_interrupts
> function pointer argument altogether or making it optional for code paths
> where we might not want any interrupt handling to run.  In the former
> approach, we could simply call WaitLatch() without a latch.  While this
> wouldn't do any interrupt handling, we'd still get custom wait event
> support and better responsiveness when the postmaster dies, which is
> strictly better than what's there today.  And many of these sleeps are in
> uncommon or debug paths, so delaying interrupt handling might be okay.  In
> the latter approach, we would have interrupt handling, but I'm worried that
> would be easy to get wrong.  I think it would be nice to have interrupt
> handling if possible, so I'm still (over)thinking about this.

I started on the former approach (work-in-progress patch attached), but I
figured I'd ask whether folks are alright with this before I spend more
time on it.  Many of the sleeps in question are relatively short, are
intended for debugging, or are meant to prevent errors from repeating as
fast as possible, so I don't know if we should bother adding interrupt
handling support.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: pg_usleep for multisecond delays

From
Kyotaro Horiguchi
Date:
At Wed, 26 Jul 2023 16:41:25 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in 
> On Mon, Mar 13, 2023 at 02:16:31PM -0700, Nathan Bossart wrote:
> I started on the former approach (work-in-progress patch attached), but I
> figured I'd ask whether folks are alright with this before I spend more
> time on it.  Many of the sleeps in question are relatively short, are
> intended for debugging, or are meant to prevent errors from repeating as
> fast as possible, so I don't know if we should bother adding interrupt
> handling support.

It is responsive to an immediate shutdown.  I think that's fine, as
things might become overly complex if we aim for it to respond to fast
shutdown as well.

The pg_msleep() implemented in the patch accepts a wait event type,
and some event types other than WAIT_EVENT_PG_SLEEP are passed to it.

WAIT_EVENT_CHECKPOINTER_MAIN:

 - At checkpointer.c:309, it is in a long-jump'ed block, where out of
   the main loop.

 - At checkpointer.c:1005: RequestCheckpoint is not executed on checkpointer.

WAIT_EVENT_ARCHIVER_MAIN:
 - At pgarch.c:453,485: They are not at the main-loop level idle-waiting.

WAIT_EVENT_PRE/POST_AUTH_DELAY:

 - These are really useless since they're not seen anywhere. Client
   backends don't show up in pg_stat_activity until this sleep
   finishes. (We could use ps-display instead, but...)

WAIT_EVENT_VACUUM_DELAY:

 - This behaves the same as it did before the change. Actually, we
   don't want to use WAIT_EVENT_PG_SLEEP for it.

So, we have at least one sensible use case for the parameter, which
seems to be sufficient reason.

I'm not sure if others will agree, but I'm leaning towards providing a
dedicated WaitEventSet for the new sleep function.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center