Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken - Mailing list pgsql-hackers
From | Nathan Bossart |
---|---|
Subject | Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken |
Date | |
Msg-id | 20230314185428.GA431737@nathanxps13 Whole thread Raw |
In response to | Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
|
List | pgsql-hackers |
On Tue, Mar 14, 2023 at 03:38:45PM +1300, Thomas Munro wrote: > On Tue, Mar 14, 2023 at 12:10 PM Nathan Bossart > <nathandbossart@gmail.com> wrote: >> > * NOTE: although the delay is specified in microseconds, the effective >> > - * resolution is only 1/HZ, or 10 milliseconds, on most Unixen. Expect >> > - * the requested delay to be rounded up to the next resolution boundary. >> > + * resolution is only 1/HZ on systems that use periodic kernel ticks to wake >> > + * up. This may cause sleeps to be rounded up by 1-20 milliseconds on older >> > + * Unixen and Windows. >> >> nitpick: Could the 1/HZ versus 20 milliseconds discrepancy cause confusion? >> Otherwise, I think this is the right idea. > > Better words welcome; 1-20ms summarises the range I actually measured, > and if reports are correct about Windows' HZ=64 (1/HZ = 15.625ms) then > it neatly covers that too, so I don't feel too bad about not chasing > down the reason for that 10ms/20ms discrepancy; maybe I looked at the > wrong HZ number (which you can change, anyway), I'm not too used to > NetBSD... BTW they have a project plan to fix that > https://wiki.netbsd.org/projects/project/tickless/ Here is roughly what I had in mind: NOTE: Although the delay is specified in microseconds, older Unixen and Windows use periodic kernel ticks to wake up, which might increase the delay time significantly. We've observed delay increases as large as 20 milliseconds on supported platforms. >> > + * CAUTION: if interrupted by a signal, this function will return, but its >> > + * interface doesn't report that. It's not a good idea to use this >> > + * for long sleeps in the backend, because backends are expected to respond to >> > + * interrupts promptly. Better practice for long sleeps is to use WaitLatch() >> > + * with a timeout. >> >> I'm not sure this argument follows. If pg_usleep() returns if interrupted, >> then why are we concerned about delayed responses to interrupts? > > Because you can't rely on it: > > 1. Maybe the signal is delivered just before pg_usleep() begins, and > a handler sets some flag we would like to react to. Now pg_usleep() > will not be interrupted. That problem is solved by using latches > instead. > 2. Maybe the signal is one that is no longer handled by a handler at > all; these days, latches use SIGURG, which pops out when you read a > signalfd or kqueue, so pg_usleep() will not wake up. That problem is > solved by using latches instead. > > (The word "interrupt" is a bit overloaded, which doesn't help with > this discussion.) Yeah, I think it would be clearer if "interrupt" was disambiguated. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: