Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
Date
Msg-id CA+hUKGLaL4n=WVWkuK5ETUkk6BDDU10St5q+p7VotpPa+dA4Zg@mail.gmail.com
Whole thread Raw
In response to Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken  (Thomas Munro <thomas.munro@gmail.com>)
Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On Fri, Mar 10, 2023 at 11:37 AM Nathan Bossart
<nathandbossart@gmail.com> wrote:
>
> On Thu, Mar 09, 2023 at 05:27:08PM -0500, Tom Lane wrote:
> > Is it reasonable to assume that all modern platforms can time
> > millisecond delays accurately?  Ten years ago I'd have suggested
> > truncating the delay to a multiple of 10msec and using this logic
> > to track the remainder, but maybe now that's unnecessary.
>
> If so, it might also be worth updating or removing this comment in
> pgsleep.c:
>
>      * NOTE: although the delay is specified in microseconds, the effective
>      * resolution is only 1/HZ, or 10 milliseconds, on most Unixen.  Expect
>      * the requested delay to be rounded up to the next resolution boundary.
>
> I've had doubts for some time about whether this is still accurate...

What I see with the old select(), or a more modern clock_nanosleep()
call, is that Linux, FreeBSD, macOS are happy sleeping for .1ms, .5ms,
1ms, 2ms, 3ms, and through innaccuracies and scheduling overheads etc
it works out to about 5-25% extra sleep time (I expect that can be
affected by choice of time source/available hardware, and perhaps
various system calls use different tricks).  I definitely recall the
behaviour described, back in the old days where more stuff was
scheduler-tick based.  I have no clue for Windows; quick googling
tells me that it might still be pretty chunky, unless you do certain
other stuff that I didn't follow up; we could probably get more
accurate sleep times by rummaging through nt.dll.  It would be good to
find out how well WaitEventSet does on Windows; perhaps we should have
a little timing accuracy test in the tree to collect build farm data?

FWIW epoll has a newer _pwait2() call that has higher res timeout
argument, and Windows WaitEventSet could also do high res timers if
you add timer events rather than using the timeout argument, and I
guess conceptually even the old poll() thing could do the equivalent
with a signal alarm timer, but it sounds a lot like a bad idea to do
very short sleeps to me, burning so much CPU on scheduling.  I kinda
wonder if the 10ms + residual thing might even turn out to be a better
idea... but I dunno.

The 1ms residual thing looks pretty good to me as a fix to the
immediate problem report, but we might also want to adjust the wording
in config.sgml?



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: improving user.c error messages
Next
From: Thomas Munro
Date:
Subject: Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken