Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken |
Date | |
Msg-id | CA+hUKG+ogAon8_V223Ldv6taPR2uKH3X_UJ_A7LJAf3-VRARPA@mail.gmail.com Whole thread Raw |
In response to | Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken (Thomas Munro <thomas.munro@gmail.com>) |
List | pgsql-hackers |
On Fri, Mar 10, 2023 at 1:05 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Fri, Mar 10, 2023 at 11:37 AM Nathan Bossart > <nathandbossart@gmail.com> wrote: > > On Thu, Mar 09, 2023 at 05:27:08PM -0500, Tom Lane wrote: > > > Is it reasonable to assume that all modern platforms can time > > > millisecond delays accurately? Ten years ago I'd have suggested > > > truncating the delay to a multiple of 10msec and using this logic > > > to track the remainder, but maybe now that's unnecessary. > > > > If so, it might also be worth updating or removing this comment in > > pgsleep.c: > > > > * NOTE: although the delay is specified in microseconds, the effective > > * resolution is only 1/HZ, or 10 milliseconds, on most Unixen. Expect > > * the requested delay to be rounded up to the next resolution boundary. > > > > I've had doubts for some time about whether this is still accurate... Unfortunately I was triggered by this Unix archeology discussion, and wasted some time this weekend testing every Unix we target. I found 3 groups: 1. OpenBSD, NetBSD: Like the comment says, kernel ticks still control sleep resolution. I measure an average time of ~20ms when I ask for 1ms sleeps in a loop with select() or nanosleep(). I don't actually understand why it's not ~10ms because HZ is 100 on these systems, but I didn't look harder. 2. AIX, Solaris, illumos: select() can sleep for 1ms accurately, but not fractions of 1ms. If you use nanosleep() instead of select(), then AIX joins the third group (huh, maybe it's just that its select(us) calls poll(ms) under the covers?), but Solaris does not (maybe it's still tick-based, but HZ == 1000?). 3. Linux, FreeBSD, macOS: sub-ms sleeps are quite accurate (via various system calls). I didn't test Windows but it sounds a lot like it is in group 1 if you use WaitForMultipleObjects() or SleepEx(), as we do. You can probably tune some of the above; for example FreeBSD can go back to the old way with kern.eventtimer.periodic=1 to get a thousand interrupts per second (kern.hz) instead of programming a hardware timer to get an interrupt at just the right time, and then 0.5ms sleep requests get rounded to an average of 1ms, just like on Solaris. And power usage probably goes up. As for what do do about it, I dunno, how about this? * NOTE: although the delay is specified in microseconds, the effective - * resolution is only 1/HZ, or 10 milliseconds, on most Unixen. Expect - * the requested delay to be rounded up to the next resolution boundary. + * resolution is only 1/HZ on systems that use periodic kernel ticks to limit + * sleeping. This may cause sleeps to be rounded up by as much as 1-20 + * milliseconds on old Unixen and Windows. As for the following paragraph about the dangers of select() and interrupts and restarts, I suspect it is describing the HP-UX behaviour (a dropped platform), which I guess must have led to POSIX's reluctance to standardise that properly, but in any case all hypothetical concerns would disappear if we just used POSIX [clock_]nanosleep(), no? It has defined behaviour on signals, and it also tells you the remaining time (if we cared, which we wouldn't).
pgsql-hackers by date: