Hi,
On Fri, Jun 28, 2024 at 05:50:20PM -0500, Sami Imseih wrote:
>
> Thanks for the feedback!
>
> > On Jun 28, 2024, at 4:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Sami Imseih <samimseih@gmail.com> writes:
> >> Reattaching the patch.
> >
> > I feel like this is fundamentally a wrong solution, for the reasons
> > cited in the comment for pg_usleep: long sleeps are a bad idea
> > because of the resulting uncertainty about whether we'll respond to
> > interrupts and such promptly. An example here is that if we get
> > a query cancel interrupt, we should probably not insist on finishing
> > out the current sleep before responding.
>
> The case which brought up this discussion is the pg_usleep that
> is called within the vacuum_delay_point being interrupted.
>
> When I read the same code comment you cited, it sounded to me
> that “long sleeps” are those that are in seconds or minutes. The longest
> vacuum delay allowed is 100ms.
I think that with the proposed patch the actual wait time can be "long".
Indeed, the time between the interruptions and restarts of the nanosleep() call
will lead to drift (as mentioned in the nanosleep() man page). So, with a large
number of interruptions, the actual wait time could be way longer than the
expected wait time.
To put numbers, I did a few tests with the patch (and with v2 shared in [1]):
cost delay is 1ms and cost limit is 10.
With 50 indexes and 10 parallel workers I can see things like:
2024-07-02 08:22:23.789 UTC [2189616] LOG: expected 1.000000, actual 239.378368
2024-07-02 08:22:24.575 UTC [2189616] LOG: expected 0.100000, actual 224.331737
2024-07-02 08:22:25.363 UTC [2189616] LOG: expected 1.300000, actual 230.462793
2024-07-02 08:22:26.154 UTC [2189616] LOG: expected 1.000000, actual 225.980803
Means we waited more than the max allowed cost delay (100ms).
With 49 parallel workers, it's worst as I can see things like:
2024-07-02 08:26:36.069 UTC [2189807] LOG: expected 1.000000, actual 1106.790136
2024-07-02 08:26:36.298 UTC [2189807] LOG: expected 1.000000, actual 218.148985
The first actual wait time is about 1 second (it has been interrupted about
16300 times during this second).
To avoid this drift, the nanosleep() man page suggests to use clock_nanosleep()
with an absolute time value, that might be another idea to explore.
[1]: https://www.postgresql.org/message-id/flat/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com