Hi,
On Mon, Jun 10, 2024 at 05:58:13PM -0400, Robert Haas wrote:
> I'm always suspicious of this sort of thing. I tend to find nothing
> gives me the right answer unless I assume that the actual sleep times
> are randomly and systematically different from the intended sleep
> times but arbitrarily large amounts. I think we should at least do
> some testing: if we measure both the intended sleep time and the
> actual sleep time, how close are they? Does it change if the system is
> under crushing load (which might elongate sleeps) or if we spam
> SIGUSR1 against the vacuum process (which might shorten them)?
Though I (now) think that it would make sense to record the actual delay time
instead (see [1]), I think it's interesting to do some testing as you suggested.
With record_actual_time.txt (attached) applied on top of v1, we can see the
intended and actual wait time.
On my system, "no load at all" except the vacuum running, I see no diff:
Tue Jun 11 09:22:06 2024 (every 1s)
pid | relid | phase | time_delayed | actual_time_delayed | duration
-------+-------+---------------+--------------+---------------------+-----------------
54754 | 16385 | scanning heap | 41107 | 41107 | 00:00:42.301851
(1 row)
Tue Jun 11 09:22:07 2024 (every 1s)
pid | relid | phase | time_delayed | actual_time_delayed | duration
-------+-------+---------------+--------------+---------------------+-----------------
54754 | 16385 | scanning heap | 42076 | 42076 | 00:00:43.301848
(1 row)
Tue Jun 11 09:22:08 2024 (every 1s)
pid | relid | phase | time_delayed | actual_time_delayed | duration
-------+-------+---------------+--------------+---------------------+-----------------
54754 | 16385 | scanning heap | 43045 | 43045 | 00:00:44.301854
(1 row)
But if I launch pg_reload_conf() 10 times in a row, I can see:
Tue Jun 11 09:22:09 2024 (every 1s)
pid | relid | phase | time_delayed | actual_time_delayed | duration
-------+-------+---------------+--------------+---------------------+-----------------
54754 | 16385 | scanning heap | 44064 | 44034 | 00:00:45.302965
(1 row)
Tue Jun 11 09:22:10 2024 (every 1s)
pid | relid | phase | time_delayed | actual_time_delayed | duration
-------+-------+---------------+--------------+---------------------+-----------------
54754 | 16385 | scanning heap | 45033 | 45003 | 00:00:46.301858
As we can see the actual wait time is 30ms less than the intended wait time with
this simple test. So I still think we should go with 1) actual wait time and 2)
report the number of waits (as mentioned in [1]). Does that make sense to you?
[1]: https://www.postgresql.org/message-id/Zmf712A5xcOM9Hlg%40ip-10-97-1-34.eu-west-3.compute.internal
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com