Re: Track the amount of time waiting due to cost_delay - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Track the amount of time waiting due to cost_delay
Date
Msg-id Zmgdl2EC3uiheEm8@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Track the amount of time waiting due to cost_delay  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Track the amount of time waiting due to cost_delay
List pgsql-hackers
Hi,

On Mon, Jun 10, 2024 at 05:58:13PM -0400, Robert Haas wrote:
> I'm always suspicious of this sort of thing. I tend to find nothing
> gives me the right answer unless I assume that the actual sleep times
> are randomly and systematically different from the intended sleep
> times but arbitrarily large amounts. I think we should at least do
> some testing: if we measure both the intended sleep time and the
> actual sleep time, how close are they? Does it change if the system is
> under crushing load (which might elongate sleeps) or if we spam
> SIGUSR1 against the vacuum process (which might shorten them)?

Though I (now) think that it would make sense to record the actual delay time 
instead (see [1]), I think it's interesting to do some testing as you suggested.

With record_actual_time.txt (attached) applied on top of v1, we can see the
intended and actual wait time.

On my system, "no load at all" except the vacuum running, I see no diff:

                         Tue Jun 11 09:22:06 2024 (every 1s)

  pid  | relid |     phase     | time_delayed | actual_time_delayed |    duration
-------+-------+---------------+--------------+---------------------+-----------------
 54754 | 16385 | scanning heap |        41107 |               41107 | 00:00:42.301851
(1 row)

                         Tue Jun 11 09:22:07 2024 (every 1s)

  pid  | relid |     phase     | time_delayed | actual_time_delayed |    duration
-------+-------+---------------+--------------+---------------------+-----------------
 54754 | 16385 | scanning heap |        42076 |               42076 | 00:00:43.301848
(1 row)

                         Tue Jun 11 09:22:08 2024 (every 1s)

  pid  | relid |     phase     | time_delayed | actual_time_delayed |    duration
-------+-------+---------------+--------------+---------------------+-----------------
 54754 | 16385 | scanning heap |        43045 |               43045 | 00:00:44.301854
(1 row)

But if I launch pg_reload_conf() 10 times in a row, I can see:

                         Tue Jun 11 09:22:09 2024 (every 1s)

  pid  | relid |     phase     | time_delayed | actual_time_delayed |    duration
-------+-------+---------------+--------------+---------------------+-----------------
 54754 | 16385 | scanning heap |        44064 |               44034 | 00:00:45.302965
(1 row)

                         Tue Jun 11 09:22:10 2024 (every 1s)

  pid  | relid |     phase     | time_delayed | actual_time_delayed |    duration
-------+-------+---------------+--------------+---------------------+-----------------
 54754 | 16385 | scanning heap |        45033 |               45003 | 00:00:46.301858


As we can see the actual wait time is 30ms less than the intended wait time with
this simple test. So I still think we should go with 1) actual wait time and 2)
report the number of waits (as mentioned in [1]). Does that make sense to you?


[1]: https://www.postgresql.org/message-id/Zmf712A5xcOM9Hlg%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Conflict Detection and Resolution
Next
From: Ashutosh Sharma
Date:
Subject: Re: Addressing SECURITY DEFINER Function Vulnerabilities in PostgreSQL Extensions