Re: Track the amount of time waiting due to cost_delay - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Track the amount of time waiting due to cost_delay
Date
Msg-id ZmreasV0BKZ5BHIM@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Track the amount of time waiting due to cost_delay  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Responses Re: Track the amount of time waiting due to cost_delay
List pgsql-hackers
Hi,

On Tue, Jun 11, 2024 at 08:26:23AM +0000, Bertrand Drouvot wrote:
> Hi,
> 
> On Tue, Jun 11, 2024 at 04:07:05PM +0900, Masahiko Sawada wrote:
> 
> > Thank you for the proposal and the patch. I understand the motivation
> > of this patch.
> 
> Thanks for looking at it!
> 
> > Beside the point Nathan mentioned, I'm slightly worried
> > that massive parallel messages could be sent to the leader process
> > when the cost_limit value is low.
> 
> I see, I can/will do some testing in this area and share the numbers.

Here is the result of the test. It has been launched several times and it
produced the same (surprising result) each time.

====================== Context ================================================

The testing has been done with this relation (large_tbl) and its indexes:

postgres=# SELECT n.nspname, c.relname, count(*) AS buffers
             FROM pg_buffercache b JOIN pg_class c
             ON b.relfilenode = pg_relation_filenode(c.oid) AND
                b.reldatabase IN (0, (SELECT oid FROM pg_database
                                      WHERE datname = current_database()))
             JOIN pg_namespace n ON n.oid = c.relnamespace
             GROUP BY n.nspname, c.relname
             ORDER BY 3 DESC
             LIMIT 22;

 nspname |      relname       | buffers
---------+--------------------+---------
 public  | large_tbl          |  222280
 public  | large_tbl_filler13 |  125000
 public  | large_tbl_filler6  |  125000
 public  | large_tbl_filler5  |  125000
 public  | large_tbl_filler3  |  125000
 public  | large_tbl_filler15 |  125000
 public  | large_tbl_filler4  |  125000
 public  | large_tbl_filler20 |  125000
 public  | large_tbl_filler18 |  125000
 public  | large_tbl_filler14 |  125000
 public  | large_tbl_filler8  |  125000
 public  | large_tbl_filler11 |  125000
 public  | large_tbl_filler19 |  125000
 public  | large_tbl_filler7  |  125000
 public  | large_tbl_filler1  |  125000
 public  | large_tbl_filler12 |  125000
 public  | large_tbl_filler9  |  125000
 public  | large_tbl_filler17 |  125000
 public  | large_tbl_filler16 |  125000
 public  | large_tbl_filler10 |  125000
 public  | large_tbl_filler2  |  125000
 public  | large_tbl_pkey     |    5486
(22 rows)

All of them completly fit in memory (to avoid I/O read latency during the vacuum).

The config, outside of default is:

max_wal_size = 4GB
shared_buffers = 30GB
vacuum_cost_delay = 1
autovacuum_vacuum_cost_delay = 1
max_parallel_maintenance_workers = 8
max_parallel_workers = 10
vacuum_cost_limit = 10
autovacuum_vacuum_cost_limit = 10

My system is not overloaded, has enough resources to run this test and only this
test is running.

====================== Results ================================================

========== With v2 (attached) applied on master

postgres=# VACUUM (PARALLEL 8) large_tbl;
VACUUM
Time: 1146873.016 ms (19:06.873)

The duration is splitted that way:

Vacuum phase: cumulative time (cumulative time delayed)
=======================================================
scanning heap: 00:08:16.414628 (time_delayed: 444370)
vacuuming indexes: 00:14:55.314699 (time_delayed: 2545293)
vacuuming heap: 00:19:06.814617 (time_delayed: 2767540)

I sampled active sessions from pg_stat_activity (one second interval), here is
the summary during the vacuuming indexes phase (ordered by count):

 leader_pid |  pid   |   wait_event   | count
------------+--------+----------------+-------
     452996 | 453225 | VacuumDelay    |   366
     452996 | 453223 | VacuumDelay    |   363
     452996 | 453226 | VacuumDelay    |   362
     452996 | 453224 | VacuumDelay    |   361
     452996 | 453222 | VacuumDelay    |   359
     452996 | 453221 | VacuumDelay    |   359
            | 452996 | VacuumDelay    |   331
            | 452996 | CPU            |    30
     452996 | 453224 | WALWriteLock   |    23
     452996 | 453222 | WALWriteLock   |    20
     452996 | 453226 | WALWriteLock   |    20
     452996 | 453221 | WALWriteLock   |    19
            | 452996 | WalSync        |    18
     452996 | 453225 | WALWriteLock   |    18
     452996 | 453223 | WALWriteLock   |    16
            | 452996 | WALWriteLock   |    15
     452996 | 453221 | CPU            |    14
     452996 | 453222 | CPU            |    14
     452996 | 453223 | CPU            |    12
     452996 | 453224 | CPU            |    10
     452996 | 453226 | CPU            |    10
     452996 | 453225 | CPU            |     8
     452996 | 453223 | WalSync        |     4
     452996 | 453221 | WalSync        |     2
     452996 | 453226 | WalWrite       |     2
     452996 | 453221 | WalWrite       |     1
            | 452996 | ParallelFinish |     1
     452996 | 453224 | WalSync        |     1
     452996 | 453225 | WalSync        |     1
     452996 | 453222 | WalWrite       |     1
     452996 | 453225 | WalWrite       |     1
     452996 | 453222 | WalSync        |     1
     452996 | 453226 | WalSync        |     1


========== On master (v2 not applied)

postgres=# VACUUM (PARALLEL 8) large_tbl;
VACUUM
Time: 1322598.087 ms (22:02.598)

Surprisingly it has been longer on master by about 3 minutes.

Let's see how the time is splitted:

Vacuum phase: cumulative time
=============================
scanning heap: 00:08:07.061196
vacuuming indexes: 00:17:50.961228
vacuuming heap: 00:22:02.561199

I sampled active sessions from pg_stat_activity (one second interval), here is
the summary during the vacuuming indexes phase (ordered by count):

 leader_pid |  pid   |    wait_event     | count
------------+--------+-------------------+-------
     468682 | 468858 | VacuumDelay       |   548
     468682 | 468862 | VacuumDelay       |   547
     468682 | 468859 | VacuumDelay       |   547
     468682 | 468860 | VacuumDelay       |   545
     468682 | 468857 | VacuumDelay       |   543
     468682 | 468861 | VacuumDelay       |   542
            | 468682 | VacuumDelay       |   378
            | 468682 | ParallelFinish    |   182
     468682 | 468861 | WALWriteLock      |    19
     468682 | 468857 | WALWriteLock      |    19
     468682 | 468859 | WALWriteLock      |    18
     468682 | 468858 | WALWriteLock      |    16
     468682 | 468860 | WALWriteLock      |    15
     468682 | 468862 | WALWriteLock      |    15
     468682 | 468862 | CPU               |    12
     468682 | 468857 | CPU               |    10
     468682 | 468859 | CPU               |    10
     468682 | 468861 | CPU               |    10
            | 468682 | CPU               |     9
     468682 | 468860 | CPU               |     9
     468682 | 468860 | WalSync           |     8
            | 468682 | WALWriteLock      |     7
     468682 | 468858 | WalSync           |     6
     468682 | 468858 | CPU               |     6
     468682 | 468862 | WalSync           |     3
     468682 | 468857 | WalSync           |     3
     468682 | 468861 | WalSync           |     3
     468682 | 468859 | WalSync           |     2
     468682 | 468861 | WalWrite          |     2
     468682 | 468857 | WalWrite          |     1
     468682 | 468858 | WalWrite          |     1
     468682 | 468861 | WALBufMappingLock |     1
     468682 | 468857 | WALBufMappingLock |     1
            | 468682 | WALBufMappingLock |     1

====================== Observations ===========================================

As compare to v2:

1. scanning heap time is about the same
2. vacuuming indexes time is about 3 minutes longer on master
3. vacuuming heap time is about the same

One difference we can see in the sampling, is that on master the "ParallelFinish"
has been sampled about 182 times (so could be _the_ 3 minutes of interest) for
the leader.

On master the vacuum indexes phase has been running between 2024-06-13 10:11:34
and 2024-06-13 10:21:15. If I extract the exact minutes and the counts for the
"ParallelFinish" wait event I get:

 minute |   wait_event   | count
--------+----------------+-------
     18 | ParallelFinish |    48
     19 | ParallelFinish |    60
     20 | ParallelFinish |    60
     21 | ParallelFinish |    14

So it's likely that the leader waited on ParallelFinish during about 3 minutes
at the end of the vacuuming indexes phase (as this wait appeared during
consecutives samples).

====================== Conclusion =============================================

1. During the scanning heap and vacuuming heap phases no noticeable performance
degradation has been observed with v2 applied (as compare to master) (cc'ing
Robert as it's also related to his question about noticeable hit when everything
is in-memory in [1]).

2. During the vacuuming indexes phase, v2 has been faster (as compare to master).
The reason is that on master the leader has been waiting during about 3 minutes
on "ParallelFinish" at the end.

====================== Remarks ================================================

As v2 is attached, please find below a summary about the current state of this
thread:

1. v2 implements delay_time as the actual wait time (and not the intended wait
time as proposed in v1).

2. some measurements have been done to check the impact of this new
instrumentation (see this email and [2]): no noticeable performance degradation
has been observed (and surprisingly that's the opposite as mentioned above).

3. there is an ongoing discussion about exposing the number of waits [2].

4. there is an ongoing discussion about exposing the effective cost limit [3].

5. that could be interesting to have a closer look as to why the leader is waiting
during 3 minutes on "ParallelFinish" on master and not with v2 applied (but that's
probably out of scope for this thread).

[1]:
https://www.postgresql.org/message-id/CA%2BTgmoZiC%3DzeCDYuMpB%2BGb2yK%3DrTQCGMu0VoxehocKyHxr9Erg%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/ZmmOOPwMFIltkdsN%40ip-10-97-1-34.eu-west-3.compute.internal
[3]: https://www.postgresql.org/message-id/Zml9%2Bu37iS7DFkJL%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Add support to TLS 1.3 cipher suites and curves lists
Next
From: Heikki Linnakangas
Date:
Subject: MultiXactMemberFreezeThreshold can make autovacuum *less* aggressive