Re: Track the amount of time waiting due to cost_delay - Mailing list pgsql-hackers
From | Bertrand Drouvot |
---|---|
Subject | Re: Track the amount of time waiting due to cost_delay |
Date | |
Msg-id | ZmreasV0BKZ5BHIM@ip-10-97-1-34.eu-west-3.compute.internal Whole thread Raw |
In response to | Re: Track the amount of time waiting due to cost_delay (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>) |
Responses |
Re: Track the amount of time waiting due to cost_delay
|
List | pgsql-hackers |
Hi, On Tue, Jun 11, 2024 at 08:26:23AM +0000, Bertrand Drouvot wrote: > Hi, > > On Tue, Jun 11, 2024 at 04:07:05PM +0900, Masahiko Sawada wrote: > > > Thank you for the proposal and the patch. I understand the motivation > > of this patch. > > Thanks for looking at it! > > > Beside the point Nathan mentioned, I'm slightly worried > > that massive parallel messages could be sent to the leader process > > when the cost_limit value is low. > > I see, I can/will do some testing in this area and share the numbers. Here is the result of the test. It has been launched several times and it produced the same (surprising result) each time. ====================== Context ================================================ The testing has been done with this relation (large_tbl) and its indexes: postgres=# SELECT n.nspname, c.relname, count(*) AS buffers FROM pg_buffercache b JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c.oid) AND b.reldatabase IN (0, (SELECT oid FROM pg_database WHERE datname = current_database())) JOIN pg_namespace n ON n.oid = c.relnamespace GROUP BY n.nspname, c.relname ORDER BY 3 DESC LIMIT 22; nspname | relname | buffers ---------+--------------------+--------- public | large_tbl | 222280 public | large_tbl_filler13 | 125000 public | large_tbl_filler6 | 125000 public | large_tbl_filler5 | 125000 public | large_tbl_filler3 | 125000 public | large_tbl_filler15 | 125000 public | large_tbl_filler4 | 125000 public | large_tbl_filler20 | 125000 public | large_tbl_filler18 | 125000 public | large_tbl_filler14 | 125000 public | large_tbl_filler8 | 125000 public | large_tbl_filler11 | 125000 public | large_tbl_filler19 | 125000 public | large_tbl_filler7 | 125000 public | large_tbl_filler1 | 125000 public | large_tbl_filler12 | 125000 public | large_tbl_filler9 | 125000 public | large_tbl_filler17 | 125000 public | large_tbl_filler16 | 125000 public | large_tbl_filler10 | 125000 public | large_tbl_filler2 | 125000 public | large_tbl_pkey | 5486 (22 rows) All of them completly fit in memory (to avoid I/O read latency during the vacuum). The config, outside of default is: max_wal_size = 4GB shared_buffers = 30GB vacuum_cost_delay = 1 autovacuum_vacuum_cost_delay = 1 max_parallel_maintenance_workers = 8 max_parallel_workers = 10 vacuum_cost_limit = 10 autovacuum_vacuum_cost_limit = 10 My system is not overloaded, has enough resources to run this test and only this test is running. ====================== Results ================================================ ========== With v2 (attached) applied on master postgres=# VACUUM (PARALLEL 8) large_tbl; VACUUM Time: 1146873.016 ms (19:06.873) The duration is splitted that way: Vacuum phase: cumulative time (cumulative time delayed) ======================================================= scanning heap: 00:08:16.414628 (time_delayed: 444370) vacuuming indexes: 00:14:55.314699 (time_delayed: 2545293) vacuuming heap: 00:19:06.814617 (time_delayed: 2767540) I sampled active sessions from pg_stat_activity (one second interval), here is the summary during the vacuuming indexes phase (ordered by count): leader_pid | pid | wait_event | count ------------+--------+----------------+------- 452996 | 453225 | VacuumDelay | 366 452996 | 453223 | VacuumDelay | 363 452996 | 453226 | VacuumDelay | 362 452996 | 453224 | VacuumDelay | 361 452996 | 453222 | VacuumDelay | 359 452996 | 453221 | VacuumDelay | 359 | 452996 | VacuumDelay | 331 | 452996 | CPU | 30 452996 | 453224 | WALWriteLock | 23 452996 | 453222 | WALWriteLock | 20 452996 | 453226 | WALWriteLock | 20 452996 | 453221 | WALWriteLock | 19 | 452996 | WalSync | 18 452996 | 453225 | WALWriteLock | 18 452996 | 453223 | WALWriteLock | 16 | 452996 | WALWriteLock | 15 452996 | 453221 | CPU | 14 452996 | 453222 | CPU | 14 452996 | 453223 | CPU | 12 452996 | 453224 | CPU | 10 452996 | 453226 | CPU | 10 452996 | 453225 | CPU | 8 452996 | 453223 | WalSync | 4 452996 | 453221 | WalSync | 2 452996 | 453226 | WalWrite | 2 452996 | 453221 | WalWrite | 1 | 452996 | ParallelFinish | 1 452996 | 453224 | WalSync | 1 452996 | 453225 | WalSync | 1 452996 | 453222 | WalWrite | 1 452996 | 453225 | WalWrite | 1 452996 | 453222 | WalSync | 1 452996 | 453226 | WalSync | 1 ========== On master (v2 not applied) postgres=# VACUUM (PARALLEL 8) large_tbl; VACUUM Time: 1322598.087 ms (22:02.598) Surprisingly it has been longer on master by about 3 minutes. Let's see how the time is splitted: Vacuum phase: cumulative time ============================= scanning heap: 00:08:07.061196 vacuuming indexes: 00:17:50.961228 vacuuming heap: 00:22:02.561199 I sampled active sessions from pg_stat_activity (one second interval), here is the summary during the vacuuming indexes phase (ordered by count): leader_pid | pid | wait_event | count ------------+--------+-------------------+------- 468682 | 468858 | VacuumDelay | 548 468682 | 468862 | VacuumDelay | 547 468682 | 468859 | VacuumDelay | 547 468682 | 468860 | VacuumDelay | 545 468682 | 468857 | VacuumDelay | 543 468682 | 468861 | VacuumDelay | 542 | 468682 | VacuumDelay | 378 | 468682 | ParallelFinish | 182 468682 | 468861 | WALWriteLock | 19 468682 | 468857 | WALWriteLock | 19 468682 | 468859 | WALWriteLock | 18 468682 | 468858 | WALWriteLock | 16 468682 | 468860 | WALWriteLock | 15 468682 | 468862 | WALWriteLock | 15 468682 | 468862 | CPU | 12 468682 | 468857 | CPU | 10 468682 | 468859 | CPU | 10 468682 | 468861 | CPU | 10 | 468682 | CPU | 9 468682 | 468860 | CPU | 9 468682 | 468860 | WalSync | 8 | 468682 | WALWriteLock | 7 468682 | 468858 | WalSync | 6 468682 | 468858 | CPU | 6 468682 | 468862 | WalSync | 3 468682 | 468857 | WalSync | 3 468682 | 468861 | WalSync | 3 468682 | 468859 | WalSync | 2 468682 | 468861 | WalWrite | 2 468682 | 468857 | WalWrite | 1 468682 | 468858 | WalWrite | 1 468682 | 468861 | WALBufMappingLock | 1 468682 | 468857 | WALBufMappingLock | 1 | 468682 | WALBufMappingLock | 1 ====================== Observations =========================================== As compare to v2: 1. scanning heap time is about the same 2. vacuuming indexes time is about 3 minutes longer on master 3. vacuuming heap time is about the same One difference we can see in the sampling, is that on master the "ParallelFinish" has been sampled about 182 times (so could be _the_ 3 minutes of interest) for the leader. On master the vacuum indexes phase has been running between 2024-06-13 10:11:34 and 2024-06-13 10:21:15. If I extract the exact minutes and the counts for the "ParallelFinish" wait event I get: minute | wait_event | count --------+----------------+------- 18 | ParallelFinish | 48 19 | ParallelFinish | 60 20 | ParallelFinish | 60 21 | ParallelFinish | 14 So it's likely that the leader waited on ParallelFinish during about 3 minutes at the end of the vacuuming indexes phase (as this wait appeared during consecutives samples). ====================== Conclusion ============================================= 1. During the scanning heap and vacuuming heap phases no noticeable performance degradation has been observed with v2 applied (as compare to master) (cc'ing Robert as it's also related to his question about noticeable hit when everything is in-memory in [1]). 2. During the vacuuming indexes phase, v2 has been faster (as compare to master). The reason is that on master the leader has been waiting during about 3 minutes on "ParallelFinish" at the end. ====================== Remarks ================================================ As v2 is attached, please find below a summary about the current state of this thread: 1. v2 implements delay_time as the actual wait time (and not the intended wait time as proposed in v1). 2. some measurements have been done to check the impact of this new instrumentation (see this email and [2]): no noticeable performance degradation has been observed (and surprisingly that's the opposite as mentioned above). 3. there is an ongoing discussion about exposing the number of waits [2]. 4. there is an ongoing discussion about exposing the effective cost limit [3]. 5. that could be interesting to have a closer look as to why the leader is waiting during 3 minutes on "ParallelFinish" on master and not with v2 applied (but that's probably out of scope for this thread). [1]: https://www.postgresql.org/message-id/CA%2BTgmoZiC%3DzeCDYuMpB%2BGb2yK%3DrTQCGMu0VoxehocKyHxr9Erg%40mail.gmail.com [2]: https://www.postgresql.org/message-id/ZmmOOPwMFIltkdsN%40ip-10-97-1-34.eu-west-3.compute.internal [3]: https://www.postgresql.org/message-id/Zml9%2Bu37iS7DFkJL%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: