Re: Track the amount of time waiting due to cost_delay - Mailing list pgsql-hackers
From | Bertrand Drouvot |
---|---|
Subject | Re: Track the amount of time waiting due to cost_delay |
Date | |
Msg-id | ZmmOOPwMFIltkdsN@ip-10-97-1-34.eu-west-3.compute.internal Whole thread Raw |
In response to | Re: Track the amount of time waiting due to cost_delay (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
Hi, On Tue, Jun 11, 2024 at 01:13:48PM -0400, Robert Haas wrote: > On Tue, Jun 11, 2024 at 5:49 AM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > As we can see the actual wait time is 30ms less than the intended wait time with > > this simple test. So I still think we should go with 1) actual wait time and 2) > > report the number of waits (as mentioned in [1]). Does that make sense to you? > > I like the idea of reporting the actual wait time better, +1 > provided > that we verify that doing so isn't too expensive. I think it probably > isn't, because in a long-running VACUUM there is likely to be disk > I/O, so the CPU overhead of a few extra gettimeofday() calls should be > fairly low by comparison. Agree. > I wonder if there's a noticeable hit when > everything is in-memory. I guess probably not, because with any sort > of normal configuration, we shouldn't be delaying after every block we > process, so the cost of those gettimeofday() calls should still be > getting spread across quite a bit of real work. I did some testing, with: shared_buffers = 12GB vacuum_cost_delay = 1 autovacuum_vacuum_cost_delay = 1 max_parallel_maintenance_workers = 0 max_parallel_workers = 0 added to a default config file. A table and all its indexes were fully in memory, the numbers are: postgres=# SELECT n.nspname, c.relname, count(*) AS buffers FROM pg_buffercache b JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c.oid) AND b.reldatabase IN (0, (SELECT oid FROM pg_database WHERE datname = current_database())) JOIN pg_namespace n ON n.oid = c.relnamespace GROUP BY n.nspname, c.relname ORDER BY 3 DESC LIMIT 11; nspname | relname | buffers ---------+-------------------+--------- public | large_tbl | 222280 public | large_tbl_pkey | 5486 public | large_tbl_filler7 | 1859 public | large_tbl_filler4 | 1859 public | large_tbl_filler1 | 1859 public | large_tbl_filler6 | 1859 public | large_tbl_filler3 | 1859 public | large_tbl_filler2 | 1859 public | large_tbl_filler5 | 1859 public | large_tbl_filler8 | 1859 public | large_tbl_version | 1576 (11 rows) The observed timings when vacuuming this table are: On master: vacuum phase: cumulative duration --------------------------------- scanning heap: 00:00:37.808184 vacuuming indexes: 00:00:41.808176 vacuuming heap: 00:00:54.808156 On master patched with actual time delayed: vacuum phase: cumulative duration --------------------------------- scanning heap: 00:00:36.502104 (time_delayed: 22202) vacuuming indexes: 00:00:41.002103 (time_delayed: 23769) vacuuming heap: 00:00:54.302096 (time_delayed: 34886) As we can see there is no noticeable degradation while the vacuum entered about 34886 times in this instrumentation code path (cost_delay was set to 1). > That said, I'm not sure this experiment shows a real problem with the > idea of showing intended wait time. It does establish the concept that > repeated signals can throw our numbers off, but 30ms isn't much of a > discrepancy. Yeah, the idea was just to show how easy it is to create a 30ms discrepancy. > I'm worried about being off by a factor of two, or an > order of magnitude. I think we still don't know if that can happen, > but if we're going to show actual wait time anyway, then we don't need > to explore the problems with other hypothetical systems too much. Agree. > I'm not convinced that reporting the number of waits is useful. If we > were going to report a possibly-inaccurate amount of actual waiting, > then also reporting the number of waits might make it easier to figure > out when the possibly-inaccurate number was in fact inaccurate. But I > think it's way better to report an accurate amount of actual waiting, > and then I'm not sure what we gain by also reporting the number of > waits. Sami shared his thoughts in [1] and [2] and so did I in [3]. If some of us still don't think that reporting the number of waits is useful then we can probably start without it. [1]: https://www.postgresql.org/message-id/0EA474B6-BF88-49AE-82CA-C1A9A3C17727%40amazon.com [2]: https://www.postgresql.org/message-id/E12435E2-5FCA-49B0-9ADB-0E7153F95E2D%40amazon.com [3]: https://www.postgresql.org/message-id/ZmmGG4e%2BqTBD2kfn%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: