Add heap and index vacuum timings to pg_stat_progress_vacuum - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Add heap and index vacuum timings to pg_stat_progress_vacuum
Date
Msg-id CALj2ACVD08XgDcQpTSegKDiyJHwJ6eK89X+1WksWqUJxT86A_A@mail.gmail.com
Whole thread
List pgsql-hackers
Hi,

When vacuum is running for hours on a table with many indexes, we can see what phase it is in and how many index vacuum cycles have happened (index_vacuum_count), but not how much time each phase has taken. That makes it hard to tell what is actually slow.

The attached patch adds three columns to pg_stat_progress_vacuum that report the total time spent in heap vacuuming, index vacuuming, and index cleanup. The values are accumulated across overflow cycles when the dead TID store fills up.

These columns make it easy to see whether the bottleneck is heap scanning or index processing, analyze why vacuum is slow, and adjust maintenance_work_mem accordingly to reduce TID store overflow cycles. They also make it possible to estimate how long
vacuum takes to finish - polling these timing columns along with scanned blocks gives a good estimated time.

We could put this behind a track_vacuum_timing GUC similar to track_io_timing and track_wal_timing of the time capturing calls seem costly. I'd prefer not to add another GUC, but I'm open to thoughts on this.

For example, I tested with a 3M-row table (3.9 GB), 4 btree indexes, and maintenance_work_mem=256kB (kept low to force TID-store overflows). About 34 minutes in, only 13% through the vacuum [1]. index_vacuum_time is 2006s vs heap_vacuum_time of 2.1s - index processing is where all the time goes. 13,188 overflow cycles × 4 indexes = 52,752 index scans so far. In production, tables with more indexes, these columns show exactly where vacuum time is going and whether raising maintenance_work_mem would help.

With the help of the attached sql function, the estimated time to finish the vacuum is about 3.5 hours [2].

Thoughts?

[1]
  SELECT * FROM pg_stat_progress_vacuum;
   pid                  | 18764
   relid                | 16384
   phase                | vacuuming indexes
   heap_blks_total      | 500000
   heap_blks_scanned    | 66202
   heap_blks_vacuumed   | 66202
   index_vacuum_count   | 13188
   max_dead_tuple_bytes | 262144
   dead_tuple_bytes     | 0
   num_dead_item_ids    | 0
   indexes_total        | 4
   indexes_processed    | 4
   delay_time           | 0
   mode                 | manual
   started_by           | user
   heap_vacuum_time     | 2.098345
   index_vacuum_time    | 2005.931383
   index_cleanup_time   | 0

[2]
  SELECT * FROM estimate_vacuum_completion();
   pid            | 18764
   relname        | vac_timing_test
   pct_done       | 13.24
   time_remaining | 03:35:39


--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com
Attachment

pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: Include schema-qualified names in publication error messages.
Next
From: shveta malik
Date:
Subject: Re: Proposal: Conflict log history table for Logical Replication