Re: Add index scan progress to pg_stat_progress_vacuum - Mailing list pgsql-hackers
From | Bossart, Nathan |
---|---|
Subject | Re: Add index scan progress to pg_stat_progress_vacuum |
Date | |
Msg-id | 7874FB21-FAA5-49BD-8386-2866552656C7@amazon.com Whole thread Raw |
In response to | Add index scan progress to pg_stat_progress_vacuum ("Imseih (AWS), Sami" <simseih@amazon.com>) |
Responses |
Re: Add index scan progress to pg_stat_progress_vacuum
Re: Add index scan progress to pg_stat_progress_vacuum |
List | pgsql-hackers |
On 12/1/21, 3:02 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: > The current implementation of pg_stat_progress_vacuum does not > provide progress on which index is being vacuumed making it > difficult for a user to determine if the "vacuuming indexes" phase > is making progress. By exposing which index is being scanned as well > as the total progress the scan has made for the current cycle, a > user can make better estimations on when the vacuum will complete. +1 > The proposed patch adds 4 new columns to pg_stat_progress_vacuum: > > 1. indrelid - the relid of the index being vacuumed > 2. index_blks_total - total number of blocks to be scanned in the > current cycle > 3. index_blks_scanned - number of blocks scanned in the current > cycle > 4. leader_pid - if the pid for the pg_stat_progress_vacuum entry is > a leader or a vacuum worker. This patch places an entry for every > worker pid ( if parallel ) as well as the leader pid nitpick: Shouldn't index_blks_scanned be index_blks_vacuumed? IMO it is more analogous to heap_blks_vacuumed. This will tell us which indexes are currently being vacuumed and the current progress of those operations, but it doesn't tell us which indexes have already been vacuumed or which ones are pending vacuum. I think such information is necessary to truly understand the current progress of vacuuming indexes, and I can think of a couple of ways we might provide it: 1. Make the new columns you've proposed return arrays. This isn't very clean, but it would keep all the information for a given vacuum operation in a single row. The indrelids column would be populated with all the indexes that have been vacuumed, need to be vacuumed, or are presently being vacuumed. The other index- related columns would then have the associated stats and the worker PID (which might be the same as the pid column depending on whether parallel index vacuum was being done). Alternatively, the index column could have an array of records, each containing all the information for a given index. 2. Create a new view for just index vacuum progress information. This would have similar information as 1. There would be an entry for each index that has been vacuumed, needs to be vacuumed, or is currently being vacuumed. And there would be an easy way to join with pg_stat_progress_vacuum (e.g., leader_pid, which again might be the same as our index vacuum PID depending on whether we were doing parallel index vacuum). Note that it would be possible for the PID of these entries to be null before and after we process the index. 3. Instead of adding columns to pg_stat_progress_vacuum, adjust the current ones to be more general, and then add new entries for each of the indexes that have been, need to be, or currently are being vacuumed. This is the most similar option to your current proposal, but instead of introducing a column like index_blks_total, we'd rename heap_blks_total to blks_total and use that for both the heap and indexes. I think we'd still want to add a leader_pid column. Again, we have to be prepared for the PID to be null in this case. Or we could just make the pid column always refer to the leader, and we could introduce a worker_pid column. That might create confusion, though. I wish option #1 was cleaner, because I think it would be really nice to have all this information in a single row. However, I don't expect much support for a 3-dimensional view, so I suspect option #2 (creating a separate view for index vacuum progress) is the way to go. The other benefit of option #2 versus option #3 or your original proposal is that it cleanly separates the top-level vacuum operations and the index vacuum operations, which are related at the moment, but which might not always be tied so closely together. Nathan
pgsql-hackers by date: