Re: Add index scan progress to pg_stat_progress_vacuum - Mailing list pgsql-hackers

From Bossart, Nathan
Subject Re: Add index scan progress to pg_stat_progress_vacuum
Date
Msg-id FD24C9C5-4F9B-4136-9A09-078AE7048B8D@amazon.com
Whole thread Raw
In response to Re: Add index scan progress to pg_stat_progress_vacuum  ("Imseih (AWS), Sami" <simseih@amazon.com>)
Responses Re: Add index scan progress to pg_stat_progress_vacuum
List pgsql-hackers
On 12/29/21, 8:44 AM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote:
> In "pg_stat_progress_vacuum", introduce 2 columns:
>
> * total_index_vacuum : This is the # of indexes that will be vacuumed. Keep in mind that if failsafe mode kicks in
mid-flightto the vacuum, Postgres may choose to forgo index scans. This value will be adjusted accordingly.
 
> * max_index_vacuum_cycle_time : The total elapsed time for a index vacuum cycle is calculated and this value will be
updatedto reflect the longest vacuum cycle. Until the first cycle completes, this value will be 0. The purpose of this
columnis to give the user an idea of how long an index vacuum cycle takes to complete.
 

I think that total_index_vacuum is a good thing to have.  I would
expect this to usually just be the number of indexes on the table, but
as you pointed out, this can be different when we are skipping
indexes.  My only concern with this new column is the potential for
confusion when compared with the index_vacuum_count value.
index_vacuum_count indicates the number of vacuum cycles completed,
but total_index_vacuum indicates the number of indexes that will be
vacuumed.  However, the names sound like they could refer to the same
thing to me.  Perhaps we should rename index_vacuum_count to
index_vacuum_cycles/index_vacuum_cycle_count, and the new column
should be something like num_indexes_to_vacuum or index_vacuum_total.

I don't think we need the max_index_vacuum_cycle_time column.  While
the idea is to give users a rough estimate for how long an index cycle
will take, I don't think it will help generate any meaningful
estimates for how much longer the vacuum operation will take.  IIUC we
won't have any idea how many total index vacuum cycles will be needed.
Even if we did, the current cycle could take much more or much less
time.  Also, none of the other progress views seem to provide any
timing information, which I suspect is by design to avoid inaccurate
estimates.

> Introduce a new view called "pg_stat_progress_vacuum_index". This view will track the progress of a worker ( or
leaderPID ) while it's vacuuming an index. It will expose some key columns:
 
>
> * pid: The PID of the worker process
>
> * leader_pid: The PID of the leader process. This is the column that can be joined with "pg_stat_progress_vacuum".
leader_pidand pid can have the same value as a leader can also perform an index vacuum.
 
>
> * indrelid: The relid of the index currently being vacuumed
>
> * vacuum_cycle_ordinal_position: The processing position of the index being vacuumed. This can be useful to determine
howmany indexes out of the total indexes ( pg_stat_progress_vacuum.total_index_vacuum ) have been vacuumed
 
>
> * index_tuples_vacuumed: This is the number of index tuples vacuumed for the index overall. This is useful to show
thatthe vacuum is actually doing work, as the # of tuples keeps increasing. 
 

Should we also provide some information for determining the progress
of the current cycle?  Perhaps there should be an
index_tuples_vacuumed_current_cycle column that users can compare with
the num_dead_tuples value in pg_stat_progress_vacuum.  However,
perhaps the number of tuples vacuumed in the current cycle can already
be discovered via index_tuples_vacuumed % max_dead_tuples.

+void
+rusage_adjust(const PGRUsage *ru0, PGRUsage *ru1)
+{
+    if (ru1->tv.tv_usec < ru0->tv.tv_usec)
+    {
+        ru1->tv.tv_sec--;
+        ru1->tv.tv_usec += 1000000;
+    }
+    if (ru1->ru.ru_stime.tv_usec < ru0->ru.ru_stime.tv_usec)
+    {
+        ru1->ru.ru_stime.tv_sec--;
+        ru1->ru.ru_stime.tv_usec += 1000000;
+    }
+    if (ru1->ru.ru_utime.tv_usec < ru0->ru.ru_utime.tv_usec)
+    {
+        ru1->ru.ru_utime.tv_sec--;
+        ru1->ru.ru_utime.tv_usec += 1000000;
+    }
+}

I think this function could benefit from a comment.  Without going
through it line by line, it is not clear to me exactly what it is
doing.

I know we're still working on what exactly this stuff should look
like, but I would suggest adding the documentation changes in the near
future.

Nathan


pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: pl/pgsql feature request: shorthand for argument and local variable references
Next
From: Peter Geoghegan
Date:
Subject: Re: Add 64-bit XIDs into PostgreSQL 15