Re: Add index scan progress to pg_stat_progress_vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Add index scan progress to pg_stat_progress_vacuum
Date
Msg-id CAD21AoDc2UAqZV6JbWyFkCAhauVRxYOz6gNBB-kZ6uY87kBGWQ@mail.gmail.com
Whole thread Raw
In response to Re: Add index scan progress to pg_stat_progress_vacuum  ("Imseih (AWS), Sami" <simseih@amazon.com>)
Responses Re: Add index scan progress to pg_stat_progress_vacuum  ("Imseih (AWS), Sami" <simseih@amazon.com>)
List pgsql-hackers
On Tue, Oct 11, 2022 at 10:50 PM Imseih (AWS), Sami <simseih@amazon.com> wrote:
>
> >    One idea would be to add a flag, say report_parallel_vacuum_progress,
> >    to IndexVacuumInfo struct and expect index AM to check and update the
> >    parallel index vacuum progress, say every 1GB blocks processed. The
> >    flag is true only when the leader process is vacuuming an index.
>
> >    Regards,
>
> Sorry for the long delay on this. I have taken the approach as suggested
> by Sawada-san and Robert and attached is v12.

Thank you for updating the patch!

>
> 1. The patch introduces a new counter in the same shared memory already
> used by the parallel leader and workers to keep track of the number
> of indexes completed. This way there is no reason to loop through
> the index status every time we want to get the status of indexes completed.

While it seems to be a good idea to have the atomic counter for the
number of indexes completed, I think we should not use the global
variable referencing the counter as follow:

+static pg_atomic_uint32 *index_vacuum_completed = NULL;
:
+void
+parallel_vacuum_progress_report(void)
+{
+   if (IsParallelWorker() || !report_parallel_vacuum_progress)
+       return;
+
+   pgstat_progress_update_param(PROGRESS_VACUUM_INDEX_COMPLETED,
+                                pg_atomic_read_u32(index_vacuum_completed));
+}

I think we can pass ParallelVacuumState (or PVIndStats) to the
reporting function so that it can check the counter and report the
progress.

> 2. A new function in vacuumparallel.c will be used to update
> the progress of indexes completed by reading from the
> counter created in point #1.
>
> 3. The function is called during the vacuum_delay_point as a
> matter of convenience, since this is called in all major vacuum
> loops. The function will only do something if the caller
> sets a boolean to report progress. Doing so will also ensure
> progress is being reported in case the parallel workers completed
> before the leader.

Robert pointed out:

---
I am not too sure that the idea of using the vacuum delay points is the best
plan. I think we should try to avoid piggybacking on such general
infrastructure if we can, and instead look for a way to tie this to
something that is specific to parallel vacuum.
---

I agree with this part.

Instead, I think we can add a boolean and the pointer for
ParallelVacuumState to IndexVacuumInfo. If the boolean is true, an
index AM can call the reporting function with the pointer to
ParallelVacuumState while scanning index blocks, for example, for
every 1GB. The boolean can be true only for the leader process.

>
> 4. Rather than adding any complexity to WaitForParallelWorkersToFinish
> and introducing a new callback, vacuumparallel.c will wait until
> the number of vacuum workers is 0 and then call
> WaitForParallelWorkersToFinish as it does currently.

Agreed, but with the following change, the leader process waits in a
busy loop, which should not be acceptable:

+   if (VacuumActiveNWorkers)
+   {
+       while (pg_atomic_read_u32(VacuumActiveNWorkers) > 0)
+       {
+           parallel_vacuum_progress_report();
+       }
+   }
+

Also, I think it's better to check whether idx_completed_progress
equals to the number indexes instead.

> 5. Went back to the idea of adding a new view called pg_stat_progress_vacuum_index
> which is accomplished by adding a new type called VACUUM_PARALLEL in progress.h

Probably, we can devide the patch into two patches. One for adding the
new statistics of the number of vacuumed indexes to
pg_stat_progress_vacuum and another one for adding new statistics view
pg_stat_progress_vacuum_index.

Regards,

-- 
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Ronan Dunklau
Date:
Subject: Re: Fix gin index cost estimation
Next
From: Виктория Шепард
Date:
Subject: Re: Re[2]: Possible solution for masking chosen columns when using pg_dump