Re: Add index scan progress to pg_stat_progress_vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Add index scan progress to pg_stat_progress_vacuum |
Date | |
Msg-id | CAD21AoDc2UAqZV6JbWyFkCAhauVRxYOz6gNBB-kZ6uY87kBGWQ@mail.gmail.com Whole thread Raw |
In response to | Re: Add index scan progress to pg_stat_progress_vacuum ("Imseih (AWS), Sami" <simseih@amazon.com>) |
Responses |
Re: Add index scan progress to pg_stat_progress_vacuum
|
List | pgsql-hackers |
On Tue, Oct 11, 2022 at 10:50 PM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > > One idea would be to add a flag, say report_parallel_vacuum_progress, > > to IndexVacuumInfo struct and expect index AM to check and update the > > parallel index vacuum progress, say every 1GB blocks processed. The > > flag is true only when the leader process is vacuuming an index. > > > Regards, > > Sorry for the long delay on this. I have taken the approach as suggested > by Sawada-san and Robert and attached is v12. Thank you for updating the patch! > > 1. The patch introduces a new counter in the same shared memory already > used by the parallel leader and workers to keep track of the number > of indexes completed. This way there is no reason to loop through > the index status every time we want to get the status of indexes completed. While it seems to be a good idea to have the atomic counter for the number of indexes completed, I think we should not use the global variable referencing the counter as follow: +static pg_atomic_uint32 *index_vacuum_completed = NULL; : +void +parallel_vacuum_progress_report(void) +{ + if (IsParallelWorker() || !report_parallel_vacuum_progress) + return; + + pgstat_progress_update_param(PROGRESS_VACUUM_INDEX_COMPLETED, + pg_atomic_read_u32(index_vacuum_completed)); +} I think we can pass ParallelVacuumState (or PVIndStats) to the reporting function so that it can check the counter and report the progress. > 2. A new function in vacuumparallel.c will be used to update > the progress of indexes completed by reading from the > counter created in point #1. > > 3. The function is called during the vacuum_delay_point as a > matter of convenience, since this is called in all major vacuum > loops. The function will only do something if the caller > sets a boolean to report progress. Doing so will also ensure > progress is being reported in case the parallel workers completed > before the leader. Robert pointed out: --- I am not too sure that the idea of using the vacuum delay points is the best plan. I think we should try to avoid piggybacking on such general infrastructure if we can, and instead look for a way to tie this to something that is specific to parallel vacuum. --- I agree with this part. Instead, I think we can add a boolean and the pointer for ParallelVacuumState to IndexVacuumInfo. If the boolean is true, an index AM can call the reporting function with the pointer to ParallelVacuumState while scanning index blocks, for example, for every 1GB. The boolean can be true only for the leader process. > > 4. Rather than adding any complexity to WaitForParallelWorkersToFinish > and introducing a new callback, vacuumparallel.c will wait until > the number of vacuum workers is 0 and then call > WaitForParallelWorkersToFinish as it does currently. Agreed, but with the following change, the leader process waits in a busy loop, which should not be acceptable: + if (VacuumActiveNWorkers) + { + while (pg_atomic_read_u32(VacuumActiveNWorkers) > 0) + { + parallel_vacuum_progress_report(); + } + } + Also, I think it's better to check whether idx_completed_progress equals to the number indexes instead. > 5. Went back to the idea of adding a new view called pg_stat_progress_vacuum_index > which is accomplished by adding a new type called VACUUM_PARALLEL in progress.h Probably, we can devide the patch into two patches. One for adding the new statistics of the number of vacuumed indexes to pg_stat_progress_vacuum and another one for adding new statistics view pg_stat_progress_vacuum_index. Regards, -- Masahiko Sawada PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: