Re: Add index scan progress to pg_stat_progress_vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Add index scan progress to pg_stat_progress_vacuum |
Date | |
Msg-id | CAD21AoAnqAM3Eagdat_b9gMDyBRZT4=48FHd=FbZ20JndZjn2w@mail.gmail.com Whole thread Raw |
In response to | Re: Add index scan progress to pg_stat_progress_vacuum ("Imseih (AWS), Sami" <simseih@amazon.com>) |
Responses |
Re: Add index scan progress to pg_stat_progress_vacuum
|
List | pgsql-hackers |
On Thu, Jan 12, 2023 at 11:02 PM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > Thanks for the feedback and I apologize for the delay in response. > > > I think the problem here is that you're basically trying to work around the > > lack of an asynchronous state update mechanism between leader and workers. The > > workaround is to add a lot of different places that poll whether there has > > been any progress. And you're not doing that by integrating with the existing > > machinery for interrupt processing (i.e. CHECK_FOR_INTERRUPTS()), but by > > developing a new mechanism. > > > I think your best bet would be to integrate with HandleParallelMessages(). > > You are correct. I have been trying to work around the async nature > of parallel workers performing the index vacuum. As you have pointed out, > integrating with HandleParallelMessages does appear to be the proper way. > Doing so will also avoid having to do any progress updates in the index AMs. Very interesting idea. I need to study the parallel query infrastructure more to consider potential downside of this idea but it seems okay as far as I researched so far. > In the attached patch, the parallel workers send a new type of protocol message > type to the leader called 'P' which signals the leader that it should handle a > progress update. The leader then performs the progress update by > invoking a callback set in the ParallelContext. This is done inside HandleParallelMessages. > In the index vacuum case, the callback is parallel_vacuum_update_progress. > > The new message does not contain a payload, and it's merely used to > signal the leader that it can invoke a progress update. Thank you for updating the patch. Here are some comments for v22 patch: --- + <para> + Number of indexes that will be vacuumed or cleaned up. This value will be + <literal>0</literal> if the phase is not <literal>vacuuming indexes</literal> + or <literal>cleaning up indexes</literal>, <literal>INDEX_CLEANUP</literal> + is set to <literal>OFF</literal>, index vacuum is skipped due to very + few dead tuples in the table, or vacuum failsafe is triggered. I think that if INDEX_CLEANUP is set to OFF or index vacuum is skipped due to failsafe mode, we enter neither vacuum indexes phase nor cleanup indexes phase. So probably we can say something like: Number of indexes that will be vacuumed or cleaned up. This counter only advances when the phase is vacuuming indexes or cleaning up indexes. --- - /* Report that we are now vacuuming indexes */ - pgstat_progress_update_param(PROGRESS_VACUUM_PHASE, - PROGRESS_VACUUM_PHASE_VACUUM_INDEX); + /* + * Report that we are now vacuuming indexes + * and the number of indexes to vacuum. + */ + progress_start_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_INDEX; + progress_start_val[1] = vacrel->nindexes; + pgstat_progress_update_multi_param(2, progress_start_index, progress_start_val); According to our code style guideline[1], we limit line lengths so that the code is readable in an 80-column window. Some comments updated in this patch seem too short. --- + StringInfoData msgbuf; + + pq_beginmessage(&msgbuf, 'P'); + pq_endmessage(&msgbuf); I think we can use pq_putmessage() instead. --- +/* progress callback definition */ +typedef void (*ParallelProgressCallback) (void *parallel_progress_callback_state); I think it's better to define "void *arg". --- + /* + * A Leader process that receives this message + * must be ready to update progress. + */ + Assert(pcxt->parallel_progress_callback); + Assert(pcxt->parallel_progress_callback_arg); + + /* Report progress */ + pcxt->parallel_progress_callback(pcxt->parallel_progress_callback_arg); I think the parallel query infra should not require parallel_progress_callback_arg to always be set. I think it can be NULL. --- +void +parallel_vacuum_update_progress(void *arg) +{ + ParallelVacuumState *pvs = (ParallelVacuumState *)arg; + + Assert(!IsParallelWorker()); + + if (pvs) + pgstat_progress_update_param(PROGRESS_VACUUM_INDEX_COMPLETED, + pg_atomic_add_fetch_u32(&(pvs->shared->nindexes_completed), 1)); +} Since parallel vacuum always sets the arg, I think we don't need to check it. Regards, [1] https://www.postgresql.org/docs/devel/source-format.html -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: