Re: Add index scan progress to pg_stat_progress_vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Add index scan progress to pg_stat_progress_vacuum
Date
Msg-id CAD21AoAnqAM3Eagdat_b9gMDyBRZT4=48FHd=FbZ20JndZjn2w@mail.gmail.com
Whole thread Raw
In response to Re: Add index scan progress to pg_stat_progress_vacuum  ("Imseih (AWS), Sami" <simseih@amazon.com>)
Responses Re: Add index scan progress to pg_stat_progress_vacuum
List pgsql-hackers
On Thu, Jan 12, 2023 at 11:02 PM Imseih (AWS), Sami <simseih@amazon.com> wrote:
>
> Thanks for the feedback and I apologize for the delay in response.
>
> >    I think the problem here is that you're basically trying to work around the
> >    lack of an asynchronous state update mechanism between leader and workers. The
> >    workaround is to add a lot of different places that poll whether there has
> >    been any progress. And you're not doing that by integrating with the existing
> >    machinery for interrupt processing (i.e. CHECK_FOR_INTERRUPTS()), but by
> >    developing a new mechanism.
>
> >    I think your best bet would be to integrate with HandleParallelMessages().
>
> You are correct. I have been trying to work around the async nature
> of parallel workers performing the index vacuum. As you have pointed out,
> integrating with HandleParallelMessages does appear to be the proper way.
> Doing so will also avoid having to do any progress updates in the index AMs.

Very interesting idea. I need to study the parallel query
infrastructure more to consider potential downside of this idea but it
seems okay as far as I researched so far.

> In the attached patch, the parallel workers send a new type of protocol message
> type to the leader called 'P' which signals the leader that it should handle a
> progress update. The leader then performs the progress update by
> invoking a callback set in the ParallelContext. This is done inside  HandleParallelMessages.
> In the index vacuum case, the callback is parallel_vacuum_update_progress.
>
> The new message does not contain a payload, and it's merely used to
> signal the leader that it can invoke a progress update.

Thank you for updating the patch. Here are some comments for v22 patch:

---
+      <para>
+       Number of indexes that will be vacuumed or cleaned up. This
value will be
+       <literal>0</literal> if the phase is not <literal>vacuuming
indexes</literal>
+       or <literal>cleaning up indexes</literal>,
<literal>INDEX_CLEANUP</literal>
+       is set to <literal>OFF</literal>, index vacuum is skipped due to very
+       few dead tuples in the table, or vacuum failsafe is triggered.

I think that if INDEX_CLEANUP is set to OFF or index vacuum is skipped
due to failsafe mode, we enter neither vacuum indexes phase nor
cleanup indexes phase. So probably we can say something like:

Number of indexes that will be vacuumed or cleaned up. This counter only
advances when the phase is vacuuming indexes or cleaning up indexes.

---
-        /* Report that we are now vacuuming indexes */
-        pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
-
PROGRESS_VACUUM_PHASE_VACUUM_INDEX);
+        /*
+         * Report that we are now vacuuming indexes
+         * and the number of indexes to vacuum.
+         */
+        progress_start_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_INDEX;
+        progress_start_val[1] = vacrel->nindexes;
+        pgstat_progress_update_multi_param(2, progress_start_index,
progress_start_val);

According to our code style guideline[1], we limit line lengths so
that the code is readable in an 80-column window. Some comments
updated in this patch seem too short.

---
+                StringInfoData msgbuf;
+
+                pq_beginmessage(&msgbuf, 'P');
+                pq_endmessage(&msgbuf);

I think we can use pq_putmessage() instead.

---
+/* progress callback definition */
+typedef void (*ParallelProgressCallback) (void
*parallel_progress_callback_state);

I think it's better to define "void *arg".

---
+                                /*
+                                 * A Leader process that receives this message
+                                 * must be ready to update progress.
+                                 */
+                                Assert(pcxt->parallel_progress_callback);
+                                Assert(pcxt->parallel_progress_callback_arg);
+
+                                /* Report progress */
+
pcxt->parallel_progress_callback(pcxt->parallel_progress_callback_arg);

I think the parallel query infra should not require
parallel_progress_callback_arg to always be set. I think it can be
NULL.

---
+void
+parallel_vacuum_update_progress(void *arg)
+{
+        ParallelVacuumState *pvs = (ParallelVacuumState *)arg;
+
+        Assert(!IsParallelWorker());
+
+        if (pvs)
+                pgstat_progress_update_param(PROGRESS_VACUUM_INDEX_COMPLETED,
+
   pg_atomic_add_fetch_u32(&(pvs->shared->nindexes_completed), 1));
+}

Since parallel vacuum always sets the arg, I think we don't need to check it.

Regards,

[1] https://www.postgresql.org/docs/devel/source-format.html

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)
Next
From: Jeff Davis
Date:
Subject: Re: Non-superuser subscription owners