Re: Add index scan progress to pg_stat_progress_vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Add index scan progress to pg_stat_progress_vacuum
Date
Msg-id CAD21AoBduTv=AQS_V0or50Fdbz7NjS2o4EWnMCaXTJ9yYJr7ew@mail.gmail.com
Whole thread Raw
In response to Re: Add index scan progress to pg_stat_progress_vacuum  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Add index scan progress to pg_stat_progress_vacuum  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
On Thu, Apr 7, 2022 at 10:20 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Apr 6, 2022 at 5:22 PM Imseih (AWS), Sami <simseih@amazon.com> wrote:
> > >    At the beginning of a parallel operation, we allocate a chunk of>
> > >    dynamic shared memory which persists even after some or all workers
> > >    have exited. It's only torn down at the end of the parallel operation.
> > >    That seems like the appropriate place to be storing any kind of data
> > >    that needs to be propagated between parallel workers. The current
> > >    patch uses the main shared memory segment, which seems unacceptable to
> > >    me.
> >
> > Correct, DSM does track shared data. However only participating
> > processes in the parallel vacuum can attach and lookup this data.
> >
> > The purpose of the main shared memory is to allow a process that
> > Is querying the progress views to retrieve the information.
>
> Sure, but I think that you should likely be doing what Andres
> recommended before:
>
> # Why isn't the obvious thing to do here to provide a way to associate workers
> # with their leaders in shared memory, but to use the existing progress fields
> # to report progress? Then, when querying progress, the leader and workers
> # progress fields can be combined to show the overall progress?
>
> That is, I am imagining that you would want to use DSM to propagate
> data from workers back to the leader and then have the leader report
> the data using the existing progress-reporting facilities. Now, if we
> really need a whole row from each worker that doesn't work, but why do
> we need that?

+1

I also proposed the same idea before[1]. The leader can know how many
indexes are processed so far by checking PVIndStats.status allocated
on DSM for each index. We can have the leader check it and update the
progress information before and after vacuuming one index. If we want
to update the progress information more timely, probably we can pass a
callback function to ambulkdelete and amvacuumcleanup so that the
leader can do that periodically, e.g., every 1000 blocks, while
vacuuming an index.

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoBW6SMJ96CNoMeu%2Bf_BR4jmatPcfVA016FdD2hkLDsaTA%40mail.gmail.com

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: why pg_walfile_name() cannot be executed during recovery?
Next
From: "Jonathan S. Katz"
Date:
Subject: Re: How about a psql backslash command to show GUCs?