Home > mailing lists

Re: Add index scan progress to pg_stat_progress_vacuum - Mailing list pgsql-hackers

From	Masahiko Sawada
Subject	Re: Add index scan progress to pg_stat_progress_vacuum
Date	April 7, 2022 15:38:36
Msg-id	CAD21AoBduTv=AQS_V0or50Fdbz7NjS2o4EWnMCaXTJ9yYJr7ew@mail.gmail.com Whole thread
In response to	Re: Add index scan progress to pg_stat_progress_vacuum (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Add index scan progress to pg_stat_progress_vacuum
List	pgsql-hackers

Tree view

On Thu, Apr 7, 2022 at 10:20 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Apr 6, 2022 at 5:22 PM Imseih (AWS), Sami <simseih@amazon.com> wrote:
> > >    At the beginning of a parallel operation, we allocate a chunk of>
> > >    dynamic shared memory which persists even after some or all workers
> > >    have exited. It's only torn down at the end of the parallel operation.
> > >    That seems like the appropriate place to be storing any kind of data
> > >    that needs to be propagated between parallel workers. The current
> > >    patch uses the main shared memory segment, which seems unacceptable to
> > >    me.
> >
> > Correct, DSM does track shared data. However only participating
> > processes in the parallel vacuum can attach and lookup this data.
> >
> > The purpose of the main shared memory is to allow a process that
> > Is querying the progress views to retrieve the information.
>
> Sure, but I think that you should likely be doing what Andres
> recommended before:
>
> # Why isn't the obvious thing to do here to provide a way to associate workers
> # with their leaders in shared memory, but to use the existing progress fields
> # to report progress? Then, when querying progress, the leader and workers
> # progress fields can be combined to show the overall progress?
>
> That is, I am imagining that you would want to use DSM to propagate
> data from workers back to the leader and then have the leader report
> the data using the existing progress-reporting facilities. Now, if we
> really need a whole row from each worker that doesn't work, but why do
> we need that?

+1

I also proposed the same idea before[1]. The leader can know how many
indexes are processed so far by checking PVIndStats.status allocated
on DSM for each index. We can have the leader check it and update the
progress information before and after vacuuming one index. If we want
to update the progress information more timely, probably we can pass a
callback function to ambulkdelete and amvacuumcleanup so that the
leader can do that periodically, e.g., every 1000 blocks, while
vacuuming an index.

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoBW6SMJ96CNoMeu%2Bf_BR4jmatPcfVA016FdD2hkLDsaTA%40mail.gmail.com

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

pgsql-hackers by date:

From: Robert Haas
Date: 07 April 2022, 15:37:15
Subject: Re: why pg_walfile_name() cannot be executed during recovery?

From: "Jonathan S. Katz"
Date: 07 April 2022, 15:39:03
Subject: Re: How about a psql backslash command to show GUCs?

Re: Add index scan progress to pg_stat_progress_vacuum - Mailing list pgsql-hackers

Previous

Next