Re: [HACKERS] Block level parallel vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [HACKERS] Block level parallel vacuum
Date
Msg-id CA+fd4k4Gi1yrrSbfb_8gbOLYeMOi7ZaKv1n0c9aUFn8nJo4Wng@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Block level parallel vacuum  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Thu, 9 Jan 2020 at 19:33, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jan 9, 2020 at 10:41 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Wed, 8 Jan 2020 at 22:16, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > What do you think of the attached?  Sawada-san, kindly verify the
> > > changes and let me know your opinion.
> >
> > I agreed to not include both the FAST option patch and
> > DISABLE_LEADER_PARTICIPATION patch at this stage. It's better to focus
> > on the main part and we can discuss and add them later if want.
> >
> > I've looked at the latest version patch you shared. Overall it looks
> > good and works fine. I have a few small comments:
> >
>
> I have addressed all your comments and slightly change nearby comments
> and ran pgindent.  I think we can commit the first two preparatory
> patches now unless you or someone else has any more comments on those.

Yes.

I'd like to briefly summarize the
v43-0002-Allow-vacuum-command-to-process-indexes-in-parallel for other
reviewers who wants to newly starts to review this patch:

Introduce PARALLEL option to VACUUM command. Parallel vacuum is
enabled by default. The number of parallel workers is determined based
on the number of indexes that support parallel index when user didn't
specify the parallel degree or PARALLEL option is omitted. Specifying
PARALLEL 0 disables parallel vacuum.

In parallel vacuum of this patch, only the leader process does heap
scan and collect dead tuple TIDs on the DSM segment. Before starting
index vacuum or index cleanup the leader launches the parallel workers
and perform it together with parallel workers. Individual index are
processed by one vacuum worker process. Therefore parallel vacuum can
be used when the table has at least 2 indexes (the leader always takes
one index). After launched parallel workers, the leader process
vacuums indexes first that don't support parallel index after launched
parallel workers. The parallel workers process indexes that support
parallel index vacuum and the leader process join as a worker after
completing such indexes. Once all indexes are processed the parallel
worker processes exit.  After that, the leader process re-initializes
the parallel context so that it can use the same DSM for multiple
passes of index vacuum and for performing index cleanup.  For updating
the index statistics, we need to update the system table and since
updates are not allowed during parallel mode we update the index
statistics after exiting from the parallel mode.

When the vacuum cost-based delay is enabled, even parallel vacuum is
throttled. The basic idea of a cost-based vacuum delay for parallel
index vacuuming is to allow all parallel vacuum workers including the
leader process to have a shared view of cost related parameters
(mainly VacuumCostBalance). We allow each worker to update it as and
when it has incurred any cost and then based on that decide whether it
needs to sleep.  We allow the worker to sleep proportional to the work
done and reduce the VacuumSharedCostBalance by the amount which is
consumed by the current worker (VacuumCostBalanceLocal).  This can
avoid letting the workers sleep who have done less or no I/O as
compared to other workers and therefore can ensure that workers who
are doing more I/O got throttled more.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Mahendra Singh Thalor
Date:
Subject: Re: [HACKERS] Block level parallel vacuum
Next
From: Tom Lane
Date:
Subject: Re: logical decoding : exceeded maxAllocatedDescs for .spill files