Re: [HACKERS] Block level parallel vacuum - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [HACKERS] Block level parallel vacuum
Date
Msg-id CAA4eK1K-=JqvJoe-MBE0PW0Q6LNtGMcEr65GJDiroFimMDxHBw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Block level parallel vacuum  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Responses Re: [HACKERS] Block level parallel vacuum  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Thu, Nov 21, 2019 at 6:53 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 20 Nov 2019 at 20:36, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Nov 20, 2019 at 4:04 PM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Wed, 20 Nov 2019 at 17:02, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
> > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > >
> > > > > I've attached the latest version patch set. The patch set includes all
> > > > > discussed points regarding index AM options as well as shared cost
> > > > > balance. Also I added some test cases used all types of index AM.
> > > > >
> > > > > During developments I had one concern about the number of parallel
> > > > > workers to launch. In current design each index AMs can choose the
> > > > > participation of parallel bulk-deletion and parallel cleanup. That
> > > > > also means the number of parallel worker to launch might be different
> > > > > for each time of parallel bulk-deletion and parallel cleanup. In
> > > > > current patch the leader will always launch the number of indexes that
> > > > > support either one but it would not be efficient in some cases. For
> > > > > example, if we have 3 indexes supporting only parallel bulk-deletion
> > > > > and 2 indexes supporting only parallel index cleanup, we would launch
> > > > > 5 workers for each execution but some workers will do nothing at all.
> > > > > To deal with this problem, I wonder if we can improve the parallel
> > > > > query so that the leader process creates a parallel context with the
> > > > > maximum number of indexes and can launch a part of workers instead of
> > > > > all of them.
> > > > >
> > > >
> > > > Can't we choose the number of workers as a maximum of
> > > > "num_of_indexes_that_support_bulk_del" and
> > > > "num_of_indexes_that_support_cleanup"?  If we can do that, then we can
> > > > always launch the required number of workers for each phase (bulk_del,
> > > > cleanup).  In your above example, it should choose 3 workers while
> > > > creating a parallel context.  Do you see any problem with that?
> > >
> > > I might be missing something but if we create the parallel context
> > > with 3 workers the leader process always launches 3 workers. Therefore
> > > in the above case it launches 3 workers even in cleanup although 2
> > > workers is enough.
> > >
> >
> > Right, so we can either extend parallel API to launch fewer workers
> > than it has in parallel context as suggested by you or we can use
> > separate parallel context for each phase.  Going with the earlier has
> > the benefit that we don't need to recreate the parallel context and
> > the latter has the advantage that we won't keep additional shared
> > memory allocated.
>
> I also thought to use separate parallel contexts for each phase but
> can the same DSM be used by parallel workers  who initiated from
> different parallel contexts? If not I think that doesn't work because
> the parallel vacuum needs to set data to DSM of ambulkdelete and then
> parallel workers for amvacuumcleanup needs to access it.
>

We can probably copy the stats in local memory instead of pointing it
to dsm after bulk-deletion, but I think that would unnecessary
overhead and doesn't sound like a good idea.

> >  BTW, what kind of API change you have in mind for
> > the approach you are suggesting?
>
> I was thinking to add a new API, say LaunchParallelNWorkers(pcxt, n),
> where n is the number of workers the caller wants to launch and should
> be lower than the value in the parallel context.
>

For that won't you need to duplicate most of the code of
LaunchParallelWorkers or maybe move the entire code in
LaunchParallelNWorkers and then LaunchParallelWorkers can also call
it.  Another idea could be to just extend the existing API
LaunchParallelWorkers to take input parameter as the number of
workers, do you see any problem with that or is there a reason you
prefer to write a new API for this?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Internal function returning pg_statistic
Next
From: Amit Kapila
Date:
Subject: Re: logical decoding : exceeded maxAllocatedDescs for .spill files