Re: [HACKERS] Block level parallel vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [HACKERS] Block level parallel vacuum
Date
Msg-id CA+fd4k4_FA+TMMENYTh-4NfOQ1h-OxitAS8p813fvg5JtwE_OA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Block level parallel vacuum  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [HACKERS] Block level parallel vacuum  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Wed, 20 Nov 2019 at 20:36, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 20, 2019 at 4:04 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Wed, 20 Nov 2019 at 17:02, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > I've attached the latest version patch set. The patch set includes all
> > > > discussed points regarding index AM options as well as shared cost
> > > > balance. Also I added some test cases used all types of index AM.
> > > >
> > > > During developments I had one concern about the number of parallel
> > > > workers to launch. In current design each index AMs can choose the
> > > > participation of parallel bulk-deletion and parallel cleanup. That
> > > > also means the number of parallel worker to launch might be different
> > > > for each time of parallel bulk-deletion and parallel cleanup. In
> > > > current patch the leader will always launch the number of indexes that
> > > > support either one but it would not be efficient in some cases. For
> > > > example, if we have 3 indexes supporting only parallel bulk-deletion
> > > > and 2 indexes supporting only parallel index cleanup, we would launch
> > > > 5 workers for each execution but some workers will do nothing at all.
> > > > To deal with this problem, I wonder if we can improve the parallel
> > > > query so that the leader process creates a parallel context with the
> > > > maximum number of indexes and can launch a part of workers instead of
> > > > all of them.
> > > >
> > >
> > > Can't we choose the number of workers as a maximum of
> > > "num_of_indexes_that_support_bulk_del" and
> > > "num_of_indexes_that_support_cleanup"?  If we can do that, then we can
> > > always launch the required number of workers for each phase (bulk_del,
> > > cleanup).  In your above example, it should choose 3 workers while
> > > creating a parallel context.  Do you see any problem with that?
> >
> > I might be missing something but if we create the parallel context
> > with 3 workers the leader process always launches 3 workers. Therefore
> > in the above case it launches 3 workers even in cleanup although 2
> > workers is enough.
> >
>
> Right, so we can either extend parallel API to launch fewer workers
> than it has in parallel context as suggested by you or we can use
> separate parallel context for each phase.  Going with the earlier has
> the benefit that we don't need to recreate the parallel context and
> the latter has the advantage that we won't keep additional shared
> memory allocated.

I also thought to use separate parallel contexts for each phase but
can the same DSM be used by parallel workers  who initiated from
different parallel contexts? If not I think that doesn't work because
the parallel vacuum needs to set data to DSM of ambulkdelete and then
parallel workers for amvacuumcleanup needs to access it.

>  BTW, what kind of API change you have in mind for
> the approach you are suggesting?

I was thinking to add a new API, say LaunchParallelNWorkers(pcxt, n),
where n is the number of workers the caller wants to launch and should
be lower than the value in the parallel context.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: checkpointer: PANIC: could not fsync file: No such file ordirectory
Next
From: Michael Paquier
Date:
Subject: Re: Hypothetical indexes using BRIN broken since pg10