Re: [HACKERS] Block level parallel vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [HACKERS] Block level parallel vacuum
Date
Msg-id CA+fd4k7z_O==GDzxbRc5wcCKCu_NbZtB3f3Q0xrgDms1X0FDxA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Block level parallel vacuum  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: [HACKERS] Block level parallel vacuum  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Tue, 12 Nov 2019 at 18:26, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Nov 12, 2019 at 7:43 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Mon, 11 Nov 2019 at 19:29, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Mon, Nov 11, 2019 at 12:26 PM Masahiko Sawada
> > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > >
> > > > > After more thoughts, I think we can have a ternary value: never,
> > > > > always, once. If it's 'never' the index never participates in parallel
> > > > > cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the
> > > > > index always participates regardless of vacrelstats->num_index_scan. I
> > > > > guess gin, brin and bloom use 'always'. Finally if it's 'once' the
> > > > > index participates in parallel cleanup only when it's the first time
> > > > > (that is, vacrelstats->num_index_scan == 0), I guess btree, gist and
> > > > > spgist use 'once'.
> > > > >
> > > >
> > > > I think this 'once' option is confusing especially because it also
> > > > depends on 'num_index_scans' which the IndexAM has no control over.
> > > > It might be that the option name is not good, but I am not sure.
> > > > Another thing is that for brin indexes, we don't want bulkdelete to
> > > > participate in parallelism.
> > >
> > > I thought brin should set amcanparallelvacuum is false and
> > > amcanparallelcleanup is 'always'.
> > >
> >
> > In that case, it is better to name the variable as amcanparallelbulkdelete.
> >
> > > > Do we want to have separate variables for
> > > > ambulkdelete and amvacuumcleanup which decides whether the particular
> > > > phase can be done in parallel?
> > >
> > > You mean adding variables to ambulkdelete and amvacuumcleanup as
> > > function arguments?
> > >
> >
> > No, I mean separate variables amcanparallelbulkdelete (bool) and
> > amcanparallelvacuumcleanup (unit16) variables.
> >
> > >
> > > > Another possibility could be to just
> > > > have one variable (say uint16 amparallelvacuum) which will tell us all
> > > > the options but I don't think that will be a popular approach
> > > > considering all the other methods and variables exposed.  What do you
> > > > think?
> > >
> > > Adding only one variable that can have flags would also be a good
> > > idea, instead of having multiple variables for each option. For
> > > instance FDW API uses such interface (see eflags of BeginForeignScan).
> > >
> >
> > Yeah, maybe something like amparallelvacuumoptions.  The options can be:
> >
> > VACUUM_OPTION_NO_PARALLEL   0 # vacuum (neither bulkdelete nor
> > vacuumcleanup) can't be performed in parallel
> > VACUUM_OPTION_NO_PARALLEL_CLEANUP  1 # vacuumcleanup cannot be
> > performed in parallel (hash index will set this flag)
>
> Maybe we don't want this option?  because if 3 or 4 is not set then we
> will not do the cleanup in parallel right?
>
> > VACUUM_OPTION_PARALLEL_BULKDEL   2 # bulkdelete can be done in
> > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > flag)
> > VACUUM_OPTION_PARALLEL_COND_CLEANUP  3 # vacuumcleanup can be done in
> > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash,
> > gin, gist, spgist, bloom will set this flag)
> > VACUUM_OPTION_PARALLEL_CLEANUP  4 # vacuumcleanup can be done in
> > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > and bloom will set this flag)
> >
> > Does something like this make sense?

3 and 4 confused me because 4 also looks conditional. How about having
two flags instead: one for doing parallel cleanup when not performed
yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing
always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)? That way, we
can have flags as follows and index AM chooses two flags, one from the
first two flags for bulk deletion and another from next three flags
for cleanup.

VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0
VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1
VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2
VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3
VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4

> Yeah, something like that seems better to me.
>
> > If we all agree on this, then I
> > think we can summarize the part of the discussion related to this API
> > and get feedback from a broader audience.
>
> Make sense.

+1

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: cost based vacuum (parallel)
Next
From: Jeevan Ladhe
Date:
Subject: Re: Extension development