Re: [HACKERS] Block level parallel vacuum - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: [HACKERS] Block level parallel vacuum
Date
Msg-id CAFiTN-trDu37W9u76oTQ+RGUVFoQ3zEvHdRPhXE3S_sWpTzkyw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Block level parallel vacuum  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
List pgsql-hackers
On Mon, Nov 11, 2019 at 12:26 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Mon, 11 Nov 2019 at 15:06, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, Nov 11, 2019 at 9:57 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Fri, 8 Nov 2019 at 18:48, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > > > updated version patch that also incorporated some comments I got so
> > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > > > test the total delay time.
> > > > >
> > > >
> > > > + /*
> > > > + * Generally index cleanup does not scan the index when index
> > > > + * vacuuming (ambulkdelete) was already performed.  So we perform
> > > > + * index cleanup with parallel workers only if we have not
> > > > + * performed index vacuuming yet.  Otherwise, we do it in the
> > > > + * leader process alone.
> > > > + */
> > > > + if (vacrelstats->num_index_scans == 0)
> > > > + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
> > > > + stats, lps);
> > > >
> > > > Today, I was thinking about this point where this check will work for
> > > > most cases, but still, exceptions are there like for brin index, the
> > > > main work is done in amvacuumcleanup function.  Similarly, I think
> > > > there are few more indexes like gin, bloom where it seems we take
> > > > another pass over-index in the amvacuumcleanup phase.  Don't you think
> > > > we should try to allow parallel workers for such cases?  If so, I
> > > > don't have any great ideas on how to do that, but what comes to my
> > > > mind is to indicate that via stats (
> > > > IndexBulkDeleteResult) or via an indexam API.  I am not sure if it is
> > > > acceptable to have indexam API for this.
> > > >
> > > > Thoughts?
> > >
> > > Good point. gin and bloom do a certain heavy work during cleanup and
> > > during bulkdelete as you mentioned. Brin does it during cleanup, and
> > > hash and gist do it during bulkdelete. There are three types of index
> > > AM just inside postgres code. An idea I came up with is that we can
> > > control parallel vacuum and parallel cleanup separately.  That is,
> > > adding a variable amcanparallelcleanup and we can do parallel cleanup
> > > on only indexes of which amcanparallelcleanup is true. IndexBulkDelete
> > > can be stored locally if both amcanparallelvacuum and
> > > amcanparallelcleanup of an index are false because only the leader
> > > process deals with such indexes. Otherwise we need to store it in DSM
> > > as always.
> > >
> > IIUC,  amcanparallelcleanup will be true for those indexes which does
> > heavy work during cleanup irrespective of whether bulkdelete is called
> > or not e.g. gin?
>
> Yes, I guess that gin and brin set amcanparallelcleanup to true (gin
> might set amcanparallevacuum to true as well).
>
> >  If so, along with an amcanparallelcleanup flag, we
> > need to consider vacrelstats->num_index_scans right? So if
> > vacrelstats->num_index_scans == 0 then we need to launch parallel
> > worker for all the indexes who support amcanparallelvacuum and if
> > vacrelstats->num_index_scans > 0 then only for those who has
> > amcanparallelcleanup as true.
>
> Yes, you're right. But this won't work fine for brin indexes who don't
> want to participate in parallel vacuum but always want to participate
> in parallel cleanup.
Yeah, right.
>
> After more thoughts, I think we can have a ternary value: never,
> always, once. If it's 'never' the index never participates in parallel
> cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the
> index always participates regardless of vacrelstats->num_index_scan. I
> guess gin, brin and bloom use 'always'. Finally if it's 'once' the
> index participates in parallel cleanup only when it's the first time
> (that is, vacrelstats->num_index_scan == 0), I guess btree, gist and
> spgist use 'once'.
Yeah, this make sense to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Ashutosh Sharma
Date:
Subject: Re: tableam vs. TOAST
Next
From: Michael Paquier
Date:
Subject: Re: 'Invalid lp' during heap_xlog_delete