Re: [HACKERS] Block level parallel vacuum - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: [HACKERS] Block level parallel vacuum
Date
Msg-id CAFiTN-scevRCZdCLpFkaSq7a_+Euc2umKSiYngsQJnyk9WYQ3w@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Block level parallel vacuum  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: [HACKERS] Block level parallel vacuum  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Re: [HACKERS] Block level parallel vacuum  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Nov 21, 2019 at 10:46 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Mon, 18 Nov 2019 at 15:38, Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Mon, 18 Nov 2019 at 15:34, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Mon, Nov 18, 2019 at 11:37 AM Masahiko Sawada
> > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > >
> > > > > On Wed, 13 Nov 2019 at 14:31, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > >
> > > > > > Based on these needs, we came up with a way to allow users to specify
> > > > > > this information for IndexAm's. Basically, Indexam will expose a
> > > > > > variable amparallelvacuumoptions which can have below options
> > > > > >
> > > > > > VACUUM_OPTION_NO_PARALLEL   1 << 0 # vacuum (neither bulkdelete nor
> > > > > > vacuumcleanup) can't be performed in parallel
> > > > >
> > > > > I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs who don't
> > > > > want to support parallel vacuum don't have to set anything.
> > > > >
> > > >
> > > > make sense.
> > > >
> > > > > > VACUUM_OPTION_PARALLEL_BULKDEL   1 << 1 # bulkdelete can be done in
> > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > > > > flag)
> > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  1 << 2 # vacuumcleanup can be
> > > > > > done in parallel if bulkdelete is not performed (Indexes nbtree, brin,
> > > > > > gin, gist,
> > > > > > spgist, bloom will set this flag)
> > > > > > VACUUM_OPTION_PARALLEL_CLEANUP  1 << 3 # vacuumcleanup can be done in
> > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > > > > and bloom will set this flag)
> > > > >
> > > > > I think gin and bloom don't need to set both but should set only
> > > > > VACUUM_OPTION_PARALLEL_CLEANUP.
> > > > >
> > > > > And I'm going to disallow index AMs to set both
> > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP and VACUUM_OPTION_PARALLEL_CLEANUP
> > > > > by assertions, is that okay?
> > > > >
> > > >
> > > > Sounds reasonable to me.
> > > >
> > > > Are you planning to include the changes related to I/O throttling
> > > > based on the discussion in the nearby thread [1]?  I think you can do
> > > > that if you agree with the conclusion in the last email[1], otherwise,
> > > > we can explore it separately.
> > >
> > > Yes I agreed. I'm going to include that changes in the next version
> > > patches. And I think we will be able to do more discussion based on
> > > the patch.
> > >
> >
> > I've attached the latest version patch set. The patch set includes all
> > discussed points regarding index AM options as well as shared cost
> > balance. Also I added some test cases used all types of index AM.
> >
> > During developments I had one concern about the number of parallel
> > workers to launch. In current design each index AMs can choose the
> > participation of parallel bulk-deletion and parallel cleanup. That
> > also means the number of parallel worker to launch might be different
> > for each time of parallel bulk-deletion and parallel cleanup. In
> > current patch the leader will always launch the number of indexes that
> > support either one but it would not be efficient in some cases. For
> > example, if we have 3 indexes supporting only parallel bulk-deletion
> > and 2 indexes supporting only parallel index cleanup, we would launch
> > 5 workers for each execution but some workers will do nothing at all.
> > To deal with this problem, I wonder if we can improve the parallel
> > query so that the leader process creates a parallel context with the
> > maximum number of indexes and can launch a part of workers instead of
> > all of them.
> >
> +
> + /* compute new balance by adding the local value */
> + shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance);
> + new_balance = shared_balance + VacuumCostBalance;
>
> + /* also compute the total local balance */
> + local_balance = VacuumCostBalanceLocal + VacuumCostBalance;
> +
> + if ((new_balance >= VacuumCostLimit) &&
> + (local_balance > 0.5 * (VacuumCostLimit / nworkers)))
> + {
> + /* compute sleep time based on the local cost balance */
> + msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;
> + new_balance = shared_balance - VacuumCostBalanceLocal;
> + VacuumCostBalanceLocal = 0;
> + }
> +
> + if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance,
> +    &shared_balance,
> +    new_balance))
> + {
> + /* Updated successfully, break */
> + break;
> + }
> While looking at the shared costing delay part, I have noticed that
> while checking the delay condition, we are considering local_balance
> which is VacuumCostBalanceLocal + VacuumCostBalance, but while
> computing the new balance we only reduce shared balance by
> VacuumCostBalanceLocal,  I think it should be reduced with
> local_balance?  I see that later we are adding VacuumCostBalance to
> the VacuumCostBalanceLocal so we are not loosing accounting for this
> balance.  But, I feel it is not right that we compare based on one
> value and operate based on other. I think we can immediately set
> VacuumCostBalanceLocal += VacuumCostBalance before checking the
> condition.
>

+/*
+ * index_parallelvacuum_estimate - estimate shared memory for parallel vacuum
+ *
+ * Currently, we don't pass any information to the AM-specific estimator,
+ * so it can probably only return a constant.  In the future, we might need
+ * to pass more information.
+ */
+Size
+index_parallelvacuum_estimate(Relation indexRelation)
+{
+ Size nbytes;
+
+ RELATION_CHECKS;
+
+ /*
+ * If amestimateparallelvacuum is not provided, assume only
+ * IndexBulkDeleteResult is needed.
+ */
+ if (indexRelation->rd_indam->amestimateparallelvacuum != NULL)
+ {
+ nbytes = indexRelation->rd_indam->amestimateparallelvacuum();
+ Assert(nbytes >= MAXALIGN(sizeof(IndexBulkDeleteResult)));
+ }
+ else
+ nbytes = MAXALIGN(sizeof(IndexBulkDeleteResult));
+
+ return nbytes;
+}

In v33-0001-Add-index-AM-field-and-callback-for-parallel-ind patch,  I
am a bit doubtful about this kind of arrangement, where the code in
the "if" is always unreachable with the current AMs.  I am not sure
what is the best way to handle this, should we just drop the
amestimateparallelvacuum altogether?  Because currently, we are just
providing a size estimate function without a copy function,  even if
the in future some Am give an estimate about the size of the stats, we
can not directly memcpy the stat from the local memory to the shared
memory, we might then need a copy function also from the AM so that it
can flatten the stats and store in proper format?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [PATCH] Do not use StdRdOptions in Access Methods
Next
From: vignesh C
Date:
Subject: Re: dropdb --force