Re: cost based vacuum (parallel) - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: cost based vacuum (parallel)
Date
Msg-id CA+fd4k4v=PHYkpOjEV8unF6MBGpmu7=6B8n6qQEqkEWop7b6gg@mail.gmail.com
Whole thread Raw
In response to Re: cost based vacuum (parallel)  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: cost based vacuum (parallel)  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Wed, 6 Nov 2019 at 15:45, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Nov 5, 2019 at 11:28 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Nov 4, 2019 at 11:42 PM Andres Freund <andres@anarazel.de> wrote:
> > >
> > >
> > > > The two approaches to solve this problem being discussed in that
> > > > thread [1] are as follows:
> > > > (a) Allow the parallel workers and master backend to have a shared
> > > > view of vacuum cost related parameters (mainly VacuumCostBalance) and
> > > > allow each worker to update it and then based on that decide whether
> > > > it needs to sleep.  Sawada-San has done the POC for this approach.
> > > > See v32-0004-PoC-shared-vacuum-cost-balance in email [2].  One
> > > > drawback of this approach could be that we allow the worker to sleep
> > > > even though the I/O has been performed by some other worker.
> > >
> > > I don't understand this drawback.
> > >
> >
> > I think the problem could be that the system is not properly throttled
> > when it is supposed to be.  Let me try by a simple example, say we
> > have two workers w-1 and w-2.  The w-2 is primarily doing the I/O and
> > w-1 is doing very less I/O but unfortunately whenever w-1 checks it
> > finds that cost_limit has exceeded and it goes for sleep, but w-1
> > still continues.
> >
>
> Typo in the above sentence.  /but w-1 still continues/but w-2 still continues.
>
> >  Now in such a situation even though we have made one
> > of the workers slept for a required time but ideally the worker which
> > was doing I/O should have slept.  The aim is to make the system stop
> > doing I/O whenever the limit has exceeded, so that might not work in
> > the above situation.
> >
>
> One idea to fix this drawback is that if we somehow avoid letting the
> workers sleep which has done less or no I/O as compared to other
> workers, then we can to a good extent ensure that workers which are
> doing more I/O will be throttled more.  What we can do is to allow any
> worker sleep only if it has performed the I/O above a certain
> threshold and the overall balance is more than the cost_limit set by
> the system.  Then we will allow the worker to sleep proportional to
> the work done by it and reduce the VacuumSharedCostBalance by the
> amount which is consumed by the current worker.  Something like:
>
> If ( VacuumSharedCostBalance >= VacuumCostLimit &&
>     MyCostBalance > (threshold) VacuumCostLimit / workers)
> {
> VacuumSharedCostBalance -= MyCostBalance;
> Sleep (delay * MyCostBalance/VacuumSharedCostBalance)
> }
>
> Assume threshold be 0.5, what that means is, if it has done work more
> than 50% of what is expected from this worker and the overall share
> cost balance is exceeded, then we will consider this worker to sleep.
>
> What do you guys think?

I think the idea that the more consuming I/O they sleep more longer
time seems good. There seems not to be the drawback of approach(b)
that is to unnecessarily delay vacuum if some indexes are very small
or bulk-deletions of indexes does almost nothing such as brin. But on
the other hand it's possible that workers don't sleep even if shared
cost balance already exceeds the limit because it's necessary for
sleeping that local balance exceeds the worker's limit divided by the
number of workers. For example, a worker is scheduled doing I/O and
exceeds the limit substantially while other 2 workers do less I/O. And
then the 2 workers are scheduled and consume I/O. The total cost
balance already exceeds the limit but the workers will not sleep as
long as the local balance is less than (limit / # of workers).

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Parallel leader process info in EXPLAIN
Next
From: Amit Kapila
Date:
Subject: Re: cost based vacuum (parallel)