Re: cost based vacuum (parallel) - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: cost based vacuum (parallel)
Date
Msg-id CAFiTN-snLc5qioVcY1MK8mB11LZqTftKctDx55x0RF2HvfnfSQ@mail.gmail.com
Whole thread Raw
In response to Re: cost based vacuum (parallel)  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Wed, Nov 6, 2019 at 9:21 AM Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Amit Kapila (amit.kapila16@gmail.com) wrote:
> > On Tue, Nov 5, 2019 at 1:42 AM Stephen Frost <sfrost@snowman.net> wrote:
> > > * Andres Freund (andres@anarazel.de) wrote:
> > > > That's quite doable independent of parallelism, as we don't have tables
> > > > or indexes spanning more than one tablespace.  True, you could then make
> > > > the processing of an individual vacuum faster by allowing to utilize
> > > > multiple tablespace budgets at the same time.
> > >
> > > Yes, it's possible to do independent of parallelism, but what I was
> > > trying to get at above is that it might not be worth the effort.  When
> > > it comes to parallel vacuum though, I'm not sure that you can just punt
> > > on this question since you'll naturally end up spanning multiple
> > > tablespaces concurrently, at least if the heap+indexes are spread across
> > > multiple tablespaces and you're operating against more than one of those
> > > relations at a time
> >
> > Each parallel worker operates on a separate index.  It might be worth
> > exploring per-tablespace vacuum throttling, but that should not be a
> > requirement for the currently proposed patch.
>
> Right, that each operates on a separate index in parallel is what I had
> figured was probably happening, and that's why I brought up the question
> of "well, what does IO throttling mean when you've got multiple
> tablespaces involved with presumably independent IO channels...?" (or,
> at least, that's what I was trying to go for).
>
> This isn't a question with the current system and way the code works
> within a single vacuum operation, as we're never operating on more than
> one relation concurrently in that case.
>
> Of course, we don't currently do anything to manage IO utilization
> across tablespaces when there are multiple autovacuum workers running
> concurrently, which I suppose goes to Andres' point that we aren't
> really doing anything to deal with this today and therefore this is
> perhaps not all that new of an issue just with the addition of
> parallel vacuum.  I'd still argue that it becomes a lot more apparent
> when you're talking about one parallel vacuum, but ultimately we should
> probably be thinking about how to manage the resources across all the
> vacuums and tablespaces and queries and such.
>
> In an ideal world, we'd track the i/o from front-end queries, have some
> idea of the total i/o possible for each IO channel, and allow vacuum and
> whatever other background processes need to run to scale up and down,
> with enough buffer to avoid ever being maxed out on i/o, but keeping up
> a consistent rate of i/o that lets everything finish as quickly as
> possible.

IMHO, in future suppose we improve the I/O throttling for each
tablespace, maybe by maintaining the independent balance for relation
and each index of the relation or may be combined balance for the
indexes which are on the same tablespace.  And, the balance can be
checked against its tablespace i/o limit.  So If we get such a
mechanism in the future then it seems that it will be easily
expandable to the parallel vacuum, isn't it?  Because across workers
also we can track tablespace wise shared balance (if we go with the
shared costing approach for example).

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Mahendra Singh
Date:
Subject: Re: [HACKERS] Block level parallel vacuum
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Should we make scary sounding, but actually routine, errorsless scary?