Re: cost based vacuum (parallel) - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: cost based vacuum (parallel)
Date
Msg-id 20191104193340.GG6962@tamriel.snowman.net
Whole thread Raw
In response to Re: cost based vacuum (parallel)  (Andres Freund <andres@anarazel.de>)
Responses Re: cost based vacuum (parallel)  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2019-11-04 14:06:19 -0500, Stephen Frost wrote:
> > * Jeff Janes (jeff.janes@gmail.com) wrote:
> > > On Mon, Nov 4, 2019 at 1:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > For parallel vacuum [1], we were discussing what is the best way to
> > > > divide the cost among parallel workers but we didn't get many inputs
> > > > apart from people who are very actively involved in patch development.
> > > > I feel that we need some more inputs before we finalize anything, so
> > > > starting a new thread.
> > >
> > > Maybe a I just don't have experience in the type of system that parallel
> > > vacuum is needed for, but if there is any meaningful IO throttling which is
> > > active, then what is the point of doing the vacuum in parallel in the first
> > > place?
> >
> > With parallelization across indexes, you could have a situation where
> > the individual indexes are on different tablespaces with independent
> > i/o, therefore the parallelization ends up giving you an increase in i/o
> > throughput, not just additional CPU time.
>
> How's that related to IO throttling being active or not?

You might find that you have to throttle the IO down when operating
exclusively against one IO channel, but if you have multiple IO channels
then the acceptable IO utilization could be higher as it would be
spread across the different IO channels.

In other words, the overall i/o allowance for a given operation might be
able to be higher if it's spread across multiple i/o channels, as it
wouldn't completely consume the i/o resources of any of them, whereas
with a higher allowance and a single i/o channel, there would likely be
an impact to other operations.

As for if this is really relevant only when it comes to parallel
operations is a bit of an interesting question- these considerations
might not require actual parallel operations as a single process might
be able to go through multiple indexes concurrently and still hit the
i/o limit that was set for it overall across the tablespaces.  I don't
know that it would actually be interesting or useful to spend the effort
to make that work though, so, from a practical perspective, it's
probably only interesting to think about this when talking about
parallel vacuum.

I've been wondering if the accounting system should consider the cost
per tablespace when there's multiple tablespaces involved, instead of
throttling the overall process without consideration for the
per-tablespace utilization.

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: v12 and pg_restore -f-
Next
From: Euler Taveira
Date:
Subject: Re: v12 and pg_restore -f-