Re: cost based vacuum (parallel) - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: cost based vacuum (parallel)
Date
Msg-id 20191104201205.GH6962@tamriel.snowman.net
Whole thread Raw
In response to Re: cost based vacuum (parallel)  (Andres Freund <andres@anarazel.de>)
Responses Re: cost based vacuum (parallel)  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2019-11-04 14:33:41 -0500, Stephen Frost wrote:
> > * Andres Freund (andres@anarazel.de) wrote:
> > > On 2019-11-04 14:06:19 -0500, Stephen Frost wrote:
> > > > With parallelization across indexes, you could have a situation where
> > > > the individual indexes are on different tablespaces with independent
> > > > i/o, therefore the parallelization ends up giving you an increase in i/o
> > > > throughput, not just additional CPU time.
> > >
> > > How's that related to IO throttling being active or not?
> >
> > You might find that you have to throttle the IO down when operating
> > exclusively against one IO channel, but if you have multiple IO channels
> > then the acceptable IO utilization could be higher as it would be
> > spread across the different IO channels.
> >
> > In other words, the overall i/o allowance for a given operation might be
> > able to be higher if it's spread across multiple i/o channels, as it
> > wouldn't completely consume the i/o resources of any of them, whereas
> > with a higher allowance and a single i/o channel, there would likely be
> > an impact to other operations.
> >
> > As for if this is really relevant only when it comes to parallel
> > operations is a bit of an interesting question- these considerations
> > might not require actual parallel operations as a single process might
> > be able to go through multiple indexes concurrently and still hit the
> > i/o limit that was set for it overall across the tablespaces.  I don't
> > know that it would actually be interesting or useful to spend the effort
> > to make that work though, so, from a practical perspective, it's
> > probably only interesting to think about this when talking about
> > parallel vacuum.
>
> But you could just apply different budgets for different tablespaces?

Yes, that would be one approach to addressing this, though it would
change the existing meaning of those cost parameters.  I'm not sure if
we think that's an issue or not- if we only have this in the case of a
parallel vacuum then it's probably fine, I'm less sure if it'd be
alright to change that on an upgrade.

> That's quite doable independent of parallelism, as we don't have tables
> or indexes spanning more than one tablespace.  True, you could then make
> the processing of an individual vacuum faster by allowing to utilize
> multiple tablespace budgets at the same time.

Yes, it's possible to do independent of parallelism, but what I was
trying to get at above is that it might not be worth the effort.  When
it comes to parallel vacuum though, I'm not sure that you can just punt
on this question since you'll naturally end up spanning multiple
tablespaces concurrently, at least if the heap+indexes are spread across
multiple tablespaces and you're operating against more than one of those
relations at a time (which, I admit, I'm not 100% sure is actually
happening with this proposed patch set- if it isn't, then this isn't
really an issue, though that would be pretty unfortunate as then you
can't leverage multiple i/o channels concurrently and therefore Jeff's
question about why you'd be doing parallel vacuum with IO throttling is
a pretty good one).

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Missed check for too-many-children in bgworker spawning
Next
From: Tom Lane
Date:
Subject: Re: Include RELKIND_TOASTVALUE in get_relkind_objtype