Re: cost based vacuum (parallel) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: cost based vacuum (parallel)
Date
Msg-id 20191104194224.wodignsgdnnjhkes@alap3.anarazel.de
Whole thread Raw
In response to Re: cost based vacuum (parallel)  (Stephen Frost <sfrost@snowman.net>)
Responses Re: cost based vacuum (parallel)  (Stephen Frost <sfrost@snowman.net>)
Re: cost based vacuum (parallel)  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Hi,

On 2019-11-04 14:33:41 -0500, Stephen Frost wrote:
> * Andres Freund (andres@anarazel.de) wrote:
> > On 2019-11-04 14:06:19 -0500, Stephen Frost wrote:
> > > With parallelization across indexes, you could have a situation where
> > > the individual indexes are on different tablespaces with independent
> > > i/o, therefore the parallelization ends up giving you an increase in i/o
> > > throughput, not just additional CPU time.
> > 
> > How's that related to IO throttling being active or not?
> 
> You might find that you have to throttle the IO down when operating
> exclusively against one IO channel, but if you have multiple IO channels
> then the acceptable IO utilization could be higher as it would be 
> spread across the different IO channels.
> 
> In other words, the overall i/o allowance for a given operation might be
> able to be higher if it's spread across multiple i/o channels, as it
> wouldn't completely consume the i/o resources of any of them, whereas
> with a higher allowance and a single i/o channel, there would likely be
> an impact to other operations.
> 
> As for if this is really relevant only when it comes to parallel
> operations is a bit of an interesting question- these considerations
> might not require actual parallel operations as a single process might
> be able to go through multiple indexes concurrently and still hit the
> i/o limit that was set for it overall across the tablespaces.  I don't
> know that it would actually be interesting or useful to spend the effort
> to make that work though, so, from a practical perspective, it's
> probably only interesting to think about this when talking about
> parallel vacuum.

But you could just apply different budgets for different tablespaces?
That's quite doable independent of parallelism, as we don't have tables
or indexes spanning more than one tablespace.  True, you could then make
the processing of an individual vacuum faster by allowing to utilize
multiple tablespace budgets at the same time.


> I've been wondering if the accounting system should consider the cost
> per tablespace when there's multiple tablespaces involved, instead of
> throttling the overall process without consideration for the
> per-tablespace utilization.

This all seems like a feature proposal, or two, independent of the
patch/question at hand. I think there's a good argument to be had that
we should severely overhaul the current vacuum cost limiting - it's way
way too hard to understand the bandwidth that it's allowed to
consume. But unless one of the proposals makes that measurably harder or
easier, I think we don't gain anything by entangling an already complex
patchset with something new.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andrew Gierth
Date:
Subject: Re: Excessive disk usage in WindowAgg
Next
From: Alvaro Herrera
Date:
Subject: Re: v12 and pg_restore -f-