Re: cost based vacuum (parallel) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: cost based vacuum (parallel)
Date
Msg-id 20191104182829.57bkz64qn5k3uwc3@alap3.anarazel.de
Whole thread Raw
In response to Re: cost based vacuum (parallel)  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: cost based vacuum (parallel)  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Hi,

On 2019-11-04 12:59:02 -0500, Jeff Janes wrote:
> On Mon, Nov 4, 2019 at 1:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > For parallel vacuum [1], we were discussing what is the best way to
> > divide the cost among parallel workers but we didn't get many inputs
> > apart from people who are very actively involved in patch development.
> > I feel that we need some more inputs before we finalize anything, so
> > starting a new thread.
> >
>
> Maybe a I just don't have experience in the type of system that parallel
> vacuum is needed for, but if there is any meaningful IO throttling which is
> active, then what is the point of doing the vacuum in parallel in the first
> place?

I am wondering the same - but to be fair, it's pretty easy to run into
cases where VACUUM is CPU bound. E.g. because most pages are in
shared_buffers, and compared to the size of the indexes number of tids
that need to be pruned is fairly small (also [1]). That means a lot of
pages need to be scanned, without a whole lot of IO going on. The
problem with that is just that the defaults for vacuum throttling will
also apply here, I've never seen anybody tune vacuum_cost_page_hit = 0,
vacuum_cost_page_dirty=0 or such (in contrast, the latter is the highest
cost currently).  Nor do we reduce the cost of vacuum_cost_page_dirty
for unlogged tables.

So while it doesn't seem unreasonable to want to use cost limiting to
protect against vacuum unexpectedly causing too much, especially read,
IO, I'm doubtful it has current practical relevance.

I'm wondering how much of the benefit of parallel vacuum really is just
to work around vacuum ringbuffers often massively hurting performance
(see e.g. [2]). Surely not all, but I'd be very unsurprised if it were a
large fraction.

Greetings,

Andres Freund

[1] I don't think the patch addresses this, IIUC it's only running index
    vacuums in parallel, but it's very easy to run into being CPU
    bottlenecked when vacuuming a busily updated table. heap_hot_prune
    can be really expensive, especially with longer update chains (I
    think it may have an O(n^2) worst case even).
[2] https://www.postgresql.org/message-id/20160406105716.fhk2eparljthpzp6%40alap3.anarazel.de



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: cost based vacuum (parallel)
Next
From: Tomas Vondra
Date:
Subject: Re: 64 bit transaction id