Re: Should we increase the default vacuum_cost_limit? - Mailing list pgsql-hackers

From David Rowley
Subject Re: Should we increase the default vacuum_cost_limit?
Date
Msg-id CAKJS1f8s5Cxat65_-uE97oUpoonO3p_2rFMf-cKA7kpAaS1-OA@mail.gmail.com
Whole thread Raw
In response to Re: Should we increase the default vacuum_cost_limit?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Should we increase the default vacuum_cost_limit?
Re: Should we increase the default vacuum_cost_limit?
List pgsql-hackers
On Sat, 9 Mar 2019 at 16:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I propose therefore that instead of increasing vacuum_cost_limit,
> what we ought to be doing is reducing vacuum_cost_delay by a similar
> factor.  And, to provide some daylight for people to reduce it even
> more, we ought to arrange for it to be specifiable in microseconds
> not milliseconds.  There's no GUC_UNIT_US right now, but it's time.
> (Perhaps we should also look into using other delay APIs, such as
> nanosleep(2), where available.)

It does seem like a genuine concern that there might be too much all
or nothing. It's no good being on a highspeed train if it stops at
every platform.

I agree that vacuum_cost_delay might not be granular enough, however.
If we're going to change the vacuum_cost_delay into microseconds, then
I'm a little concerned that it'll silently break existing code that
sets it.  Scripts that do manual off-peak vacuums are pretty common
out in the wild.

In an ideal world we'd just redesign the vacuum throttling to have
MB/s for hit/read/dirty, and possible also WAL write rate.  I'm not
sure exactly how they'd cooperate together, but we could likely
minimise gettimeofday() calls by sampling the time it took to process
N pages, and if N pages didn't take the time we wanted them to take we
could set N = Min(N * ($target_gettimeofday_sample_rate / $timetaken),
1);  e.g if N was 2000 and it just took us 1 second to do 2000 pages,
but we want to sleep every millisecond, then just do N *= (0.001 / 1),
so the next run we only do 2 pages before checking how long we should
sleep for. If we happened to process those 2 pages in 0.5
milliseconds, then N would become 4, etc.

We'd just need to hard code the $target_gettimeofday_sample_rate.
Probably 1 millisecond would be about right and we'd need to just
guess the first value of N, but if we guess a low value, it'll be
quick to correct itself after the first batch of pages.

If anyone thinks that idea has any potential, then maybe it's better
to leave the new vacuum_cost_limit default in place and consider
redesigning this for PG13... as such a change is too late for PG12.

It may also be possible to make this a vacuum rate limit in %. Say 10%
would just sleep for 10x as long is it took to process the last set of
pages.   The problem with this is that if the server was under heavy
load then auto-vacuum might crawl along, but that might be the exact
opposite of what's required as it might be crawling due to inadequate
vacuuming.

> I don't have any particular objection to kicking up the maximum
> value of vacuum_cost_limit by 10X or so, if anyone's hot to do that.
> But that's not where we ought to be focusing our concern.  And there
> really is a good reason, not just nannyism, not to make that
> setting huge --- it's just the wrong thing to do, as compared to
> reducing vacuum_cost_delay.

My vote is to 10x the maximum for vacuum_cost_limit and consider
changing how it all works in PG13.  If nothing happens before this
time next year then we can consider making vacuum_cost_delay a
microseconds GUC.


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Checksum errors in pg_stat_database
Next
From: Fabien COELHO
Date:
Subject: RE: Timeout parameters