Thread: Allow to specify (auto-)vacuum cost limits relative to the database/cluster size?

Allow to specify (auto-)vacuum cost limits relative to the database/cluster size?

From
Andres Freund
Date:
Hi,

right now the defaults for autovacuum cost limiting are so low that they
regularly cause problems for our users. It's not exactly obvious that
any installation above a couple gigabytes definitely needs to change
autovacuum_vacuum_cost_delay &
autovacuum_vacuum_cost_limit/vacuum_cost_limit. Especially
anti-wraparound/full table vacuums basically take forever with the
default settings.

On the other hand we don't want a database of a couple hundred megabytes
to be vacuumed as fast as possible and trash the poor tiny system. So we
can't just massively increase the limits by default; although I believe
some default adjustment would be appropriate anyway.

I wonder if it makes sense to compute the delays / limits in relation to
either cluster or relation size. If you have a 10 TB table, you
obviously don't want to scan with a few megabytes a second, which the
default settings will do for you. With that in mind we could just go for
something like the autovacuum_*_scale_factor settings. But e.g. for
partitioned workloads with a hundreds of tables in the couple gigabyte
range that'd not work that well.

Somehow computing the speed in relation to the cluster/database size is
probably possible, but I wonder how we can do so without constantly
re-computing something relatively expensive?

Thoughts?


- Andres



On 1/12/16 6:42 AM, Andres Freund wrote:
> Somehow computing the speed in relation to the cluster/database size is
> probably possible, but I wonder how we can do so without constantly
> re-computing something relatively expensive?

ISTM relpages would probably be good enough for this, if we take the 
extra step of getting actual relation size when relpages is 0.

I'm not sure a straght scale factor is the way to go though... it seems 
that might be problematic? I think we'd at least one a minimum default 
value; you certainly don't want even a small system running vacuum at 
1kB/s...
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



On Tue, Jan 12, 2016 at 6:12 PM, Andres Freund <andres@anarazel.de> wrote:
> right now the defaults for autovacuum cost limiting are so low that they
> regularly cause problems for our users. It's not exactly obvious that
> any installation above a couple gigabytes definitely needs to change
> autovacuum_vacuum_cost_delay &
> autovacuum_vacuum_cost_limit/vacuum_cost_limit. Especially
> anti-wraparound/full table vacuums basically take forever with the
> default settings.
>
> On the other hand we don't want a database of a couple hundred megabytes
> to be vacuumed as fast as possible and trash the poor tiny system. So we
> can't just massively increase the limits by default; although I believe
> some default adjustment would be appropriate anyway.
>
> I wonder if it makes sense to compute the delays / limits in relation to
> either cluster or relation size. If you have a 10 TB table, you
> obviously don't want to scan with a few megabytes a second, which the
> default settings will do for you. With that in mind we could just go for
> something like the autovacuum_*_scale_factor settings. But e.g. for
> partitioned workloads with a hundreds of tables in the couple gigabyte
> range that'd not work that well.
>
> Somehow computing the speed in relation to the cluster/database size is
> probably possible, but I wonder how we can do so without constantly
> re-computing something relatively expensive?
>
> Thoughts?

Thanks for bringing this up.  I fully agree we should try to do
something about this.  This comes up quite regularly in EnterpriseDB
support discussions, and I'm sure lots of other people have problems
with it too.  It seems to me that what we really want to do is try to
finish vacuuming the table before we again need to vacuum the table.
For the sake of simplicity, just consider the anti-wraparound case for
a second.  If it takes three days to vacuum the table and we consume
200 million XIDs in two days, we are pretty clearly not vacuuming fast
enough.

I think we should do something similar to what we do for checkpoints.
We estimate when the table will next need vacuuming based on the rate
of XID advancement and the rate at which dead tuples are being
created.  We can also estimate what percentage of the relation we've
vacuumed and derive some estimate of when we'll be done - perhaps
assuming only one index pass, for the sake of simplicity.  If we're
behind, we should vacuum faster to try to catch up.  We could even try
to include some fudge factor in the calculation - e.g. if the time
until the next vacuum is estimated to be 30 hours from the start of
the current vacuum, we try to make the current vacuum finish in no
more than 75% * 30 hours = 22.5 hours.

I think this is better than your proposal to scale it just based on
the size of the relation because it may be find for the vacuum to run
slowly if we're creating very few dead tuples and consuming very few
XIDs.  IME, there's one very specific scenario where the wheels come
off, and that's when the table doesn't get fully vacuumed before it's
due to be vacuumed again.  Of course, anything we did here wouldn't be
perfect - it would all be based on estimates - but I bet we could make
things a lot better.  There's an even more global version of this
problem, which is that you could have a situation when any given table
gets vacuumed it runs quick enough to finish before that table gets
vacuumed again, but there are lots of large tables so overall we don't
make enough progress.  It would be nice to fix that, too, but even
something simple that ignored that more global problem would help a
lot of people.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On 02/23/2016 10:23 PM, Robert Haas wrote:
> On Tue, Jan 12, 2016 at 6:12 PM, Andres Freund <andres@anarazel.de> wrote:
>> right now the defaults for autovacuum cost limiting are so low that they
>> regularly cause problems for our users. It's not exactly obvious that
>> any installation above a couple gigabytes definitely needs to change
>> autovacuum_vacuum_cost_delay &
>> autovacuum_vacuum_cost_limit/vacuum_cost_limit. Especially
>> anti-wraparound/full table vacuums basically take forever with the
>> default settings.
>>
>> On the other hand we don't want a database of a couple hundred megabytes
>> to be vacuumed as fast as possible and trash the poor tiny system. So we
>> can't just massively increase the limits by default; although I believe
>> some default adjustment would be appropriate anyway.
>>
>> I wonder if it makes sense to compute the delays / limits in relation to
>> either cluster or relation size. If you have a 10 TB table, you
>> obviously don't want to scan with a few megabytes a second, which the
>> default settings will do for you. With that in mind we could just go for
>> something like the autovacuum_*_scale_factor settings. But e.g. for
>> partitioned workloads with a hundreds of tables in the couple gigabyte
>> range that'd not work that well.
>>
>> Somehow computing the speed in relation to the cluster/database size is
>> probably possible, but I wonder how we can do so without constantly
>> re-computing something relatively expensive?
>>
>> Thoughts?
>
> Thanks for bringing this up.  I fully agree we should try to do
> something about this.  This comes up quite regularly in EnterpriseDB
> support discussions, and I'm sure lots of other people have problems
> with it too.  It seems to me that what we really want to do is try to
> finish vacuuming the table before we again need to vacuum the table.
> For the sake of simplicity, just consider the anti-wraparound case for
> a second.  If it takes three days to vacuum the table and we consume
> 200 million XIDs in two days, we are pretty clearly not vacuuming fast
> enough.

In my experience it is almost always best to run autovacuum very often
and very aggressively. That generally means tuning scaling factor and
thresholds as well, such that there are never more than say 50-100k dead
rows. Then running vacuum with no delays or limits runs quite fast is is
generally not noticeable/impactful.

However that strategy does not work well for vacuums which run long,
such as an anti-wraparound vacuum. So in my opinion we need to think
about this as at least two distinct cases requiring different solutions.


> I think we should do something similar to what we do for checkpoints.
> We estimate when the table will next need vacuuming based on the rate
> of XID advancement and the rate at which dead tuples are being
> created.  We can also estimate what percentage of the relation we've
> vacuumed and derive some estimate of when we'll be done - perhaps
> assuming only one index pass, for the sake of simplicity.  If we're
> behind, we should vacuum faster to try to catch up.  We could even try
> to include some fudge factor in the calculation - e.g. if the time
> until the next vacuum is estimated to be 30 hours from the start of
> the current vacuum, we try to make the current vacuum finish in no
> more than 75% * 30 hours = 22.5 hours.

This seems reasonable for the anti-wraparound case.

> I think this is better than your proposal to scale it just based on
> the size of the relation because it may be find for the vacuum to run
> slowly if we're creating very few dead tuples and consuming very few
> XIDs.  IME, there's one very specific scenario where the wheels come
> off, and that's when the table doesn't get fully vacuumed before it's
> due to be vacuumed again.  Of course, anything we did here wouldn't be
> perfect - it would all be based on estimates - but I bet we could make
> things a lot better.  There's an even more global version of this
> problem, which is that you could have a situation when any given table
> gets vacuumed it runs quick enough to finish before that table gets
> vacuumed again, but there are lots of large tables so overall we don't
> make enough progress.  It would be nice to fix that, too, but even
> something simple that ignored that more global problem would help a
> lot of people.

This brings up a third scenario I've seen, which is lots (think
thousands) of individual tables needing vacuum almost constantly. In
that case autovacuum_naptime and autovacuum_max_workers also need tuning
or you never get to all of them.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Joe Conway wrote:

> In my experience it is almost always best to run autovacuum very often
> and very aggressively. That generally means tuning scaling factor and
> thresholds as well, such that there are never more than say 50-100k dead
> rows. Then running vacuum with no delays or limits runs quite fast is is
> generally not noticeable/impactful.
> 
> However that strategy does not work well for vacuums which run long,
> such as an anti-wraparound vacuum. So in my opinion we need to think
> about this as at least two distinct cases requiring different solutions.

With the freeze map there is no need for anti-wraparound vacuums to be
terribly costly, since they don't need to scan the whole table each
time.  That patch probably changes things a lot in this area.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On 02/24/2016 08:54 AM, Alvaro Herrera wrote:
> Joe Conway wrote:
>
>> In my experience it is almost always best to run autovacuum very often
>> and very aggressively. That generally means tuning scaling factor and
>> thresholds as well, such that there are never more than say 50-100k dead
>> rows. Then running vacuum with no delays or limits runs quite fast is is
>> generally not noticeable/impactful.
>>
>> However that strategy does not work well for vacuums which run long,
>> such as an anti-wraparound vacuum. So in my opinion we need to think
>> about this as at least two distinct cases requiring different solutions.
>
> With the freeze map there is no need for anti-wraparound vacuums to be
> terribly costly, since they don't need to scan the whole table each
> time.  That patch probably changes things a lot in this area.

Yes, I had forgotten about that. It would be a huge help.


--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development