Thread: New GUC autovacuum_max_threshold ?

New GUC autovacuum_max_threshold ?

From

Frédéric Yhuel

Date:

24 April 2024, 12:08:00

Hello,

I would like to suggest a new parameter, autovacuum_max_threshold, which 
would set an upper limit on the number of tuples to delete/update/insert 
prior to vacuum/analyze.

A good default might be 500000.

The idea would be to replace the following calculation :

vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;

with this one :

vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples / (1 
+ vac_scale_factor * reltuples / autovacuum_max_threshold)

(and the same for the others, vacinsthresh and anlthresh).

The attached graph plots vacthresh against pgclass.reltuples, with 
default settings :

autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.2

and

autovacuum_max_threshold = 500000 (the suggested default)

Thus, for small tables, vacthresh is only slightly smaller than 0.2 * 
pgclass.reltuples, but it grows towards 500000 when reltuples → ∞

The idea is to reduce the need for autovacuum tuning.

The attached (draft) patch further illustrates the idea.

My guess is that a similar proposal has already been submitted... and 
rejected 🙂 If so, I'm very sorry for the useless noise.

Best regards,
Frédéric

I've attached a new patch to show roughly what I think this new GUC should
look like.  I'm hoping this sparks more discussion, if nothing else.

On Tue, Jun 18, 2024 at 12:36:42PM +0200, Frédéric Yhuel wrote:
> By the way, I wonder if there were any off-list discussions after Robert's
> conference at PGConf.dev (and I'm waiting for the video of the conf).

I don't recall any discussions about this idea, but Robert did briefly
mention it in his talk [0].

[0] https://www.youtube.com/watch?v=RfTD-Twpvac

-- 
nathan

Attachment

v2-0001-autovacuum_max_threshold.patch

Re: New GUC autovacuum_max_threshold ?

From

Frédéric Yhuel

Date:

12 August 2024, 13:41:26


On 8/7/24 23:39, Nathan Bossart wrote:
> I've attached a new patch to show roughly what I think this new GUC should
> look like.  I'm hoping this sparks more discussion, if nothing else.
>

Thank you. FWIW, I would prefer a sub-linear growth, so maybe something 
like this:

vacthresh = Min(vac_base_thresh + vac_scale_factor * reltuples, 
vac_base_thresh + vac_scale_factor * pow(reltuples, 0.7) * 100);

This would give :

* 386M (instead of 5.1 billion currently) for a 25.6 billion tuples table ;
* 77M for a 2.56 billion tuples table (Robert's example) ;
* 15M (instead of 51M currently) for a 256M tuples table ;
* 3M (instead of 5M currently) for a 25.6M tuples table.

The other advantage is that you don't need another GUC.

> On Tue, Jun 18, 2024 at 12:36:42PM +0200, Frédéric Yhuel wrote:
>> By the way, I wonder if there were any off-list discussions after Robert's
>> conference at PGConf.dev (and I'm waiting for the video of the conf).
> 
> I don't recall any discussions about this idea, but Robert did briefly
> mention it in his talk [0].
> 
> [0] https://www.youtube.com/watch?v=RfTD-Twpvac
> 

Very interesting, thanks!

Re: New GUC autovacuum_max_threshold ?

From

wenhui qiu

Date:

06 November 2024, 15:51:07

Hi frederic.yhuel

> Thank you. FWIW, I would prefer a sub-linear growth, so maybe something
> like this

>   vacthresh = Min(vac_base_thresh + vac_scale_factor * reltuples,
>   vac_base_thresh + vac_scale_factor * pow(reltuples, 0.7) * 100);

>   This would give :

>   * 386M (instead of 5.1 billion currently) for a 25.6 billion tuples table ;
>   * 77M for a 2.56 billion tuples table (Robert's example) ;
>   * 15M (instead of 51M currently) for a 256M tuples table ;
>   * 3M (instead of 5M currently) for a 25.6M tuples table.
> The other advantage is that you don't need another GUC.
Argee ,We just need to change the calculation formula，But I prefer this formula because it calculates a smoother value.

vacthresh =  (float4) fmin(vac_base_thresh + vac_scale_factor * reltuples,vac_base_thresh + vac_scale_factor * log2(reltuples) * 10000);
or

vacthresh = (float4) fmin(vac_base_thresh + (vac_scale_factor * reltuples) , sqrt(1000.0 * reltuples));

Frédéric Yhuel <frederic.yhuel@dalibo.com> 于2024年8月12日周一 21:41写道：

On 8/7/24 23:39, Nathan Bossart wrote:
> I've attached a new patch to show roughly what I think this new GUC should
> look like. I'm hoping this sparks more discussion, if nothing else.
>

Thank you. FWIW, I would prefer a sub-linear growth, so maybe something
like this:

vacthresh = Min(vac_base_thresh + vac_scale_factor * reltuples,
vac_base_thresh + vac_scale_factor * pow(reltuples, 0.7) * 100);

This would give :

* 386M (instead of 5.1 billion currently) for a 25.6 billion tuples table ;
* 77M for a 2.56 billion tuples table (Robert's example) ;
* 15M (instead of 51M currently) for a 256M tuples table ;
* 3M (instead of 5M currently) for a 25.6M tuples table.

The other advantage is that you don't need another GUC.

> On Tue, Jun 18, 2024 at 12:36:42PM +0200, Frédéric Yhuel wrote:
>> By the way, I wonder if there were any off-list discussions after Robert's
>> conference at PGConf.dev (and I'm waiting for the video of the conf).
>
> I don't recall any discussions about this idea, but Robert did briefly
> mention it in his talk [0].
>
> [0] https://www.youtube.com/watch?v=RfTD-Twpvac
>

Very interesting, thanks!

Re: New GUC autovacuum_max_threshold ?

From

Nathan Bossart

Date:

08 November 2024, 20:44:24

On Wed, Nov 06, 2024 at 08:51:07PM +0800, wenhui qiu wrote:
>> Thank you. FWIW, I would prefer a sub-linear growth, so maybe something
>> like this
> 
>>   vacthresh = Min(vac_base_thresh + vac_scale_factor * reltuples,
>>   vac_base_thresh + vac_scale_factor * pow(reltuples, 0.7) * 100);
> 
>>   This would give :
> 
>>   * 386M (instead of 5.1 billion currently) for a 25.6 billion tuples
> table ;
>>   * 77M for a 2.56 billion tuples table (Robert's example) ;
>>   * 15M (instead of 51M currently) for a 256M tuples table ;
>>   * 3M (instead of 5M currently) for a 25.6M tuples table.
>> The other advantage is that you don't need another GUC.
> Argee ,We just need to change the calculation formula,But I prefer this
> formula because it calculates a smoother value.
> 
> vacthresh =  (float4) fmin(vac_base_thresh + vac_scale_factor *
> reltuples,vac_base_thresh
> + vac_scale_factor * log2(reltuples) * 10000);
> or
> vacthresh = (float4) fmin(vac_base_thresh + (vac_scale_factor * reltuples)
> , sqrt(1000.0 * reltuples));

I apologize for the curt response, but I don't understand how we could
decide which of these three complicated formulas to use, let alone how we
could expect users to reason about the behavior.

-- 
nathan

Re: New GUC autovacuum_max_threshold ?

From

Nathan Bossart

Date:

09 November 2024, 18:59:42

On Sat, Nov 09, 2024 at 10:08:51PM +0800, wenhui qiu wrote:
>       Sorry ,I forgot to explain the reason in my last email,In fact, I
> submitted the patch to the community,(frederic.yhuel@dalibo.com) told me
> there has a same idea ,so ,
>       Let me explain those two formulas here,about (   vacthresh = (float4)
> fmin(vac_base_thresh + (vac_scale_factor * reltuples), sqrt(1000.0 *
> reltuples));   A few days ago, I was looking at the sql server
> documentation and found that sql server has optimized the algorithm related
> to updating statistics in the 2016 ,version,I think we can also learn from
> the implementation method of sql server to optimize the problem of
> automatic vacuum triggered by large tables,The Document link(
> https://learn.microsoft.com/en-us/sql/relational-databases/statistics/statistics?view=sql-server-ver16
> ),about ( vacthresh =  (float4) fmin(vac_base_thresh + vac_scale_factor *
>  reltuples,vac_base_thresh+ vac_scale_factor * log2(reltuples) * 10000);)I
> came to the conclusion by trying to draw a function graph,I personally
> think it is a smooth formula

AFAICT the main advantage of these formulas is that you don't need another
GUC, but they also makes the existing ones more difficult to configure.
Plus, there's no way to go back to the existing behavior.

-- 
nathan

Re: New GUC autovacuum_max_threshold ?

From

wenhui qiu

Date:

10 November 2024, 14:25:45

Hi Nathan Bossart
> AFAICT the main advantage of these formulas is that you don't need another
> GUC, but they also makes the existing ones more difficult to configure.
> Plus, there's no way to go back to the existing behavior.
There is indeed this problem,But I think this formula should not be a linear relationship in the first place,SQL Server was realized and optimized eight years ago. I think we can definitely draw on the experience of SQL Server.Maybe many people are worried that frequent vacuum will affect io performance, but we can learn from the experience of SQL Server in vacuum analysis .

Nathan Bossart <nathandbossart@gmail.com> 于2024年11月9日周六 23:59写道：

On Sat, Nov 09, 2024 at 10:08:51PM +0800, wenhui qiu wrote:
> Sorry ,I forgot to explain the reason in my last email,In fact, I
> submitted the patch to the community,(frederic.yhuel@dalibo.com) told me
> there has a same idea ,so ,
> Let me explain those two formulas here,about ( vacthresh = (float4)
> fmin(vac_base_thresh + (vac_scale_factor * reltuples), sqrt(1000.0 *
> reltuples)); A few days ago, I was looking at the sql server
> documentation and found that sql server has optimized the algorithm related
> to updating statistics in the 2016 ,version,I think we can also learn from
> the implementation method of sql server to optimize the problem of
> automatic vacuum triggered by large tables,The Document link(
> https://learn.microsoft.com/en-us/sql/relational-databases/statistics/statistics?view=sql-server-ver16
> ),about ( vacthresh = (float4) fmin(vac_base_thresh + vac_scale_factor *
> reltuples,vac_base_thresh+ vac_scale_factor * log2(reltuples) * 10000);)I
> came to the conclusion by trying to draw a function graph,I personally
> think it is a smooth formula

AFAICT the main advantage of these formulas is that you don't need another
GUC, but they also makes the existing ones more difficult to configure.
Plus, there's no way to go back to the existing behavior.

--
nathan

Re: New GUC autovacuum_max_threshold ?

From

Frédéric Yhuel

Date:

13 November 2024, 13:03:25

On 11/9/24 16:59, Nathan Bossart wrote:
> AFAICT the main advantage of these formulas is that you don't need another
> GUC, but they also makes the existing ones more difficult to configure.

I wouldn't say that's the main advantage. It doesn't seem very clean to 
me to cap to a fixed value. Because you could take Robert's 
demonstration with a bigger table, and come to the same conclusion:

Let's compare the current situation to the situation post-Nathan's-patch 
with a cap of 100M. Consider a table 100 times larger than the one of 
Robert's previous example, so pgbench scale factor 2_560_000, size on 
disk 32TB.
Currently, that table will be vacuumed for bloat when the number of
dead tuples exceeds 20% of the table size, because that's the default
value of autovacuum_vacuum_scale_factor. The table has 256 billion
tuples, so that means that we're going to vacuum it when there are
more than 51 billion dead tuples. Post-patch, we will vacuum when we
have 100 million dead tuples. Suppose a uniform workload that slowly
updates rows in the table. If we were previously autovacuuming the
table once per day (1440 minutes) we're now going to try to vacuum it
almost every minute (1440 minutes / 512 = 168 seconds).

(compare with every 55 min with my formula)

Of course, this a theoretical example that is probably unrealistic. I 
don't know, really. I don't know if Robert's example was realistic in 
the first place.

In any case, we should do the tests that Robert suggested and/or come up 
with a good mathematical model, because we are in the dark at the moment.

> Plus, there's no way to go back to the existing behavior.

I think we should indeed provide a retro-compatible behaviour (so maybe 
another GUC after all).

Re: New GUC autovacuum_max_threshold ?

From

wenhui qiu

Date:

13 November 2024, 13:33:17

HI
> In any case, we should do the tests that Robert suggested and/or come up
> with a good mathematical model, because we are in the dark at the moment.
I think SQL Server has given us great inspiration
>I think we should indeed provide a retro-compatible behaviour (so maybe
> another GUC after all).

I am ready to implement a new guc parameter,Enable database administrators to configure appropriate calculation methods（The default value is the original calculation formula）

Frédéric Yhuel <frederic.yhuel@dalibo.com> 于2024年11月13日周三 18:03写道：

On 11/9/24 16:59, Nathan Bossart wrote:
> AFAICT the main advantage of these formulas is that you don't need another
> GUC, but they also makes the existing ones more difficult to configure.

I wouldn't say that's the main advantage. It doesn't seem very clean to
me to cap to a fixed value. Because you could take Robert's
demonstration with a bigger table, and come to the same conclusion:

Let's compare the current situation to the situation post-Nathan's-patch
with a cap of 100M. Consider a table 100 times larger than the one of
Robert's previous example, so pgbench scale factor 2_560_000, size on
disk 32TB.
Currently, that table will be vacuumed for bloat when the number of
dead tuples exceeds 20% of the table size, because that's the default
value of autovacuum_vacuum_scale_factor. The table has 256 billion
tuples, so that means that we're going to vacuum it when there are
more than 51 billion dead tuples. Post-patch, we will vacuum when we
have 100 million dead tuples. Suppose a uniform workload that slowly
updates rows in the table. If we were previously autovacuuming the
table once per day (1440 minutes) we're now going to try to vacuum it
almost every minute (1440 minutes / 512 = 168 seconds).

(compare with every 55 min with my formula)

Of course, this a theoretical example that is probably unrealistic. I
don't know, really. I don't know if Robert's example was realistic in
the first place.

In any case, we should do the tests that Robert suggested and/or come up
with a good mathematical model, because we are in the dark at the moment.

> Plus, there's no way to go back to the existing behavior.

I think we should indeed provide a retro-compatible behaviour (so maybe
another GUC after all).

Re: New GUC autovacuum_max_threshold ?

From

Frédéric Yhuel

Date:

08 January, 16:48:10

On 1/7/25 23:57, Nathan Bossart wrote:
> Here is a rebased patch for cfbot.  AFAICT we are still pretty far from
> consensus on which approach to take, unfortunately.
> 

For what it's worth, although I would have preferred the sub-linear 
growth thing, I'd much rather have this than nothing.

And I have to admit that the proposed formulas were either too 
convoluted or wrong.

This very patch is more straightforward. Please let me know if I can 
help and how.

Re: New GUC autovacuum_max_threshold ?

From

Vinícius Abrahão

Date:

09 January, 00:32:58

On Wed, Jan 8, 2025 at 8:01 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Wed, Jan 08, 2025 at 02:48:10PM +0100, Frédéric Yhuel wrote:
> For what it's worth, although I would have preferred the sub-linear growth
> thing, I'd much rather have this than nothing.

+1, this is how I feel, too. But I also don't want to add something that
folks won't find useful.

> And I have to admit that the proposed formulas were either too convoluted or
> wrong.
>
> This very patch is more straightforward. Please let me know if I can help
> and how.

I read through the thread from the top, and it does seem like there is
reasonably strong support for the hard cap. Upon a closer review of the
patch, I noticed that the relopt was defined such that you couldn't disable
autovacuum_max_threshold on a per-table basis, so I fixed that in v4.

--
nathan

nathan,

Please also provide the tests on the new parameter you want to introduce.

Best,

vini

Re: New GUC autovacuum_max_threshold ?

From

Robert Treat

Date:

09 January, 03:01:53

On Wed, Jan 8, 2025 at 3:01 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Wed, Jan 08, 2025 at 02:48:10PM +0100, Frédéric Yhuel wrote:
> > For what it's worth, although I would have preferred the sub-linear growth
> > thing, I'd much rather have this than nothing.
>
> +1, this is how I feel, too.  But I also don't want to add something that
> folks won't find useful.
>
> > And I have to admit that the proposed formulas were either too convoluted or
> > wrong.
> >
> > This very patch is more straightforward. Please let me know if I can help
> > and how.
>
> I read through the thread from the top, and it does seem like there is
> reasonably strong support for the hard cap.  Upon a closer review of the
> patch, I noticed that the relopt was defined such that you couldn't disable
> autovacuum_max_threshold on a per-table basis, so I fixed that in v4.
>

To be frank, this patch feels like a solution in search of a problem,
and as I read back through the thread, it isn't clear what problem
this is intended to fix.

There is some talk of "simplifying" autovacuum configuration, but some
noted that we already have a rather complex set of GUCs to deal with,
and adding another one, along with more math, into the equation
doesn't seem simpler to mel I'd like to think the bar should be that
the problem should be clear. So what is the problem?

Is the patch supposed to help with wraparound prevention?
autovac_freeze_max_age already covers that, and when it doesn't
vacuum_failsafe_age helps out.

A couple of people mentioned issues around hitting the index wall when
vacuuming large tables, but we believe that problem is mostly resolved
due to radix based tid storage, so this doesn't solve that. (To the
degree you don't think v17 has baked into enough production workloads
to be sure, I'd agree, but that's also an argument against doing more
work that might not be needed)

Maybe the hope is that this setting will cause vacuum to run more
often to help ameliorate i/o work from freeze vacuums kicking in, but
I suspect that Melanie's nearby work on eager vacuuming is a smarter
solution towards this problem (warning, it also may want to add more
gucs), so I think we're not solving that, and in fact might be
undercutting it.

I guess that means this is supposed to help with bloat management? but
only on large tables? I guess because you run vacuums more often?
Except that the adages of running vacuums more often don't apply as
cleanly to large tables, because those tables typically come with
large indexes, and while we have a lot of machinery in place to help
with repeated scans of the heap, that same machinery doesn't exist for
scanning the indexes, which gives you sort of an exponential curve
around vacuum times as table size (but actually index size) grows
larger. On the upside, this does mean we're less likely to see a 50x
boost in vacuums on large tables that some seemed concerned about, but
on the downside its because we're probably going to increase the
probability of vacuum worker starvation.

But getting back to goals, if your goal is to help with bloat
management, trying to tie that to a number that doesn't cleanly map to
the meta information of the table in question is a poor way to do it.
Meaning, to the degree that you are skeptical that vacuuming based on
20% of the rows of a table might not really be 20% of the size of the
table, it's certainly going to be a closer map than 100million rows in
a n number of tables of unknown (but presumably greater than
500million?) numbers of rows of unknown sizes. And again, we have a
means to tackle these bloat cases already; lowering
vacuum_scale_factor.

This isn't to say the system is perfect; I do think there are some
fundamental issues that need addressing, but adding this guc just
feels a little less baked than usual.

Robert Treat
https://xzilla.net

Re: New GUC autovacuum_max_threshold ?

From

Nathan Bossart

Date:

06 February, 00:52:31

Committed.

-- 
nathan