Re: New GUC autovacuum_max_threshold ? - Mailing list pgsql-hackers

From Michael Banck
Subject Re: New GUC autovacuum_max_threshold ?
Date
Msg-id 662b6941.5d0a0220.ea153.4178@mx.google.com
Whole thread Raw
In response to Re: New GUC autovacuum_max_threshold ?  (Laurenz Albe <laurenz.albe@cybertec.at>)
Responses Re: New GUC autovacuum_max_threshold ?
Re: New GUC autovacuum_max_threshold ?
List pgsql-hackers
Hi,

On Fri, Apr 26, 2024 at 10:18:00AM +0200, Laurenz Albe wrote:
> On Fri, 2024-04-26 at 09:35 +0200, Frédéric Yhuel wrote:
> > Le 26/04/2024 à 04:24, Laurenz Albe a écrit :
> > > On Thu, 2024-04-25 at 14:33 -0400, Robert Haas wrote:
> > > > I believe that the underlying problem here can be summarized in this
> > > > way: just because I'm OK with 2MB of bloat in my 10MB table doesn't
> > > > mean that I'm OK with 2TB of bloat in my 10TB table. One reason for
> > > > this is simply that I can afford to waste 2MB much more easily than I
> > > > can afford to waste 2TB -- and that applies both on disk and in
> > > > memory.
> > > 
> > > I don't find that convincing.  Why are 2TB of wasted space in a 10TB
> > > table worse than 2TB of wasted space in 100 tables of 100GB each?
> > 
> > Good point, but another way of summarizing the problem would be that the 
> > autovacuum_*_scale_factor parameters work well as long as we have a more 
> > or less evenly distributed access pattern in the table.
> > 
> > Suppose my very large table gets updated only for its 1% most recent 
> > rows. We probably want to decrease autovacuum_analyze_scale_factor and 
> > autovacuum_vacuum_scale_factor for this one.
> > 
> > Partitioning would be a good solution, but IMHO postgres should be able 
> > to handle this case anyway, ideally without per-table configuration.
> 
> I agree that you may well want autovacuum and autoanalyze treat your large
> table differently from your small tables.
> 
> But I am reluctant to accept even more autovacuum GUCs.  It's not like
> we don't have enough of them, rather the opposite.  You can slap on more
> GUCs to treat more special cases, but we will never reach the goal of
> having a default that will make everybody happy.
> 
> I believe that the defaults should work well in moderately sized databases
> with moderate usage characteristics.  If you have large tables or a high
> number of transactions per second, you can be expected to make the effort
> and adjust the settings for your case.  Adding more GUCs makes life *harder*
> for the users who are trying to understand and configure how autovacuum works.

Well, I disagree to some degree. I agree that the defaults should work
well in moderately sized databases with moderate usage characteristics.
But I also think we can do better than telling DBAs to they have to
manually fine-tune autovacuum for large tables (and frequenlty
implementing by hand what this patch is proposed, namely setting
autovacuum_vacuum_scale_factor to 0 and autovacuum_vacuum_threshold to a
high number), as this is cumbersome and needs adult supervision that is
not always available. Of course, it would be great if we just slap some
AI into the autovacuum launcher that figures things out automagically,
but I don't think we are there, yet.

So this proposal (probably along with a higher default threshold than
500000, but IMO less than what Robert and Nathan suggested) sounds like
a stop forward to me. DBAs can set the threshold lower if they want, or
maybe we can just turn it off by default if we cannot agree on a sane
default, but I think this (using the simplified formula from Nathan) is
a good approach that takes some pain away from autovacuum tuning and
reserves that for the really difficult cases.


Michael



pgsql-hackers by date:

Previous
From: Laurenz Albe
Date:
Subject: Re: New GUC autovacuum_max_threshold ?
Next
From: Yugo NAGATA
Date:
Subject: Re: Extend ALTER DEFAULT PRIVILEGES for large objects