Re: New GUC autovacuum_max_threshold ? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: New GUC autovacuum_max_threshold ?
Date
Msg-id CA+TgmoZ-iiaNLBtXDLFO4MLTcDQmpyHaNd-=mQywXF8PsVVoBQ@mail.gmail.com
Whole thread Raw
In response to Re: New GUC autovacuum_max_threshold ?  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: New GUC autovacuum_max_threshold ?
List pgsql-hackers
On Thu, Apr 25, 2024 at 3:21 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> Agreed, the default should probably be on the order of 100-200M minimum.
>
> The original proposal also seems to introduce one parameter that would
> affect all three of autovacuum_vacuum_threshold,
> autovacuum_vacuum_insert_threshold, and autovacuum_analyze_threshold.  Is
> that okay?  Or do we need to introduce a "limit" GUC for each?  I guess the
> question is whether we anticipate any need to have different values for
> these limits, which might be unlikely.

I don't think we should make the same limit apply to more than one of
those. I would phrase the question in the opposite way that you did:
is there any particular reason to believe that the limits should be
the same? I don't see one.

I think it would be OK to introduce limits for some and leave the
others uncapped, but I don't like the idea of reusing the same limit
for different things.

My intuition is strongest for the vacuum threshold -- that's such an
expensive operation, takes so long, and has such dire consequences if
it isn't done. We need to force the table to be vacuumed before it
bloats out of control. Maybe essentially the same logic applies to the
insert threshold, namely, that we should vacuum before the number of
not-all-visible pages gets too large, but I think it's less clear.
It's just not nearly as bad if that happens. Sure, it may not be great
when vacuum eventually runs and hits a ton of pages all at once, but
it's not even close to being as catastrophic as the vacuum case.

The analyze case, I feel, is really murky.
autovacuum_analyze_scale_factor stands for the proposition that as the
table becomes larger, analyze doesn't need to be done as often. If
what you're concerned about is the frequency estimates, that's true:
an injection of a million new rows can shift frequencies dramatically
in a small table, but the effect is blunted in a large one. But a lot
of the cases I've seen have involved the histogram boundaries. If
you're inserting data into a table in increasing order, every new
million rows shifts the boundary of the last histogram bucket by the
same amount. You either need those rows included in the histogram to
get good query plans, or you don't. If you do, the frequency with
which you need to analyze does not change as the table grows. If you
don't, then it probably does. But the answer doesn't really depend on
how big the table is already, but on your workload. So it's unclear to
me that the proposed parameter is the right idea here at all. It's
also unclear to me that the existing system is the right idea. :-)

So overall I guess I'd lean toward just introducing a cap for the
"vacuum" case and leave the "insert" and "analyze" cases as ideas for
possible future consideration, but I'm not 100% sure.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: New GUC autovacuum_max_threshold ?
Next
From: Frédéric Yhuel
Date:
Subject: Re: New GUC autovacuum_max_threshold ?