Re: New GUC autovacuum_max_threshold ? - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: New GUC autovacuum_max_threshold ? |
Date | |
Msg-id | CA+TgmoZ-iiaNLBtXDLFO4MLTcDQmpyHaNd-=mQywXF8PsVVoBQ@mail.gmail.com Whole thread Raw |
In response to | Re: New GUC autovacuum_max_threshold ? (Nathan Bossart <nathandbossart@gmail.com>) |
Responses |
Re: New GUC autovacuum_max_threshold ?
|
List | pgsql-hackers |
On Thu, Apr 25, 2024 at 3:21 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > Agreed, the default should probably be on the order of 100-200M minimum. > > The original proposal also seems to introduce one parameter that would > affect all three of autovacuum_vacuum_threshold, > autovacuum_vacuum_insert_threshold, and autovacuum_analyze_threshold. Is > that okay? Or do we need to introduce a "limit" GUC for each? I guess the > question is whether we anticipate any need to have different values for > these limits, which might be unlikely. I don't think we should make the same limit apply to more than one of those. I would phrase the question in the opposite way that you did: is there any particular reason to believe that the limits should be the same? I don't see one. I think it would be OK to introduce limits for some and leave the others uncapped, but I don't like the idea of reusing the same limit for different things. My intuition is strongest for the vacuum threshold -- that's such an expensive operation, takes so long, and has such dire consequences if it isn't done. We need to force the table to be vacuumed before it bloats out of control. Maybe essentially the same logic applies to the insert threshold, namely, that we should vacuum before the number of not-all-visible pages gets too large, but I think it's less clear. It's just not nearly as bad if that happens. Sure, it may not be great when vacuum eventually runs and hits a ton of pages all at once, but it's not even close to being as catastrophic as the vacuum case. The analyze case, I feel, is really murky. autovacuum_analyze_scale_factor stands for the proposition that as the table becomes larger, analyze doesn't need to be done as often. If what you're concerned about is the frequency estimates, that's true: an injection of a million new rows can shift frequencies dramatically in a small table, but the effect is blunted in a large one. But a lot of the cases I've seen have involved the histogram boundaries. If you're inserting data into a table in increasing order, every new million rows shifts the boundary of the last histogram bucket by the same amount. You either need those rows included in the histogram to get good query plans, or you don't. If you do, the frequency with which you need to analyze does not change as the table grows. If you don't, then it probably does. But the answer doesn't really depend on how big the table is already, but on your workload. So it's unclear to me that the proposed parameter is the right idea here at all. It's also unclear to me that the existing system is the right idea. :-) So overall I guess I'd lean toward just introducing a cap for the "vacuum" case and leave the "insert" and "analyze" cases as ideas for possible future consideration, but I'm not 100% sure. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: