Re: autovacuum prioritization - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: autovacuum prioritization |
Date | |
Msg-id | CAH2-WznrZC-oHkB+QZQS65o+8_Jtj6RXadjh+8EBqjrD1f8FQQ@mail.gmail.com Whole thread Raw |
In response to | Re: autovacuum prioritization (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: autovacuum prioritization
|
List | pgsql-hackers |
On Tue, Jan 25, 2022 at 11:30 AM Robert Haas <robertmhaas@gmail.com> wrote: > But your broader point that we need to consider how much bloat > represents a problem is a really good one. In the past, one rule that > I've thought about is: if we're vacuuming a table and we're not going > to finish before it needs to be vacuumed again, then we should vacuum > faster (i.e. in effect, increase the cost limit on the fly). That seems reasonable, but I doubt that that's a huge issue in practice, now that the default cost limits are more sensible. > That might still not result in good behavior, but it would at least result > in behavior that is less bad. However, it doesn't really answer the > question of how we decide when to start the very first VACUUM. I don't > really know the answer to that question. The current heuristics result > in estimates of acceptable bloat that are too high in some cases and > too low in others. I've seen tables that got bloated vastly beyond > what autovacuum is configured to tolerate before they caused any real > difficulty, and I know there are other cases where users start to > suffer long before those thresholds are reached. ISTM that the easiest thing that could be done to improve this is to give some consideration to page-level characteristics. For example, a page that has 5 dead heap-only tuples is vastly different to a similar page that has 5 LP_DEAD items instead -- and yet our current approach makes no distinction. Chances are very high that if the only dead tuples are heap-only tuples, then things are going just fine on that page -- opportunistic pruning is actually keeping up. Page-level stability over time seems to be the thing that matters most -- we must make sure that the same "logical rows" that were inserted around the same time remain on the same block for as long as possible, without mixing in other unrelated tuples needlessly. In other words, preserve natural locality. This is related to the direction of things, and the certain knowledge that VACUUM alone can deal with line pointer bloat. The current state of individual pages hints at the direction of things even without tracking how things change directly. But tracking the change over time in ANALYZE seems better still: if successive ANALYZE operations notice a consistent pattern where pages that had a non-zero number of LP_DEAD items last time now have a significantly higher number, then it's a good idea to err in the direction of more aggressive vacuuming. *Growing* concentrations of LP_DEAD items signal chaos. I think that placing a particular emphasis on pages with non-zero LP_DEAD items as a qualitatively distinct category of page might well make sense -- relatively few blocks with a growing number of LP_DEAD items seems like it should be enough to make autovacuum run aggressively. As I pointed out not long ago, ANALYZE does a terrible job of accurately counting dead tuples/LP_DEAD items when they aren't uniformly distributed in the table -- which is often a hugely important factor, with a table that is append-mostly with updates and deletes. That's why I suggested bringing the visibility map into it. In general I think that the statistics that drive autovacuum are currently often quite wrong, even on their own simplistic, quantitative terms. > I don't see why we want multiple queues. We have to answer the > question "what should we do next?" which requires us, in some way, to > funnel everything into a single prioritization. Even busy production DBs should usually only be vacuuming one large table at a time. Also might make sense to strategically align the work with the beginning of a new checkpoint. -- Peter Geoghegan
pgsql-hackers by date: