Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically? - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically? |
Date | |
Msg-id | CAH2-Wz=D6M_g+WgW5dN1Hz6=AFJmZ63j8GfZAGm90uH4ZGxiRg@mail.gmail.com Whole thread Raw |
In response to | Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically? (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
|
List | pgsql-hackers |
On Mon, Dec 6, 2021 at 6:11 PM Robert Haas <robertmhaas@gmail.com> wrote: > This doesn't seem convincing. Launching autovacuum too soon surely has > costs that someone might not want to pay. Clearly in the degenerate > case where we always autovacuum every table every time an autovacuum > worker is launched, we have gone insane. Unfortunately we already sometimes behave insanely in exactly the way that you describe: https://postgr.es/m/CAH2-Wz=sJm3tm+FpXbyBhEhX5tbz1trQrhG6eOhYk4-+5uL=ww@mail.gmail.com That is, in addition to the problem that I'm highlighting on this thread, we also have the opposite problem: autovacuum chases its tail when it sees dead heap-only tuples that opportunistic pruning can take care of on its own. I bet that both effects sometimes cancel each other out, in weird and unpredictable ways. This effect might be protective at first, and then less protective. > So arbitrarily large moves in > that direction can't be viewed as unproblematic. I certainly wouldn't argue that they are. Just that the current approach of simply counting dead tuples in the table (or trying to, using sampling) and later launching autovacuum (when dead tuples crosses a pretty arbitrary threshold) has many problems -- problems that make us either run autovacuum too aggressively, and not aggressively enough (relative to what the docs suggest is supposed to happen). > Now, on the other hand, I *most definitely* think > autovacuum_vacuum_scale_factor is hogwash. Everything I've seen > indicates that, while you do want to wait for a larger number of dead > tuples in a large table than in a small one, it's sublinear. I don't > know whether it's proportional to sqrt(table_size) or table_size^(1/3) > or lg(table_size) or table_size^(0.729837166538), but just plain old > table_size is definitely not it. I think that it'll vary considerably, based on many factors. Making the precise threshold "open to interpretation" to some degree (by autovacuum.c) seems like it might help us with future optimizations. It's hard to define a break-even point for launching an autovacuum worker. I think it would be more productive to come up with a design that at least doesn't go completely off the rails in various specific ways. I also think that our problem is not so much that we don't have accurate statistics about dead tuples (though we surely don't have much accuracy). The main problem seems to be that there are various specific, important ways in which the distribution of dead tuples may matter (leading to various harmful biases). And so it seems reasonable to fudge how we interpret dead tuples with the intention of capturing some of that, as a medium term solution. Something that considers the concentration of dead tuples in heap pages seems promising. -- Peter Geoghegan
pgsql-hackers by date: