Thread: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
From
Peter Geoghegan
Date:
I wonder why we're counting the number of dead tuples (or LP_DEAD stub items) in the relation as a whole in ANALYZE's acquire_sample_rows() function. Wouldn't it make more sense to focus on the "live vs dead tuple properties" of heap pages that are not known to be all-visible when we generate statistics for our pgstat_report_analyze() report? These statistic collector stats are only for the benefit of autovacuum scheduling -- and so they're *consumed* in a way that is totally different to the nearby pg_statistic stats. There is no good reason for the pgstat_report_analyze() stats to be based on the same pg_class.relpages "denominator" as the pg_statistic stats (it's just slightly easier to do it that way in acquire_sample_rows(), I suppose). On the other hand, an alternative behavior involving counting totaldeadrows against sampled not-all-visible pages (but not otherwise) has a big benefit: doing so would remove any risk that older/earlier PageIsAllVisible() pages will bias ANALYZE in the direction of underestimating the count. This isn't a theoretical benefit -- I have tied it to an issue with the BenchmarkSQL TPC-C implementation [1]. This approach just seems natural to me. VACUUM intrinsically only expects dead tuples/line pointers in not-all-visible pages. So PageIsAllVisible() pages should not be counted here -- they are simply irrelevant, because these stats are for autovacuum, and autovacuum thinks they're irrelevant. What's more, VACUUM currently uses vac_estimate_reltuples() to compensate for the fact that it skips some pages using the visibility map -- pgstat_report_vacuum() expects a whole-relation estimate. But if pgstat_report_vacuum()/pgstat_report_analyze() expected statistics about the general properties of live vs dead tuples (or LP_DEAD items) on not-all-visible pages in the first place, then we wouldn't need to compensate like this. This new approach also buys us the ability to extrapolate a new estimated number of dead tuples using old, stale stats. The stats can be combined with the authoritative/known number of not-all-visible pages right this second, since it's cheap enough to *accurately* determine the total number of not-all-visible pages for a heap relation by calling visibilitymap_count(). My guess is that this would be much more accurate in practice: provided the original average number of dead/live tuples (tuples per not-all-visible block) was still reasonably accurate, the extrapolated "total dead tuples right now" values would also be accurate. I'm glossing over some obvious wrinkles here, such as: what happens to totaldeadrows when 100% of all the pages ANALYZE samples are PageIsAllVisible() pages? I think that it shouldn't be too hard to come up with solutions to those problems (the extrapolation idea already hints at a solution), but for now I'd like to keep the discussion high level. [1] https://postgr.es/m/CAH2-Wz=9R83wcwZcPUH4FVPeDM4znzbzMvp3rt21+XhQWMU8+g@mail.gmail.com -- Peter Geoghegan
Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
From
Robert Haas
Date:
On Sun, Dec 5, 2021 at 12:28 AM Peter Geoghegan <pg@bowt.ie> wrote: > I wonder why we're counting the number of dead tuples (or LP_DEAD stub > items) in the relation as a whole in ANALYZE's acquire_sample_rows() > function. Wouldn't it make more sense to focus on the "live vs dead > tuple properties" of heap pages that are not known to be all-visible > when we generate statistics for our pgstat_report_analyze() report? > These statistic collector stats are only for the benefit of autovacuum > scheduling -- and so they're *consumed* in a way that is totally > different to the nearby pg_statistic stats. I think this could be the right idea. I'm not certain that it is, but it does sound believable. > This new approach also buys us the ability to extrapolate a new > estimated number of dead tuples using old, stale stats. The stats can > be combined with the authoritative/known number of not-all-visible > pages right this second, since it's cheap enough to *accurately* > determine the total number of not-all-visible pages for a heap > relation by calling visibilitymap_count(). My guess is that this would > be much more accurate in practice: provided the original average > number of dead/live tuples (tuples per not-all-visible block) was > still reasonably accurate, the extrapolated "total dead tuples right > now" values would also be accurate. So does this. If some of the table is now all-visible when it wasn't before, it's certainly a good guess that the portions that still aren't have about the same distribution of dead tuples that they did before ... although the other direction is less clear: it seems possible that newly not-all-visible pages have fewer dead tuples than ones which have been not-all-visible for a while. But you have to make some guess. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
From
Peter Geoghegan
Date:
On Mon, Dec 6, 2021 at 12:07 PM Robert Haas <robertmhaas@gmail.com> wrote: > So does this. If some of the table is now all-visible when it wasn't > before, it's certainly a good guess that the portions that still > aren't have about the same distribution of dead tuples that they did > before ... although the other direction is less clear: it seems > possible that newly not-all-visible pages have fewer dead tuples than > ones which have been not-all-visible for a while. But you have to make > some guess. To me, it seems natural to accept and even embrace the inherent uncertainty about the number of dead tuples. We should model our current belief about how many dead tuples are in the table as a probability density function (or something along the same lines). There is a true "sample space" here. Once we focus on not-all-visible pages, using authoritative VM info, many kinds of misestimations are clearly impossible. For example, there are only so many not-all-visible heap pages, and they can only hold so many dead tuples (up to MaxHeapTuplesPerPage). This is a certainty. The number of dead tuples in the table is an inherently dynamic thing, which makes it totally dissimilar to the pg_statistics-based stats. And so a single snapshot of a point in time is inherently much less useful -- we ought to keep a few sets of old statistics within our new pgstat_report_analyze() -- maybe 3 or 5. Each set of statistics includes the total number of relpages at the time, the total number of not-all-visible pages (i.e. interesting pages) at the time, and the average number of live and dead tuples encountered. This is interpreted (along with a current visibilitymap_count()) to get our so-called probability density function (probably not really a PDF, probably something simpler and only vaguely similar) within autovacuum.c. It just occurred to me that it makes zero sense that pgstat_report_vacuum() does approximately the same thing as pgstat_report_analyze() -- we make no attempt to compensate for the fact that the report is made by VACUUM specifically, and so reflects the state of each page in the table immediately after it was processed by VACUUM. ISTM that this makes it much more likely to appear as an underestimate later on -- pgstat_report_vacuum() gets the furthest possible thing from a random sample. Whereas if we had more context (specifically that there are very few or even 0 all-visible pages), it wouldn't hurt us at all, and we wouldn't need to have special pgstat_report_vacuum()-only heuristics. -- Peter Geoghegan
Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
From
Peter Geoghegan
Date:
On Mon, Dec 6, 2021 at 2:37 PM Peter Geoghegan <pg@bowt.ie> wrote: > On Mon, Dec 6, 2021 at 12:07 PM Robert Haas <robertmhaas@gmail.com> wrote: > > So does this. If some of the table is now all-visible when it wasn't > > before, it's certainly a good guess that the portions that still > > aren't have about the same distribution of dead tuples that they did > > before ... although the other direction is less clear: it seems > > possible that newly not-all-visible pages have fewer dead tuples than > > ones which have been not-all-visible for a while. But you have to make > > some guess. > > To me, it seems natural to accept and even embrace the inherent > uncertainty about the number of dead tuples. > The number of dead tuples in the table is an inherently dynamic thing, > which makes it totally dissimilar to the pg_statistics-based stats. > And so a single snapshot of a point in time is inherently much less > useful -- we ought to keep a few sets of old statistics within our new > pgstat_report_analyze() -- maybe 3 or 5. I just realized that I didn't really get around to explicitly connecting this to your point about newly not-all-visible pages being quite different to older ones that ANALYZE has seen -- which is definitely an important consideration. I'll do so now: Keeping some history makes the algorithm "less gullible" (a more useful goal than making it "smarter", at least IMV). Suppose that our starting point is 2 pieces of authoritative information, which are current as of the instant we want to estimate the number of dead tuples for VACUUM: 1. total relation size (relpages), and 2. current not-all-visible-pages count (interesting/countable pages, calculated by taking the "complement" of visibilitymap_count() value). Further suppose we store the same 2 pieces of information in our ANALYZE stats, reporting using pgstat_report_analyze() -- the same 2 pieces of information are stored alongside the actual count of dead tuples and live tuples found on not-all-visible pages. The algorithm avoids believing silly things about dead tuples by considering the delta between each piece of information, particularly the difference between "right now" and "the last time ANALYZE ran and called pgstat_report_analyze()". For example, if item 1/relpages increased by exactly the same number of blocks as item 2/not-all-visible pages (or close enough to it), that is recognized as a pretty strong signal. The algorithm should consider the newly not-all-visible pages as likely to have very few dead tuples. At the same time, the algorithm should not change its beliefs about the concentration of dead tuples in remaining, older not-all-visible pages. This kind of thing will still have problems, no doubt. But I'd much rather err in the direction of over-counting dead tuples like this. The impact of the problem on the workload/autovacuum is a big part of the picture here. Suppose we believe that not-all-visible pages have 20 LP_DEAD items on average, and they turn out to only have 3 or 5. Theoretically we've done the wrong thing by launching autovacuum workers sooner -- we introduce bias. But we also have lower variance over time, which might make it worth it. I also think that it might not really matter at all. It's no great tragedy if we clean up and set pages all-visible in the visibility map a little earlier on average. It might even be a positive thing. The fact that the user expresses the dead-tuple-wise threshold using autovacuum_vacuum_scale_factor is already somewhat arbitrary -- it is based on some pretty iffy assumptions. Even if we greatly overestimate dead tuples with the new algorithm, we're only doing so under circumstances that might have caused autovacuum_vacuum_insert_scale_factor to launch an autovacuum worker anyway. Just setting the visibility map bit has considerable value. -- Peter Geoghegan
Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
From
Robert Haas
Date:
On Mon, Dec 6, 2021 at 8:14 PM Peter Geoghegan <pg@bowt.ie> wrote: > Suppose we believe that not-all-visible pages have 20 LP_DEAD items on > average, and they turn out to only have 3 or 5. Theoretically we've > done the wrong thing by launching autovacuum workers sooner -- we > introduce bias. But we also have lower variance over time, which might > make it worth it. I also think that it might not really matter at all. > It's no great tragedy if we clean up and set pages all-visible in the > visibility map a little earlier on average. It might even be a > positive thing. This doesn't seem convincing. Launching autovacuum too soon surely has costs that someone might not want to pay. Clearly in the degenerate case where we always autovacuum every table every time an autovacuum worker is launched, we have gone insane. So arbitrarily large moves in that direction can't be viewed as unproblematic. > The fact that the user expresses the dead-tuple-wise threshold using > autovacuum_vacuum_scale_factor is already somewhat arbitrary -- it is > based on some pretty iffy assumptions. Even if we greatly overestimate > dead tuples with the new algorithm, we're only doing so under > circumstances that might have caused > autovacuum_vacuum_insert_scale_factor to launch an autovacuum worker > anyway. Just setting the visibility map bit has considerable value. Now, on the other hand, I *most definitely* think autovacuum_vacuum_scale_factor is hogwash. Everything I've seen indicates that, while you do want to wait for a larger number of dead tuples in a large table than in a small one, it's sublinear. I don't know whether it's proportional to sqrt(table_size) or table_size^(1/3) or lg(table_size) or table_size^(0.729837166538), but just plain old table_size is definitely not it. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
From
Peter Geoghegan
Date:
On Mon, Dec 6, 2021 at 6:11 PM Robert Haas <robertmhaas@gmail.com> wrote: > This doesn't seem convincing. Launching autovacuum too soon surely has > costs that someone might not want to pay. Clearly in the degenerate > case where we always autovacuum every table every time an autovacuum > worker is launched, we have gone insane. Unfortunately we already sometimes behave insanely in exactly the way that you describe: https://postgr.es/m/CAH2-Wz=sJm3tm+FpXbyBhEhX5tbz1trQrhG6eOhYk4-+5uL=ww@mail.gmail.com That is, in addition to the problem that I'm highlighting on this thread, we also have the opposite problem: autovacuum chases its tail when it sees dead heap-only tuples that opportunistic pruning can take care of on its own. I bet that both effects sometimes cancel each other out, in weird and unpredictable ways. This effect might be protective at first, and then less protective. > So arbitrarily large moves in > that direction can't be viewed as unproblematic. I certainly wouldn't argue that they are. Just that the current approach of simply counting dead tuples in the table (or trying to, using sampling) and later launching autovacuum (when dead tuples crosses a pretty arbitrary threshold) has many problems -- problems that make us either run autovacuum too aggressively, and not aggressively enough (relative to what the docs suggest is supposed to happen). > Now, on the other hand, I *most definitely* think > autovacuum_vacuum_scale_factor is hogwash. Everything I've seen > indicates that, while you do want to wait for a larger number of dead > tuples in a large table than in a small one, it's sublinear. I don't > know whether it's proportional to sqrt(table_size) or table_size^(1/3) > or lg(table_size) or table_size^(0.729837166538), but just plain old > table_size is definitely not it. I think that it'll vary considerably, based on many factors. Making the precise threshold "open to interpretation" to some degree (by autovacuum.c) seems like it might help us with future optimizations. It's hard to define a break-even point for launching an autovacuum worker. I think it would be more productive to come up with a design that at least doesn't go completely off the rails in various specific ways. I also think that our problem is not so much that we don't have accurate statistics about dead tuples (though we surely don't have much accuracy). The main problem seems to be that there are various specific, important ways in which the distribution of dead tuples may matter (leading to various harmful biases). And so it seems reasonable to fudge how we interpret dead tuples with the intention of capturing some of that, as a medium term solution. Something that considers the concentration of dead tuples in heap pages seems promising. -- Peter Geoghegan
Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
From
Robert Haas
Date:
On Mon, Dec 6, 2021 at 9:42 PM Peter Geoghegan <pg@bowt.ie> wrote: > On Mon, Dec 6, 2021 at 6:11 PM Robert Haas <robertmhaas@gmail.com> wrote: > > This doesn't seem convincing. Launching autovacuum too soon surely has > > costs that someone might not want to pay. Clearly in the degenerate > > case where we always autovacuum every table every time an autovacuum > > worker is launched, we have gone insane. > > Unfortunately we already sometimes behave insanely in exactly the way > that you describe: > > https://postgr.es/m/CAH2-Wz=sJm3tm+FpXbyBhEhX5tbz1trQrhG6eOhYk4-+5uL=ww@mail.gmail.com In the same ballpark is http://rhaas.blogspot.com/2020/02/useless-vacuuming.html > It's hard to define a break-even point for launching an autovacuum > worker. I think it would be more productive to come up with a design > that at least doesn't go completely off the rails in various specific > ways. I think that's a good observation. I think the current autovacuum algorithm works pretty well when things are normal. But in extreme scenarios it does not degrade gracefully. The vacuuming-over-and-over-without-doing-anything phenomenon is an example of that. Another example is the user who creates 10,000 databases and we're happy to divide the 60s-autovacuum_naptime by 10,000 and try to launch a worker every 0.6 ms. A third example is vacuuming the tables from pg_class in physical order on disk, so that a table that is 1 XID past the wraparound limit can result in a long delay vacuuming a table that is bloating quickly, or conversely a table that is bloating very slowly but has just crawled over the threshold for a regular vacuum gets processed before one that is threatening an imminent wraparound shutdown. I think these are all pathological cases that a well-informed human can easily recognize and handle in an intelligent manner, and it doesn't seem crazy to program those responses into the computer in some way. > I also think that our problem is not so much that we don't have > accurate statistics about dead tuples (though we surely don't have > much accuracy). The main problem seems to be that there are various > specific, important ways in which the distribution of dead tuples may > matter (leading to various harmful biases). And so it seems reasonable > to fudge how we interpret dead tuples with the intention of capturing > some of that, as a medium term solution. Something that considers the > concentration of dead tuples in heap pages seems promising. I am less convinced about this part. It sort of sounds like you're saying - it doesn't really matter whether the numbers we gather are accurate, just that they produce the correct results. If the information we currently gather wouldn't produce the right results even if it were fully accurate, that to me suggests that we're gathering the wrong information, and we should gather something else. For example, we could attack the useless-vacuuming problem by having each vacuum figure out - and store - the oldest XID that could potentially be worth using as a cutoff for vacuuming that table, and refusing to autovacuum that table again until we'd be able to use a cutoff >= that value. I suppose this would be the oldest of (a) any XID that exists in the table on a tuple that we judged recently dead, (b) any XID that is currently-running, and (c) the next XID. I also accept that knowing the concentration of dead tuples on pages that contain at least 1 dead tuple could be interesting. I've felt for a while that it's a mistake to know how many dead tuples there are but not how many pages contain them, because that affects both the amount of I/O required to vacuum and also how much need we have to set VM bits. I'm not sure I would have approached gathering that information in the way that you're proposing here, but I'm not deeply against it, either. I do think that we should try to keep it as a principle that whatever we do gather, we should try our best to make accurate. If that doesn't work well, then gather different stuff instead. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
From
Peter Geoghegan
Date:
On Tue, Dec 7, 2021 at 7:58 AM Robert Haas <robertmhaas@gmail.com> wrote: > I think that's a good observation. I think the current autovacuum > algorithm works pretty well when things are normal. But in extreme > scenarios it does not degrade gracefully. +1 to all of the specific examples you go on to describe. > > I also think that our problem is not so much that we don't have > > accurate statistics about dead tuples (though we surely don't have > > much accuracy). The main problem seems to be that there are various > > specific, important ways in which the distribution of dead tuples may > > matter (leading to various harmful biases). And so it seems reasonable > > to fudge how we interpret dead tuples with the intention of capturing > > some of that, as a medium term solution. Something that considers the > > concentration of dead tuples in heap pages seems promising. > > I am less convinced about this part. It sort of sounds like you're > saying - it doesn't really matter whether the numbers we gather are > accurate, just that they produce the correct results. That's not quite it. More like: I think that it would be reasonable to define dead tuples abstractly, in a way that's more useful but nevertheless cannot diverge too much from the current definition. We should try to capture "directionality", with clear error bounds. This won't represent the literal truth, but it will be true in spirit (or much closer). For example, why should we count dead heap-only tuples from earlier in a HOT chain, even when we see no evidence that opportunistic HOT pruning can't keep up on that page? Since we actually care about the direction of things, not just the present state of things, we'd be justified in completely ignoring those dead tuples. Similarly, it might well make sense to give more weight to concentrations of LP_DEAD items on a page -- that is a signal that things are not going well *at the level of the page*. Not so much when you have a few LP_DEAD stubs, but certainly when you have dozens of them on one page, or even hundreds. And so ISTM that the conditions of the page should influence how we interpret/count that page's dead tuples, in both directions (interpret the page as having more dead tuples, or fewer). We all know that there isn't a sensible answer to the question "if the abstract cost units used in the optimizer say that one sequential page access is 4x cheaper than one random page access, then what's the difference between accessing 10 random pages sequentially in close succession, versus accessing the same 10 pages randomly?". But at the same time, we can say for sure that random is more expensive to *some* degree, but certainly never by multiple orders of magnitude. The model is imperfect, to be sure, but it is better than many alternative models that are also technically possible. We need *some* cost model for a cost-based optimizer, and it is better to be approximately correct than exactly wrong. Again, the truly important thing is to not be totally wrong about any one thing. Another way of putting it is that I am willing to accept a more biased definition of dead tuples, if that lowers the variance: https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff We have some certainty about what is possible in a world in which we use the visibility map directly, and so our final estimate should never be wildly unreasonable -- the abstract idea of dead tuples can still be anchored to the physical reality. As with the optimizer and its cost units, we don't have that many degrees of freedom when autovacuum.c launches a worker, any I don't see that changing -- we must ultimately decide to launch or not launch an autovacuum worker (for any given table) each time the autovacuum launcher wakes up. So we're practically forced to come up with a model that has some abstract idea of one kind of bloat/dead tuple. I would say that we already have one, in fact -- it's just not a very good one because it doesn't take account of obviously-relevant factors like HOT. It could quite easily be less wrong. > If the > information we currently gather wouldn't produce the right results > even if it were fully accurate, that to me suggests that we're > gathering the wrong information, and we should gather something else. I think that counting dead tuples is useful, but not quite sufficient on its own. At the same time, we still need something that works like a threshold -- because that's just how the autovacuum.c scheduling works. It's a go/no-go decision, regardless of how the decision is made. > For example, we could attack the useless-vacuuming problem by having > each vacuum figure out - and store - the oldest XID that could > potentially be worth using as a cutoff for vacuuming that table, and > refusing to autovacuum that table again until we'd be able to use a > cutoff >= that value. I suppose this would be the oldest of (a) any > XID that exists in the table on a tuple that we judged recently dead, > (b) any XID that is currently-running, and (c) the next XID. I agree that that's a good idea, but it seems like something that only augments what I'm talking about. I suppose that it might become necessary if we get something along the lines of what we've been discussing, though. > I also accept that knowing the concentration of dead tuples on pages > that contain at least 1 dead tuple could be interesting. I've felt for > a while that it's a mistake to know how many dead tuples there are but > not how many pages contain them, because that affects both the amount > of I/O required to vacuum and also how much need we have to set VM > bits. Right. And as I keep saying, the truly important thing is to not *completely* ignore any relevant dimension of cost. I just don't want to ever be wildly wrong -- not even once. We can tolerate being somewhat less accurate all the time (not that we necessarily have to make a trade-off), but we cannot tolerate pathological behavior. Of course I include new/theoretical pathological behaviors here (not just the ones we know about today). > I'm not sure I would have approached gathering that information > in the way that you're proposing here, but I'm not deeply against it, > either. I do think that we should try to keep it as a principle that > whatever we do gather, we should try our best to make accurate. If > that doesn't work well, then gather different stuff instead. It's important to be accurate, but it's also important to be tolerant of model error, which is inevitable. We should make pragmatic decisions about what kinds of errors our new model will have. And it should have at least a rudimentary ability to learn from its mistakes. -- Peter Geoghegan
Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
From
Robert Haas
Date:
On Tue, Dec 7, 2021 at 2:13 PM Peter Geoghegan <pg@bowt.ie> wrote: > For example, why should we count dead heap-only tuples from earlier in > a HOT chain, even when we see no evidence that opportunistic HOT > pruning can't keep up on that page? Since we actually care about the > direction of things, not just the present state of things, we'd be > justified in completely ignoring those dead tuples. Similarly, it > might well make sense to give more weight to concentrations of LP_DEAD > items on a page -- that is a signal that things are not going well *at > the level of the page*. Not so much when you have a few LP_DEAD stubs, > but certainly when you have dozens of them on one page, or even > hundreds. And so ISTM that the conditions of the page should influence > how we interpret/count that page's dead tuples, in both directions > (interpret the page as having more dead tuples, or fewer). Well... I mean, I think we're almost saying the same thing, then, but I think you're saying it more confusingly. I have no objection to counting the number of dead HOT chains rather than the number of dead tules, because that's what affects the index contents, but there's no need to characterize that as "not the literal truth." There's nothing fuzzy or untrue about it if we simply say that's what we're doing. > Right. And as I keep saying, the truly important thing is to not > *completely* ignore any relevant dimension of cost. I just don't want > to ever be wildly wrong -- not even once. We can tolerate being > somewhat less accurate all the time (not that we necessarily have to > make a trade-off), but we cannot tolerate pathological behavior. Of > course I include new/theoretical pathological behaviors here (not just > the ones we know about today). Sure, but we don't *need* to be less accurate, and I don't think we even *benefit* from being less accurate. If we do something like count dead HOT chains instead of dead tuples, let's not call that a less-accurate count of dead tuples. Let's call it an accurate count of dead HOT chains. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
From
Peter Geoghegan
Date:
On Tue, Dec 7, 2021 at 12:27 PM Robert Haas <robertmhaas@gmail.com> wrote: > Well... I mean, I think we're almost saying the same thing, then, but > I think you're saying it more confusingly. I have no objection to > counting the number of dead HOT chains rather than the number of dead > tules, because that's what affects the index contents, but there's no > need to characterize that as "not the literal truth." Works for me! > Sure, but we don't *need* to be less accurate, and I don't think we > even *benefit* from being less accurate. If we do something like count > dead HOT chains instead of dead tuples, let's not call that a > less-accurate count of dead tuples. Let's call it an accurate count of > dead HOT chains. Fair enough, but even then we still ultimately have to generate a final number that represents how close we are to a configurable "do an autovacuum" threshold (such as an autovacuum_vacuum_scale_factor-based threshold) -- the autovacuum.c side of this (the consumer side) fundamentally needs the model to reduce everything to a one dimensional number (even though the reality is that there isn't just one dimension). This single number (abstract bloat units, abstract dead tuples, whatever) is a function of things like the count of dead HOT chains, perhaps the concentration of dead tuples on heap pages, whatever -- but it's not the same thing as any one of those things we count. I think that this final number needs to be denominated in abstract units -- we need to call those abstract units *something*. I don't care what that name ends up being, as long as it reflects reality. -- Peter Geoghegan
Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
From
Robert Haas
Date:
On Tue, Dec 7, 2021 at 3:44 PM Peter Geoghegan <pg@bowt.ie> wrote: > Fair enough, but even then we still ultimately have to generate a > final number that represents how close we are to a configurable "do an > autovacuum" threshold (such as an autovacuum_vacuum_scale_factor-based > threshold) -- the autovacuum.c side of this (the consumer side) > fundamentally needs the model to reduce everything to a one > dimensional number (even though the reality is that there isn't just > one dimension). This single number (abstract bloat units, abstract > dead tuples, whatever) is a function of things like the count of dead > HOT chains, perhaps the concentration of dead tuples on heap pages, > whatever -- but it's not the same thing as any one of those things we > count. > > I think that this final number needs to be denominated in abstract > units -- we need to call those abstract units *something*. I don't > care what that name ends up being, as long as it reflects reality. If we're only trying to decide whether or not to vacuum a table, we don't need units: the output is a Boolean. If we're trying to decide on an order in which to vacuum tables, then we need units. But such units can't be anything related to dead tuples, because vacuum can be needed based on XID age, or MXID age, or dead tuples. The units would have to be something like abstract vacuum-urgency units (if higher is more urgent) or abstract remaining-headroom-beform-catastrophe units (if lower is more urgent). Ignoring wraparound considerations for a moment, I think that we might want to consider having multiple thresholds and vacuuming the table if any one of them are met. For example, suppose a table qualifies for vacuum when %-of-not-all-visible-pages > some-threshold, or alternatively when %-of-index-tuples-thought-to-be-dead > some-other-threshold. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
From
Peter Geoghegan
Date:
On Tue, Dec 7, 2021 at 1:59 PM Robert Haas <robertmhaas@gmail.com> wrote: > If we're only trying to decide whether or not to vacuum a table, we > don't need units: the output is a Boolean. I was imagining a world in which we preserve the autovacuum_vacuum_scale_factor design, but interpret it creatively (but never too creatively) -- an incremental approach seems best to me. We can even sanity check our abstract bloat unit calculation, in case the page-level sampling aggregates into a totally wild number of dead tuples (based in part on the current number of not-all-visible heap pages) -- so the abstract units are always anchored to the old idea of dead tuples. Maybe this isn't the best approach, but at least it addresses compatibility. *Any* approach based on sampling relatively few random blocks (to look for signs of bloat) is inherently prone to hugely underestimating the extent of bloat (which is what we see in TPC-C). I am primarily concerned about compensating for the inherent limitations that go with that. To me it seems inappropriate to make statistical inferences about dead tuples based on a random snapshot of random blocks (usually only a tiny minority). It is not only possible for the picture to change utterly -- it is routine, expected, and the whole entire point. The entire intellectual justification for statistical sampling (that mostly works for optimizer stats) just doesn't carry over to autovacuum stats, for many reasons. At the same time, I don't have any fundamentally better starting point. That's how I arrived at the idea of probabilistic modeling based on several recent snapshots from ANALYZE. The statistics are often rubbish, whether or not we like it, and regardless of how we decide to count things on each page. And so it's entirely reasonable to not limit the algorithm to concerns about the state of things -- the actual exposure of the system to harm (from overlooking harmful bloat) is also relevant. > If we're trying to decide > on an order in which to vacuum tables, then we need units. But such > units can't be anything related to dead tuples, because vacuum can be > needed based on XID age, or MXID age, or dead tuples. The units would > have to be something like abstract vacuum-urgency units (if higher is > more urgent) or abstract remaining-headroom-beform-catastrophe units > (if lower is more urgent). I like that idea. But I wonder if they should be totally unrelated. If we're close to the "emergency" XID threshold, and also close to the "bloat units" threshold, then it seems reasonable to put our finger on the scales, and do an autovacuum before either threshold is crossed. I'm not sure how that should work, but I find the idea of interpreting the "bloat units" creatively/probabilistically appealing. We're not actually making things up by erring in the direction of launching an autovacuum worker, because we don't actually know the number of dead tuples (or whatever) anyway -- we're just recognizing the very real role of chance and noise. That is, if the "bloat units" threshold might well not have been crossed due to random chance (noise, the phase of the moon), why should we defer to random chance? If we have better information to go on, like the thing with the XID threshold, why not prefer that? Similarly, if we see that the system as a whole is not very busy right now, why not consider that, just a little, if the only downside is that we'll ignore a demonstrably noise-level signal (from the stats)? That's the high level intuition behind making "bloat units" a probability density function, and not just a simple expected value. Teaching the autovacuum.c scheduler to distinguish signal from noise could be very valuable, if it enables opportunistic batching of work, or off-hours work. We don't have to respect noise. The devil is in the details, of course. -- Peter Geoghegan