Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation |
Date | |
Msg-id | CAH2-Wzk3KKS-82n5N5tR6JZRWS-XHmUE=BgiVPQA4S0BM2tNXg@mail.gmail.com Whole thread Raw |
In response to | Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation |
List | pgsql-hackers |
On Wed, Jan 18, 2023 at 11:02 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Jan 18, 2023 at 1:31 PM Peter Geoghegan <pg@bowt.ie> wrote: > > pgstat_report_analyze() will totally override the > > tabentry->dead_tuples information that drives autovacuum.c, based on > > an estimate derived from a random sample -- which seems to me to be an > > approach that just doesn't have any sound theoretical basis. > > Yikes. I think we don't have a choice but to have a method to correct > the information somehow, because AFAIK the statistics system is not > crash-safe. But that approach does seem to carry significant risk of > overwriting correct information with wrong information. This situation is really quite awful, so maybe we should do something about it soon, in the scope of the Postgres 16 work on autovacuum that is already underway. In fact I think that the problem here is so bad that even something slightly less naive would be far more effective. You're right to point out that pgstat_report_analyze needs to update the stats in case there is a hard crash, of course. But there is plenty of context with which to make better decisions close at hand. For example, I bet that pgstat_report_analyze already does a pretty good job of estimating live_tuples -- my spiel about statistics mostly doesn't apply to live_tuples. Suppose that we notice that its new estimate for live_tuples approximately matches what the stats subsystem already thought about live_tuples, while dead_tuples is far far lower. We shouldn't be so credulous as to believe the new dead_tuples estimate at that point. Perhaps we can change nothing about dead_tuples at all when this happens. Or perhaps we can set dead_tuples to a value that is scaled from the old estimate. The new dead_tuples value could be derived by taking the ratio of the old live_tuples to the old dead_tuples, and then using that to scale from the new live_tuples. This is just a first pass, to see what you and others think. Even very simple heuristics seem like they could make things much better. Another angle of attack is the PD_ALL_VISIBLE page-level bit, which acquire_sample_rows() could pay attention to -- especially in larger tables, where the difference between all pages and just the all-visible subset of pages is most likely to matter. The more sampled pages that had PD_ALL_VISIBLE set, the less credible the new dead_tuples estimate will be (relative to existing information), and so pgstat_report_analyze() should prefer the new estimate over the old one in proportion to that. We probably shouldn't change anything about pgstat_report_vacuum as part of this effort to make pgstat_report_analyze less terrible in the near term. It certainly has its problems (what is true for pages that VACUUM scanned at the end of VACUUM is far from representative for new pages!), it's probably much less of a contributor to issues like those that Andres reports seeing. BTW, one of the nice things about the insert-driven autovacuum stats is that pgstat_report_analyze doesn't have an opinion about how many tuples were inserted since the last VACUUM ran. It does have other problems, but they seem less serious to me. > Yep. I think what we should try to evaluate is which number is > furthest from the truth. My guess is that the threshold is so high > relative to what a reasonable value would be that you can't get any > benefit out of making the dead tuple count more accurate. Like, if the > threshold is 100x too high, or something, then who cares how accurate > the dead tuples number is? Right. Or if we don't make any reasonable distinction between LP_DEAD items and dead heap-only tuples, then the total number of both things together may matter very little. Better to be approximately correct than exactly wrong. Deliberately introducing a bias to lower the variance is a perfectly valid approach. > Maybe that's even true in some scenarios, but I bet that > it's never the issue when people have really big tables. The fact that > I'm OK with 10MB of bloat in my 100MB table doesn't mean I'm OK with > 1TB of bloat in my 10TB table. Among other problems, I can't even > vacuum away that much bloat in one index pass, because autovacuum > can't use enough work memory for that. Also, the absolute space > wastage matters. I certainly agree with all that. FWIW, part of my mental model with VACUUM is that the rules kind of change in the case of a big table. We're far more vulnerable to issues such as (say) waiting for cleanup locks because the overall cadence used by autovacuum is so infrequently relative to everything else. There are more opportunities for things to go wrong, worse consequences when they do go wrong, and greater potential for the problems to compound. -- Peter Geoghegan
pgsql-hackers by date: