Re: decoupling table and index vacuum - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: decoupling table and index vacuum |
Date | |
Msg-id | CAH2-WznpDabB_usAvzUSbQRET-Y2z0DJeDv8jAgU1C_c3=OWgg@mail.gmail.com Whole thread Raw |
In response to | Re: decoupling table and index vacuum (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: decoupling table and index vacuum
|
List | pgsql-hackers |
On Fri, Sep 24, 2021 at 11:48 AM Robert Haas <robertmhaas@gmail.com> wrote: > Actually, I have. I've been focusing on trying to create a general > infrastructure for conveyor belt storage. An incomplete and likely > quite buggy version of this can be found here: > > https://git.postgresql.org/gitweb/?p=users/rhaas/postgres.git;a=shortlog;h=refs/heads/conveyor That's great news! I think that this is the right high level direction. > Mark Dilger has been helping me debug it, but it's still very early > days. I was planning to wait until it was a little more baked before > posting it to the list, but since you asked... Reminds me of my FSM patch, in a way. It's ambitious, and still very rough, but maybe I should bite the bullet and post it as a POC soon. > Once that infrastructure is sufficiently mature, then the next step, I > think, would be to try to use it to store dead TIDs. +1. > And then after that, one has to think about how autovacuum scheduling > ought to work in a world where table vacuuming and index vacuuming are > decoupled. I'm excited about the possibility of using this infrastructure as a springboard for driving autovacuum's behavior using more or less authoritative information, rather than dubious statistics that can consistently lead us down the wrong path. ANALYZE style statistics are something that can only work under specific conditions that take their obvious limitations into account -- and even there (even within the optimizer) it's amazing that they work as well as they do. I fear that we assumed that the statistics driving autovacuum were good enough at some point in the distant past, and never really validated that assumption. Perhaps because anti-wraparound VACUUM was *accidentally* protective. The scheduling of autovacuum is itself a big problem for the two big BenchmarkSQL tables I'm always going on about -- though it did get a lot better with the introduction of the autovacuum_vacuum_insert_scale_factor stuff in Postgres 13. I recently noticed that the tables have *every* autovacuum driven by inserts (i.e. by the new autovacuum_vacuum_scale_factor stuff), and never by updates -- even though updates obviously produce significant bloat in the two tables. BenchmarkSQL on Postgres was far worse than it is now a few releases ago [1], and I think that this stats business was a big factor (on top of everything else). I can clearly see that autovacuum_vacuum_scale_factor is certainly accidentally protective with BenchmarkSQL today, in a way that wasn't particularly anticipated by anybody. The fact that the intellectual justifications for a lot of these things are so vague concerns me. For example, why do we apply autovacuum_vacuum_scale_factor based on reltuples at the end of the last VACUUM? That aspect of the design will make much less sense once we have this decoupling in place. Even with the happy accident of autovacuum_vacuum_insert_scale_factor helping BenchmarkSQL, the conventional dead tuples based approach to VACUUM still doesn't drive autovacuum sensibly -- we still systematically undercount LP_DEAD stubs because (with this workload) they're systemically concentrated in relatively few heap pages. So if this was a real app, the DBA would somehow have to work out that they should aggressively tune autovacuum_vacuum_scale_factor to clean up bloat from updates. I doubt any DBA could ever figure that out, because it doesn't make any sense. The problem goes both ways: in addition to undercounting dead tuples, we effectively overcount, which can lead to autovacuum chasing its own tail [2]. I think that we could do *way* better than we do today without enormous effort, and I think that it really matters. Maybe we could select from a few standard models for autovacuum scheduling using Bayesian inference -- converge on the more predictive model for a given table over time, using actual outcomes for each autovacuum. Be sensitive to how LP_DEAD stub line pointers can become concentrated in relatively few heap pages, and stuff like that. Maybe keep a little history to work off of. The problem with the current model is not that it might be wrong. The problem is that it might *never* be right (for a given table). The scheduling never learns any lessons, because it's fundamentally static -- it ought to be dynamic. How things change is much more informative than where things are at an arbitrary point in time. [1] https://www.postgresql.org/message-id/flat/0265f9e2-3e32-e67d-f106-8abde596c0e4%40commandprompt.com [2] https://postgr.es/m/CAH2-Wz=sJm3tm+FpXbyBhEhX5tbz1trQrhG6eOhYk4-+5uL=ww@mail.gmail.com -- Peter Geoghegan
pgsql-hackers by date: