Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations |
Date | |
Msg-id | CAH2-Wzm8Uc7dbUcDvgqMmRr0EaesTdLXypGRBUruKYYMnY362w@mail.gmail.com Whole thread Raw |
In response to | Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
On Sun, Apr 3, 2022 at 12:05 PM Andres Freund <andres@anarazel.de> wrote: > Just saw that you committed: Wee! I think this will be a substantial > improvement for our users. I hope so! I think that it's much more useful as the basis for future work than as a standalone thing. Users of Postgres 15 might not notice a huge difference. But it opens up a lot of new directions to take VACUUM in. I would like to get rid of anti-wraparound VACUUMs and aggressive VACUUMs in Postgres 16. This isn't as radical as it sounds. It seems quite possible to find a way for *every* VACUUM to become aggressive progressively and dynamically. We'll still need to have autovacuum.c know about wraparound, but it should just be just another threshold, not fundamentally different to the other thresholds (except that it's still used when autovacuum is nominally disabled). The behavior around autovacuum cancellations is probably still going to be necessary when age(relfrozenxid) gets too high, but it shouldn't be conditioned on what age(relfrozenxid) *used to be*, when the autovacuum started. That could have been a long time ago. It should be based on what's happening *right now*. > While I was writing the above I, again, realized that it'd be awfully nice to > have some accumulated stats about (auto-)vacuum's effectiveness. For us to get > feedback about improvements more easily and for users to know what aspects > they need to tune. Strongly agree. And I'm excited about the potential of the shared memory stats patch to enable more thorough instrumentation, which allows us to improve things with feedback that we just can't get right now. VACUUM is still too complicated -- that makes this kind of analysis much harder, even for experts. You need more continuous behavior to get value from this kind of analysis. There are too many things that might end up mattering, that really shouldn't ever matter. Too much potential for strange illogical discontinuities in performance over time. Having only one type of VACUUM (excluding VACUUM FULL) will be much easier for users to reason about. But I also think that it'll be much easier for us to reason about. For example, better autovacuum scheduling will be made much easier if autovacuum.c can just assume that every VACUUM operation will do the same amount of work. (Another problem with the scheduling is that it uses ANALYZE statistics (sampling) in a way that just doesn't make any sense for something like VACUUM, which is an inherently dynamic and cyclic process.) None of this stuff has to rely on my patch for freezing. We don't necessarily have to make every VACUUM advance relfrozenxid to do all this. The important point is that we definitely shouldn't be putting off *all* freezing of all-visible pages in non-aggressive VACUUMs (or in VACUUMs that are not expected to advance relfrozenxid). Even a very conservative implementation could achieve all this; we need only spread out the burden of freezing all-visible pages over time, across multiple VACUUM operations. Make the behavior continuous. > Knowing how many times a table was vacuumed doesn't really tell that much, and > requiring to enable log_autovacuum_min_duration and then aggregating those > results is pretty painful (and version dependent). Yeah. Ideally we could avoid making the output of log_autovacuum_min_duration into an API, by having a real API instead. The output probably needs to evolve some more. A lot of very basic information wasn't there until recently. > If we just collected something like: > - number of heap passes > - time spent heap vacuuming > - number of index scans > - time spent index vacuuming > - time spent delaying You forgot FPIs. > - percentage of non-yet-removable vs removable tuples I think that we should address this directly too. By "taking a snapshot of the visibility map", so we at least don't scan/vacuum heap pages that don't really need it. This is also valuable because it makes slowing down VACUUM (maybe slowing it down a lot) have fewer downsides. At least we'll have "locked in" our scanned_pages, which we can figure out in full before we really scan even one page. > it'd start to be a heck of a lot easier to judge how well autovacuum is > coping. What about the potential of the shared memory stats stuff to totally replace the use of ANALYZE stats in autovacuum.c? Possibly with help from vacuumlazy.c, and the visibility map? I see a lot of potential for exploiting the visibility map more, both within vacuumlazy.c itself, and for autovacuum.c scheduling [1]. I'd probably start with the scheduling stuff, and only then work out how to show users more actionable information. [1] https://postgr.es/m/CAH2-Wzkt9Ey9NNm7q9nSaw5jdBjVsAq3yvb4UT4M93UaJVd_xg@mail.gmail.com -- Peter Geoghegan
pgsql-hackers by date: