Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation |
Date | |
Msg-id | CAH2-Wz=EZVVihiz_sOfyaRn0b9AHNrWOtELjXfUHGeKTL=QBog@mail.gmail.com Whole thread Raw |
In response to | Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
On Thu, Jan 19, 2023 at 3:38 PM Andres Freund <andres@anarazel.de> wrote: > Another version of this could be to integrate analyze.c's scan more closely > with vacuum all the time. It's a bit bonkers that we often sequentially read > blocks, evict them from shared buffers if we read them, just to then > afterwards do random IO for blocks we've already read. That's imo what we > eventually should do, but clearly it's not a small project. Very often, what we're really missing in VACUUM is high level context. That's true of what you say here, about analyze.c, as well as complaints like your vac_estimate_reltuples complaint. The problem scenarios involving vac_estimate_reltuples all involve repeating the same silly action again and again, never realizing that that's what's going on. I've found it very useful to think of one VACUUM as picking up where the last one left off for my work on freezing. This seems related to pre-autovacuum historical details. VACUUM shouldn't really be a command in the same way that CREATE INDEX is a command. I do think that we need to retain a VACUUM command in some form, but it should be something pretty close to a command that just enqueues off-schedule autovacuums. That can do things like coalesce duplicate requests into one. Anyway, I am generally in favor of a design that makes VACUUM and ANALYZE things that are more or less owned by autovacuum. It should be less and less of a problem to blur the distinction between VACUUM and ANALYZE under this model, in each successive release. These distinctions are quite unhelpful, overall, because they make it hard for autovacuum scheduling to work at the whole-system level. > > This wouldn't have to happen every time, but it would happen fairly often. > > Do you have a mechanism for that in mind? Just something vacuum_count % 10 == > 0 like? Or remember scanned_pages in pgstats and re-computing I was thinking of something very simple like that, yes. > I think it'd be fine to just use analyze.c and pass in an option to not > compute column and inheritance stats. That could be fine. Just as long as it's not duplicative in an obvious way. > > Presumably you'll want to add the same I/O prefetching logic to this > > cut-down version, just for example. Since without that there will be > > no competition between it and ANALYZE proper. Besides which, isn't it > > kinda wasteful to not just do a full ANALYZE? Sure, you can avoid > > detoasting overhead that way. But even still. > > It's not just that analyze is expensive, I think it'll also be confusing if > the column stats change after a manual VACUUM without ANALYZE. Possibly, but it doesn't have to happen there. It's not like the rules aren't a bit different compared to autovacuum already. For example, the way TOAST tables are handled by the VACUUM command versus autovacuum. Even if it's valuable to maintain this kind of VACUUM/autovacuum parity (which I tend to doubt), doesn't the same argument work almost as well with whatever stripped down version you come up with? It's also confusing that a manual VACUUM command will be doing an ANALYZE-like thing. Especially in cases where it's really expensive relative to the work of VACUUM, because VACUUM scanned so few pages. You just have to make some kind of trade-off. -- Peter Geoghegan
pgsql-hackers by date: