Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Date
Msg-id CAH2-Wz=EZVVihiz_sOfyaRn0b9AHNrWOtELjXfUHGeKTL=QBog@mail.gmail.com
Whole thread Raw
In response to Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Thu, Jan 19, 2023 at 3:38 PM Andres Freund <andres@anarazel.de> wrote:
> Another version of this could be to integrate analyze.c's scan more closely
> with vacuum all the time. It's a bit bonkers that we often sequentially read
> blocks, evict them from shared buffers if we read them, just to then
> afterwards do random IO for blocks we've already read. That's imo what we
> eventually should do, but clearly it's not a small project.

Very often, what we're really missing in VACUUM is high level context.
That's true of what you say here, about analyze.c, as well as
complaints like your vac_estimate_reltuples complaint. The problem
scenarios involving vac_estimate_reltuples all involve repeating the
same silly action again and again, never realizing that that's what's
going on. I've found it very useful to think of one VACUUM as picking
up where the last one left off for my work on freezing.

This seems related to pre-autovacuum historical details. VACUUM
shouldn't really be a command in the same way that CREATE INDEX is a
command. I do think that we need to retain a VACUUM command in some
form, but it should be something pretty close to a command that just
enqueues off-schedule autovacuums. That can do things like coalesce
duplicate requests into one.

Anyway, I am generally in favor of a design that makes VACUUM and
ANALYZE things that are more or less owned by autovacuum. It should be
less and less of a problem to blur the distinction between VACUUM and
ANALYZE under this model, in each successive release. These
distinctions are quite unhelpful, overall, because they make it hard
for autovacuum scheduling to work at the whole-system level.

> > This wouldn't have to happen every time, but it would happen fairly often.
>
> Do you have a mechanism for that in mind? Just something vacuum_count % 10 ==
> 0 like? Or remember scanned_pages in pgstats and re-computing

I was thinking of something very simple like that, yes.

> I think it'd be fine to just use analyze.c and pass in an option to not
> compute column and inheritance stats.

That could be fine. Just as long as it's not duplicative in an obvious way.

> > Presumably you'll want to add the same I/O prefetching logic to this
> > cut-down version, just for example. Since without that there will be
> > no competition between it and ANALYZE proper. Besides which, isn't it
> > kinda wasteful to not just do a full ANALYZE? Sure, you can avoid
> > detoasting overhead that way. But even still.
>
> It's not just that analyze is expensive, I think it'll also be confusing if
> the column stats change after a manual VACUUM without ANALYZE.

Possibly, but it doesn't have to happen there. It's not like the rules
aren't a bit different compared to autovacuum already. For example,
the way TOAST tables are handled by the VACUUM command versus
autovacuum.

Even if it's valuable to maintain this kind of VACUUM/autovacuum
parity (which I tend to doubt), doesn't the same argument work almost
as well with whatever stripped down version you come up with? It's
also confusing that a manual VACUUM command will be doing an
ANALYZE-like thing. Especially in cases where it's really expensive
relative to the work of VACUUM, because VACUUM scanned so few pages.
You just have to make some kind of trade-off.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Use appendStringInfoSpaces more
Next
From: Greg Stark
Date:
Subject: Re: Unicode grapheme clusters