Re: decoupling table and index vacuum - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: decoupling table and index vacuum
Date
Msg-id CAH2-WznpDabB_usAvzUSbQRET-Y2z0DJeDv8jAgU1C_c3=OWgg@mail.gmail.com
Whole thread Raw
In response to Re: decoupling table and index vacuum  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: decoupling table and index vacuum
List pgsql-hackers
On Fri, Sep 24, 2021 at 11:48 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Actually, I have. I've been focusing on trying to create a general
> infrastructure for conveyor belt storage. An incomplete and likely
> quite buggy version of this can be found here:
>
> https://git.postgresql.org/gitweb/?p=users/rhaas/postgres.git;a=shortlog;h=refs/heads/conveyor

That's great news! I think that this is the right high level direction.

> Mark Dilger has been helping me debug it, but it's still very early
> days. I was planning to wait until it was a little more baked before
> posting it to the list, but since you asked...

Reminds me of my FSM patch, in a way. It's ambitious, and still very
rough, but maybe I should bite the bullet and post it as a POC soon.

> Once that infrastructure is sufficiently mature, then the next step, I
> think, would be to try to use it to store dead TIDs.

+1.

> And then after that, one has to think about how autovacuum scheduling
> ought to work in a world where table vacuuming and index vacuuming are
> decoupled.

I'm excited about the possibility of using this infrastructure as a
springboard for driving autovacuum's behavior using more or less
authoritative information, rather than dubious statistics that can
consistently lead us down the wrong path. ANALYZE style statistics are
something that can only work under specific conditions that take their
obvious limitations into account -- and even there (even within the
optimizer) it's amazing that they work as well as they do. I fear that
we assumed that the statistics driving autovacuum were good enough at
some point in the distant past, and never really validated that
assumption. Perhaps because anti-wraparound VACUUM was *accidentally*
protective.

The scheduling of autovacuum is itself a big problem for the two big
BenchmarkSQL tables I'm always going on about -- though it did get a
lot better with the introduction of the
autovacuum_vacuum_insert_scale_factor stuff in Postgres 13. I recently
noticed that the tables have *every* autovacuum driven by inserts
(i.e. by the new autovacuum_vacuum_scale_factor stuff), and never by
updates -- even though updates obviously produce significant bloat in
the two tables. BenchmarkSQL on Postgres was far worse than it is now
a few releases ago [1], and I think that this stats business was a big
factor (on top of everything else). I can clearly see that
autovacuum_vacuum_scale_factor is certainly accidentally protective
with BenchmarkSQL today, in a way that wasn't particularly anticipated
by anybody.

The fact that the intellectual justifications for a lot of these
things are so vague concerns me. For example, why do we apply
autovacuum_vacuum_scale_factor based on reltuples at the end of the
last VACUUM? That aspect of the design will make much less sense once
we have this decoupling in place. Even with the happy accident of
autovacuum_vacuum_insert_scale_factor helping BenchmarkSQL, the
conventional dead tuples based approach to VACUUM still doesn't drive
autovacuum sensibly -- we still systematically undercount LP_DEAD
stubs because (with this workload) they're systemically concentrated
in relatively few heap pages. So if this was a real app, the DBA would
somehow have to work out that they should aggressively tune
autovacuum_vacuum_scale_factor to clean up bloat from updates. I doubt
any DBA could ever figure that out, because it doesn't make any sense.

The problem goes both ways: in addition to undercounting dead tuples,
we effectively overcount, which can lead to autovacuum chasing its own
tail [2].

I think that we could do *way* better than we do today without
enormous effort, and I think that it really matters. Maybe we could
select from a few standard models for autovacuum scheduling using
Bayesian inference -- converge on the more predictive model for a
given table over time, using actual outcomes for each autovacuum. Be
sensitive to how LP_DEAD stub line pointers can become concentrated in
relatively few heap pages, and stuff like that. Maybe keep a little
history to work off of. The problem with the current model is not that
it might be wrong. The problem is that it might *never* be right (for
a given table). The scheduling never learns any lessons, because it's
fundamentally static -- it ought to be dynamic. How things change is
much more informative than where things are at an arbitrary point in
time.

[1] https://www.postgresql.org/message-id/flat/0265f9e2-3e32-e67d-f106-8abde596c0e4%40commandprompt.com
[2] https://postgr.es/m/CAH2-Wz=sJm3tm+FpXbyBhEhX5tbz1trQrhG6eOhYk4-+5uL=ww@mail.gmail.com
--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: "houzj.fnst@fujitsu.com"
Date:
Subject: RE: Column Filtering in Logical Replication
Next
From: Peter Geoghegan
Date:
Subject: Re: decoupling table and index vacuum