Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Date
Msg-id CAH2-WzmTz6sC0CGSyfFosgP5GO7hffq9TznSGQLt0A7tennZLQ@mail.gmail.com
Whole thread Raw
In response to Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations  (Jim Nasby <nasbyj@amazon.com>)
List pgsql-hackers
On Thu, Apr 14, 2022 at 4:19 PM Jim Nasby <nasbyj@amazon.com> wrote:
> > - percentage of non-yet-removable vs removable tuples
>
> This'd give you an idea how bad your long-running-transaction problem is.

VACUUM fundamentally works by removing those tuples that are
considered dead according to an XID-based cutoff established when the
operation begins. And so many very long running VACUUM operations will
see dead-but-not-removable tuples even when there are absolutely no
long running transactions (nor any other VACUUM operations). The only
long running thing involved might be our own long running VACUUM
operation.

I would like to reduce the number of non-removal dead tuples
encountered by VACUUM by "locking in" heap pages that we'd like to
scan up front. This would work by having VACUUM create its own local
in-memory copy of the visibility map before it even starts scanning
heap pages. That way VACUUM won't end up visiting heap pages just
because they were concurrently modified half way through our VACUUM
(by some other transactions). We don't really need to scan these pages
at all -- they have dead tuples, but not tuples that are "dead to
VACUUM".

The key idea here is to remove a big unnatural downside to slowing
VACUUM down. The cutoff would almost work like an MVCC snapshot, that
described precisely the work that VACUUM needs to do (which pages to
scan) up-front. Once that's locked in, the amount of work we're
required to do cannot go up as we're doing it (or it'll be less of an
issue, at least).

It would also help if VACUUM didn't scan pages that it already knows
don't have any dead tuples. The current SKIP_PAGES_THRESHOLD rule
could easily be improved. That's almost the same problem.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Next
From: Andres Freund
Date:
Subject: Re: shared-memory based stats collector - v70