Re: decoupling table and index vacuum - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: decoupling table and index vacuum
Date
Msg-id CAH2-Wz=0nQHK3RsOuBrHoU=6hyJWcQr=RdaoYcKdRkc4P7L_uw@mail.gmail.com
Whole thread Raw
In response to Re: decoupling table and index vacuum  (Robert Haas <robertmhaas@gmail.com>)
Responses Getting rid of freezing and hint bits by eagerly vacuuming aborted xacts (was: decoupling table and index vacuum)
List pgsql-hackers
On Thu, Apr 22, 2021 at 11:16 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > My most ambitious goal is finding a way to remove the need to freeze
> > or to set hint bits. I think that we can do this by inventing a new
> > kind of VACUUM just for aborted transactions, which doesn't do index
> > vacuuming. You'd need something like an ARIES-style dirty page table
> > to make this cheap -- so it's a little like UNDO, but not very much.
>
> I don't see how that works. An aborted transaction can have made index
> entries, and those index entries can have already been moved by page
> splits, and there can be arbitrarily many of them, so that you can't
> keep track of them all in RAM. Also, you can crash after making the
> index entries and writing them to the disk and before the abort
> happens. Anyway, this is probably a topic for a separate thread.

This is a topic for a separate thread, but I will briefly address your question.

Under the scheme I've sketched, we never do index vacuuming when
invoking an autovacuum worker (or something like it) to clean-up after
an aborted transaction. We track the pages that all transactions have
modified. If a transaction commits then we quickly discard the
relevant dirty page table metadata. If a transaction aborts
(presumably a much rarer event), then we launch an autovacuum worker
that visits precisely those heap blocks that were modified by the
aborted transaction, and just prune each page, one by one. We have a
cutoff that works a little like relfrozenxid, except that it tracks
the point in the XID space before which we know any XIDs (any XIDs
that we can read from extant tuple headers) must be committed.

The idea of a "Dirty page table" is standard ARIES. It'd be tricky to
get it working, but still quite possible.

The overall goal of this design is for the system to be able to reason
about committed-ness inexpensively (to obviate the need for hint bits
and per-tuple freezing). We want to be able to say for sure that
almost all heap blocks in the database only contain heap tuples whose
headers contain only committed XIDs, or LP_DEAD items that are simply
dead (the exact provenance of these LP_DEAD items is not a concern,
just like today). The XID cutoff for committed-ness could be kept
quite recent due to the fact that aborted transactions are naturally
rare. And because we can do relatively little work to "logically roll
back" aborted transactions.

Note that a heap tuple whose xmin and xmax are committed might also be
dead under this scheme, since of course it might have been updated or
deleted by an xact that committed. We've effectively decoupled things
by making aborted transactions special, and subject to very eager
cleanup.

I'm sure that there are significant challenges with making something
like this work. But to me this design seems roughly the right
combination of radical and conservative.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: RFE: Make statistics robust for unplanned events
Next
From: Tom Stellard
Date:
Subject: Re: Do we work with LLVM 12 on s390x?