Re: decoupling table and index vacuum - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: decoupling table and index vacuum |
Date | |
Msg-id | CAH2-Wz=0nQHK3RsOuBrHoU=6hyJWcQr=RdaoYcKdRkc4P7L_uw@mail.gmail.com Whole thread Raw |
In response to | Re: decoupling table and index vacuum (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Getting rid of freezing and hint bits by eagerly vacuuming aborted xacts (was: decoupling table and index vacuum)
|
List | pgsql-hackers |
On Thu, Apr 22, 2021 at 11:16 AM Robert Haas <robertmhaas@gmail.com> wrote: > > My most ambitious goal is finding a way to remove the need to freeze > > or to set hint bits. I think that we can do this by inventing a new > > kind of VACUUM just for aborted transactions, which doesn't do index > > vacuuming. You'd need something like an ARIES-style dirty page table > > to make this cheap -- so it's a little like UNDO, but not very much. > > I don't see how that works. An aborted transaction can have made index > entries, and those index entries can have already been moved by page > splits, and there can be arbitrarily many of them, so that you can't > keep track of them all in RAM. Also, you can crash after making the > index entries and writing them to the disk and before the abort > happens. Anyway, this is probably a topic for a separate thread. This is a topic for a separate thread, but I will briefly address your question. Under the scheme I've sketched, we never do index vacuuming when invoking an autovacuum worker (or something like it) to clean-up after an aborted transaction. We track the pages that all transactions have modified. If a transaction commits then we quickly discard the relevant dirty page table metadata. If a transaction aborts (presumably a much rarer event), then we launch an autovacuum worker that visits precisely those heap blocks that were modified by the aborted transaction, and just prune each page, one by one. We have a cutoff that works a little like relfrozenxid, except that it tracks the point in the XID space before which we know any XIDs (any XIDs that we can read from extant tuple headers) must be committed. The idea of a "Dirty page table" is standard ARIES. It'd be tricky to get it working, but still quite possible. The overall goal of this design is for the system to be able to reason about committed-ness inexpensively (to obviate the need for hint bits and per-tuple freezing). We want to be able to say for sure that almost all heap blocks in the database only contain heap tuples whose headers contain only committed XIDs, or LP_DEAD items that are simply dead (the exact provenance of these LP_DEAD items is not a concern, just like today). The XID cutoff for committed-ness could be kept quite recent due to the fact that aborted transactions are naturally rare. And because we can do relatively little work to "logically roll back" aborted transactions. Note that a heap tuple whose xmin and xmax are committed might also be dead under this scheme, since of course it might have been updated or deleted by an xact that committed. We've effectively decoupled things by making aborted transactions special, and subject to very eager cleanup. I'm sure that there are significant challenges with making something like this work. But to me this design seems roughly the right combination of radical and conservative. -- Peter Geoghegan
pgsql-hackers by date: