Re: should vacuum's first heap pass be read-only? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: should vacuum's first heap pass be read-only?
Date
Msg-id CA+TgmoZc=1TRoL1m2v6Uz25TzatNitAfXH6fK06SnhKz7_5wuQ@mail.gmail.com
Whole thread Raw
In response to Re: should vacuum's first heap pass be read-only?  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
On Fri, Feb 4, 2022 at 3:05 PM Greg Stark <stark@mit.edu> wrote:
> Whatever happened to the idea to "rotate" the work of vacuum. So all
> the work of the second pass would actually be deferred until the first
> pass of the next vacuum cycle.
>
> That would also have the effect of eliminating the duplicate work,
> both the  writes with the wal generation as well as the actual scan.
> The only heap scan would be "remove line pointers previously cleaned
> from indexes and prune dead tuples recording them to clean from
> indexes in future". The index scan would remove line pointers and
> record them to be removed from the heap in a future heap scan.

I vaguely remember previous discussions of this, but only vaguely, so
if there are threads on list feel free to send pointers. It seems to
me that in order to do this, we'd need some kind of way of storing the
TIDs that were found to be dead in one VACUUM so that they can be
marked unused in the next VACUUM - and the conveyor belt patches on
which Dilip's work is based provide exactly that machinery, which his
patches then leverage to do exactly that thing. But it feels like a
big, sudden change from the way things work now, and I'm trying to
think of ways to make it more incremental, and thus hopefully less
risky.

> The downside would mainly be in the latency before the actual tuples
> get cleaned up from the table. That is not so much of an issue as far
> as space these days with tuple pruning but is more and more of an
> issue with xid wraparound. Also, having to record the line pointers
> that have been cleaned from indexes somewhere on disk for the
> subsequent vacuum would be extra state on disk and we've learned that
> means extra complexity.

I don't think there's any XID wraparound issue here. Phase 1 does a
HOT prune, after which only dead line pointers remain, not dead
tuples. And those contain no XIDs. Phase 2 is only setting those dead
line pointers back to unused.

As for the other part, that's pretty much exactly the complexity that
I'm worrying about.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: should vacuum's first heap pass be read-only?
Next
From: Robert Haas
Date:
Subject: Re: make MaxBackends available in _PG_init