Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune() - Mailing list pgsql-bugs

From Noah Misch
Subject Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
Date
Msg-id 20240108182125.f8.nmisch@google.com
Whole thread Raw
In response to Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-bugs
On Mon, Jan 08, 2024 at 12:02:01PM -0500, Peter Geoghegan wrote:
> On Sat, Jan 6, 2024 at 5:44 PM Noah Misch <noah@leadboat.com> wrote:
> > Tied to that decision is the choice of semantics when the xmin horizon moves
> > backward during one VACUUM, e.g. when a new walsender xmin does so.  Options:
> >
> > 1. Continue to remove tuples based on the OldestXmin from VACUUM's start.  We
> >    could have already removed some of those tuples, so the walsender xmin
> >    won't achieve a guarantee anyway.  (VACUUM would want ratchet-like behavior
> >    in GlobalVisState, possibly by sharing OldestXmin with pruneheap like you
> >    say.)
> >
> > 2. Move OldestXmin backward, to reflect the latest xmin horizon.  (Perhaps
> >    VACUUM would just pass GlobalVisState to a function that returns the
> >    compatible OldestXmin.)
> >
> > Which way is better?
> 
> I suppose that a hybrid of these two approaches makes the most sense.
> A design that's a lot closer to your #1 than to your #2.
> 
> Under this scheme, pruneheap.c would be explicitly aware of
> OldestXmin, and would promise to respect the exact invariant that we
> need to avoid getting stuck in lazy_scan_prune's loop (or avoid
> confused NewRelfrozenXid tracking on HEAD, which no longer has this
> loop). But it'd be limited to that exact invariant; we'd still avoid
> unduly imposing any requirements on pruning-away deleted tuples whose
> xmax was >= OldestXmin. lazy_scan_prune/vacuumlazy.c shouldn't care if
> we prune away any "extra" heap tuples, just because we can (or just
> because it's convenient to the implementation). Andres has in the past
> placed a lot of emphasis on being able to update the
> GlobalVisState-wise bounds on the fly. Not sure that it's really that
> important that VACUUM does that, but there is no reason to not allow
> it. So we can keep that property (as well as the aforementioned
> high-level OldestXmin immutability property).
> 
> More importantly (at least to me), this scheme allows vacuumlazy.c to
> continue to treat OldestXmin as an immutable cutoff for both pruning
> and freezing -- the high level design doesn't need any revisions. We
> already "freeze away" multixact member XIDs >= OldestXmin in certain
> rare cases (i.e. we remove lockers that are determined to no longer be
> running in FreezeMultiXactId's "second pass" slow path), so there is
> nothing fundamentally novel about the idea of removing some extra XIDs
> >= OldestXmin in passing, just because it happens to be convenient to
> some low-level piece of code that's external to vacuumlazy.c.
> 
> What do you think of that general approach?

That all sounds good to me.

> I see no reason why it
> matters if OldestXmin goes backwards across two VACUUM operations, so
> I haven't tried to avoid that.

That may be fully okay, or we may want to clamp OldestXmin to be no older than
relfrozenxid.  I don't feel great about the system moving relfrozenxid
backward unless it observed an older XID, and observing an older XID would be
a corruption signal.  I don't have a specific way non-monotonic relfrozenxid
breaks things, though.



pgsql-bugs by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
Next
From: Peter Geoghegan
Date:
Subject: Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()