Re: xid wraparound danger due to INDEX_CLEANUP false - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: xid wraparound danger due to INDEX_CLEANUP false
Date
Msg-id CAH2-Wzmn3xjWBJ7=4Pmq2skEvHSn7CPiC3abPaqWahaBhp7GPQ@mail.gmail.com
Whole thread Raw
In response to Re: xid wraparound danger due to INDEX_CLEANUP false  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Wed, Apr 29, 2020 at 2:56 PM Andres Freund <andres@anarazel.de> wrote:
> I'm not sure I see the advantage. Only doing so in the freezing case
> seems unlikely to cause additional conflicts, but I'm less sure about
> doing it always. btpo.xact will often be quite recent, right? So it'd
> likely cause more conflicts.

btpo.xact values come from a call to ReadNewTransactionId(). There
pretty much has to be one call to ReadNewTransactionId() per page
deletion (see comments about it within _bt_unlink_halfdead_page()). So
yes, I'd say that it could be very recent.

I don't mind continuing to do the conflicts in _bt_getbuf(), which
would delay it until the point where we really need the page --
especially if we could do that in a way that captures temporal
locality (e.g. your recyclable page chaining idea).

> I don't really see the problem with the check in _bt_getbuf()? I'd
> actually like to be *more* aggressive about putting pages in the FSM (or
> whatever), and that'd probably require checks like this. E.g. whenever
> we unlink a page, we should put it into the FSM (or something different,
> see below). And then do all the necessary checks in _bt_getbuf().

Basically, I would like to have a page state that represents "it
should be impossible for any scan to land on this page, except for
btvacuumscan()". Without relying on 32-bit XIDs, and ideally without
relying on any other state to interpret what it really means. In
principle we can set a deleted page to that state at the earliest
possible point when that becomes true, without it meaning anything
else, or requiring that we do anything else at the same time (e.g.
actually using it for the right half of a page in a page split,
generating recovery conflicts).

> It's pretty sad that one right now basically needs to vacuum twice to
> reuse pages in nbtree (once to delete the page, once to record it in the
> fsm). Usually the xmin horizon should advance much more quickly than
> that, allowing reuse earlier.

Yes, that's definitely bad. I like the general idea of making us more
aggressive with recycling. Marking pages as "recyclable" (not
"recycled") not too long after they first get deleted in VACUUM, and
then separately recycling them in _bt_getbuf() is a cleaner design.
Separation of concerns. That would build confidence in a more
aggressive approach -- we could add lots of cross-checks against
landing on a recyclable page. Note that we have had a bug in this
exact mechanism in the past -- see commit d3abbbebe52.

If there was a bug then we might still land on the page after it gets
fully recycled, in which case the cross-checks won't detect the bug.
But ISTM that we always have a good chance of landing on the page
before that happens, in which case the cross-check complains and we
get a log message, and possibly even a bug report. We don't have to be
truly lucky to see when our approach is buggy when we go on to make
page deletion more aggressive (in whatever way). And we get the same
cross-checks on standbys.

> > It makes sense that the FSM is not crash safe, I suppose, but we're
> > talking about relatively large amounts of free space here. Can't we
> > just do it properly/reliably?
>
> What do you mean by that? To have the FSM be crash-safe?

That, or the equivalent (pretty much your chaining idea) may well be a
good idea. But what I really meant was an explicit "recyclable" page
state. That's all. We may or may not also continue to rely on the FSM
in the same way.

I suppose that we should try to get rid of the FSM in nbtree. I see
the advantages. It's not essential to my proposal, though.

> It could make sense to just not have the FSM, and have a linked-list
> style stack of pages reachable from the meta page.  That'd be especially
> advantageous if we kept xids associated with the "about to be
> recyclable" pages in the metapage, so it'd be cheap to evaluate.

I like that idea. But doesn't that also argue for delaying the
conflicts until we actually recycle a "recyclable" page?

> There's possibly some not-entirely-trivial locking concerns around such
> a list. Adding entries seems easy enough, because we currently only
> delete pages from within vacuum. But popping entries could be more
> complicated, I'm not exactly sure if there are potential lock nesting
> issues (nor am I actually sure there aren't existing ones).

A "recyclable" page state might help with this, too. _bt_getbuf() is a
bag of tricks, even leaving aside generating recovery conflicts.

If we are going to be more eager, then the cost of dirtying the page a
second time to mark it "recyclable" might mostly not matter.
Especially if we chain the pages. That is, maybe VACUUM recomputes
RecentGlobalXmin at the end of its first btvacuumscan() scan (or at
the end of the whole VACUUM operation), when it notices that it is
already possible to mark many pages as "recyclable". Perhaps we won't
write out the page twice much of the time, because it won't have been
that long since VACUUM dirtied the page in order to delete it.

Yeah, we could be a lot more aggressive here, in a bunch of ways. As
I've said quite a few times, it seems like our implementation of "the
drain technique" is way more conservative than it needs to be (i.e. we
use ReadNewTransactionId() without considering any of the specifics of
the index). But if we mess up, we can't expect amcheck to detect the
problems, which would be totally transient. We're talking about
incredibly subtle concurrency bugs. So IMV it's just not going to
happen until the whole thing becomes way less scary.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: "Jonathan S. Katz"
Date:
Subject: Re: Poll: are people okay with function/operator table redesign?
Next
From: Tomas Vondra
Date:
Subject: Re: Remove unnecessary relabel stripping