Re: xid wraparound danger due to INDEX_CLEANUP false - Mailing list pgsql-hackers

From Andres Freund
Subject Re: xid wraparound danger due to INDEX_CLEANUP false
Date
Msg-id 20200429215559.yuzhunlyyqkmxhws@alap3.anarazel.de
Whole thread Raw
In response to Re: xid wraparound danger due to INDEX_CLEANUP false  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: xid wraparound danger due to INDEX_CLEANUP false  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
Hi,

On 2020-04-29 13:40:55 -0700, Peter Geoghegan wrote:
> On Wed, Apr 29, 2020 at 12:54 PM Andres Freund <andres@anarazel.de> wrote:
> > > Fundamentally, btvacuumpage() doesn't freeze 32-bit XIDs (from
> > > bpto.xact) when it recycles deleted pages. It simply puts them in the
> > > FSM without changing anything about the page itself. This means
> > > surprisingly little in the context of nbtree: the
> > > _bt_page_recyclable() XID check that takes place in btvacuumpage()
> > > also takes place in _bt_getbuf(), at the point where the page actually
> > > gets recycled by the client. That's not great.
> >
> > I think it's quite foolish for btvacuumpage() to not freeze xids. If we
> > only do so when necessary (i.e. older than a potential new relfrozenxid,
> > and only when the vacuum didn't yet skip pages), the costs are pretty
> > miniscule.
>
> I wonder if we should just bite the bullet and mark pages that are
> recycled by VACUUM as explicitly recycled, with WAL-logging and
> everything (this is like freezing, but stronger). That way, the
> _bt_page_recyclable() call within _bt_getbuf() would only be required
> to check that state (while btvacuumpage() would use something like a
> _bt_page_eligible_for_recycling(), which would do the same thing as
> the current _bt_page_recyclable()).

I'm not sure I see the advantage. Only doing so in the freezing case
seems unlikely to cause additional conflicts, but I'm less sure about
doing it always. btpo.xact will often be quite recent, right? So it'd
likely cause more conflicts.


> I'm not sure how low the costs would be, but at least we'd only have
> to do it once per already-deleted page (i.e. only the first time a
> VACUUM operation found _bt_page_eligible_for_recycling() returned true
> for the page and marked it recycled in a crash safe manner). That
> design would be quite a lot simpler, because it expresses the problem
> in terms that make sense to the nbtree code. _bt_getbuf() should not
> have to make a distinction between "possibly recycled" versus
> "definitely recycled".

I don't really see the problem with the check in _bt_getbuf()? I'd
actually like to be *more* aggressive about putting pages in the FSM (or
whatever), and that'd probably require checks like this. E.g. whenever
we unlink a page, we should put it into the FSM (or something different,
see below). And then do all the necessary checks in _bt_getbuf().

It's pretty sad that one right now basically needs to vacuum twice to
reuse pages in nbtree (once to delete the page, once to record it in the
fsm). Usually the xmin horizon should advance much more quickly than
that, allowing reuse earlier.

As far as I can tell, even just adding them to the FSM when setting
ISDELETED would be advantageous. There's obviously that we'll cause
backends to visit a lot of pages that can't actually be reused... But if
we did what I suggest below, that danger probably could largely be
avoided.


> It makes sense that the FSM is not crash safe, I suppose, but we're
> talking about relatively large amounts of free space here. Can't we
> just do it properly/reliably?

What do you mean by that? To have the FSM be crash-safe?

It could make sense to just not have the FSM, and have a linked-list
style stack of pages reachable from the meta page.  That'd be especially
advantageous if we kept xids associated with the "about to be
recyclable" pages in the metapage, so it'd be cheap to evaluate.

There's possibly some not-entirely-trivial locking concerns around such
a list. Adding entries seems easy enough, because we currently only
delete pages from within vacuum. But popping entries could be more
complicated, I'm not exactly sure if there are potential lock nesting
issues (nor am I actually sure there aren't existing ones).

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: "Jonah H. Harris"
Date:
Subject: Re: Proposing WITH ITERATIVE
Next
From: "Jonah H. Harris"
Date:
Subject: Re: Proposing WITH ITERATIVE