Re: xid wraparound danger due to INDEX_CLEANUP false - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: xid wraparound danger due to INDEX_CLEANUP false
Date
Msg-id CAH2-WzmAq5hWPEHb4s5cO+c+vYNpn-Ez3ZExUPV2FCD7TuorCA@mail.gmail.com
Whole thread Raw
In response to Re: xid wraparound danger due to INDEX_CLEANUP false  (Andres Freund <andres@anarazel.de>)
Responses Re: xid wraparound danger due to INDEX_CLEANUP false  (Peter Geoghegan <pg@bowt.ie>)
Re: xid wraparound danger due to INDEX_CLEANUP false  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Wed, Apr 29, 2020 at 12:54 PM Andres Freund <andres@anarazel.de> wrote:
> > Fundamentally, btvacuumpage() doesn't freeze 32-bit XIDs (from
> > bpto.xact) when it recycles deleted pages. It simply puts them in the
> > FSM without changing anything about the page itself. This means
> > surprisingly little in the context of nbtree: the
> > _bt_page_recyclable() XID check that takes place in btvacuumpage()
> > also takes place in _bt_getbuf(), at the point where the page actually
> > gets recycled by the client. That's not great.
>
> I think it's quite foolish for btvacuumpage() to not freeze xids. If we
> only do so when necessary (i.e. older than a potential new relfrozenxid,
> and only when the vacuum didn't yet skip pages), the costs are pretty
> miniscule.

I wonder if we should just bite the bullet and mark pages that are
recycled by VACUUM as explicitly recycled, with WAL-logging and
everything (this is like freezing, but stronger). That way, the
_bt_page_recyclable() call within _bt_getbuf() would only be required
to check that state (while btvacuumpage() would use something like a
_bt_page_eligible_for_recycling(), which would do the same thing as
the current _bt_page_recyclable()).

I'm not sure how low the costs would be, but at least we'd only have
to do it once per already-deleted page (i.e. only the first time a
VACUUM operation found _bt_page_eligible_for_recycling() returned true
for the page and marked it recycled in a crash safe manner). That
design would be quite a lot simpler, because it expresses the problem
in terms that make sense to the nbtree code. _bt_getbuf() should not
have to make a distinction between "possibly recycled" versus
"definitely recycled".

It makes sense that the FSM is not crash safe, I suppose, but we're
talking about relatively large amounts of free space here. Can't we
just do it properly/reliably?

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Setting min/max TLS protocol in clientside libpq
Next
From: Corey Huinker
Date:
Subject: Re: Proposing WITH ITERATIVE