Re: B-tree page deletion boundary cases - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: B-tree page deletion boundary cases
Date
Msg-id CA+U5nM+Ra=3X269jC6x4fr4dVYj=rBAOS=y9FwO_kmhBuh5eCA@mail.gmail.com
Whole thread Raw
In response to B-tree page deletion boundary cases  (Noah Misch <noah@leadboat.com>)
Responses Re: B-tree page deletion boundary cases  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
On Sat, Apr 21, 2012 at 5:52 PM, Noah Misch <noah@leadboat.com> wrote:

> As I mentioned[1] peripherally back in November, that algorithm has been
> insufficient since the introduction of non-XID-bearing transactions in
> PostgreSQL 8.3.  Such transactions do not restrain RecentXmin.  If no running
> transaction has an XID, RecentXmin == ReadNewTransactionId() and the page
> incorrectly becomes available for immediate reuse.

Good observation.

> The fix is to compare the stored XID to RecentGlobalXmin, not RecentXmin.  We
> already use RecentGlobalXmin when wal_level = hot_standby.  If no running
> transaction has an XID and all running transactions began since the last
> transaction that did bear an XID, RecentGlobalXmin == ReadNewTransactionId().
> Therefore, the correct test is btpo.xact < RecentGlobalXmin, not btpo.xact <=
> RecentGlobalXmin as we have today.  This also cleanly removes the need for the
> bit of code in _bt_getbuf() that decrements btpo.xact before sending it down
> for ResolveRecoveryConflictWithSnapshot().  I suggested[2] that decrement on
> an unprincipled basis; it was just masking the off-by-one of using "<=
> RecentGlobalXmin" instead of "< RecentGlobalXmin" in _bt_page_recyclable().

Looks like the right fix. I'll apply this to 9.0/9.1/HEAD.

> This change makes empty B-tree pages wait through two generations of running
> transactions before reuse, so some additional bloat will arise.

Could arise, in some circumstances. But that assumes VACUUMs are
fairly frequent and that they would be delayed/rendered less effective
by this, which I don't think will be the case.

I note that we don't take any account of the number of pages that may
be reused when we VACUUM, so when HOT avoids a VACUUM we may
accumulate pages for a longer period. Looks like there is more work to
be done yet in cleaning indexes.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Boszormenyi Zoltan
Date:
Subject: Re: PL/PGSQL bug in handling composite types
Next
From: Michael Meskes
Date:
Subject: Re: ECPG FETCH readahead