On Tue, Apr 24, 2012 at 09:08:36AM +0100, Simon Riggs wrote:
> On Sat, Apr 21, 2012 at 5:52 PM, Noah Misch <noah@leadboat.com> wrote:
> > The fix is to compare the stored XID to RecentGlobalXmin, not RecentXmin. ?We
> > already use RecentGlobalXmin when wal_level = hot_standby. ?If no running
> > transaction has an XID and all running transactions began since the last
> > transaction that did bear an XID, RecentGlobalXmin == ReadNewTransactionId().
> > Therefore, the correct test is btpo.xact < RecentGlobalXmin, not btpo.xact <=
> > RecentGlobalXmin as we have today. ?This also cleanly removes the need for the
> > bit of code in _bt_getbuf() that decrements btpo.xact before sending it down
> > for ResolveRecoveryConflictWithSnapshot(). ?I suggested[2] that decrement on
> > an unprincipled basis; it was just masking the off-by-one of using "<=
> > RecentGlobalXmin" instead of "< RecentGlobalXmin" in _bt_page_recyclable().
>
> Looks like the right fix. I'll apply this to 9.0/9.1/HEAD.
Thanks. 8.3 and 8.4 need it, too, albeit simplified due to some of the
affected code not existing yet.
> > This change makes empty B-tree pages wait through two generations of running
> > transactions before reuse, so some additional bloat will arise.
>
> Could arise, in some circumstances. But that assumes VACUUMs are
> fairly frequent and that they would be delayed/rendered less effective
> by this, which I don't think will be the case.
>
> I note that we don't take any account of the number of pages that may
> be reused when we VACUUM, so when HOT avoids a VACUUM we may
> accumulate pages for a longer period. Looks like there is more work to
> be done yet in cleaning indexes.
Agreed. Taking one VACUUM visit to mark the page BTP_DELETED and another to
add it to the FSM is fairly pessimistic; perhaps we can do better.
nm