On Thu, Aug 02, 2012 at 02:49:45PM -0400, Tom Lane wrote:
> I believe that the SPGiST issue should be fixed as follows:
>
> * It's okay for transactions inserting redirection tuples to use their
> own XID as the marker.
>
> * In spgvacuum.c, the test in vacuumRedirectAndPlaceholder to decide if
> a redirection tuple is recyclable should use
> TransactionIdPrecedes(dt->xid, RecentGlobalXmin)
> rather than comparing against OldestXmin as it does now. (This allows
> some consequent simplification since the module need not pass around
> an OldestXmin parameter.)
>
> * vacuumRedirectAndPlaceholder must also compute the newest XID among
> the redirection tuples it removes from the page, and store that in
> a new field of XLOG_SPGIST_VACUUM_REDIRECT WAL records.
>
> * In spgRedoVacuumRedirect, call ResolveRecoveryConflictWithSnapshot
> with the newest-redirect XID.
All the above looks reasonable, as does the committed patch.
There's an obsolete comment in spg_redo().
> I had first thought that we could avoid a change in WAL contents by
> having spgRedoVacuumRedirect compute the cutoff XID for itself by
> examining the removed tuples, but that doesn't work: XLogInsert might
> for instance have decided to store a full-page image, in which case the
> information isn't available by inspection of the old page contents.
> But we still have to enforce the interlock against hot standby xacts.
We achieve that division of labor for XLOG_BTREE_DELETE by examining the old
contents before RestoreBkpBlocks(). This is safe, I think, because we only
examine the page when the system has running hot standby backends, and we only
allow hot standby connections when recovery has proceeded far enough to fix
all torn and ahead-of-EndRecPtr pages. Note that this decision is a larger
win for XLOG_BTREE_DELETE than it would be for XLOG_SPGIST_VACUUM_REDIRECT.
For XLOG_BTREE_DELETE, finding the cutoff XID could require fetching and
examining hundreds of heap pages. For XLOG_SPGIST_VACUUM_REDIRECT, doing so
just adds a few cycles to an existing page scan. I think the simpler way
you've implemented it is better for the long term.
> In principle we should bump the XLOG page magic number for this change
> in WAL contents, but I'm inclined not to because it'd cause pain for
> beta testers, and there are probably very few people who'd have live
> XLOG_SPGIST_VACUUM_REDIRECT records in play when they switch to beta3.
Seems fair.
Thanks,
nm