On Fri, 2008-12-19 at 10:49 +0200, Heikki Linnakangas wrote:
> Whenever a B-tree index scan fetches a heap tuple that turns out to be
> dead, the B-tree item is marked as killed by calling _bt_killitems. When
> the page gets full, all the killed items are removed by calling
> _bt_vacuum_one_page.
>
> That's a problem for hot standby. If any of the killed b-tree items
> point to a tuple that is still visible to a running read-only
> transaction, we have the same situation as with vacuum, and have to
> either wait for the read-only transaction to finish before applying the
> WAL record or kill the transaction.
>
> It looks like there's some cosmetic changes related to that in the
> patch, the signature of _bt_delitems is modified, but there's no actual
> changes that would handle that situation. I didn't see it on the TODO on
> the hot standby wiki either. Am I missing something, or the patch?
ResolveRedoVisibilityConflicts() describes the current patch's position
on this point, which on review is wrong, I agree.
It looks like I assumed that _bt_delitems is only called during VACUUM,
which I knew it wasn't. I know I was going to split XLOG_BTREE_VACUUM
into two record types at one point, one for delete, one for vacuum. In
the end I didn't. Anyhow, its wrong.
We have infrastructure in place to make this work correctly, just need
to add latestRemovedXid field to xl_btree_vacuum. So that part is easily
solved.
Thanks for spotting it. More like that please!
-- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support