On Mon, 2008-11-03 at 10:07 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > VACUUM with a btree index proceeds like this:
> > 1. Scan table
> > 2. Remove rows from btree identified in (1)
> > 3. Remove rows from heap identified in (1)
>
> > The purpose of the additional locking requirements during (2) for btrees
> > is to ensure that we do not fail to find the rows identified in (1),
> > because the rows can move after (1) and during (2) because of block
> > splits.
>
> No, you are missing the point. One purpose of the additional locking
> requirements is to ensure that there is not a concurrent process that
> has read a btree index entry just before you removed it but arrives at
> the heap page only after you removed the heap entry (and, perhaps,
> replaced it with some other row that doesn't match the index entry at
> all). This is clearly still a risk in a hot-standby environment.
OK, I think I get it now. Thanks for putting me straight.
So I will implement the locking-every-page approach discussed upthread.
So I will just keep note of the blocks touched exactly in that order and
store the info accordingly onto the WAL records.
Are you happy with my optimisation that if a page needs to be read in,
we just skip it and pretend we did read-pin-unpin on it? I would
implement that as a new ReadBuffer mode (in Heikki's new API
terminology).
If you know/can see any other missing correctness requirements please
let me know. I've not had trouble understanding any of the other
correctness requirements, but I'll leave it to review to judge whether
I've implemented them all correctly.
-- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support