Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> On 01/06/2014 03:48 PM, Andres Freund wrote:
>> There just was another case of this reported on IRC by MatheusOl and for
>> some reason in his case I noticed the pertinent details and it quickly
>> clicked:
>> * page 14833 is the one with the error
>> * we're actually vacuuming page 38538
>> * lastBlockVacuumed is 0
>>
>> In btree_xlog_vacuum() we scan all the pages between lastBlockVacuumed
>> and the page vacuumed and acquire a cleanup lock on it. But there isn't
>> any guarantee that the intermediate pages are valid, filled pages,
>> afaics.
> Hmm. So the problem arises if there's an uninitialized page in the
> middle of the b-tree relation for some reason. It's unusual for an
> uninitialized page to be left in the middle of the relation, but it's
> certainly possible, if e.g you crash just after extending the relation.
Right. This diagnosis is incomplete in itself, because if the slave has a
zeroed page there, shouldn't the master have one too? If the master does
have a zeroed page there, how come vacuum didn't fail on the master? The
answer is that btvacuumpage will skip over all-zero pages without doing
anything more than noting them as free in FSM. When btree_xlog_vacuum
rescans the relation, it will also skip over all-zero pages without doing
anything --- but XLogReadBufferExtended logs such a page as invalid, and
then bitches later when it doesn't see the page dropped or truncated away.
>> ISTM we can just use RBM_ZERO_ON_ERROR instead of RBM_NORMAL.
> That'd be horrendously dangerous. It would silently zap any page with
> any error on it. But we could add a new ReadBufferMode that returns
> InvalidBuffer on error, without zeroing the page.
The important point is not just that it not damage the page, but that
it not log it as invalid. I concur that the right fix requires a
new operating mode for XLogReadBufferExtended, perhaps RBM_NORMAL_ZERO_OK.
I think the spec for this should be that if the page doesn't exist or
contains zeroes, we return InvalidBuffer without logging the page number
as invalid. The doesn't-exist case is justified by the expectation that
there will be a later RBM_NORMAL call for a larger page number, which will
result in a suitable complaint if the page range isn't there.
Will go fix this if there's not any objection to that plan.
regards, tom lane