On Tue, 2009-06-09 at 16:17 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > A couple of people in recent years have had a problem with "page X is
> > uninitialised -- fixing" messages.
>
> > I have a case now with 569357 consecutive pages that required fixing in
> > pg_attribute. We looked at pages by hand and they really are
> > uninitialised, but otherwise what we would expect for size, name etc..
>
> > Clearly this is way too many pages to be easily explainable.
>
> It's probably too late to tell now, but I wonder if those pages actually
> existed or were just a "hole" in the file. A perhaps-plausible
> mechanism for them to appear is that the FSM spits out some ridiculously
> large page number as being the next place to insert something into
> pg_attribute, the system plops down a new tuple into that page, and
> behold you have a large hole that reads as zeroes.
>
> Another interesting question is whether the range began or ended at a
> 1GB segment boundary, in which case something in or around the
> segmenting logic could be at fault. (Hmm ... actually 1GB is only
> 131072 pages anyway, so your "hole" definitely spanned several segments.
> That seems like the next place to look.)
The "hole" started about 0.75GB in file 0 and spanned 4 complete 1GB
segments before records started again in file 5. The "hole" segments
were all 1GB in size, and the pages either size of the hole were
undamaged.
A corrupt record of a block number would do this in XLogReadBuffer() if
we had full page writes enabled. But it would have to be corrupt between
setting it correctly and the CRC check on the WAL record. Which is a
fairly small window of believability.
Should there be a sanity check on how far a relation can be extended in
recovery?
Not sure if that would work with normal mode ReadBuffer() - it should
fail somewhere in smgr or in bufmgr.
-- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support