Re: Fwd: index corruption in PG 8.3.13 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Fwd: index corruption in PG 8.3.13
Date
Msg-id AANLkTinUaGCL491fu==7h5FYJ-=hV7LqORP8QhHbZLgK@mail.gmail.com
Whole thread Raw
In response to Re: Fwd: index corruption in PG 8.3.13  (Greg Stark <gsstark@mit.edu>)
Responses Re: Fwd: index corruption in PG 8.3.13
List pgsql-hackers
On Wed, Mar 9, 2011 at 7:14 PM, Greg Stark <gsstark@mit.edu> wrote:
> On Wed, Mar 9, 2011 at 11:28 PM, Nikhil Sontakke
> <nikhil.sontakke@enterprisedb.com> wrote:
>> "Other peculiarity in the index file is that we found a lot of zeroed
>> out pages. Blocks from #279 to #518 are all completely zeroed out
>> without any signs of even a page header. Any ideas on how we can get
>> so many zeroed out blocks? Apart from the extend code path, I fail to
>> see any other. And this is an unusually large number of zero pages"
>>
>
> What does stat say for the index data file? Are the Size and Blocks
> values the same (modulo block size)? Or are these blocks actually not
> allocated?
>
> Postgres always forces blocks to be allocated but if they were lost
> due to filesystem corruption maybe they're not allocated any more.

Hmm, that gives me an idea.  What if the index relation were extended
repeatedly, but the blocks are still all zero on disk because they've
never been written out of shared_buffers?  Then, before the next
checkpoint, the system crashes.  All those blocks are now zero on
disk.  In theory, WAL replay should fix most of it, but there could be
portions of the WAL that haven't been flushed; or there could be some
other problem in WAL replay.  For example, it strikes me that this
situation would be consistent with:

1. Somebody inserts a bunch of new tuples into the relation, causing
growth in the index.
2. Before the blocks are written to disk, system crash.
3. pg_resetxlog

But it wouldn't have to be exactly that.  Like what if during replay,
you hit a WAL record that had a bit flipped so it failed the checksum.I think the system would just treat that as
end-of-WALand start up.
 
Now you could have some corruption, and a big chunk of zero blocks in
the index.  Now you go along in normal running, maybe not realizing
that those zero blocks are there, and start extending the relation as
you need to update the index...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Nikhil Sontakke
Date:
Subject: Re: Fwd: index corruption in PG 8.3.13
Next
From: Robert Haas
Date:
Subject: Re: Update of replication/README