Re: Version 7.2.3 unrecoverable crash on missing pg_clog - Mailing list pgsql-bugs

From Tom Lane
Subject Re: Version 7.2.3 unrecoverable crash on missing pg_clog
Date
Msg-id 29844.1042126052@sss.pgh.pa.us
Whole thread Raw
In response to Re: Version 7.2.3 unrecoverable crash on missing pg_clog  (Andy Osborne <andy@sift.co.uk>)
List pgsql-bugs
Andy Osborne <andy@sift.co.uk> writes:
> Tom Lane wrote:
>>> FATAL 2:  open of /u0/pgdata/pg_clog/0726 failed: No such file or directory
>> What range of file names do you actually see in pg_clog?

> Currently 0000 to 00D6. I don't know what it was last night.

Not any greater, for sure.  (FYI, each segment covers one million
transactions.)

> the next backup was running when the database crashed.  Any
> attempt to access the table crashed it again.  I don't know if
> it helps, but a select * from news where <conditional on a field
> with an index) was ok but if the where was not indexed and resulted
> in a table scan, it crashed it.

This is consistent with one page of the table being corrupted.

> While I wouldn't rule out data corruption, the kernel message
> ring has no errors for the md dirver, scsi host adapter or the
> disks, which I would expect if we had bad blocks appearing on a
> disk or somesuch.

Some of the cases that I've seen look like completely unrelated data
(not even Postgres stuff, just bits of text files) was written into
a page of a Postgres table.  This could possibly be a kernel bug,
along the lines of getting confused about which buffer belongs to
which file.  But with no way to reproduce it it's hard to pin blame.

>> You didn't happen to make a physical copy of the news table before
>> dropping it, did you?  It'd be interesting to examine the remains.

> Sadly, no I didn't.  This is one of our live database servers
> and I was under a lot of pressure to get it back quickly.  If
> it does it again, what can I do to provide the most useful
> feedback ?.

If the database isn't unreasonably large, perhaps you could take a
tarball dump of the whole $PGDATA directory tree while the postmaster
is stopped?  That would document the situation for examination at leisure.

            regards, tom lane

pgsql-bugs by date:

Previous
From: Andy Osborne
Date:
Subject: Re: Version 7.2.3 unrecoverable crash on missing pg_clog
Next
From: Andy Osborne
Date:
Subject: Re: Version 7.2.3 unrecoverable crash on missing pg_clog