Re: fatal error in database - Mailing list pgsql-general

From Tom Lane
Subject Re: fatal error in database
Date
Msg-id 21238.1038426130@sss.pgh.pa.us
Whole thread Raw
In response to fatal error in database  ("Johnson, Shaunn" <SJohnson6@bcbsm.com>)
List pgsql-general
"Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:
> Running PostgreSQL 7.2.1 on RedHat Linux 7.2.

> I'm having a problem trying to identify some of the causes
> for the following errors:

> [snip]
> test=> select count (*) from t_testob;
> FATAL 2:  open of /raid/pgsql/data/pg_clog/0373 failed: No such file or
> directory
> server closed the connection unexpectedly

This probably means corrupted data in your t_testob table: the system
is trying to determine the commit status of a bogus transaction number
(I'm assuming that the file names present in pg_clog/ are nowhere near
0373?).  This is frequently the first visible failure when trying to
read a completely trashed disk page.

As to how the page got trashed, it could have been Postgres' fault,
but I'm inclined to suspect a disk-hardware or kernel mistake instead.
Are you up2date on your kernel version?

You can probably find the broken page by looking through t_testob
with a tool like pg_filedump (should be available from
http://sources.redhat.com/rhdb --- would give you an exact URL except
I can't seem to reach that site right now).  Look for pages that have
page header or item header fields obviously different from the rest;
you don't usually have to know much about what you're reading to spot
the one that ain't like the others.

If the page seems to be completely trashed, which is the usual situation
in the cases I've looked at personally, your best bet is to zero it out;
this loses the rows that were on that page but makes the rest of the
table usable again.  You can do that with something like
dd bs=8K seek=<target page number> count=1 if=/dev/zero of=<target file>
while the postmaster is shut down.  (I strongly advise making a backup
copy of the target file first, in case you make a mistake ...)

BTW, you should definitely upgrade to 7.2.3.  There are serious known
bugs in 7.2.1 (that's why we put out update releases).

            regards, tom lane

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: How was my PG compiled
Next
From: Bruce Momjian
Date:
Subject: Re: Two features left