Re: what could cause this PANIC on enterprise 7.3.4 db? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: what could cause this PANIC on enterprise 7.3.4 db?
Date
Msg-id 18379.1068473964@sss.pgh.pa.us
Whole thread Raw
In response to Re: what could cause this PANIC on enterprise 7.3.4 db?  (Andriy Tkachuk <ant@imt.com.ua>)
List pgsql-hackers
Andriy Tkachuk <ant@imt.com.ua> writes:
> On Fri, 7 Nov 2003, Tom Lane wrote:
>> Andriy Tkachuk <ant@imt.com.ua> writes:
>>> Nov  5 20:22:42 monstr postgres[16071]: [3] PANIC:  open of /usr/local/pgsql/data/pg_clog/0040 failed: No such file
ordirectory
 
>> 
>> Could we see ls -l /usr/local/pgsql/data/pg_clog/

> [10:49]/2:ant@monstr:~>sudo ls -al /usr/local/pgsql/data/pg_clog
> total 40
> drwx------    2 pgsql    postgres     4096 Nov  7 03:28 .
> drwx------    6 pgsql    root         4096 Oct 23 10:45 ..
> -rw-------    1 pgsql    postgres    32768 Nov 10 10:47 000D

Okay, given that the file the code was trying to access is nowhere near
the current or past set of valid transaction numbers, it's pretty clear
that what you have is a corrupted transaction number in some tuple's
header.  The odds are that not only the transaction number is affected;
usually when we see something like this, anywhere from dozens to
hundreds of bytes have been replaced by garbage data.

In the cases I've been able to study in the past, the cause seemed to
be faulty hardware or possibly kernel bugs --- for instance someone
recently reported a case where a whole kilobyte of a Postgres file had
been replaced with what seemed to be part of a mail message.  I'd
ascribe that to either a disk drive writing a sector at the wrong place,
or the kernel getting confused about which buffer held which file.
So I'd recommend running some hardware diagnostics and checking to see
if there are errata available for your kernel.

As far as cleaning up the immediate damage is concerned, you'll probably
want to use pg_filedump or some such tool to get a better feeling for
the extent of the damage.  There are descriptions of this process in the
archives --- try looking for recent references to pg_filedump.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: Experimental patch for inter-page delay in VACUUM
Next
From: Bruce Momjian
Date:
Subject: Re: Experimental patch for inter-page delay in VACUUM