Thread: Unable to explain DB error

Unable to explain DB error

From
Steven Rosenstein
Date:



Postgres V7.3.9-2.

While executing a query in psql, the following error was generated:

vsa=# select * from vsa.dtbl_logged_event_20050318 where id=2689472;
PANIC:  open of /vsa/db/pg_clog/0FC0 failed: No such file or directory
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!#

I checked in the /vsa/db/pg_clog directory, and the files have monotonically increasing filenames starting with 0000.
Themost recent names are: 

-rw-------    1 postgres postgres   262144 Jul 25 21:39 04CA
-rw-------    1 postgres postgres   262144 Jul 26 01:10 04CB
-rw-------    1 postgres postgres   262144 Jul 26 05:39 04CC
-rw-------    1 postgres postgres   262144 Jul 28 00:01 04CD
-rw-------    1 postgres postgres   237568 Jul 28 11:31 04CE

Any idea why Postgres would be looking for a clog file name 0FC0 when the most recent filename is 04CE?

Any help and suggestions for recovery are appreciated.

--- Steve
___________________________________________________________________________________

Steven Rosenstein
IT Architect/Developer | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

"Learn from the mistakes of others because you can't live long enough to
make them all yourself." -- Eleanor Roosevelt


Re: Unable to explain DB error

From
Tom Lane
Date:
Steven Rosenstein <srosenst@us.ibm.com> writes:
> Any idea why Postgres would be looking for a clog file name 0FC0 when the most recent filename is 04CE?

Corrupt data --- specifically a bad transaction number in a tuple
header.  (In practice, this is the first field looked at in which
we can readily detect an error, so you tend to see this symptom for
any serious data corruption situation.  The actual fault may well
be something like a corrupt page header causing the code to follow
"tuple pointers" that point to garbage.)

See the PG list archives for past discussions of dealing with corrupt
data.  pgsql-performance is pretty off-topic for this.

BTW, PG 7.4 and up handle this sort of thing much more gracefully ...
they can't resurrect corrupt data of course, but they tend not to
panic.

            regards, tom lane