Re: FATAL 2: open of /usr/local/pgsql/data/pg_clog/0943 failed - Mailing list pgsql-admin

From Tom Lane
Subject Re: FATAL 2: open of /usr/local/pgsql/data/pg_clog/0943 failed
Date
Msg-id 10148.1045510092@sss.pgh.pa.us
Whole thread Raw
In response to FATAL 2: open of /usr/local/pgsql/data/pg_clog/0943 failed  (Martins Zarins <mark@vestnesis.lv>)
List pgsql-admin
Martins Zarins <mark@vestnesis.lv> writes:
> FATAL 2:  open of /usr/local/pgsql/data/pg_clog/0943 failed: No such file or
> directory

You evidently have a row with a corrupted transaction number in that
table.  The system is trying to look up the status of that transaction,
and it's failing because the number is far beyond the actually valid
range of transaction numbers in your database.

Frequently, this failure is just the first detectable symptom of a
completely-corrupted page.  But it might just be the one row that's bad.

If you want to try to narrow down where the corruption is, you can
experiment with commands like
    select ctid,* from big_table offset N limit 1;
This will fail with the clog-open error for all N greater than some
critical value, which you can home in on by trial and error.  Once you
know the largest safe N, the ctid reported for that N tells you a block
number just before the broken tuple or page.  Armed with that, you can
look for trouble using a hex editor or pg_filedump (but I recommend
pg_filedump --- see http://sources.redhat.com/rhdb/tools.html).

If you aren't interested in investigating, you could recover by just
dropping the table and recreating it from backup.  (I hope you have a
backup, as you have certainly lost at least one row and possibly several
pages' worth.)

In any case, it'd be a good idea to run some memory and disk diagnostics
to try to determine what caused the data corruption.

            regards, tom lane

pgsql-admin by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: best way to make incremental back-ups?
Next
From: Bruce Momjian
Date:
Subject: Re: Authentication using NIS password database