Thread: FATAL 2: open of /usr/local/pgsql/data/pg_clog/0943 failed

FATAL 2: open of /usr/local/pgsql/data/pg_clog/0943 failed

From

Martins Zarins

Date:

17 February 2003, 13:59:09

I have this error in postgres log
FATAL 2:  open of /usr/local/pgsql/data/pg_clog/0943 failed: No such file or
directory
When it happend i did indexing on big table (~32M rows)
On small selects, inserts, updates all seems to work fine, but when I try to
do select count(*) from big table or intex it, then server crashes and
automaticaly restarts (file name in error message changes from time to time)
Theare is about 3.5GB free disk space.
Postgres version 7.2.1
Theare is no more unusual messages in log.

What happend?
How to fix this problem?
Google didn't help so much :(

Mark
P.S.
Additional info:
last files in /usr/local/pgsql/data/pg_clog are
-rw-------    1 postgres daemon     262144 Jan 31 16:17 0099
-rw-------    1 postgres daemon     262144 Feb  3 09:11 009A
-rw-------    1 postgres daemon     262144 Feb  3 16:26 009B
-rw-------    1 postgres daemon     262144 Feb  4 13:54 009C
-rw-------    1 postgres daemon     262144 Feb  5 11:11 009D
-rw-------    1 postgres daemon     262144 Feb  5 21:32 009E
-rw-------    1 postgres daemon     262144 Feb  7 10:52 009F
-rw-------    1 postgres daemon     262144 Feb  7 14:35 00A0
-rw-------    1 postgres daemon     262144 Feb  9 21:57 00A1
-rw-------    1 postgres daemon     262144 Feb 10 16:29 00A2
-rw-------    1 postgres daemon     262144 Feb 11 13:06 00A3
-rw-------    1 postgres daemon     262144 Feb 11 23:22 00A4
-rw-------    1 postgres daemon     262144 Feb 12 15:53 00A5
-rw-------    1 postgres daemon     262144 Feb 13 13:52 00A6
-rw-------    1 postgres daemon     262144 Feb 14 10:33 00A7
-rw-------    1 postgres daemon     262144 Feb 15 13:27 00A8
-rw-------    1 postgres daemon     262144 Feb 17 12:44 00A9
-rw-------    1 postgres daemon     237568 Feb 17 20:44 00AA
That's all

Re: FATAL 2: open of /usr/local/pgsql/data/pg_clog/0943 failed

From

Tom Lane

Date:

17 February 2003, 14:29:05

Martins Zarins <mark@vestnesis.lv> writes:
> FATAL 2:  open of /usr/local/pgsql/data/pg_clog/0943 failed: No such file or
> directory

You evidently have a row with a corrupted transaction number in that
table.  The system is trying to look up the status of that transaction,
and it's failing because the number is far beyond the actually valid
range of transaction numbers in your database.

Frequently, this failure is just the first detectable symptom of a
completely-corrupted page.  But it might just be the one row that's bad.

If you want to try to narrow down where the corruption is, you can
experiment with commands like
    select ctid,* from big_table offset N limit 1;
This will fail with the clog-open error for all N greater than some
critical value, which you can home in on by trial and error.  Once you
know the largest safe N, the ctid reported for that N tells you a block
number just before the broken tuple or page.  Armed with that, you can
look for trouble using a hex editor or pg_filedump (but I recommend
pg_filedump --- see http://sources.redhat.com/rhdb/tools.html).

If you aren't interested in investigating, you could recover by just
dropping the table and recreating it from backup.  (I hope you have a
backup, as you have certainly lost at least one row and possibly several
pages' worth.)

In any case, it'd be a good idea to run some memory and disk diagnostics
to try to determine what caused the data corruption.

            regards, tom lane

Re: FATAL 2: open of /usr/local/pgsql/data/pg_clog/0943 failed

From

"Björn Metzdorf"

Date:

17 February 2003, 15:31:15

You really need to upgrade to 7.2.3/7.2.4. This issue is fixed in 7.2.3.

To fix this problem, stop postmaster, make a dummy clog containing zeros
(/dev/zero), start postmaster. for security you can dump all data then
(perhaps you need to repeat the clog dummy step) and freshly re-insert it.

Regards,
Bjoern


----- Original Message -----
From: "Martins Zarins" <mark@vestnesis.lv>
To: <pgsql-admin@postgresql.org>
Sent: Monday, February 17, 2003 8:03 PM
Subject: [ADMIN] FATAL 2: open of /usr/local/pgsql/data/pg_clog/0943 failed


I have this error in postgres log
FATAL 2:  open of /usr/local/pgsql/data/pg_clog/0943 failed: No such file or
directory
When it happend i did indexing on big table (~32M rows)
On small selects, inserts, updates all seems to work fine, but when I try to
do select count(*) from big table or intex it, then server crashes and
automaticaly restarts (file name in error message changes from time to time)
Theare is about 3.5GB free disk space.
Postgres version 7.2.1
Theare is no more unusual messages in log.

What happend?
How to fix this problem?
Google didn't help so much :(

Mark
P.S.
Additional info:
last files in /usr/local/pgsql/data/pg_clog are
-rw-------    1 postgres daemon     262144 Jan 31 16:17 0099
-rw-------    1 postgres daemon     262144 Feb  3 09:11 009A
-rw-------    1 postgres daemon     262144 Feb  3 16:26 009B
-rw-------    1 postgres daemon     262144 Feb  4 13:54 009C
-rw-------    1 postgres daemon     262144 Feb  5 11:11 009D
-rw-------    1 postgres daemon     262144 Feb  5 21:32 009E
-rw-------    1 postgres daemon     262144 Feb  7 10:52 009F
-rw-------    1 postgres daemon     262144 Feb  7 14:35 00A0
-rw-------    1 postgres daemon     262144 Feb  9 21:57 00A1
-rw-------    1 postgres daemon     262144 Feb 10 16:29 00A2
-rw-------    1 postgres daemon     262144 Feb 11 13:06 00A3
-rw-------    1 postgres daemon     262144 Feb 11 23:22 00A4
-rw-------    1 postgres daemon     262144 Feb 12 15:53 00A5
-rw-------    1 postgres daemon     262144 Feb 13 13:52 00A6
-rw-------    1 postgres daemon     262144 Feb 14 10:33 00A7
-rw-------    1 postgres daemon     262144 Feb 15 13:27 00A8
-rw-------    1 postgres daemon     262144 Feb 17 12:44 00A9
-rw-------    1 postgres daemon     237568 Feb 17 20:44 00AA
That's all

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org