Where art thou pg_clog? - Mailing list pgsql-general

From Casey Duncan
Subject Where art thou pg_clog?
Date
Msg-id 3D5E68F2-13C1-4F18-8200-85CF4D8D583B@pandora.com
Whole thread Raw
Responses Re: Where art thou pg_clog?  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: Where art thou pg_clog?  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-general
We have a production system with multiple identical database
instances on the same hardware, with the same configuration, running
databases with the exact same schema. They each have different data,
but the database sizes and load patterns are almost exactly the same.

We are running pg 8.1.5 (upgraded the day before 8.1.6 came out, oh
well ;^) and since then we have noticed the following error on two of
the servers:

2007-02-15 00:35:03.324 PST ERROR:  could not access status of
transaction 2565134864
2007-02-15 00:35:03.325 PST DETAIL:  could not open file "pg_clog/
098E": No such file or directory

The first time this happened, I chalked it up to some kind of disk
corruption based on the mailing list archives. So I dumped the
databases, did a fresh initdb, forced an fsck (these run with a jfs
data partition and an ext2 wal partition) which found no problems and
then reloaded the databases.

Now about a week later I see the same problem on different server. We
never saw this problem running 8.1.3 on these same machines over many
months, so I'm beginning to get suspect that something we changed
since running 8.1.3 is to blame. Before the upgrade these systems ran
postgres 8.1.3 and slony 1.1.5. Now they run postgres 8.1.5 and slony
1.2.6 (I don't know that the slony version is important, I add it
here for completeness). Nothing else important has changed on these
boxes. I see the 8.1.8 is out now, though nothing I see in the
release notes seems relevant to this issue.

Here are some specific things I'd like to know:

1. Is it possible to "fix" this problem without an dumpall/initdb/
restore. That takes many hours and can only be done when I'm supposed
to be at home relaxing (yeah right) ;^) FWIW, the system is
functioning fine right now from what I can tell, save the above
errors in the log every few minutes.

2. What more info can I give to figure out the "cause" of this. Are
there files I can inspect to find out more?

3. Is it possible that this is a side-affect of the upgrade to 8.1.5?

Thanks for any insights,

-Casey

pgsql-general by date:

Previous
From: Ron Johnson
Date:
Subject: Re: backup database by cloning itself
Next
From: Guido Neitzer
Date:
Subject: Re: Database performance comparison paper.