Thread: pg_clog corrupt, can't start postgres
Hi, I need some help in bringing back this db please. The partition ran out of space from an import process. I cleared up the space and attempted to start the postgres serviceagain, but it doesn't start and i get following in the message log. HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery. LOG: checkpoint record is at 1B/27F23A6C LOG: redo record is at 1B/27751714; undo record is at 0/0; shutdown FALSE LOG: next transaction ID: 45279762; next OID: 43062083 LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 1B/27751714 PANIC: could not access status of transaction 45514755 DETAIL: could not read from file "/var/lib/pgsql/data/pg_clog/002B" at offset 106496: Success LOG: startup process (PID 23991) was terminated by signal 6 LOG: aborting startup due to startup process failure Postgres is 7.4.1 on this machine. I saw some previous posts on this subject and so far the solution seems to be initialize and restore databases from the dumps. I can live with some aborted transactions, if it's possible to recover somehow. $ psql psql: could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.5432"? # ls -l pg_xlog/ total 131232 -rw------- 1 postgres postgres 16777216 Feb 19 13:30 0000001B00000026 -rw------- 1 postgres postgres 16777216 Feb 19 13:34 0000001B00000027 -rw------- 1 postgres postgres 16777216 Feb 19 13:44 0000001B00000028 -rw------- 1 postgres postgres 16777216 Feb 19 13:15 0000001B00000029 -rw------- 1 postgres postgres 16777216 Feb 19 13:12 0000001B0000002A -rw------- 1 postgres postgres 16777216 Feb 19 13:18 0000001B0000002B -rw------- 1 postgres postgres 16777216 Feb 19 13:26 0000001B0000002C -rw------- 1 postgres postgres 16777216 Feb 19 13:22 0000001B0000002D # ls -l pg_clog/ total 628 -rw------- 1 postgres postgres 262144 Feb 19 04:31 0029 -rw------- 1 postgres postgres 262144 Feb 19 11:55 002A -rw------- 1 postgres postgres 106496 Feb 20 22:34 002B Thanks, Anjan
"Anjan Dave" <adave@vantage.com> writes: > The partition ran out of space from an import process. I cleared up the space and attempted to start the postgres serviceagain, but it doesn't start and i get following in the message log. > PANIC: could not access status of transaction 45514755 > DETAIL: could not read from file "/var/lib/pgsql/data/pg_clog/002B" at offset 106496: Success > LOG: startup process (PID 23991) was terminated by signal 6 > LOG: aborting startup due to startup process failure > Postgres is 7.4.1 on this machine. > I saw some previous posts on this subject and so far the solution seems to be initialize and restore databases from thedumps. Before that, try updating to 7.4.7 (or at least 7.4.2) --- this looks like the same bug fixed here: 2004-01-26 14:16 tgl * src/backend/access/transam/varsup.c (REL7_4_STABLE): Repair incorrect order of operations in GetNewTransactionId(). We must complete ExtendCLOG() before advancing nextXid, so that if that routine fails, the next incoming transaction will try it again. Per trouble report from Christopher Kings-Lynne. You might also go back to the mail list archives from that time and see what advice was given to Chris about getting out of the problem he found himself in. I *think* that something along the line of forcibly appending a page of zeroes to that clog file might be the best solution, but this was more than a year ago and I don't recall for sure. regards, tom lane
Tom, You're the man! I zeroed out the troubled pg_clog file and the db started up fine! Here's the link to the discussion, and a detailed explanationof the issue by Tom: http://groups-beta.google.com/group/comp.databases.postgresql.hackers/browse_thread/thread/c97c853f640b9ac1/d6bc3c75eed6c2a4?q=could+not+access+status+of+transaction#d6bc3c75eed6c2a4 Tom, is the issue resolved after 7.4.1? Thanks, Anjan -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Mon 2/21/2005 11:42 AM To: Anjan Dave Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] pg_clog corrupt, can't start postgres "Anjan Dave" <adave@vantage.com> writes: > The partition ran out of space from an import process. I cleared up the space and attempted to start the postgres serviceagain, but it doesn't start and i get following in the message log. > PANIC: could not access status of transaction 45514755 > DETAIL: could not read from file "/var/lib/pgsql/data/pg_clog/002B" at offset 106496: Success > LOG: startup process (PID 23991) was terminated by signal 6 > LOG: aborting startup due to startup process failure > Postgres is 7.4.1 on this machine. > I saw some previous posts on this subject and so far the solution seems to be initialize and restore databases fromthe dumps. Before that, try updating to 7.4.7 (or at least 7.4.2) --- this looks like the same bug fixed here: 2004-01-26 14:16 tgl * src/backend/access/transam/varsup.c (REL7_4_STABLE): Repair incorrect order of operations in GetNewTransactionId(). We must complete ExtendCLOG() before advancing nextXid, so that if that routine fails, the next incoming transaction will try it again. Per trouble report from Christopher Kings-Lynne. You might also go back to the mail list archives from that time and see what advice was given to Chris about getting out of the problem he found himself in. I *think* that something along the line of forcibly appending a page of zeroes to that clog file might be the best solution, but this was more than a year ago and I don't recall for sure. regards, tom lane
"Anjan Dave" <adave@vantage.com> writes: > Tom, is the issue resolved after 7.4.1? Yes, that's why I told you to update. regards, tom lane