Thread: Another database recovery problem

Another database recovery problem

From
"Chris White (cjwhite)"
Date:
My 7.2.1 database was in the middle of inserting a large object when the power went down. When I restart the database I getting the following messages:

postgres: startup subprocess   :   DEBUG sql sql sql DEBUG:  database system was interrupted at 2003-10-08 09:29:12 PDT
postgres: startup subprocess   :   DEBUG sql sql sql DEBUG:  checkpoint
postgres: startup subprocess   :   DEBUG sql sql sql DEBUG:  redo record is at 0/115A3C; undo record is at 0/0; shutdown TRUE
postgres: startup subprocess   :   DEBUG sql sql sql DEBUG:  next transaction id: 95; next oid: 16557
postgres: startup subprocess   :   DEBUG sql sql sql DEBUG:  database system was not properly shut down; automatic recovery in progress
postgres: startup subprocess   :   DEBUG sql sql sql DEBUG:  redo starts at 0/115A7C
postgres: startup subprocess   :   DEBUG sql sql sql DEBUG:  ReadRecord: record with zero length at 0/157F4C
postgres: startup subprocess   :   DEBUG sql sql sql DEBUG:  redo done at 0/157F28
postgres: startup subprocess   :   DEBUG sql sql sql DEBUG:  database system is ready
 
 
When I vacuum the database to remove used entries and to compact the db, I get the following errors:

postgres: voicemail aesop 127.0.0.1 VACUUM:   DEBUG sql sql sql DEBUG --Relation pg_largeobject--
postgres: voicemail aesop 127.0.0.1 VACUUM:    INFO sql sql sql NOTICE:  Rel pg_largeobject: TID 1/1: InsertTransactionInProgress 0 - can't shrink relation
postgres: voicemail aesop 127.0.0.1 VACUUM:    INFO sql sql sql NOTICE:  Rel pg_largeobject: TID 1/2: InsertTransactionInProgress 0 - can't shrink relation
postgres: voicemail aesop 127.0.0.1 VACUUM:    INFO sql sql sql NOTICE:  Rel pg_largeobject: TID 1/4: InsertTransactionInProgress 0 - can't shrink relation
postgres: voicemail aesop 127.0.0.1 VACUUM:   CRASH sql sql sql FATAL2:  open of /data/sql/pg_clog/09C6 failed: No such file or directory
postmaster:   DEBUG sql sql sql DEBUG:  server process (pid 1393) exited with exit code 2
postmaster:   DEBUG sql sql sql DEBUG:  terminating any other active server processes
postmaster:   ERROR sql sql sql FATAL 1:  The database system is in recovery mode
 
and from then on any time we try to connect we get a message saying database is in recovery mode and it never comes out. How can I recover this database or do I have to restore a prior backup?
 
Chris White


Re: Another database recovery problem

From
Tom Lane
Date:
"Chris White (cjwhite)" <cjwhite@cisco.com> writes:
> My 7.2.1 database was in the middle of inserting a large object when the
> power went down.

You really, really, really should be on 7.2.4 (if not 7.3.*).  We don't
make dot-releases for amusement's sake --- there are some critical bug
fixes between 7.2.1 and 7.2.4, including some possibly relevant to your
problem.

> postgres: voicemail aesop 127.0.0.1 VACUUM:   CRASH sql sql sql FATAL2:
> open of /data/sql/pg_clog/09C6 failed: No such file or directory

This looks like a corrupted page in (probably) pg_largeobject --- do you
get the same if you try, say, "select count(*) from pg_largeobject"?

> and from then on any time we try to connect we get a message saying
> database is in recovery mode and it never comes out. How can I recover
> this database or do I have to restore a prior backup?

I can't think of any reasonable fix in 7.2.* other than truncating
pg_largeobject, after which you'd need to reload the large objects
somehow.  If you are lucky, the corruption doesn't affect any of the
valid data --- have you tried pg_dumping the database?  A dump and
reload of the particular database would be the most reliable solution.

            regards, tom lane