Thread: 8.1.3 Crash/Corruption Problem

8.1.3 Crash/Corruption Problem

From
"Chris Hoover"
Date:
Over the weekend, I suffered from major database corruption on my 8.1.3 database.  Fortunetly, this was my development/non-production box so I did not have to spend my entire weekend getting it back online, but I did spend a major portion of it.  I ended up having to do a pitr recovery of the system.  I don't understand what happened to the system.  Please help explain what happened.  Also, would there have been another option to get the system back online w/o the full recovery?

Thanks,

Chris


Here is the log of the crash:

,19684,,2007-01-06 03:20:04.542 EST,,70594,,,,DEBUG:  server process (PID 9375) exited with exit code 0
,19684,,2007-01-06 03:20:04.547 EST,,70595,,,,DEBUG:  forked new backend, pid=9380 socket=8
[unknown],9380,[unknown],2007-01-06 03:20:04.547 EST,459f5bb4.24a4,1,2007-01-06 03:20:04 EST,0,/usr/bin/postmaster,LOG:  connection received: host=[local]
dba_stats,9380,postgres,2007-01-06 03:20:04.547 EST,459f5bb4.24a4 ,2,2007-01-06 03:20:04 EST,0,authentication,LOG:  connection authorized: user=postgres database=dba_stats
dba_stats,9380,postgres,2007-01-06 03:20:04.549 EST,459f5bb4.24a4,3,2007-01-06 03:20:04 EST,322692053,idle,LOG:  statement: insert into db_transactions values (' 192.168.1.73','d45','0','79443','5','1769025','14411089','01/06/2007 03:20:01');
dba_stats,9380,postgres,2007-01-06 03:20:04.551 EST,459f5bb4.24a4,4,2007-01-06 03:20:04 EST,0,INSERT,LOG:  duration: 1.974 ms
dba_stats,9380,postgres,2007-01-06 03:20:04.551 EST,459f5bb4.24a4,5,2007-01-06 03:20:04 EST,0,INSERT,LOG:  duration: 1.974 ms  statement: insert into db_transactions values (' 192.168.1.73','d45','0','79443','5','1769025','14411089','01/06/2007 03:20:01');
dba_stats,9380,postgres,2007-01-06 03:20:04.551 EST,459f5bb4.24a4,6,2007-01-06 03:20:04 EST,0,idle,LOG:  disconnection: session time: 0:00:00.00 user=postgres database=dba_stats host=[local]
,19684,,2007-01-06 03:20:04.552 EST,,70596,,,,DEBUG:  server process (PID 9380) exited with exit code 0
[unknown],9385,[unknown],2007-01-06 03:20:04.557 EST,459f5bb4.24a9,1,2007-01-06 03:20:04 EST,0,/usr/bin/postmaster,LOG:  connection received: host=[local]
dba_stats,9385,postgres,2007-01-06 03:20:04.557 EST,459f5bb4.24a9 ,2,2007-01-06 03:20:04 EST,0,authentication,LOG:  connection authorized: user=postgres database=dba_stats
,19684,,2007-01-06 03:20:04.558 EST,,70597,,,,DEBUG:  forked new backend, pid=9385 socket=8
dba_stats,9385,postgres,2007-01-06 03:20: 04.560 EST,459f5bb4.24a9,3,2007-01-06 03:20:04 EST,322692055,idle,LOG:  statement: insert into db_transactions values ('192.168.1.73','d46','0','116796','112','7246902','23447082','01/06/2007 03:20:01');
dba_stats,9385,postgres,2007-01-06 03:20:04.561 EST,459f5bb4.24a9,4,2007-01-06 03:20:04 EST,322692055,INSERT,PANIC:  right sibling's left-link doesn't match
dba_stats,9385,postgres,2007-01-06 03:20:04.561 EST, 459f5bb4.24a9,5,2007-01-06 03:20:04 EST,322692055,INSERT,STATEMENT:  insert into db_transactions values ('192.168.1.73','d46','0','116796','112','7246902','23447082','01/06/2007 03:20:01');
,19684,,2007-01-06 03:20:04.562 EST,,70598,,,,DEBUG:  server process (PID 9385) was terminated by signal 6
,19684,,2007-01-06 03:20:04.562 EST,,70599,,,,LOG:  server process (PID 9385) was terminated by signal 6
,19684,,2007-01-06 03:20:04.562 EST,,70600,,,,LOG:  terminating any other active server processes
,19684,,2007-01-06 03:20:04.562 EST,,70601,,,,DEBUG:  sending SIGQUIT to process 8277
,19684,,2007-01-06 03:20:04.562 EST,,70602,,,,DEBUG:  sending SIGQUIT to process 8008
,19684,,2007-01-06 03:20:04.562 EST,,70603,,,,DEBUG:  sending SIGQUIT to process 8227
,19684,,2007-01-06 03:20:04.562 EST,,70604,,,,DEBUG:  sending SIGQUIT to process 8275
sys_logon,8277,postgres,2007-01-06 03:20:04.562 EST,459f5ad5.2055,13,2007-01-06 03:16:21 EST,0,idle,WARNING:  terminating connection because of crash of another server process
sys_logon,8277,postgres,2007-01-06 03:20: 04.562 EST,459f5ad5.2055,14,2007-01-06 03:16:21 EST,0,idle,DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
sys_logon,8277,postgres,2007-01-06 03:20:04.562 EST,459f5ad5.2055,15,2007-01-06 03:16:21 EST,0,idle,HINT:  In a moment you should be able to reconnect to the database and repeat your command.
,19684,,2007-01-06 03:20: 04.564 EST,,70605,,,,DEBUG:  server process (PID 8277) exited with exit code 1
,19684,,2007-01-06 03:20:04.564 EST,,70606,,,,LOG:  all server processes terminated; reinitializing
,9390,,2007-01-06 03:20:04.603 EST,,1,,,,LOG:  database system was interrupted at 2007-01-06 03:15:04 EST
,9390,,2007-01-06 03:20:04.603 EST,,2,,,,LOG:  checkpoint record is at 299/3856B2DC
,9390,,2007-01-06 03:20:04.603 EST,,3,,,,LOG:  redo record is at 299/3856B2DC; undo record is at 0/0; shutdown TRUE
,9390,,2007-01-06 03:20: 04.603 EST,,4,,,,LOG:  next transaction ID: 322691436; next OID: 79454488
,9390,,2007-01-06 03:20:04.603 EST,,5,,,,LOG:  next MultiXactId: 9; next MultiXactOffset: 17
,9390,,2007-01-06 03:20:04.603 EST,,6,,,,LOG:  database system was not properly shut down; automatic recovery in progress
[unknown],9391,[unknown],2007-01-06 03:20:04.604 EST,459f5bb4.24af,1,2007-01-06 03:20:04 EST,0,/usr/bin/postmaster,LOG:  connection received: host=[local]
dba_stats,9391,postgres,2007-01-06 03:20:04.604 EST,459f5bb4.24af ,2,2007-01-06 03:20:04 EST,0,/usr/bin/postmaster,FATAL:  the database system is starting up
,19684,,2007-01-06 03:20:04.604 EST,,70607,,,,DEBUG:  forked new backend, pid=9391 socket=8
,19684,,2007-01-06 03:20:04.605 EST,,70608,,,,DEBUG:  server process (PID 9391) exited with exit code 0
,9390,,2007-01-06 03:20:04.607 EST,,7,,,,LOG:  redo starts at 299/3856B320
[unknown],9396,[unknown],2007-01-06 03:20:04.612 EST,459f5bb4.24b4,1,2007-01-06 03:20:04 EST,0,/usr/bin/postmaster,LOG:  connection received: host=[local]
dba_stats,9396,postgres,2007-01-06 03:20:04.612 EST,459f5bb4.24b4,2,2007-01-06 03:20:04 EST,0,/usr/bin/postmaster,FATAL:  the database system is starting up
,19684,,2007-01-06 03:20:04.614 EST,,70609,,,,DEBUG:  forked new backend, pid=9396 socket=8
,19684,,2007-01-06 03:20:04.614 EST,,70610,,,,DEBUG:  server process (PID 9396) exited with exit code 0
[unknown],9401,[unknown],2007-01-06 03:20:04.620 EST,459f5bb4.24b9,1,2007-01-06 03:20:04 EST,0,/usr/bin/postmaster,LOG:  connection received: host=[local]
dba_stats,9401,postgres,2007-01-06 03:20:04.620 EST,459f5bb4.24b9,2,2007-01-06 03:20:04 EST,0,/usr/bin/postmaster,FATAL:  the database system is starting up
,19684,,2007-01-06 03:20:04.622 EST,,70611,,,,DEBUG:  forked new backend, pid=9401 socket=8
,19684,,2007-01-06 03:20:04.622 EST,,70612,,,,DEBUG:  server process (PID 9401) exited with exit code 0
,9390,,2007-01-06 03:20:04.622 EST,,8,,,,LOG:  record with zero length at 299/386E998C
,9390,,2007-01-06 03:20: 04.622 EST,,9,,,,LOG:  redo done at 299/386E9964
[unknown],9406,[unknown],2007-01-06 03:20:04.628 EST,459f5bb4.24be,1,2007-01-06 03:20:04 EST,0,/usr/bin/postmaster,LOG:  connection received: host=[local]
dba_stats,9406,postgres,2007-01-06 03:20: 04.629 EST,459f5bb4.24be,2,2007-01-06 03:20:04 EST,0,/usr/bin/postmaster,FATAL:  the database system is starting up
,19684,,2007-01-06 03:20:04.630 EST,,70613,,,,DEBUG:  forked new backend, pid=9406 socket=8
,19684,,2007-01-06 03:20: 04.630 EST,,70614,,,,DEBUG:  server process (PID 9406) exited with exit code 0
[unknown],9411,[unknown],2007-01-06 03:20:04.636 EST,459f5bb4.24c3,1,2007-01-06 03:20:04 EST,0,/usr/bin/postmaster,LOG:  connection received: host=[local]
dba_stats,9411,postgres,2007-01-06 03:20:04.636 EST,459f5bb4.24c3,2,2007-01-06 03:20:04 EST,0,/usr/bin/postmaster,FATAL:  the database system is starting up
,19684,,2007-01-06 03:20:04.638 EST,,70615,,,,DEBUG:  forked new backend, pid=9411 socket=8
,19684,,2007-01-06 03:20:04.638 EST,,70616,,,,DEBUG:  server process (PID 9411) exited with exit code 0
[unknown],9416,[unknown],2007-01-06 03:20:04.643 EST,459f5bb4.24c8,1,2007-01-06 03:20:04 EST,0,/usr/bin/postmaster,LOG:  connection received: host=[local]
dba_stats,9416,postgres,2007-01-06 03:20:04.644 EST,459f5bb4.24c8,2,2007-01-06 03:20:04 EST,0,/usr/bin/postmaster,FATAL:  the database system is starting up
,19684,,2007-01-06 03:20:04.645 EST,,70617,,,,DEBUG:  forked new backend, pid=9416 socket=8
,19684,,2007-01-06 03:20:04.645 EST,,70618,,,,DEBUG:  server process (PID 9416) exited with exit code 0
[unknown],9421,[unknown],2007-01-06 03:20:04.652 EST,459f5bb4.24cd,1,2007-01-06 03:20:04 EST,0,/usr/bin/postmaster,LOG:  connection received: host=[local]
dba_stats,9421,postgres,2007-01-06 03:20:04.652 EST,459f5bb4.24cd,2,2007-01-06 03:20:04 EST,0,/usr/bin/postmaster,FATAL:  the database system is starting up
,19684,,2007-01-06 03:20:04.655 EST,,70619,,,,DEBUG:  forked new backend, pid=9421 socket=8
,19684,,2007-01-06 03:20:04.655 EST,,70620,,,,DEBUG:  server process (PID 9421) exited with exit code 0
[unknown],9426,[unknown],2007-01-06 03:20:04.660 EST,459f5bb4.24d2,1,2007-01-06 03:20:04 EST,0,/usr/bin/postmaster,LOG:  connection received: host=[local]
dba_stats,9426,postgres,2007-01-06 03:20:04.661 EST,459f5bb4.24d2,2,2007-01-06 03:20:04 EST,0,/usr/bin/postmaster,FATAL:  the database system is starting up
,19684,,2007-01-06 03:20:04.662 EST,,70621,,,,DEBUG:  forked new backend, pid=9426 socket=8
,19684,,2007-01-06 03:20:04.662 EST,,70622,,,,DEBUG:  server process (PID 9426) exited with exit code 0
[unknown],9431,[unknown],2007-01-06 03:20:04.668 EST,459f5bb4.24d7,1,2007-01-06 03:20:04 EST,0,/usr/bin/postmaster,LOG:  connection received: host=[local]
dba_stats,9431,postgres,2007-01-06 03:20:04.668 EST,459f5bb4.24d7,2,2007-01-06 03:20:04 EST,0,/usr/bin/postmaster,FATAL:  the database system is starting up
,9390,,2007-01-06 03:20:04.669 EST,,10,,,,PANIC:  failed to re-find parent key in "3000108689"
,19684,,2007-01-06 03:20:04.669 EST,,70623,,,,DEBUG:  forked new backend, pid=9431 socket=8
,19684,,2007-01-06 03:20:04.669 EST,,70624,,,,DEBUG:  server process (PID 9431) exited with exit code 0
,19684,,2007-01-06 03:20: 04.673 EST,,70625,,,,LOG:  startup process (PID 9390) was terminated by signal 6
,19684,,2007-01-06 03:20:04.673 EST,,70626,,,,LOG:  aborting startup due to startup process failure
,19685,,2007-01-06 03:20:04.681 EST,,1,,,,LOG:  logger shutting down


Here is the log of the first try at starting the database:

,2960,,2007-01-06 08:59:13.996 EST,,1,,,,LOG:  database system was interrupted while in recovery at 2007-01-06 03:20:04 EST
,2960,,2007-01-06 08:59: 13.996 EST,,2,,,,HINT:  This probably means that some data is corrupted and you will have to use the last backup for recovery.
,2960,,2007-01-06 08:59:13.996 EST,,3,,,,LOG:  checkpoint record is at 299/3856B2DC
,2960,,2007-01-06 08:59: 13.996 EST,,4,,,,LOG:  redo record is at 299/3856B2DC; undo record is at 0/0; shutdown TRUE
,2960,,2007-01-06 08:59:13.996 EST,,5,,,,LOG:  next transaction ID: 322691436; next OID: 79454488
,2960,,2007-01-06 08:59:13.996 EST,,6,,,,LOG:  next MultiXactId: 9; next MultiXactOffset: 17
,2960,,2007-01-06 08:59:13.996 EST,,7,,,,LOG:  database system was not properly shut down; automatic recovery in progress
,2960,,2007-01-06 08:59:14.192 EST,,8,,,,LOG:  redo starts at 299/3856B320
,2960,,2007-01-06 08:59:14.205 EST,,9,,,,LOG:  record with zero length at 299/386E998C
,2960,,2007-01-06 08:59:14.205 EST,,10,,,,LOG:  redo done at 299/386E9964
,2960,,2007-01-06 08:59: 14.243 EST,,11,,,,PANIC:  failed to re-find parent key in "3000108689"
,2956,,2007-01-06 08:59:14.246 EST,,2,,,,LOG:  startup process (PID 2960) was terminated by signal 6
,2956,,2007-01-06 08:59:14.246 EST,,3,,,,LOG:  aborting startup due to startup process failure
,2959,,2007-01-06 08:59:14.254 EST,,1,,,,LOG:  logger shutting down

Re: 8.1.3 Crash/Corruption Problem

From
Tom Lane
Date:
"Chris Hoover" <revoohc@gmail.com> writes:
> Over the weekend, I suffered from major database corruption on my
> 8.1.3database.  Fortunetly, this was my development/non-production box
> so I did
> not have to spend my entire weekend getting it back online, but I did spend
> a major portion of it.  I ended up having to do a pitr recovery of the
> system.  I don't understand what happened to the system.  Please help
> explain what happened.  Also, would there have been another option to get
> the system back online w/o the full recovery?

This looks like a problem we fixed in 8.1.6 ... consider updating.

            regards, tom lane