Thread: CRITICAL HELP NEEDED! DEAD DB!

CRITICAL HELP NEEDED! DEAD DB!

From
Cott Lang
Date:
Sep 24 10:22:37 snafu postgres[18306]: [2-1] LOG:  database system was
interrupted while in recovery at 2004-09-24 10:21:41 MST
Sep 24 10:22:37 snafu postgres[18306]: [2-2] HINT:  This probably means
that some data is corrupted and you will have to use the last backup for
recovery.
Sep 24 10:22:37 snafu postgres[18306]: [3-1] LOG:  checkpoint record is
at 9A/C2022368
Sep 24 10:22:37 snafu postgres[18306]: [4-1] LOG:  redo record is at
9A/C2022368; undo record is at 0/0; shutdown FALSE
Sep 24 10:22:37 snafu postgres[18306]: [5-1] LOG:  next transaction ID:
197841225; next OID: 715436086
Sep 24 10:22:37 snafu postgres[18306]: [6-1] LOG:  database system was
not properly shut down; automatic recovery in progress
Sep 24 10:22:37 snafu postgres[18306]: [7-1] LOG:  redo starts at
9A/C20223B0
Sep 24 10:22:37 snafu postgres[18306]: [8-1] PANIC:  btree_insert_redo:
failed to add item
Sep 24 10:22:37 snafu postgres[18299]: [2-1] LOG:  startup process (PID
18306) was terminated by signal 6
Sep 24 10:22:37 snafu postgres[18299]: [3-1] LOG:  aborting startup due
to startup process failure


Any suggestions to recover?!  I'm dead in the water!  Please!!!





Re: CRITICAL HELP NEEDED! DEAD DB!

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Cott Lang
> Sent: Friday, September 24, 2004 10:21 AM
> To: pgsql-hackers@postgresql.org
> Subject: [HACKERS] CRITICAL HELP NEEDED! DEAD DB!
>
>
> Sep 24 10:22:37 snafu postgres[18306]: [2-1] LOG:  database
> system was interrupted while in recovery at 2004-09-24
> 10:21:41 MST Sep 24 10:22:37 snafu postgres[18306]: [2-2]
> HINT:  This probably means that some data is corrupted and
> you will have to use the last backup for recovery. Sep 24
> 10:22:37 snafu postgres[18306]: [3-1] LOG:  checkpoint record
> is at 9A/C2022368 Sep 24 10:22:37 snafu postgres[18306]:
> [4-1] LOG:  redo record is at 9A/C2022368; undo record is at
> 0/0; shutdown FALSE Sep 24 10:22:37 snafu postgres[18306]:
> [5-1] LOG:  next transaction ID: 197841225; next OID:
> 715436086 Sep 24 10:22:37 snafu postgres[18306]: [6-1] LOG:
> database system was not properly shut down; automatic
> recovery in progress Sep 24 10:22:37 snafu postgres[18306]:
> [7-1] LOG:  redo starts at 9A/C20223B0 Sep 24 10:22:37 snafu
> postgres[18306]: [8-1] PANIC:  btree_insert_redo: failed to
> add item Sep 24 10:22:37 snafu postgres[18299]: [2-1] LOG:
> startup process (PID
> 18306) was terminated by signal 6
> Sep 24 10:22:37 snafu postgres[18299]: [3-1] LOG:  aborting
> startup due to startup process failure
>
>
> Any suggestions to recover?!  I'm dead in the water!  Please!!!

When did you do your last backup?

This message is a clue:
"HINT:  This probably means that some data is corrupted and you will
have to use the last backup for recovery."

If you do a restore from your last backup, you will lose the data
between that time and the time of the problem.  Any other solution will
be fraught with peril, I think.

Otherwise, maybe something here will help:
http://svana.org/kleptog/pgsql/pgfsck.html


Re: CRITICAL HELP NEEDED! DEAD DB!

From
Tom Lane
Date:
Cott Lang <cott@internetstaff.com> writes:
> Sep 24 10:22:37 snafu postgres[18306]: [2-1] LOG:  database system was
> interrupted while in recovery at 2004-09-24 10:21:41 MST
> Sep 24 10:22:37 snafu postgres[18306]: [2-2] HINT:  This probably means
> that some data is corrupted and you will have to use the last backup for
> recovery.
> Sep 24 10:22:37 snafu postgres[18306]: [3-1] LOG:  checkpoint record is
> at 9A/C2022368
> Sep 24 10:22:37 snafu postgres[18306]: [4-1] LOG:  redo record is at
> 9A/C2022368; undo record is at 0/0; shutdown FALSE
> Sep 24 10:22:37 snafu postgres[18306]: [5-1] LOG:  next transaction ID:
> 197841225; next OID: 715436086
> Sep 24 10:22:37 snafu postgres[18306]: [6-1] LOG:  database system was
> not properly shut down; automatic recovery in progress
> Sep 24 10:22:37 snafu postgres[18306]: [7-1] LOG:  redo starts at
> 9A/C20223B0
> Sep 24 10:22:37 snafu postgres[18306]: [8-1] PANIC:  btree_insert_redo:
> failed to add item
> Sep 24 10:22:37 snafu postgres[18299]: [2-1] LOG:  startup process (PID
> 18306) was terminated by signal 6
> Sep 24 10:22:37 snafu postgres[18299]: [3-1] LOG:  aborting startup due
> to startup process failure

> Any suggestions to recover?!  I'm dead in the water!  Please!!!

I think your only chance is pg_resetxlog.  Be aware that you won't
necessarily have a consistent database afterwards --- in particular,
whichever index that failure is about is certainly broken.  I'd
recommend a dump and reload, plus as much manual verification of data
consistency as you can manage.

How did you get into this state, anyway?
        regards, tom lane


Re: CRITICAL HELP NEEDED! DEAD DB!

From
Cott Lang
Date:
On Fri, 2004-09-24 at 11:43, Tom Lane wrote:
> 
> I think your only chance is pg_resetxlog.  Be aware that you won't
> necessarily have a consistent database afterwards --- in particular,
> whichever index that failure is about is certainly broken.  I'd
> recommend a dump and reload, plus as much manual verification of data
> consistency as you can manage.

That's what I've done, so far so good, although we are still checking
consistency against the last backup.  Thanks for the info. Luckily this
was one of our smaller databases ...

> How did you get into this state, anyway?

I wish I knew - this is what appeared to start it:

Sep 24 10:19:41 snafu postgres[18176]: [464-1] ERROR:  could not open
segment 1 of relation "idx_ordl_id" (target block 1719234412): No such
file or
Sep 24 10:19:41 snafu postgres[18176]: [464-2]  directory

I can't figure out what the exact problem is; there were no I/O errors
or any other relative messages at the time, the box was empty, and
nothing remarkable was going on.  :(

thanks,
Cott

PS: No, I don't think it's a PG problem. :)






Re: CRITICAL HELP NEEDED! DEAD DB!

From
Cott Lang
Date:
Does pgfsck work on 7.4.x?


> 
> Otherwise, maybe something here will help:
> http://svana.org/kleptog/pgsql/pgfsck.html
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match



Re: CRITICAL HELP NEEDED! DEAD DB!

From
Joe Conway
Date:
Cott Lang wrote:
> I wish I knew - this is what appeared to start it:
> 
> Sep 24 10:19:41 snafu postgres[18176]: [464-1] ERROR:  could not open
> segment 1 of relation "idx_ordl_id" (target block 1719234412): No such
> file or
> Sep 24 10:19:41 snafu postgres[18176]: [464-2]  directory
> 
> I can't figure out what the exact problem is; there were no I/O errors
> or any other relative messages at the time, the box was empty, and
> nothing remarkable was going on.  :(

I saw that exact error message, with no logged I/O system errors, when 
using SAN attached storage a month or so ago. It turned out to be the 
SAN silently corrupting files. We did eventually start to see scsi 
errors, but not at the beginning.

Joe


Re: CRITICAL HELP NEEDED! DEAD DB!

From
"Matthew T. O'Connor"
Date:
For starters a little more detail would be helpful, for example:

What version of PostgreSQL?  What OS? What compiler?  What happened that 
caused this? Server Crash?

Matthew


Cott Lang wrote:

>Sep 24 10:22:37 snafu postgres[18306]: [2-1] LOG:  database system was
>interrupted while in recovery at 2004-09-24 10:21:41 MST
>Sep 24 10:22:37 snafu postgres[18306]: [2-2] HINT:  This probably means
>that some data is corrupted and you will have to use the last backup for
>recovery.
>Sep 24 10:22:37 snafu postgres[18306]: [3-1] LOG:  checkpoint record is
>at 9A/C2022368
>Sep 24 10:22:37 snafu postgres[18306]: [4-1] LOG:  redo record is at
>9A/C2022368; undo record is at 0/0; shutdown FALSE
>Sep 24 10:22:37 snafu postgres[18306]: [5-1] LOG:  next transaction ID:
>197841225; next OID: 715436086
>Sep 24 10:22:37 snafu postgres[18306]: [6-1] LOG:  database system was
>not properly shut down; automatic recovery in progress
>Sep 24 10:22:37 snafu postgres[18306]: [7-1] LOG:  redo starts at
>9A/C20223B0
>Sep 24 10:22:37 snafu postgres[18306]: [8-1] PANIC:  btree_insert_redo:
>failed to add item
>Sep 24 10:22:37 snafu postgres[18299]: [2-1] LOG:  startup process (PID
>18306) was terminated by signal 6
>Sep 24 10:22:37 snafu postgres[18299]: [3-1] LOG:  aborting startup due
>to startup process failure
>
>
>Any suggestions to recover?!  I'm dead in the water!  Please!!!
>
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 8: explain analyze is your friend
>
>  
>