Re: postmaster fails to start - Mailing list pgsql-general

From Dweck Nir
Subject Re: postmaster fails to start
Date
Msg-id 68382F2B929CEB4FAE828088C1BF0672139510@tbs-ex1.tadirantele.com
Whole thread Raw
In response to postmaster fails to start  ("Dweck Nir" <Nir.Dweck@tadirantele.com>)
List pgsql-general
hi,
1) when the postmaster was started the first time, it was just a matter of .pid file not being erased, since the
machinewas restarted. There was no other postmaster running. 
2) all the WAL configurations are as default:
#---------------------------------------------------------------------------
# WRITE AHEAD LOG
#---------------------------------------------------------------------------

# - Settings -

#fsync = true            # turns forced synchronization on or off
#wal_sync_method = fsync    # the default varies across platforms:
                # fsync, fdatasync, open_sync, or open_datasync
#wal_buffers = 8        # min 4, 8KB each
#commit_delay = 0        # range 0-100000, in microseconds
#commit_siblings = 5        # range 1-1000

# - Checkpoints -

#checkpoint_segments = 3    # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 300    # range 30-3600, in seconds
#checkpoint_warning = 30    # 0 is off, in seconds

# - Archiving -

#archive_command = ''        # command to use to archive a logfile segment

3) I have he data backed up in other databases (not as a file backup), so I am really not so concerned about loosing
thedata (in this specific case).  The problem is that the postmaster isn't starting so I can't even restore the data.
Mostimportantly I would like to learn from this case what to do next time this problem happens to me in the field. 

Regards,
Nir.

-----Original Message-----
From: Richard Huxton [mailto:dev@archonet.com]
Sent: Wednesday, May 25, 2005 11:51 AM
To: Dweck Nir
Cc: postgreSQL mailing list (E-mail)
Subject: Re: [GENERAL] postmaster fails to start


I've taken the liberty of rearranging your email slightly.

Dweck Nir wrote:
> The sequence of events was as follow: 1) computer was shut down
> without stopping postmaster.

OK - not good. Some crucial questions:
1. Do you have fsync enabled or disabled in the postgresql.conf file?
2. Do you know whether your drives are flushing write-cache properly?

> 2) postmaster was started, but because of an error that there might
> be another postmaster running, the postmaster was started again.

Was this just a matter of deleting the .pid file and did you check there
wasn't another postmaster running?

> 3) since then each time I try to start the postmaster I get the same
> error.


 > LOG:  redo starts at 1/A500075C PANIC:  btree_delete_page_redo: lost
 > target page LOG:  startup process (PID 4409) was terminated by signal
 > 6

OK - well, this error message is in backend/access/nbtree/nbtxlog.c
where it is replaying the write-ahead-log files for btrees (I'm no
hacker, I just searched the source for the error message and read the
comments).

So - it looks like you might have a corrupted WAL. That shouldn't be
possible if you were running with fsync enabled and drives that flushed
cache like they should, so I'm guessing that wasn't the case.

It might be possible to recover to a state before this point, but that's
not something I'm going to be able to advise on. There are two steps you
should take immediately though.

1. Take a file-backup of your entire data directory and keep it safe.
You might well be making repeated attempts to recover this.
2. Check your most recent database backup and restore it to another
machine - it may be quicker to restore that than fix your file corruption.

--
   Richard Huxton
   Archonet Ltd

pgsql-general by date:

Previous
From: Richard Huxton
Date:
Subject: Re: postmaster fails to start
Next
From: Sebastian Böck
Date:
Subject: Re: Update on tables when the row doesn't change