Re: production server down - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: production server down
Date
Msg-id 200412150342.iBF3got12800@candle.pha.pa.us
Whole thread Raw
In response to production server down  (Joe Conway <mail@joeconway.com>)
List pgsql-hackers
Joe Conway wrote:
> This is a SuSE 9, 8-way Xeon IBM x445, with nfs mounted Network 
> Appliance for database storage, postgresql-7.4.5-36.4.
> 
> The server experienced a hang (as yet unexplained) yesterday and was 
> restarted at 2004-12-13 16:38:49 according to syslog. I'm told by the 
> network admin that there was a problem with the network card on restart, 
> so the nfs mount most probably disappeared and then reappeared 
> underneath a quiescent postgresql at some point between 2004-12-13 
> 16:39:55 and 2004-12-14 15:36:20 (but much closer to the former than the 
> latter).

Well, my first reaction is that if the file system storage was not
always 100% reliable, then there is no way to know the data is correct
except by restoring from backup.  The startup failure indicates that
there were surely storage problems in the past.  There is no way to know
how far that corrupt goes.

You can use pg_resetxlog to clear it out and look to see how accurate it
is, but there is no way to be sure.  I would back up the file system
with the server down in case you want to do some more serious recovery
attempts later though.

The Freenode IRC channel can probably walk you through more details of
the recovery process.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: production server down
Next
From: Bruce Momjian
Date:
Subject: libpq *.def files built for non-Win32