Re: production server down - Mailing list pgsql-hackers

From Joe Conway
Subject Re: production server down
Date
Msg-id 41BFC81E.3050706@joeconway.com
Whole thread Raw
In response to Re: production server down  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
> 
>>I've got a down production server (will not restart) with the following 
>>tail to its log file:
> 
> Please show the output of pg_controldata, or a hex dump of pg_control
> if pg_controldata fails.

OK, will do shortly.

> 
>>The server experienced a hang (as yet unexplained) yesterday and was 
>>restarted at 2004-12-13 16:38:49 according to syslog. I'm told by the 
>>network admin that there was a problem with the network card on restart, 
>>so the nfs mount most probably disappeared and then reappeared 
>>underneath a quiescent postgresql at some point between 2004-12-13 
>>16:39:55 and 2004-12-14 15:36:20 (but much closer to the former than the 
>>latter).
> 
> I've always felt that running a database across NFS was a Bad Idea ;-)

Yeah, I knew I had that coming :-)


>>Any help would be much appreciated. Is our only option pg_resetxlog?
> 
> Possibly, but let's try to dig first.  I suppose the DB is too large
> to save an image aside for forensics later?
> 

Actually, although the database is about 400 GB, we do have room and are 
in the process of saving an image now.

Joe


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: production server down
Next
From: Joe Conway
Date:
Subject: Re: production server down