Re: production server down - Mailing list pgsql-hackers

From Tom Lane
Subject Re: production server down
Date
Msg-id 7244.1103415151@sss.pgh.pa.us
Whole thread Raw
In response to Re: production server down  (Joe Conway <mail@joeconway.com>)
Responses Re: production server down
List pgsql-hackers
Joe Conway <mail@joeconway.com> writes:
> The Tue Nov  2 17:05:32 2004 seems to be related to the *previous* 
> restart; from /var/log/messages:

> Nov  2 17:04:20 csdfds1 syslogd 1.4.1: restart.
> ...
> Nov  2 17:05:22 csdfds1 su: pam_unix2: session started for user 
> postgres, service su

> ...
> Nov  2 17:05:33 csdfds1 su: (to postgres) root on /dev/pts/5
> Nov  2 17:05:33 csdfds1 su: pam_unix2: session started for user 
> postgres, service su
> Nov  2 17:05:33 csdfds1 su: pam_unix2: session finished for user 
> postgres, service su

I'm betting that the "su" at :33 is the invocation of the postmaster.
The fact that it took the script 11 seconds to get to that step is
suggestive to say the least.  Are you using one of the scripts that
does an auto initdb if it doesn't see a valid PGDATA?  11 seconds might
be about right for that.

One problem with this theory is how come you didn't get screwed during
*that* boot cycle.  It seems to require assuming that the NFS mount came
online just after the initdb finished (else initdb would have
overwritten the on-NFS pg_control) but before the regular postmaster
started (else this same scenario would have played out then).  That's
not a very wide window.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: production server down
Next
From: Tom Lane
Date:
Subject: pg_resetxlog for 8.0 (was Re: production server down)