PG service restart failure (start getting ahead of stop?) - Mailing list pgsql-general

From George Pavlov
Subject PG service restart failure (start getting ahead of stop?)
Date
Msg-id 8C5B026B51B6854CBE88121DBF097A86AB2F07@ehost010-33.exch010.intermedia.net
Whole thread Raw
Responses Re: PG service restart failure (start getting ahead of stop?)  (Brent Wood <b.wood@niwa.co.nz>)
Re: PG service restart failure (start getting ahead of stop?)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
We have a nightly restart of one PG database. Today it failed and I
can't seem to understand why and how to prevent this in the future (nor
can I reproduce the problem).

We have a line in a shell script that calls "/etc/init.d/postgresql
restart". In the shell script's log from this invocation I have:

  Stopping postgresql service: [FAILED]
  Starting postgresql service: [  OK  ]

The outcome of this was that the service was not running (despite the "[
OK  ]" on the second line). The query log from that run ends with:

  2007-04-23 03:03:59 PDT [23265] LOG:  received fast shutdown request
  2007-04-23 03:03:59 PDT [23265] LOG:  aborting any active transactions
  2007-04-23 03:03:59 PDT [26749] FATAL:  terminating connection due to
administrator command
  <... snipped more lines like the one above ...>
  2007-04-23 03:03:59 PDT [24090] LOG:  could not send data to client:
Broken pipe
  2007-04-23 03:03:59 PDT [26700] FATAL:  terminating connection due to
administrator command
  <... snipped more lines like the one above ...>
  2007-04-23 03:04:13 PDT [26820] FATAL:  the database system is
shutting down
  <... snipped more lines like the one above ...>
  2007-04-23 03:06:10 PDT [23269] LOG:  shutting down
  2007-04-23 03:06:10 PDT [23269] LOG:  database system is shut down
  2007-04-23 03:06:13 PDT [23267] LOG:  logger shutting down
  <end of log>

So it looks like the STOPPING of the service actually succeeded, albeit
it took a while (more than the usual sessions open?). The STARTING is
the one that actually failed (is that because the STOP was still in
process?).  The question is why -- in a RESTART situation
wouldn't/shouldn't the START part wait for the STOP part to complete
(regardless of how long it takes)?

Also what can we do to avoid this in the future? We can issue a separate
STOP and then a START X minutes later, but how long an X? It would seem
that a RESTART is really what I want...

TIA,

George

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Slow query using simple equality operators
Next
From: Benjamin Arai
Date:
Subject: Re: Slow query using simple equality operators