Re: postmaster recovery and automatic restart suppression - Mailing list pgsql-hackers

From Kolb, Harald (NSN - DE/Munich)
Subject Re: postmaster recovery and automatic restart suppression
Date
Msg-id 8F6635BC27831E4BB0923D8A55136C26018DB6BB@DEMUEXC005.nsn-intra.net
Whole thread Raw
In response to Re: postmaster recovery and automatic restart suppression  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: postmaster recovery and automatic restart suppression
List pgsql-hackers
Hi

> -----Original Message-----
> From: ext Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Tuesday, June 09, 2009 9:20 PM
> To: Kolb, Harald (NSN - DE/Munich)
> Cc: Robert Haas; Greg Stark; Simon Riggs; Fujii Masao;
> pgsql-hackers@postgresql.org; Czichy, Thoralf (NSN - FI/Helsinki)
> Subject: Re: [HACKERS] postmaster recovery and automatic
> restart suppression
>
> "Kolb, Harald (NSN - DE/Munich)" <harald.kolb@nsn.com> writes:
> > If you don't want to see this option as a GUC parameter, would it be
> > acceptable to have it as a new postmaster cmd line option ?
>
> That would make two kluges, not one (we don't do options that are
> settable in only one way).  And it does nothing whatever to address
> my objection to the concept.
>
>             regards, tom lane
>

First point is understood.
Second point needs further discussion:
The recovery and restart feature is an excellent solution if the db is
running in a standalone environment and I understand that this should
not be weakened. But in a configuration where the db is only one
resource among others and where you have a central supervisor, it's
problematic. Then this central instance observes all the resources and
services and decides what to do in case of problems. It's not up to the
resource/service to make it's own decision because it's only a piece of
the cake and doesn't has the complete view to the whole situation.
E.g. the behaviour might be different if the problems occurr during an
overload situation or if you already have hints to HW related problems
or if you are in an upgrade procedure and the initial start fails. An
uncontrolled and undetected automatic restart may complicate the
situation and increase the outage time.
Thus it would be helpful to have the possibility of a very fast failure
detection (SIGCHLD in controlling instance) and to avoid wasteful
cleanup procedures.
If the db is embedded in a management (High Availability) environment,
this option will be helpful in general, independent if you have a
cluster or a single node.
But in a cluster environment it would be more important to have this
switch, because you always will have this management instance, the
cluster software. And of course the main reason of a cluster is to
switch over when it makes sense to do so. And one good reason to realy
do it is when a central instance like the db on the primary side
crashes. At least the user should have the possibility to decide this,
but this would require that PostgreSQL constructively supports this
situation.

Regards, Harald.


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: machine-readable explain output
Next
From: Alvaro Herrera
Date:
Subject: Re: postmaster recovery and automatic restart suppression