Re: postmaster recovery and automatic restart suppression - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: postmaster recovery and automatic restart suppression
Date
Msg-id 3f0b79eb0906170036j13f643afjf53c9b134453b3c0@mail.gmail.com
Whole thread Raw
In response to Re: postmaster recovery and automatic restart suppression  ("Czichy, Thoralf (NSN - FI/Helsinki)" <thoralf.czichy@nsn.com>)
List pgsql-hackers
Hi,

On Wed, Jun 17, 2009 at 12:22 AM, Czichy, Thoralf (NSN -
FI/Helsinki)<thoralf.czichy@nsn.com> wrote:
> [STONITH is not always best strategy if failures can be declared as
> user-space software problem only, limit STONITH to HW/OS failures]
>
> The isolation of the failing Postgres instance does not require a
> STONITH
> - mainly as there's also other software running on the same node that
> we'd
> not want to automatically switchover (e.g. because it takes longer to do
> or
> the functionality is more critical or less critical). Also we generally
> trust
> the HW, OS kernel and cluster middleware to behave correctly . These
> functions
> also follow the principle of fail-fast-and-safe. This trust might be an
> assumption that not everybody agrees with, though. So, if the failure
> originated
> from HW/OS/Clusterware it clearly is a STONITH situation, but if it's a
> user-space problem - the default assumption is that isolation can be
> implemented on
> OS-level and that's a guarantee that the clusterware gives (using a
> separate
> Quorum mechanism to avoid split-brain situations).

HW-level STONITH seems to be too much for your case. How about making
your HA-middleware shut the dying postgres down before doing switchover
by using (for example) "pg_ctl -mi stop"? In this case, other
softwares can still
keep on running on the original node after switchover.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Stefan Kaltenbrunner
Date:
Subject: Re: concurrent COPY performance
Next
From: Petr Jelinek
Date:
Subject: Re: GRANT ON ALL IN schema