Home > mailing lists

Re: postmaster recovery and automatic restart suppression - Mailing list pgsql-hackers

From	Fujii Masao
Subject	Re: postmaster recovery and automatic restart suppression
Date	June 17, 2009 07:36:41
Msg-id	3f0b79eb0906170036j13f643afjf53c9b134453b3c0@mail.gmail.com Whole thread Raw
In response to	Re: postmaster recovery and automatic restart suppression ("Czichy, Thoralf (NSN - FI/Helsinki)" <thoralf.czichy@nsn.com>)
List	pgsql-hackers

Tree view

Hi,

On Wed, Jun 17, 2009 at 12:22 AM, Czichy, Thoralf (NSN -
FI/Helsinki)<thoralf.czichy@nsn.com> wrote:
> [STONITH is not always best strategy if failures can be declared as
> user-space software problem only, limit STONITH to HW/OS failures]
>
> The isolation of the failing Postgres instance does not require a
> STONITH
> - mainly as there's also other software running on the same node that
> we'd
> not want to automatically switchover (e.g. because it takes longer to do
> or
> the functionality is more critical or less critical). Also we generally
> trust
> the HW, OS kernel and cluster middleware to behave correctly . These
> functions
> also follow the principle of fail-fast-and-safe. This trust might be an
> assumption that not everybody agrees with, though. So, if the failure
> originated
> from HW/OS/Clusterware it clearly is a STONITH situation, but if it's a
> user-space problem - the default assumption is that isolation can be
> implemented on
> OS-level and that's a guarantee that the clusterware gives (using a
> separate
> Quorum mechanism to avoid split-brain situations).

HW-level STONITH seems to be too much for your case. How about making
your HA-middleware shut the dying postgres down before doing switchover
by using (for example) "pg_ctl -mi stop"? In this case, other
softwares can still
keep on running on the original node after switchover.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

pgsql-hackers by date:

From: Stefan Kaltenbrunner
Date: 17 June 2009, 04:40:21
Subject: Re: concurrent COPY performance

From: Petr Jelinek
Date: 17 June 2009, 08:29:27
Subject: Re: GRANT ON ALL IN schema

Re: postmaster recovery and automatic restart suppression - Mailing list pgsql-hackers

Previous

Next