Re: Standalone synchronous master - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Standalone synchronous master
Date
Msg-id CAOuzzgr5762f1g_14R3R-+jx6rsVvT095hzXZxyMmLAic8H_Ng@mail.gmail.com
Whole thread Raw
In response to Re: Standalone synchronous master  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Standalone synchronous master
List pgsql-hackers
Andres,

On Friday, January 10, 2014, Andres Freund wrote:
On 2014-01-10 17:02:08 -0500, Stephen Frost wrote:
> * Andres Freund (andres@2ndquadrant.com) wrote:
> > On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote:
> > > If a synchronous slave goes down, the master continues to operate. That is
> > > all. I don't care if it is configurable (I would be fine with that). I don't
> > > care if it is not automatic (e.g; slave goes down and we have to tell the
> > > master to continue).
> >
> > Would you please explain, as precise as possible, what the advantages of
> > using a synchronous standby would be in such a scenario?
>
> In a degraded/failure state, things continue to *work*.  In a
> non-degraded/failure state, you're able to handle a system failure and
> know that you didn't lose any transactions.

Why do you know that you didn't loose any transactions? Trivial network
hiccups, a restart of a standby, IO overload on the standby all can
cause a very short interruptions in the walsender connection - leading
to degradation.

You know that you haven't *lost* any by virtue of the master still being up. The case you describe is a double-failure scenario- the link between the master and slave has to go away AND the master must accept a transaction and then fail independently. 
 
> As pointed out by someone
> previously, that's how RAID-1 works (which I imagine quite a few of us
> use).

I don't think that argument makes much sense. Raid-1 isn't safe
as-is. It's only safe if you use some sort of journaling or similar
ontop. If you issued a write during a crash you normally will just get
either the version from before or the version after the last write back,
depending on the state on the individual disks and which disk is treated
as authoritative by the raid software.

Uh, you need a decent raid controller then and we're talking about after a transaction commit/sync. 

And even if you disregard that, there's not much outside influence that
can lead to loosing connection to a disk drive inside a raid outside an
actually broken drive. Any network connection is normally kept *outside*
the leven at which you build raids.

This is a fair point and perhaps we should have the timeout or jitter GUC which was proposed elsewhere, but the notion that this configuration is completely unreasonable is not accurate and therefore having it would be a benefit overall. 

Thanks,

Stephen

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Standalone synchronous master
Next
From: "Joshua D. Drake"
Date:
Subject: Re: Standalone synchronous master