Re: High Availability: Hot Standby vs. Warm Standby - Mailing list pgsql-admin

From Brad Nicholson
Subject Re: High Availability: Hot Standby vs. Warm Standby
Date
Msg-id 1278941155.29426.44.camel@bnicholson-desktop
Whole thread Raw
In response to Re: High Availability: Hot Standby vs. Warm Standby  (Thomas Kellerer <spam_eater@gmx.net>)
Responses Re: High Availability: Hot Standby vs. Warm Standby
List pgsql-admin
On Mon, 2010-07-12 at 08:58 +0200, Thomas Kellerer wrote:
> Greg Smith, 10.07.2010 14:44:
> >> Is there a difference in how much data could potentially be lost in
> >> case of a failover? E.g. because 9.0 replicates the changes quicker than 8.4?
> >
> > There's nothing that 9.0 does that you can' t do with 8.4 and the right
> > software to aggressively ship partial files around. In practice though,
> > streaming shipping is likely to result in less average data loss simply
> > because it will do the right thing to ship new transactions
> > automatically. Getting the same reaction time and resulting low amount
> > of lag out of an earlier version requires a level of external script
> > configuration that few sites every actually manage to obtain. You can
> > think of the 9.0 features as mainly reducing the complexity of
> > installation needed to achieve low latency significantly. I would bet
> > that if you tried to setup 8.4 to achieve the same quality level in
> > terms of quick replication, your result would be more fragile and buggy
> > than just using 9.0--the bugs would be just be in your own code rather
> > than in the core server.
> >
>
> Greg and Rob,
>
> thanks for the answers.
>
> I didn't "plan" (or expect) to get the same level of reliability from a "standard" 8.4 HA installation, so I don't
thinkI would go that way. If we do need that level, we'd go for 9.0 or for some other solution. 
>
> The manual lists three possible solutions to HA: shared disk failover, file system replication and Warm/Hot Standby.
I'mnot an admin (nor a DBA), so my question might sound a bit stupid: from my point of view solutions using shared disk
failoverof file system replication seem to be more reliable in terms of how much data can get lost (and possibly the
switchover lag) 

With Shared Disk failover, you don't use filesystem replication.  Your
disk resources are available to a secondary server, and in the result of
a failure to the primary server, your secondary takes ownership of the
disk resources.

The nice thing about shared disk solutions is that you won't lose any
committed data if a server fails.

The down sides are that this shared disk can be really tough to setup
properly. Your storage is a still a single point of failure, so you need
to make sure that it's reliable and most likely still use alternate
means to protect against failure of the storage.

Warm/Hot Standby is a lot easier to setup, but there is a window for
data loss on failure.  This can be minimized/eliminated by using some
sort of block level synchronous replication (DRBD file system, array or
SAN based) if you can afford the overhead.  I don't have any first hand
experience with the sync based stuff, so I can't comment much further
than that.

Switchover times are really going to vary.

For shared clusters, there is some overhead in dealing with the low
level disk stuff, but I find it's not that bad.

The bigger issue on switchover is whether or not you have time to call a
fast shutdown instead of having the server do a hard crash.  If it's a
hard crash (which it usually is), you'll start up in recovery mode on
the secondary server and have to replay through wal.  If you have a lot
of wal files you need to replay on start up, the switchover time can be
quite long.

Warm/Hot Standby tends to be faster on fail over as long as you are
applying the wal files at a reasonable rate.

One further thing to mention - all of these solutions are based on
making the physical blocks available (actually, I'm not sure about
Streaming replication in 9.0).  As such, it is possible for corruption
to hit the master at the block level and get replicated through the
chain.

Logical solutions like Slony/Bucardo/Londiste do give some additional
protection against this.

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



pgsql-admin by date:

Previous
From: Thomas Kellerer
Date:
Subject: Re: High Availability: Hot Standby vs. Warm Standby
Next
From: "Kevin Grittner"
Date:
Subject: Re: Postgresql shared_buffer and SHMMAX configuration