Thread: High Availability: Hot Standby vs. Warm Standby

High Availability: Hot Standby vs. Warm Standby

From

Thomas Kellerer

Date:

09 July 2010, 08:51:27

Hi,

we are contemplating the possibilities for a Postgres HA installation.

As the rollout is targeted towards the end of the year, 9.0 and it's new features might be an option for us.

Now from a HA point of view, what is the major difference between 9.0's Hot Standby and 8.x's Warm Standby?

I am aware that I can use the 9.0 standby server for read only queries, but that is (currently) not something we need

I'm wondering about the differences when the failover situation occurs. From reading the docs, I get the impression
that9.0's streaming replication might be faster than 8.4's WAL shipping, but otherwise offers the same level of data
protection.

Is there a difference in how much data could potentially be lost in case of a failover?
E.g. because 9.0 replicates the changes quicker than 8.4?

If there is no (big) difference in reliability (or potential data loss) I would rather go for 8.4 than 9.0 just because
thefeature is so new in 9.0 and might not be 100% reliable at the beginning. 

Any input is highly appreciated.

Thanks in advance
Thomas

Re: High Availability: Hot Standby vs. Warm Standby

From

Rob Wultsch

Date:

09 July 2010, 13:15:14

On Fri, Jul 9, 2010 at 4:51 AM, Thomas Kellerer <spam_eater@gmx.net> wrote:
> Hi,
>
> we are contemplating the possibilities for a Postgres HA installation.
>
> As the rollout is targeted towards the end of the year, 9.0 and it's new
> features might be an option for us.
>
> Now from a HA point of view, what is the major difference between 9.0's Hot
> Standby and 8.x's Warm Standby?

You can run queries on the standby...
>
> I am aware that I can use the 9.0 standby server for read only queries, but
> that is (currently) not something we need
>

Taking SQL backups without impacting the master might be something to consider.

> I'm wondering about the differences when the failover situation occurs. From
> reading the docs, I get the impression that 9.0's streaming replication
> might be faster than 8.4's WAL shipping, but otherwise offers the same level
> of data protection.
>
> Is there a difference in how much data could potentially be lost in case of
> a failover?

9.0 has streaming replication so much less data would likely be lost.
WAL logs are generally 16 MB and often shipped when completed.

> E.g. because 9.0 replicates the changes quicker than 8.4?
>
> If there is no (big) difference in reliability (or potential data loss) I
> would rather go for 8.4 than 9.0 just because the feature is so new in 9.0
> and might not be 100% reliable at the beginning.
>
> Any input is highly appreciated.
>
> Thanks in advance
> Thomas
>
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>



--
Rob Wultsch
wultsch@gmail.com

Re: High Availability: Hot Standby vs. Warm Standby

From

Thomas Kellerer

Date:

09 July 2010, 13:32:20

Hi,

Rob Wultsch wrote on 09.07.2010 18:14:
>> I am aware that I can use the 9.0 standby server for read only queries, but
>> that is (currently) not something we need
>>
>
> Taking SQL backups without impacting the master might be something to consider.

Interesting point. Thanks for mentioning that.


>> I'm wondering about the differences when the failover situation occurs. From
>> reading the docs, I get the impression that 9.0's streaming replication
>> might be faster than 8.4's WAL shipping, but otherwise offers the same level
>> of data protection.
>>
>> Is there a difference in how much data could potentially be lost in case of
>> a failover?
>
> 9.0 has streaming replication so much less data would likely be lost.
> WAL logs are generally 16 MB and often shipped when completed.

So my assumption is correct that streaming replication does mean that in case of a failover less transactions are lost?


Regards
Thomas

Re: High Availability: Hot Standby vs. Warm Standby

From

"Kevin Grittner"

Date:

09 July 2010, 13:52:30

Thomas Kellerer <spam_eater@gmx.net> wrote:

> So my assumption is correct that streaming replication does mean
> that in case of a failover less transactions are lost?

Yes, that is correct.

-Kevin

Re: High Availability: Hot Standby vs. Warm Standby

From

Brad Nicholson

Date:

09 July 2010, 14:18:49

On Fri, 2010-07-09 at 18:31 +0200, Thomas Kellerer wrote:
> Hi,
>
> Rob Wultsch wrote on 09.07.2010 18:14:
> >> I am aware that I can use the 9.0 standby server for read only queries, but
> >> that is (currently) not something we need
> >>
> >
> > Taking SQL backups without impacting the master might be something to consider.
>
> Interesting point. Thanks for mentioning that.

There is an issue with running your backup from a standby that you need
to be aware of.

There is potential that a long running query on the standby can conflict
with the application of wal records.  In this case, you have a choice to
either terminate the query and let wal records continue to be applied,
or delay the application of the wal until the query completes.

Considering that you are looking at HA and asking about the difference
in lost transactions between streaming replication and 8.4 PITR, I doubt
that letting the standby lag for the duration of the pg_dump is going to
be something that interests you.

Full details are here:
http://www.postgresql.org/docs/9.0/static/hot-standby.html#HOT-STANDBY-CONFLICT

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.

Re: High Availability: Hot Standby vs. Warm Standby

From

Kasia Tuszynska

Date:

09 July 2010, 14:25:57

Since the topic is very relevant to me right now I would like to ask if anyone is running a HA solution in the cloud?
Thanks,
Kasia 

-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Brad Nicholson
Sent: Friday, July 09, 2010 10:19 AM
To: Thomas Kellerer
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] High Availability: Hot Standby vs. Warm Standby

On Fri, 2010-07-09 at 18:31 +0200, Thomas Kellerer wrote:
> Hi,
> 
> Rob Wultsch wrote on 09.07.2010 18:14:
> >> I am aware that I can use the 9.0 standby server for read only queries, but
> >> that is (currently) not something we need
> >>
> >
> > Taking SQL backups without impacting the master might be something to consider.
> 
> Interesting point. Thanks for mentioning that.

There is an issue with running your backup from a standby that you need
to be aware of. 

There is potential that a long running query on the standby can conflict
with the application of wal records.  In this case, you have a choice to
either terminate the query and let wal records continue to be applied,
or delay the application of the wal until the query completes.

Considering that you are looking at HA and asking about the difference
in lost transactions between streaming replication and 8.4 PITR, I doubt
that letting the standby lag for the duration of the pg_dump is going to
be something that interests you.

Full details are here:
http://www.postgresql.org/docs/9.0/static/hot-standby.html#HOT-STANDBY-CONFLICT

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.

-- 
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: High Availability: Hot Standby vs. Warm Standby

From

Brad Nicholson

Date:

09 July 2010, 16:23:33

On Fri, 2010-07-09 at 10:25 -0700, Kasia Tuszynska wrote:
> Since the topic is very relevant to me right now I would like to ask if anyone is running a HA solution in the cloud?

While running databases in a cloud does come with it's own set of
issues/challenges, I don't see what would be materially different about
implementation of (most) HA solutions for Postgres in the cloud vs
standalone servers.

If you have specific questions or concerns, perhaps we could be of
assistance.

>
> -----Original Message-----
> From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Brad Nicholson
> Sent: Friday, July 09, 2010 10:19 AM
> To: Thomas Kellerer
> Cc: pgsql-admin@postgresql.org
> Subject: Re: [ADMIN] High Availability: Hot Standby vs. Warm Standby
>
> On Fri, 2010-07-09 at 18:31 +0200, Thomas Kellerer wrote:
> > Hi,
> >
> > Rob Wultsch wrote on 09.07.2010 18:14:
> > >> I am aware that I can use the 9.0 standby server for read only queries, but
> > >> that is (currently) not something we need
> > >>
> > >
> > > Taking SQL backups without impacting the master might be something to consider.
> >
> > Interesting point. Thanks for mentioning that.
>
> There is an issue with running your backup from a standby that you need
> to be aware of.
>
> There is potential that a long running query on the standby can conflict
> with the application of wal records.  In this case, you have a choice to
> either terminate the query and let wal records continue to be applied,
> or delay the application of the wal until the query completes.
>
> Considering that you are looking at HA and asking about the difference
> in lost transactions between streaming replication and 8.4 PITR, I doubt
> that letting the standby lag for the duration of the pg_dump is going to
> be something that interests you.
>
> Full details are here:
> http://www.postgresql.org/docs/9.0/static/hot-standby.html#HOT-STANDBY-CONFLICT
>
> --
> Brad Nicholson  416-673-4106
> Database Administrator, Afilias Canada Corp.
>
>
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>
>

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.

Re: High Availability: Hot Standby vs. Warm Standby

From

Greg Smith

Date:

10 July 2010, 09:44:47

Thomas Kellerer wrote:
> I'm wondering about the differences when the failover situation
> occurs. From reading the docs, I get the impression that 9.0's
> streaming replication might be faster than 8.4's WAL shipping, but
> otherwise offers the same level of data protection.
> Is there a difference in how much data could potentially be lost in
> case of a failover?
> E.g. because 9.0 replicates the changes quicker than 8.4?

There's nothing that 9.0 does that you can' t do with 8.4 and the right
software to aggressively ship partial files around.  In practice though,
streaming shipping is likely to result in less average data loss simply
because it will do the right thing to ship new transactions
automatically.  Getting the same reaction time and resulting low amount
of lag out of an earlier version requires a level of external script
configuration that few sites every actually manage to obtain.  You can
think of the 9.0 features as mainly reducing the complexity of
installation needed to achieve low latency significantly.  I would bet
that if you tried to setup 8.4 to achieve the same quality level in
terms of quick replication, your result would be more fragile and buggy
than just using 9.0--the bugs would be just be in your own code rather
than in the core server.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: High Availability: Hot Standby vs. Warm Standby

From

Thomas Kellerer

Date:

12 July 2010, 03:59:17

Greg Smith, 10.07.2010 14:44:
>> Is there a difference in how much data could potentially be lost in
>> case of a failover? E.g. because 9.0 replicates the changes quicker than 8.4?
>
> There's nothing that 9.0 does that you can' t do with 8.4 and the right
> software to aggressively ship partial files around. In practice though,
> streaming shipping is likely to result in less average data loss simply
> because it will do the right thing to ship new transactions
> automatically. Getting the same reaction time and resulting low amount
> of lag out of an earlier version requires a level of external script
> configuration that few sites every actually manage to obtain. You can
> think of the 9.0 features as mainly reducing the complexity of
> installation needed to achieve low latency significantly. I would bet
> that if you tried to setup 8.4 to achieve the same quality level in
> terms of quick replication, your result would be more fragile and buggy
> than just using 9.0--the bugs would be just be in your own code rather
> than in the core server.
>

Greg and Rob,

thanks for the answers.

I didn't "plan" (or expect) to get the same level of reliability from a "standard" 8.4 HA installation, so I don't
thinkI would go that way. If we do need that level, we'd go for 9.0 or for some other solution. 

The manual lists three possible solutions to HA: shared disk failover, file system replication and Warm/Hot Standby.
I'mnot an admin (nor a DBA), so my question might sound a bit stupid: from my point of view solutions using shared disk
failoverof file system replication seem to be more reliable in terms of how much data can get lost (and possibly the
switchover lag) 

Regards
Thomas

Re: High Availability: Hot Standby vs. Warm Standby

From

Brad Nicholson

Date:

12 July 2010, 10:26:07

On Mon, 2010-07-12 at 08:58 +0200, Thomas Kellerer wrote:
> Greg Smith, 10.07.2010 14:44:
> >> Is there a difference in how much data could potentially be lost in
> >> case of a failover? E.g. because 9.0 replicates the changes quicker than 8.4?
> >
> > There's nothing that 9.0 does that you can' t do with 8.4 and the right
> > software to aggressively ship partial files around. In practice though,
> > streaming shipping is likely to result in less average data loss simply
> > because it will do the right thing to ship new transactions
> > automatically. Getting the same reaction time and resulting low amount
> > of lag out of an earlier version requires a level of external script
> > configuration that few sites every actually manage to obtain. You can
> > think of the 9.0 features as mainly reducing the complexity of
> > installation needed to achieve low latency significantly. I would bet
> > that if you tried to setup 8.4 to achieve the same quality level in
> > terms of quick replication, your result would be more fragile and buggy
> > than just using 9.0--the bugs would be just be in your own code rather
> > than in the core server.
> >
>
> Greg and Rob,
>
> thanks for the answers.
>
> I didn't "plan" (or expect) to get the same level of reliability from a "standard" 8.4 HA installation, so I don't
thinkI would go that way. If we do need that level, we'd go for 9.0 or for some other solution. 
>
> The manual lists three possible solutions to HA: shared disk failover, file system replication and Warm/Hot Standby.
I'mnot an admin (nor a DBA), so my question might sound a bit stupid: from my point of view solutions using shared disk
failoverof file system replication seem to be more reliable in terms of how much data can get lost (and possibly the
switchover lag) 

With Shared Disk failover, you don't use filesystem replication.  Your
disk resources are available to a secondary server, and in the result of
a failure to the primary server, your secondary takes ownership of the
disk resources.

The nice thing about shared disk solutions is that you won't lose any
committed data if a server fails.

The down sides are that this shared disk can be really tough to setup
properly. Your storage is a still a single point of failure, so you need
to make sure that it's reliable and most likely still use alternate
means to protect against failure of the storage.

Warm/Hot Standby is a lot easier to setup, but there is a window for
data loss on failure.  This can be minimized/eliminated by using some
sort of block level synchronous replication (DRBD file system, array or
SAN based) if you can afford the overhead.  I don't have any first hand
experience with the sync based stuff, so I can't comment much further
than that.

Switchover times are really going to vary.

For shared clusters, there is some overhead in dealing with the low
level disk stuff, but I find it's not that bad.

The bigger issue on switchover is whether or not you have time to call a
fast shutdown instead of having the server do a hard crash.  If it's a
hard crash (which it usually is), you'll start up in recovery mode on
the secondary server and have to replay through wal.  If you have a lot
of wal files you need to replay on start up, the switchover time can be
quite long.

Warm/Hot Standby tends to be faster on fail over as long as you are
applying the wal files at a reasonable rate.

One further thing to mention - all of these solutions are based on
making the physical blocks available (actually, I'm not sure about
Streaming replication in 9.0).  As such, it is possible for corruption
to hit the master at the block level and get replicated through the
chain.

Logical solutions like Slony/Bucardo/Londiste do give some additional
protection against this.

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.

Re: High Availability: Hot Standby vs. Warm Standby

From

Greg Smith

Date:

12 July 2010, 13:11:55

Thomas Kellerer wrote:
> The manual lists three possible solutions to HA: shared disk failover,
> file system replication and Warm/Hot Standby. I'm not an admin (nor a
> DBA), so my question might sound a bit stupid: from my point of view
> solutions using shared disk failover of file system replication seem
> to be more reliable in terms of how much data can get lost (and
> possibly the switch over lag)

Yes, but if you try you'll discover that actually getting any shared
disk or file system replication solution setup so that you really do
achieve less failover loss than the file shipping approach will be
expensive, complicated, fragile in its own way, and just generally a
pain to pull off.  The fundamental problem with shared storage for
example is how to keep a note that's failed from try to reassume being
the master when it comes back.  Doing that well requires hardware
support aimed at that specific use case.

Meanwhile, file shipping for Warm Standby use requires nothing special
at all except some modest software setup.  It's comparatively simple to
setup, validate, and keep going on any hardware capable of running the
database.  This is why shared storage and the like isn't the only
obvious solution even though it's technically capable of losing less
transactions; you'll discover that keeping from losing that last little
bit of data when there's a crash turns out to be quite expensive.
Whether it worth it or not depends on the value of your data and whether
it can be retained at some higher level when this happens instead.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: High Availability: Hot Standby vs. Warm Standby

From

Greg Smith

Date:

12 July 2010, 13:16:56

Brad Nicholson wrote:
> One further thing to mention - all of these solutions are based on
> making the physical blocks available (actually, I'm not sure about
> Streaming replication in 9.0).

You're right here; the SR feature in 9.0 is essentially near real-time
partial WAL file shipping, and the WAL contains physical disk block
change data.  If your master has data blocks corrupted, the next time
you do a base backup against it that corruption will be mirrored to the
standby too.  I've mentioned on one of these lists recently that I like
to schedule a periodic pg_dump even if log shipping is the main backup
mechanism for a database, just so that corruption in the underlying
files is caught as early as possible by trying to read every block and
confirm it has valid data.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: High Availability: Hot Standby vs. Warm Standby

From

Thomas Kellerer

Date:

13 July 2010, 04:11:51

Greg Smith, 12.07.2010 18:11:
> Yes, but if you try you'll discover that actually getting any shared
> disk or file system replication solution setup so that you really do
> achieve less failover loss than the file shipping approach will be
> expensive, complicated, fragile in its own way, and just generally a
> pain to pull off. The fundamental problem with shared storage for
> example is how to keep a note that's failed from try to reassume being
> the master when it comes back. Doing that well requires hardware support
> aimed at that specific use case.
> Meanwhile, file shipping for Warm Standby use requires nothing special
> at all except some modest software setup. It's comparatively simple to
> setup, validate, and keep going on any hardware capable of running the
> database. This is why shared storage and the like isn't the only obvious
> solution even though it's technically capable of losing less
> transactions; you'll discover that keeping from losing that last little
> bit of data when there's a crash turns out to be quite expensive.
> Whether it worth it or not depends on the value of your data and whether
> it can be retained at some higher level when this happens instead.
>
Thanks Greg,

I really appreciate your answer. I'm nowhere a DB- or system-admin, but I'm promoting Postgres from a user's point of
viewand this makes people ask me whenever there is a question around Postgres (in this case replacing the existing
OracleDataguard solution with PG) 

Basically the requirement is that we cannot afford to lose any committed data. But if that comes at a too high cost, it
mightbe possible to negotiate that. I will also need to check how busy the server actually is, maybe setting
archive_timeoutto a very low value (1 or two seconds) would be enough. 

The time it takes to switch over to the standby server is not relevant (it doesn't matter if it's seconds or minutes,
butit should not be hours ;) ) 

Using Warm-Standby would mean, if the server fails after a transaction commits, but before the WAL can be shipped to
thestandby, that transaction will be lost, right? At least on the standby server.   

It would still be available on the primary server, but that wouldn't help much, because we would need to backup and
restorethe data from standby to the primary after the problems have been fixed, which would essentially overwrite that
transactionin the WAL. 

Currently our "favorite" is file system replication between two nodes.
The target system will be Solaris using a ZFS filesystem.
Are there any Solaris/ZFS specific features that would help here?

Thanks again for the input!

Regards
Thomas