Thread: Hosting PG on AWS in 2013

Hosting PG on AWS in 2013

From
David Boreham
Date:
First I need to say that I'm asking this question on behalf of "a
friend", who asked me what I thought on the subject -- I host all the
databases important to me and my livelihood, on physical machines I own
outright. That said, I'm curious as to the current thinking on a)
whether it is wise, and b) if so how to deploy, PG servers on AWS. As I
recall, a couple years ago it just wasn't a wise plan because Amazon's
I/O performance and reliability wasn't acceptable. Perhaps that's no
longer the case..

Just to set the scene -- the application is a very high traffic web
service where any down time is very costly, processing a few hundred
transactions/s.

Scanning through the latest list of AWS instance types, I can see two
plausible approaches:

1. High I/O Instances:  (regular AWS instance but with SSD local
storage) + some form of replication. Replication would be needed because
(as I understand it) any AWS instance can be "vanished" at any time due
to Amazon screwing something up, maintenance on the host, etc (I believe
the term of art is "ephemeral").

2. EBS-Optimized Instances: these allow the use of EBS storage (SAN-type
service) from regular AWS instances. Assuming that EBS is maintained to
a high level of availability and performance (it doesn't, afaik, feature
the vanishing property of AWS machines), this should in theory work out
much the same as a traditional cluster of physical machines using a
shared SAN, with the appropriate voodoo to fail over between nodes.

Any thoughts, wisdom, and especially from-the-trenches experience, would
be appreciated.

In the Googlesphere I found this interesting presentation :
http://www.pgcon.org/2012/schedule/attachments/256_pg-aws.pdf which
appears to support option #2 with s/w (obviously) RAID on the PG hosts,
but with replication rather than SAN cluster-style failover, or perhaps
in addition to.

Note that I'm not looking for recommendations on PG hosting providers
(in fact my friend is looking to transition off one of them, to bare-AWS
machines, for a variety of reasons).

Thanks.





Re: Hosting PG on AWS in 2013

From
Tomas Vondra
Date:
Hi David,

On 7.4.2013 03:51, David Boreham wrote:
>
> First I need to say that I'm asking this question on behalf of "a
> friend", who asked me what I thought on the subject -- I host all the
> databases important to me and my livelihood, on physical machines I own
> outright. That said, I'm curious as to the current thinking on a)
> whether it is wise, and b) if so how to deploy, PG servers on AWS. As I
> recall, a couple years ago it just wasn't a wise plan because Amazon's
> I/O performance and reliability wasn't acceptable. Perhaps that's no
> longer the case..

That depends on what you mean by reliability and (poor) performance.

Amazon says the AFR for EBS is 0.1-0.5% (under some conditions, see
http://aws.amazon.com/ebs/). I have no reason not to trust them in this
case. Maybe it was much worse a few years ago, but I haven't been
working with AWS back then so I can't compare.

As for the performance, AFAIK the EBS volumes always had, and probably
will have, a 32 MB/s limit. Thanks to caching, built into the EBS, the
performance may seem much better initially (say twice as good), but
after a sustained write workload (say 15-30 minutes), you're back at the
32 MB/s per volume.

The main problem with regular EBS is the variability - the numbers above
are for cases where everything operates fine. When something goes wrong,
you can get 1 MB/s for a period of time. And when you create 10 volumes,
each will have a bit different performance.

There are ways to handle this, though - the "old way" is to build a
RAID10 array on top of regular EBS volumes, the "new way" is to use EBS
with Provisioned IOPS (possibly with RAID0).

> Just to set the scene -- the application is a very high traffic web
> service where any down time is very costly, processing a few hundred
> transactions/s.

What "high traffic" means for the database? Does that mean a lot of
reads or writes, or something else?

> Scanning through the latest list of AWS instance types, I can see two
> plausible approaches:
>
> 1. High I/O Instances:  (regular AWS instance but with SSD local
> storage) + some form of replication. Replication would be needed because
> (as I understand it) any AWS instance can be "vanished" at any time due
> to Amazon screwing something up, maintenance on the host, etc (I believe
> the term of art is "ephemeral").

Yes. You'll get great I/O performance with these SSD-based instances
(easily ~1GB/s in), so you'll probably hit CPU bottlenecks instead.

You're right that to handle the instance / ephemeral failures, you'll
have to use some sort of replication - might be your custom
application-specific application, or some sort of built-in (async/sync
streamin, log shipping, Slony, Londiste, whatever suits your needs ...).

If you really value the availability, you should deploy the replica in
different availability zone or data center.

> 2. EBS-Optimized Instances: these allow the use of EBS storage (SAN-type
> service) from regular AWS instances. Assuming that EBS is maintained to
> a high level of availability and performance (it doesn't, afaik, feature
> the vanishing property of AWS machines), this should in theory work out
> much the same as a traditional cluster of physical machines using a
> shared SAN, with the appropriate voodoo to fail over between nodes.

No, that's not what "EBS Optimized" instances are for. All AWS instance
types can use EBS, using a SHARED network link. That means that e.g.
HTTP or SSH traffic influences EBS performance, because they use the
same ethernet link. The "EBS Optimized" says that the instance has a
network link dedicated for EBS traffic, with guaranteed throughput.

That is not going to fix the variability or EBS performance, though ...

What you're looking for is called "Provisioned IOPS" (PIOPS) which
guarantees the EBS volume performance, in terms of IOPS with 16kB block.
For example you may create an EBS volume with 2000 IOPS, which is
~32MB/s (with 16kB blocks). It's not much, but it's much easier to build
RAID0 array on top of those volumes. We're using this for some of our
databases and are very happy with it.

Obviously, you want to use PIOPS with EBS Optimized instances. I don't
see much point in using only one of them.

But still, depends on the required I/O performance - you can't really
get above 125MB/s (m2.4xlarge) or 250MB/s (cc2.8xlarge).

And you can't really rely on this if you need quick failover to a
different availability zone or data center, because it's quite likely
the EBS is going to be hit by the issue (read the analysis of AWS outage
from April 2011: http://aws.amazon.com/message/65648/).

> Any thoughts, wisdom, and especially from-the-trenches experience, would
> be appreciated.

My recommendation is to plan for zone/datacenter failures first. That
means build a failover replica in a different zone/datacenter.

You might be able to handle isolated EBS failures e.g. using snapshots
and/or backups and similar recovery procedures, but it may require
unpredictable downtimes (e.g. while we don't see failed EBS volumes very
frequently, we do see EBS volumes stuck in "attaching" much more
frequently).

To handle I/O performance, you may either use EBS with PIOPS (which will
also give you more reliability) or SSD instances (but you'll have to
either setup a local replica to handle the instance failures or do the
failover to the other cluster).

> In the Googlesphere I found this interesting presentation :
> http://www.pgcon.org/2012/schedule/attachments/256_pg-aws.pdf which
> appears to support option #2 with s/w (obviously) RAID on the PG hosts,
> but with replication rather than SAN cluster-style failover, or perhaps
> in addition to.

Christophe's talk is definitely a valuable source, although maybe a bit
difficult to follow without his comments (just like any other talk). And
I don't see any mention of "Provisioned IOPS" in it, probably as it was
prepared before Amazon started to offer that feature.

Tomas


Re: Hosting PG on AWS in 2013

From
David Boreham
Date:
I thanks very much for your detailed response. A few answers below inline:

On 4/7/2013 9:38 AM, Tomas Vondra wrote:
> As for the performance, AFAIK the EBS volumes always had, and probably
> will have, a 32 MB/s limit. Thanks to caching, built into the EBS, the
> performance may seem much better initially (say twice as good), but
> after a sustained write workload (say 15-30 minutes), you're back at
> the 32 MB/s per volume. The main problem with regular EBS is the
> variability - the numbers above are for cases where everything
> operates fine. When something goes wrong, you can get 1 MB/s for a
> period of time. And when you create 10 volumes, each will have a bit
> different performance. There are ways to handle this, though - the
> "old way" is to build a RAID10 array on top of regular EBS volumes,
> the "new way" is to use EBS with Provisioned IOPS (possibly with RAID0).
>> Just to set the scene -- the application is a very high traffic web
>> service where any down time is very costly, processing a few hundred
>> transactions/s.
> What "high traffic" means for the database? Does that mean a lot of
> reads or writes, or something else?

I should have been more clear : the transactions/s above is all writes.
The read load is effectively cached. My assessment is that the load is
high enough that careful attention must be paid to I/O performance, but
no so high that sharding/partitioning is required (yet).
Part of the site is already using RDS with PIOPS, and runs at a constant
500 w/s, as viewed in CloudWatch. I don't know for sure how the PG-based
elements relate to this on load -- they back different functional areas
of the site.
>
>> Scanning through the latest list of AWS instance types, I can see two
>> plausible approaches:
>>
>> 1. High I/O Instances:  (regular AWS instance but with SSD local
>> storage) + some form of replication. Replication would be needed because
>> (as I understand it) any AWS instance can be "vanished" at any time due
>> to Amazon screwing something up, maintenance on the host, etc (I believe
>> the term of art is "ephemeral").
> Yes. You'll get great I/O performance with these SSD-based instances
> (easily ~1GB/s in), so you'll probably hit CPU bottlenecks instead.
>
> You're right that to handle the instance / ephemeral failures, you'll
> have to use some sort of replication - might be your custom
> application-specific application, or some sort of built-in (async/sync
> streamin, log shipping, Slony, Londiste, whatever suits your needs ...).
>
> If you really value the availability, you should deploy the replica in
> different availability zone or data center.
>
>> 2. EBS-Optimized Instances: these allow the use of EBS storage (SAN-type
>> service) from regular AWS instances. Assuming that EBS is maintained to
>> a high level of availability and performance (it doesn't, afaik, feature
>> the vanishing property of AWS machines), this should in theory work out
>> much the same as a traditional cluster of physical machines using a
>> shared SAN, with the appropriate voodoo to fail over between nodes.
> No, that's not what "EBS Optimized" instances are for. All AWS instance
> types can use EBS, using a SHARED network link. That means that e.g.
> HTTP or SSH traffic influences EBS performance, because they use the
> same ethernet link. The "EBS Optimized" says that the instance has a
> network link dedicated for EBS traffic, with guaranteed throughput.

Ah, thanks for clarifying that. I knew about PIOPS, but hadn't realized
that EBS Optimized meant a dedicated SAN cable. Makes sense...

>
> That is not going to fix the variability or EBS performance, though ...
>
> What you're looking for is called "Provisioned IOPS" (PIOPS) which
> guarantees the EBS volume performance, in terms of IOPS with 16kB block.
> For example you may create an EBS volume with 2000 IOPS, which is
> ~32MB/s (with 16kB blocks). It's not much, but it's much easier to build
> RAID0 array on top of those volumes. We're using this for some of our
> databases and are very happy with it.
>
> Obviously, you want to use PIOPS with EBS Optimized instances. I don't
> see much point in using only one of them.
>
> But still, depends on the required I/O performance - you can't really
> get above 125MB/s (m2.4xlarge) or 250MB/s (cc2.8xlarge).

I don't forsee this application being limited by bulk data throughput
(MB/s). It will be limited more by writes/s due to the small
transaction, OLTP-type workload.

>
> And you can't really rely on this if you need quick failover to a
> different availability zone or data center, because it's quite likely
> the EBS is going to be hit by the issue (read the analysis of AWS outage
> from April 2011: http://aws.amazon.com/message/65648/).

Right, assume that there can be cascading and correlated failures. I'm
not sure I could ever convince myself that a cloud-hosted solution is
really safe, because honestly I don't trust Amazon to design out their
single failure points and thermal-runaway problems. However in the
industry now there seems to be wide acceptance of the view that if
you're shafted by Amazon, that's ok (you don't get fired). I'm looking
at this project from that perspective. "Netflix-reliable", something
like that ;)

>
>> Any thoughts, wisdom, and especially from-the-trenches experience, would
>> be appreciated.
> My recommendation is to plan for zone/datacenter failures first. That
> means build a failover replica in a different zone/datacenter.
>
> You might be able to handle isolated EBS failures e.g. using snapshots
> and/or backups and similar recovery procedures, but it may require
> unpredictable downtimes (e.g. while we don't see failed EBS volumes very
> frequently, we do see EBS volumes stuck in "attaching" much more
> frequently).
>
> To handle I/O performance, you may either use EBS with PIOPS (which will
> also give you more reliability) or SSD instances (but you'll have to
> either setup a local replica to handle the instance failures or do the
> failover to the other cluster).
>

Thanks again for your thoughts -- most helpful.




Re: Hosting PG on AWS in 2013

From
Ben Chobot
Date:

On Apr 6, 2013, at 6:51 PM, David Boreham wrote:

First I need to say that I'm asking this question on behalf of "a friend", who asked me what I thought on the subject -- I host all the databases important to me and my livelihood, on physical machines I own outright. That said, I'm curious as to the current thinking on a) whether it is wise, and b) if so how to deploy, PG servers on AWS. As I recall, a couple years ago it just wasn't a wise plan because Amazon's I/O performance and reliability wasn't acceptable. Perhaps that's no longer the case..

Tomas gave you a pretty good run-down, but I should just emphasis that you need to view AWS instances as disposable, if only because that's how Amazon views them. You have multiple AZs in every region.... use them for replication, because its only a matter of time before your master DB goes offline (or the entire AZ it's in does). So script up your failover and have it ready to run, because you will need to do it. Also, copy data to another region and have a DR plan to fail over to it, because history shows AZ aren't always as independent as Amazon intends. 

Of course, these are things you should do regardless of if you're in AWS or not, but AWS makes it more necessary. (Which arguably pushes you to have a more resilient service.)

Also, if you go the route of CC-sized instances, you don't need to bother with EBS optimization, because the CC instances have 10Gb network links already. 

Also, if you go the ephemeral instance route, be aware that an instance stop/start (not reboot) means you loose your data. There are still too many times where we've found an instance needs to be restarted, so you need to be really, really ok with your failover if you want those local SSDs. I would say synchronous replication would be mandatory. 


Overall I won't say that you can get amazing DB performance inside AWS, but you can certainly get reasonable performance with enough PIOPs volumes and memory, and while the on-demand cost is absurd compared to what you can build with bare metal, the reserved-instance cost is more reasonable (even if not cheap). 

Re: Hosting PG on AWS in 2013

From
Tomas Vondra
Date:
On 7.4.2013 19:43, David Boreham wrote:
>
> I thanks very much for your detailed response. A few answers below inline:
>
> On 4/7/2013 9:38 AM, Tomas Vondra wrote:
>> As for the performance, AFAIK the EBS volumes always had, and probably
>> will have, a 32 MB/s limit. Thanks to caching, built into the EBS, the
>> performance may seem much better initially (say twice as good), but
>> after a sustained write workload (say 15-30 minutes), you're back at
>> the 32 MB/s per volume. The main problem with regular EBS is the
>> variability - the numbers above are for cases where everything
>> operates fine. When something goes wrong, you can get 1 MB/s for a
>> period of time. And when you create 10 volumes, each will have a bit
>> different performance. There are ways to handle this, though - the
>> "old way" is to build a RAID10 array on top of regular EBS volumes,
>> the "new way" is to use EBS with Provisioned IOPS (possibly with RAID0).
>>> Just to set the scene -- the application is a very high traffic web
>>> service where any down time is very costly, processing a few hundred
>>> transactions/s.
>> What "high traffic" means for the database? Does that mean a lot of
>> reads or writes, or something else?
>
> I should have been more clear : the transactions/s above is all writes.
> The read load is effectively cached. My assessment is that the load is
> high enough that careful attention must be paid to I/O performance, but
> no so high that sharding/partitioning is required (yet).
> Part of the site is already using RDS with PIOPS, and runs at a constant
> 500 w/s, as viewed in CloudWatch. I don't know for sure how the PG-based
> elements relate to this on load -- they back different functional areas
> of the site.

Thats 500 * 16kB of writes, i.e. ~8MB/s. Not a big deal, IMHO,
especially if only part of this are writes from PostgreSQL.

>> But still, depends on the required I/O performance - you can't really
>> get above 125MB/s (m2.4xlarge) or 250MB/s (cc2.8xlarge).
>
> I don't forsee this application being limited by bulk data throughput
> (MB/s). It will be limited more by writes/s due to the small
> transaction, OLTP-type workload.

There's not much difference between random and sequential I/O on EBS.
You may probably get a bit better sequential performance thanks to
coalescing smaller requests (the PIOPS work with 16kB blocks, while
PostgreSQL uses 8kB), but we don't see that in practice.

And the writes to the WAL are sequential anyway.

>> And you can't really rely on this if you need quick failover to a
>> different availability zone or data center, because it's quite likely
>> the EBS is going to be hit by the issue (read the analysis of AWS outage
>> from April 2011: http://aws.amazon.com/message/65648/).
>
> Right, assume that there can be cascading and correlated failures. I'm
> not sure I could ever convince myself that a cloud-hosted solution is
> really safe, because honestly I don't trust Amazon to design out their
> single failure points and thermal-runaway problems. However in the
> industry now there seems to be wide acceptance of the view that if
> you're shafted by Amazon, that's ok (you don't get fired). I'm looking
> at this project from that perspective. "Netflix-reliable", something
> like that ;)

Well, even if you could prevent all those failures, there's still a
possibility of a human error (as in 2011) or Godzilla eating the data
center (and it's not going to eat a single availability zone).

I believe Amazon is working hard on this and I trust their engineers,
but this simply is not a matter of trust. Mistakes and unexpected
failures do happen all the time. Anyone who believes that moving to
Amazon somehow magicaly makes them disappear is naive.

The only good thing is that when such crash happens, half of the
internet goes down so noone really notices the smaller sites. If you
can't watch funny cat pictures on reddit, it's all futile anyway.

Tomas



Re: Hosting PG on AWS in 2013

From
Vincent Veyron
Date:
Le dimanche 07 avril 2013 à 11:19 -0700, Ben Chobot a écrit :

>
> Overall I won't say that you can get amazing DB performance inside
> AWS, but you can certainly get reasonable performance with enough
> PIOPs volumes and memory, and while the on-demand cost is absurd
> compared to what you can build with bare metal, the reserved-instance
> cost is more reasonable (even if not cheap).

Indeed.

Could someone explain to me the point of using an AWS instance in the
case of the OP, whose site is apparently very busy, versus renting a
bare metal server in a datacenter?

As an example, the site in my sig, which admittedly has much lower
requirements, since I only have a handful of users, has been hosted very
reliably on a rented server at online.net for the past two years on
their smallest server for 15 euros/month. Average load sits at 0.3%.

Using this feels like having a machine in your facility, only better
protected. I use several of those for redundancy.

Is there something I'm missing?


--
Salutations, Vincent Veyron
http://marica.fr/site/demonstration
Logiciel de gestion des contentieux juridiques et des sinistres d'assurance



Re: Hosting PG on AWS in 2013

From
Ben Chobot
Date:
On Apr 8, 2013, at 2:15 AM, Vincent Veyron wrote:

Could someone explain to me the point of using an AWS instance in the
case of the OP, whose site is apparently very busy, versus renting a
bare metal server in a datacenter?

Well, at least in my experience, you don't go to AWS because the databases there are awesome. You go to AWS because you have highly cyclical load patterns, can't predict your future capacity needs, tend to have very large batch jobs, etc. So then you have most of your servers living in AWS, and if you need low latencies to your database (which most people do) then it often makes sense to try to make your database live in AWS as well, instead of putting it a VPN hop away.

I wouldn't claim that AWS is the best place to run a database, but for running a service, of which a database is just one part, you could do a lot worse if you do it right.

Re: Hosting PG on AWS in 2013

From
David Boreham
Date:
On 4/8/2013 3:15 AM, Vincent Veyron wrote:
> Could someone explain to me the point of using an AWS instance in the
> case of the OP, whose site is apparently very busy, versus renting a
> bare metal server in a datacenter?

I am the OP, but I can't provide a complete answer, since personally
(e.g. all servers that my income depends on), I do not use cloud
hosting. However, some reasons I have heard mentioned include:

1. The rest of the site is already hosted in AWS so deploying a DB
outside AWS adds to network costs, latency, and adds yet more moving
parts and things to worry about.
2. These days many companies just do not have the capability to deploy
bare metal. The people who understand how to do it are gone, the
management feel like it is outside their comfort zone vs. Cloud, and so
on. Conversely, there are plenty of people you can hire who are familiar
with AWS, its deployment tools, management, monitoring and so on.
3. Peer pressure to use AWS (from finance people, VCs, industry pundits,
etc).
4. AWS is the new IBM Mainframe (nobody ever got fired for buying
one...). If 1/2 the Internet is on AWS, and something breaks, then
well...you can point to the 1/2 the Internet that's also down..