Re: Using AWS ephemeral SSD storage for production database workload? - Mailing list pgsql-general

From Pritam Barhate
Subject Re: Using AWS ephemeral SSD storage for production database workload?
Date
Msg-id CALpo98Ufx4hYZeJy2Ae59mBrxtuC25LKx-aPbTU6ijR4RrD9ng@mail.gmail.com
Whole thread Raw
In response to Re: Using AWS ephemeral SSD storage for production database workload?  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Using AWS ephemeral SSD storage for production databaseworkload?  (Steven Lembark <lembark@wrkhors.com>)
List pgsql-general
>> Both log shipping and async replication are ancient features, and should
>> be well understood. What exactly is unclear?

I know about these and I know how to operate them also. The only part I am concerned about is the ephemeral storage. The risk appetite around it and the steps people take in order to ensure no "serious" data is lost when both the primary and the standby are lost (very unlikely when both are in different AZ but still possible.).  I was just wondering if there is any secret sauce (like some wisdom that comes only from operating a real-world deployment) to it. Even Heroku seems to be using PIOS (https://devcenter.heroku.com/articles/heroku-postgres-production-tier-technical-characterization) and these guys created WAL-E. Anyways I did learn some new things from Manuel's response. 

In short, I am just trying to learn from other people's experience. 

Thanks for all the information.

Pritam.


On Mon, Jan 29, 2018 at 11:02 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:


On 01/29/2018 05:41 PM, Pritam Barhate wrote:
> Hi everyone, 
>
> As you may know, EBS volumes though durable are very costly when you
> need provisioned IOPS. As opposed to this AWS instance attached
> ephemeral SSD is very fast but isn't durable.
>
> I have come across some ideas on the Internet where people hinted at
> running production PostgreSQL workloads on AWS ephemeral SSD
> storage. Generally, this involves shipping WAL logs continuously to
> S3 and keeping an async read replica in another AWS availability
> zone. Worst case scenario in such deployment is data loss of a few
> seconds. But beyond this the details are sketchy.
>

Both log shipping and async replication are ancient features, and should
be well understood. What exactly is unclear?

> Have you come across such a deployment? What are some best practices
> that need to be followed to pull this through without significant
> data loss? Even though WAL logs are being shipped to S3, in case of
> loss of both the instances, the restore time is going be quite a bit
> for databases of a few hundred GBs.
>

Pretty much everyone who is serious about HA is running such cluster. If
they can't afford any data loss, they use synchronous replicas instead.
That's a basic latency-durability trade-off.

> Just to be clear, I am not planning anything like this, anytime soon
> :-) But I am curious about trade-offs of such a deployment. Any
> concrete information in this aspect is well appreciated.
>

Pretty much everyone is using such architecture (primary + streaming
replicas) nowadays, so it's a reasonably well understood scenario. But
it's really unclear what kind of information you expect to get, or how
much time have you spent reading about this.

There is quite a bit of information in the official docs, although maybe
a bit too low level - it certainly gives you the building blocks instead
of a complete solution. There are also books like [1] for example.

And finally there are tools that help with managing such clusters, like
for example [2]. Not only it's rather bad idea to implement this on your
own (bugs, unnecessary effort) but the tools also show how to do stuff.

[1]
https://www.packtpub.com/big-data-and-business-intelligence/postgresql-replication-second-edition

[2] https://repmgr.org/

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-general by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Using AWS ephemeral SSD storage for production database workload?
Next
From: Vitaliy Garnashevich
Date:
Subject: EXPLAIN BUFFERS: dirtied