Thread: Opinions on SSDs

Opinions on SSDs

From
"David F. Skoll"
Date:
Hi,

We run a fairly write-intensive workload and are looking at upgrading
our Pg servers.  (PostgreSQL 9.1; no practical way to upgrade to 9.2 for
a while because we use what's packaged with Debian.)

I'm considering the following configuration:

   Dual 4-core Intel CPU (E5620 at 2.4GHz)

   192GB of RAM

   Sixteen 240GB Intel SSD 520 series drives arranged using Linux
   RAID-10.  The RAID 10 array will use eight mirrored stripes using the
   "offset-copies" RAID10 scheme.

Questions:

1) Has anyone had experience with Intel 520 SSDs?  Are they reliable?
When they fail, do they fail nicely (ie, failure detected and bad drive
removed from RAID array) or horribly (data silently corrupted...) ?

2) Is this RAID arrangement sensible?  I've seen reports where some
people keep the pg_xlog directory on a separate RAID array, but I
don't really see the advantage with SSDs.

3) Our current workload peaks at about 5000 transactions per second;
you can assume about one-third to one-half of those are writes.  Do
you think we can get away with 16 10Krpm SATA drives instead of the
SSDs?

Regards,

David.


Re: Opinions on SSDs

From
"Joshua D. Drake"
Date:
On 08/12/2013 08:28 AM, David F. Skoll wrote:
>
> Hi,
>
> We run a fairly write-intensive workload and are looking at upgrading
> our Pg servers.  (PostgreSQL 9.1; no practical way to upgrade to 9.2 for
> a while because we use what's packaged with Debian.)

apt.postgresql.org

>
> I'm considering the following configuration:
>
>     Dual 4-core Intel CPU (E5620 at 2.4GHz)
>
>     192GB of RAM
>
>     Sixteen 240GB Intel SSD 520 series drives arranged using Linux
>     RAID-10.  The RAID 10 array will use eight mirrored stripes using the
>     "offset-copies" RAID10 scheme.
>
> Questions:
>
> 1) Has anyone had experience with Intel 520 SSDs?  Are they reliable?
> When they fail, do they fail nicely (ie, failure detected and bad drive
> removed from RAID array) or horribly (data silently corrupted...) ?

I don't recall if the 520s have powerloss protection but you will want
to check that.

>
> 2) Is this RAID arrangement sensible?  I've seen reports where some
> people keep the pg_xlog directory on a separate RAID array, but I
> don't really see the advantage with SSDs.

Put the pg_xlog on spindles, they are more than fast enough and won't
eat up the write life of your SSDs.

JD


--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
    a rose in the deeps of my heart. - W.B. Yeats


Re: Opinions on SSDs

From
Craig James
Date:
On Mon, Aug 12, 2013 at 8:28 AM, David F. Skoll <dfs@roaringpenguin.com> wrote:

3) Our current workload peaks at about 5000 transactions per second;
you can assume about one-third to one-half of those are writes.  Do
you think we can get away with 16 10Krpm SATA drives instead of the
SSDs?

pgbench peaks out at 5K-7K transactions per second on my server which uses just 10ea. of 7Krpm SATA drives:

  WAL: RAID1 (2 disks)
  Data: RAID10 (8 disks)
  3Ware RAID controller with BBU
  2x4 core Intel CPUs
  12 GB memory

I don't know how pgbench compares to your workload.  But suspect 16 10K SATA drives would be pretty fast if you combine them with a BBU RAID controller.

On the other hand, I swore this would be the last server I buy with spinning storage.

Craig

Re: Opinions on SSDs

From
Scott Whitney
Date:
When you say "16 10K drives," do you mean:

a) RAID 0 with 16 drives?
b) RAID 1 with 8+8 drives?
c) RAID 5 with 12 drives?
d) RAID 1 with 7+7 drives and 2 hotspares?

We moved from a 14 FC drive (15k RPM) array (6+6 with 2 hotspares) to a 6 SSD array (2+2 with 2 hotspares) because our iops would max out regularly on the spinning drives. The SSD solution I put in has show significant speed improvements, to say the very least.

The short answer is unless you're going with option a (which has no redundancy), you're going to be have some I/O wait at 5k tps.

Now, there is a LOT to understand about drive iops. You could start here, if you would like to read a bit about it:

Basically, just assume that you're getting 130 iops per drive. Well, 16 drives in a RAID 0 is going to max you out at 2100ish iops, which is low for your stated peak usage (but probably within range for your average usage). However, you start looking at 8+8 or 7+7 with HSP, you're looking at cutting that in half, and you're going to see I/O wait, period. Of course if you were to use a non-recommended RAID5, you'd be taking an even bigger hit on writes.

Now, I do _not_ want to open a can of worms here and start a war about SSD versus spindles and "perceived performance vs real," or any such thing. I attended Greg Smith's  (excellent) talk in Austin at PG Day wrt "Seeking Postgres," and I had also personally amassed quite a bit of data on such comparisons myself.  Unfortunately, some of that talk not compare apples to apples (3-disk RAID 0 versus singe Intel 520SSD), and I quite simply find that the benchmarks do not really reflect real world usage.

Source: months and months of real-world stress-testing specifically 10k drives (SAS) against SSD (SATA) drives in the same configuration on the same machine using the same tests plus over a year (total) of production deployment among 3 servers thusly configured.

So far as personal experience with the Intel drives, I don't have that, personally. I'm using Crucial, and I'm pretty happy with those. The _problem_ with SSD is there is no "put it in the freezer" magic bullet. When they fail, they fail, and they're gone. So, IMO (and there are MANY MANY valid opinions on this), use slightly cheaper drives and proactively replace them every 9 months or year.




On Mon, Aug 12, 2013 at 8:28 AM, David F. Skoll <dfs@roaringpenguin.com> wrote:

3) Our current workload peaks at about 5000 transactions per second;
you can assume about one-third to one-half of those are writes.  Do
you think we can get away with 16 10Krpm SATA drives instead of the
SSDs?

pgbench peaks out at 5K-7K transactions per second on my server which uses just 10ea. of 7Krpm SATA drives:

  WAL: RAID1 (2 disks)
  Data: RAID10 (8 disks)
  3Ware RAID controller with BBU
  2x4 core Intel CPUs
  12 GB memory

I don't know how pgbench compares to your workload.  But suspect 16 10K SATA drives would be pretty fast if you combine them with a BBU RAID controller.

On the other hand, I swore this would be the last server I buy with spinning storage.

Craig


Re: Opinions on SSDs

From
"David F. Skoll"
Date:
On Mon, 12 Aug 2013 11:01:09 -0500 (CDT)
Scott Whitney <scott@journyx.com> wrote:

> When you say "16 10K drives," do you mean:

I mean 8 RAID-1 pairs with data striped across the pairs.  The Linux
software RAID "offset" scheme is described here:


http://www.ilsistemista.net/index.php/linux-a-unix/35-linux-software-raid-10-layouts-performance-near-far-and-offset-benchmark-analysis.html?start=1

> The SSD solution I put in has shown significant speed improvements,
> to say the very least.

OK; thanks.

> Basically, just assume that you're getting 130 iops per drive. Well,
> 16 drives in a RAID 0 is going to max you out at 2100ish iops,

With our RAID-10 array, we're looking at about 1050 iops.  Our server
currently is holding up OK with a decidedly non-optimal arrangment
(four RAID-1 volumes with pg_xlog on one, most DB files on a second,
and a couple of tablespaces with hand-placed tables and indexes on the
other two, and only 7200 RPM disks.)

So I think 16 10Krpm spinning disks will probably suffice; the failure mode
of SSDs makes me nervous.

Regards,

David.


Re: Opinions on SSDs

From
Bruce Momjian
Date:
On Mon, Aug 12, 2013 at 08:33:04AM -0700, Joshua D. Drake wrote:
> >1) Has anyone had experience with Intel 520 SSDs?  Are they reliable?
> >When they fail, do they fail nicely (ie, failure detected and bad drive
> >removed from RAID array) or horribly (data silently corrupted...) ?
>
> I don't recall if the 520s have powerloss protection but you will
> want to check that.

I am pretty sure they don't.  The only options are the Intel 320 and
710, I think.  Here is a blog about it:

    http://blog.2ndquadrant.com/intel_ssds_lifetime_and_the_32/

Look for "Enhanced Power Loss Data Protection".  Intel does not make it
easy to find all drive that have it --- you have to look at each spec
sheet.

> >2) Is this RAID arrangement sensible?  I've seen reports where some
> >people keep the pg_xlog directory on a separate RAID array, but I
> >don't really see the advantage with SSDs.
>
> Put the pg_xlog on spindles, they are more than fast enough and
> won't eat up the write life of your SSDs.

Given its small size and need for fast fsync, I have WAL on SSDs, and
the data on magnetic disk.  You are right the WAL can generate a lot of
writes, but just use smartctl monitoring and replace the SSD when
needed.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +


Re: Opinions on SSDs

From
Lonni J Friedman
Date:
On Mon, Aug 12, 2013 at 1:05 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Mon, Aug 12, 2013 at 08:33:04AM -0700, Joshua D. Drake wrote:
>> >1) Has anyone had experience with Intel 520 SSDs?  Are they reliable?
>> >When they fail, do they fail nicely (ie, failure detected and bad drive
>> >removed from RAID array) or horribly (data silently corrupted...) ?
>>
>> I don't recall if the 520s have powerloss protection but you will
>> want to check that.
>
> I am pretty sure they don't.  The only options are the Intel 320 and
> 710, I think.  Here is a blog about it:
>
>         http://blog.2ndquadrant.com/intel_ssds_lifetime_and_the_32/
>
> Look for "Enhanced Power Loss Data Protection".  Intel does not make it
> easy to find all drive that have it --- you have to look at each spec
> sheet.

The S3700 series also has power loss data protection:
http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-dc-s3700-series.html


Re: Opinions on SSDs

From
"David F. Skoll"
Date:
On Mon, 12 Aug 2013 16:05:04 -0400
Bruce Momjian <bruce@momjian.us> wrote:

> On Mon, Aug 12, 2013 at 08:33:04AM -0700, Joshua D. Drake wrote:

> > Put the pg_xlog on spindles, they are more than fast enough and
> > won't eat up the write life of your SSDs.

> Given its small size and need for fast fsync, I have WAL on SSDs, and
> the data on magnetic disk.  You are right the WAL can generate a lot
> of writes, but just use smartctl monitoring and replace the SSD when
> needed.

:-)  Nothing like consensus from Pg luminaries. :)

Anyway, I've decided to go with spinning disks for this server upgrade.  The
factors behind my decision:

o A server with 16 10Krpm disks will be substantially better than what we
  have now, and what we have now is managing to keep up with only occasional
  over-busy periods.

o My supplier cannot easily get the 320, 710 or S3700 Intel SSD disks; he
  claims a shortage of Intel SSD drives.  I know too little about non-Intel
  SSDs to feel comfortable with them.

o One of our two database servers is 200km away, so swapping out SSDs on
  a prophylactic basis is not particularly appetizing.

Regards,

David.


Re: Opinions on SSDs

From
Bruce Momjian
Date:
On Mon, Aug 12, 2013 at 04:41:48PM -0400, David F. Skoll wrote:
> On Mon, 12 Aug 2013 16:05:04 -0400
> Bruce Momjian <bruce@momjian.us> wrote:
>
> > On Mon, Aug 12, 2013 at 08:33:04AM -0700, Joshua D. Drake wrote:
>
> > > Put the pg_xlog on spindles, they are more than fast enough and
> > > won't eat up the write life of your SSDs.
>
> > Given its small size and need for fast fsync, I have WAL on SSDs, and
> > the data on magnetic disk.  You are right the WAL can generate a lot
> > of writes, but just use smartctl monitoring and replace the SSD when
> > needed.
>
> :-)  Nothing like consensus from Pg luminaries. :)
>
> Anyway, I've decided to go with spinning disks for this server upgrade.  The
> factors behind my decision:
>
> o A server with 16 10Krpm disks will be substantially better than what we
>   have now, and what we have now is managing to keep up with only occasional
>   over-busy periods.
>
> o My supplier cannot easily get the 320, 710 or S3700 Intel SSD disks; he
>   claims a shortage of Intel SSD drives.  I know too little about non-Intel
>   SSDs to feel comfortable with them.
>
> o One of our two database servers is 200km away, so swapping out SSDs on
>   a prophylactic basis is not particularly appetizing.

Good point on the travel distance.  For my server, I didn't go with a
RAID controller with a battery-backed unit (BBU), so SSDs for WAL made
complete sense for me.  FYI, your CPUs exactly matches mine:

    http://momjian.us/main/blogs/pgblog/2012.html#January_20_2012

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +


Re: Opinions on SSDs

From
Charles Sprickman
Date:
On Aug 12, 2013, at 4:15 PM, Lonni J Friedman wrote:

> On Mon, Aug 12, 2013 at 1:05 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> On Mon, Aug 12, 2013 at 08:33:04AM -0700, Joshua D. Drake wrote:
>>>> 1) Has anyone had experience with Intel 520 SSDs?  Are they reliable?
>>>> When they fail, do they fail nicely (ie, failure detected and bad drive
>>>> removed from RAID array) or horribly (data silently corrupted...) ?
>>>
>>> I don't recall if the 520s have powerloss protection but you will
>>> want to check that.
>>
>> I am pretty sure they don't.  The only options are the Intel 320 and
>> 710, I think.  Here is a blog about it:
>>
>>        http://blog.2ndquadrant.com/intel_ssds_lifetime_and_the_32/
>>
>> Look for "Enhanced Power Loss Data Protection".  Intel does not make it
>> easy to find all drive that have it --- you have to look at each spec
>> sheet.
>
> The S3700 series also has power loss data protection:
> http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-dc-s3700-series.html

And the much more affordable S3500 series:

http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-dc-s3500-series.html

The 320 and 710 are still available, but the prices are totally jacked up, which I assume means it's all old stock and
peoplethat need to get exact replacements are the main market at this point. 

We run 4 320s on an SSD-only box and it's been amazing.  We're using ZFS so no hardware RAID, but it does allow us to
pullmembers of each mirrored pair out one by one to take care of both pre-emptive replacement and array growth (started
with160GB drives, on the first refresh moved to 250GB on one pair).  Wear indicator on the replaced drives was at 98%,
sothose got moved to another box for some quick scratch storage.  The next replacement we'll probably cycle the old
SSDsin as ZIL on other (non-db) servers and bring in these new Intel S3500s. 

Another non-traditional and cheap option is to combine some decent spinny drives with SSDs.  The slave to our main
all-SSDbox is a hybrid with 4 10K raptors paired with two small Intel 320s as ZFS ZIL.  The ZIL "absorbs" the sync
writes,so we get ssd-like random write performance but with the data also safe on the traditional spinny drives.
pgbenchon that setup did something like 15K TPS, I've got graphs of that laying around somewhere if anyone's
interested.

The budget hybrid SSD approach is apparently an odd setup, as I've not seen anyone else discuss it. :)

Charles

>
> --
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin



Re: Opinions on SSDs

From
Scott Whitney
Date:
I tested the hybrid approach during my months-long testing and performance stuff, and I was a bit underwhelmed.

That said, what _I_ personally really needed was increase in peak iops. Using spindles for "static" data (OS, some logs, and such) worked fine, but no matter how I split up the pg stuff (logs, data, etc), I couldn't get anywhere above about 15% of the speed of the true SSD solution.

And, on my end, I'm using IBM ServeRAID cards (8 and 10 series) with battery back, so I didn't bother with the EPL stuff. Maybe I should, but when you're talking that many moving parts, I have my doubts that the EPL stuff would do anything when under the control of a BBU  _true_ hardware (not kernel-based software-raid-on chip) RAID card.

Also, so far I haven't seen any "marketing BS." I mean, my drives are rated either 30k (write) and 40k (read) iops or maybe 35/45. I forget. Somewhere in that range. So, IN THEORY (think "in marketing speak"), with a 6-drive config (RAID 10 2+2 plus 2 hot spare), IN THEORY I'm running 70,000ish iops on those 2 online drives. From what I've seen, while I'm sure this is a bit overblown on the marketing, it's well within 20% of the marketed iops.

Again, just for math's sake, I WAS running 3-4k REQUESTED* iops about 20% of the time, and sometimes I'd spike to 7k. My old RAID array was capable of about 1,050 iops or so, and I was having I/O wait like mad. I have not had that problem a single time since migrating my servers to pure SSD solution.

* When I say "requested iops," understand that if I have a hard limit of 1,050 that my drives can actually handle, the 1051st enters a wait state. If another one comes in, I have 1,052 (not really, but you get the point for simplification) io requests needing service. Basically, then, they stack up to a reported 3k, that might not mean that I'm actually requesting 3,000 iops at any given time (I might only NEED 1200 right now), but there are 3,000 in the queue waiting to be serviced.  There's a bit of calculus that would go on there if you truly cared about determining what the actual wait queue is at any given second.






On Aug 12, 2013, at 4:15 PM, Lonni J Friedman wrote:

> On Mon, Aug 12, 2013 at 1:05 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> On Mon, Aug 12, 2013 at 08:33:04AM -0700, Joshua D. Drake wrote:
>>>> 1) Has anyone had experience with Intel 520 SSDs?  Are they reliable?
>>>> When they fail, do they fail nicely (ie, failure detected and bad drive
>>>> removed from RAID array) or horribly (data silently corrupted...) ?
>>>
>>> I don't recall if the 520s have powerloss protection but you will
>>> want to check that.
>>
>> I am pretty sure they don't.  The only options are the Intel 320 and
>> 710, I think.  Here is a blog about it:
>>
>>        http://blog.2ndquadrant.com/intel_ssds_lifetime_and_the_32/
>>
>> Look for "Enhanced Power Loss Data Protection".  Intel does not make it
>> easy to find all drive that have it --- you have to look at each spec
>> sheet.
>
> The S3700 series also has power loss data protection:
> http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-dc-s3700-series.html

And the much more affordable S3500 series:

http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-dc-s3500-series.html

The 320 and 710 are still available, but the prices are totally jacked up, which I assume means it's all old stock and people that need to get exact replacements are the main market at this point.

We run 4 320s on an SSD-only box and it's been amazing.  We're using ZFS so no hardware RAID, but it does allow us to pull members of each mirrored pair out one by one to take care of both pre-emptive replacement and array growth (started with 160GB drives, on the first refresh moved to 250GB on one pair).  Wear indicator on the replaced drives was at 98%, so those got moved to another box for some quick scratch storage.  The next replacement we'll probably cycle the old SSDs in as ZIL on other (non-db) servers and bring in these new Intel S3500s.

Another non-traditional and cheap option is to combine some decent spinny drives with SSDs.  The slave to our main all-SSD box is a hybrid with 4 10K raptors paired with two small Intel 320s as ZFS ZIL.  The ZIL "absorbs" the sync writes, so we get ssd-like random write performance but with the data also safe on the traditional spinny drives.  pgbench on that setup did something like 15K TPS, I've got graphs of that laying around somewhere if anyone's interested.

The budget hybrid SSD approach is apparently an odd setup, as I've not seen anyone else discuss it. :)

Charles

>
> --
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin



--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin