Thread: SSD Drives
Any opinions/comments on using SSD drives with postgresql?
On 04/02/2014 02:37 PM, Bret Stern wrote: > Any opinions/comments on using SSD drives with postgresql? Using SSDs with PostgreSQL is fine, provided they have an onboard capacitor to ensure data integrity. The main concern with SSD drives, is that they essentially lie about their sync status. There is an inherent race-condition between the time data reaches the drive, and how long it takes for the write balancing and NVRAM commit overhead. Most common drives only have a volatile RAM chip that acts as a buffer space while writes are synced to the physical drive. Without a capacitor backing, the state of this buffer is erased on power loss, resulting in a corrupt database. There are upcoming technologies which may solve this (see ReRAM) but for now, it's a requirement for any sane system. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-676-8870 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
have you seen this?
http://it-blog.5amsolutions.com/2010/08/performance-of-postgresql-ssd-vs.html
Brent Wood
http://it-blog.5amsolutions.com/2010/08/performance-of-postgresql-ssd-vs.html
Brent Wood
Brent Wood |
Principal Technician - GIS and Spatial Data Management Programme Leader - Environmental Information Delivery |
+64-4-386-0529 | 301 Evans Bay Parade, Greta Point, Wellington | www.niwa.co.nz |
________________________________________
From: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] on behalf of Bret Stern [bret_stern@machinemanagement.com]
Sent: Thursday, April 3, 2014 8:37 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] SSD Drives
Any opinions/comments on using SSD drives with postgresql?
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
From: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] on behalf of Bret Stern [bret_stern@machinemanagement.com]
Sent: Thursday, April 3, 2014 8:37 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] SSD Drives
Any opinions/comments on using SSD drives with postgresql?
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Attachment
On 04/02/2014 02:50 PM, Brent Wood wrote: > http://it-blog.5amsolutions.com/2010/08/performance-of-postgresql-ssd-vs.html While interesting, these results are extremely out of date compared to current drives. Current chips and firmware regularly put out 2-10 times better performance than even the best graphs on this page, depending on what you buy. We moved all of our performance-critical servers to NVRAM-based storage years ago. For us, it was well worth the added expense. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-676-8870 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
Care to share the SSD hardware you're using? I've used none to date, and have some critical data I would like to put on a development server to test with. Regards, Bret Stern On Wed, 2014-04-02 at 15:31 -0500, Shaun Thomas wrote: > On 04/02/2014 02:50 PM, Brent Wood wrote: > > > http://it-blog.5amsolutions.com/2010/08/performance-of-postgresql-ssd-vs.html > > While interesting, these results are extremely out of date compared to > current drives. Current chips and firmware regularly put out 2-10 times > better performance than even the best graphs on this page, depending on > what you buy. > > We moved all of our performance-critical servers to NVRAM-based storage > years ago. For us, it was well worth the added expense. > > -- > Shaun Thomas > OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 > 312-676-8870 > sthomas@optionshouse.com > > ______________________________________________ > > See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email > >
On 04/02/2014 04:55 PM, Bret Stern wrote: > Care to share the SSD hardware you're using? We use these: http://www.fusionio.com/products/iodrive2/ The older versions of these cards can read faster than a RAID-10 of 80x15k RPM SAS drives, based on our tests from a couple yeas ago. Writes aren't *quite* as fast, but still much better than even a large RAID array. They ain't cheap, though. You can expect to pay around $15k USD per TB, I believe. There are other similar products from other vendors which may have different cost/performance ratios, but I can only vouch for stuff I've personally tested. Our adventure with these cards was a presentation at Postgres Open in 2011. Slides are here: https://wiki.postgresql.org/images/c/c5/Nvram_fun_profit.pdf -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-676-8870 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On Wed, Apr 2, 2014 at 4:09 PM, Shaun Thomas <sthomas@optionshouse.com> wrote: > On 04/02/2014 04:55 PM, Bret Stern wrote: > >> Care to share the SSD hardware you're using? > > > We use these: > > http://www.fusionio.com/products/iodrive2/ > > The older versions of these cards can read faster than a RAID-10 of 80x15k > RPM SAS drives, based on our tests from a couple yeas ago. Writes aren't > *quite* as fast, but still much better than even a large RAID array. > > They ain't cheap, though. You can expect to pay around $15k USD per TB, I > believe. There are other similar products from other vendors which may have > different cost/performance ratios, but I can only vouch for stuff I've > personally tested. > > Our adventure with these cards was a presentation at Postgres Open in 2011. > Slides are here: > > https://wiki.postgresql.org/images/c/c5/Nvram_fun_profit.pdf > Where I work we use the MLC based FusionIO cards and they are quite fast. It's actually hard to push them to their max with only 24 or 32 cores in a fast machine. My favorite thing about them is their fantastic support.
We used 4x OCZ Deneva 2 in a RAID configuration. Worked well for us for over 2 years with no hardware issues. We switched to SSD because we had a very write-intensive application (30 million rows/day) that spinning disks just couldn't keep up with. On 4/2/2014 6:09 PM, Shaun Thomas wrote: > On 04/02/2014 04:55 PM, Bret Stern wrote: > >> Care to share the SSD hardware you're using? > > We use these: > > http://www.fusionio.com/products/iodrive2/ > > The older versions of these cards can read faster than a RAID-10 of > 80x15k RPM SAS drives, based on our tests from a couple yeas ago. Writes > aren't *quite* as fast, but still much better than even a large RAID array. > > They ain't cheap, though. You can expect to pay around $15k USD per TB, > I believe. There are other similar products from other vendors which may > have different cost/performance ratios, but I can only vouch for stuff > I've personally tested. > > Our adventure with these cards was a presentation at Postgres Open in > 2011. Slides are here: > > https://wiki.postgresql.org/images/c/c5/Nvram_fun_profit.pdf > -- Guy Rouillier --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
While I have two friends who work at FusionIO, and have great confidence in their products, we like to deploy more conventional SATA SSDs at present in our servers. We have been running various versions of Intel's enterprise and data center SSDs in production for several years now and couldn't be happier with their performance. The oldest in service at present are 710 series that have been subjected to a ~500wtps PG load 7*24 for the past 28 months. They still show zero wearout indication in the SMART stats. As others have mentioned, power-fail protection (supercap) is the thing to look for, and also some sort of concrete specification for drive write endurance unless you have made a deliberate decision to trade off endurance vs. cost in the context of your deployment.
On Wed, Apr 2, 2014 at 12:37 PM, Bret Stern <bret_stern@machinemanagement.com> wrote:
Any opinions/comments on using SSD drives with postgresql?
Related, anyone have any thoughts on using postgresql on Amazon's EC2 SSDs? Been looking at http://aws.amazon.com/about-aws/whats-new/2013/12/19/announcing-the-next-generation-of-amazon-ec2-high-i/o-instance
On 4/3/2014 9:26 AM, Joe Van Dyk wrote: > Related, anyone have any thoughts on using postgresql on Amazon's EC2 > SSDs? Been looking at > http://aws.amazon.com/about-aws/whats-new/2013/12/19/announcing-the-next-generation-of-amazon-ec2-high-i/o-instance > if your data isn't very important, by all means, keep it on someone elses virtualized infrastructure with no performance or reliability guarantees. -- john r pierce 37N 122W somewhere on the middle of the left coast
On Apr 3, 2014, at 12:47 PM, John R Pierce <pierce@hogranch.com> wrote:
On 4/3/2014 9:26 AM, Joe Van Dyk wrote:Related, anyone have any thoughts on using postgresql on Amazon's EC2 SSDs? Been looking at http://aws.amazon.com/about-aws/whats-new/2013/12/19/announcing-the-next-generation-of-amazon-ec2-high-i/o-instance
if your data isn't very important, by all means, keep it on someone elses virtualized infrastructure with no performance or reliability guarantees.
Well that’s not quite fair. AWS guarantees performance for those instances (http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/i2-instances.html#i2-instances-diskperf). They also guarantee their instances will fail sooner or later, with or without warning (at which point you will loose all your data unless you’ve been putting copies onto a different system).
On Wed, Apr 2, 2014 at 2:37 PM, Bret Stern <bret_stern@machinemanagement.com> wrote: > Any opinions/comments on using SSD drives with postgresql? Here's a single S3700 smoking an array of 16 15k drives (poster didn't realize that; was to focused on synthetic numbers): http://dba.stackexchange.com/questions/45224/postgres-write-performance-on-intel-s3700-ssd merlin
On Thu, Apr 3, 2014 at 12:13 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Wed, Apr 2, 2014 at 2:37 PM, Bret Stern > <bret_stern@machinemanagement.com> wrote: >> Any opinions/comments on using SSD drives with postgresql? > > Here's a single S3700 smoking an array of 16 15k drives (poster didn't > realize that; was to focused on synthetic numbers): > http://dba.stackexchange.com/questions/45224/postgres-write-performance-on-intel-s3700-ssd I just ran a quick test earlier this week on an old Dell 2970 (2 Opteron 2387, 16GB RAM) comparing a 6-disk RAID10 with 10k 147GB SAS disks to a 2-disk RAID1 with 480GB Intel S3500 SSDs and found the SSDs are about 4-6x faster using pgbench and a scaling factor of 1100. Some sort of MegaRAID controller according to lspci and has BBU. TPS numbers below are approximate. RAID10 disk array: 8 clients: 350 tps 16 clients: 530 tps 32 clients: 800 tps RAID1 SSD array: 8 clients: 2100 tps 16 clients: 2500 tps 32 clients: 3100 tps So yeah, even the slower, cheaper S3500 SSDs are way fast. If your write workload isn't too high, the S3500 can work well. We'll see how the SMART drive lifetime numbers do once we get into production, but right now we estimate they should last at least 5 years and from what we've seen it seems that SSDs seem to wear much better than expected. If not, we'll pony up and go for the S3700 or perhaps move the xlog back on to spinning disks. -Dave
On Thu, Apr 3, 2014 at 12:44 PM, Brent Wood <Brent.Wood@niwa.co.nz> wrote: > Does the RAID 1 array give any performance benefits over a single drive? I'd guess > that writes may be slower, reads may be faster (if balanced) but data security is improved. Unfortunately I didn't test a single drive as that's not a configuration we would run our systems in. I expect that it would reduce read performance and thus pgbench results some, but I can't tell you how much in this case. -Dave
On Thu, Apr 3, 2014 at 1:32 PM, David Rees <drees76@gmail.com> wrote: > On Thu, Apr 3, 2014 at 12:13 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> On Wed, Apr 2, 2014 at 2:37 PM, Bret Stern >> <bret_stern@machinemanagement.com> wrote: >>> Any opinions/comments on using SSD drives with postgresql? >> >> Here's a single S3700 smoking an array of 16 15k drives (poster didn't >> realize that; was to focused on synthetic numbers): >> http://dba.stackexchange.com/questions/45224/postgres-write-performance-on-intel-s3700-ssd > > I just ran a quick test earlier this week on an old Dell 2970 (2 > Opteron 2387, 16GB RAM) comparing a 6-disk RAID10 with 10k 147GB SAS > disks to a 2-disk RAID1 with 480GB Intel S3500 SSDs and found the SSDs > are about 4-6x faster using pgbench and a scaling factor of 1100. Some > sort of MegaRAID controller according to lspci and has BBU. TPS > numbers below are approximate. > > RAID10 disk array: > 8 clients: 350 tps > 16 clients: 530 tps > 32 clients: 800 tps > > RAID1 SSD array: > 8 clients: 2100 tps > 16 clients: 2500 tps > 32 clients: 3100 tps > > So yeah, even the slower, cheaper S3500 SSDs are way fast. If your > write workload isn't too high, the S3500 can work well. We'll see how > the SMART drive lifetime numbers do once we get into production, but > right now we estimate they should last at least 5 years and from what > we've seen it seems that SSDs seem to wear much better than expected. > If not, we'll pony up and go for the S3700 or perhaps move the xlog > back on to spinning disks. On a machine with 16 cores with HT (appears as 32 cores) and 8 of the 3700 series Intel SSDs in a RAID-10 under an LSI MegaRAID with BBU, I was able to get 6300 to 7500 tps on a decent sized pgbench db (-s1000).
On Thu, 2014-04-03 at 12:32 -0700, David Rees wrote: > On Thu, Apr 3, 2014 at 12:13 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > > On Wed, Apr 2, 2014 at 2:37 PM, Bret Stern > > <bret_stern@machinemanagement.com> wrote: > >> Any opinions/comments on using SSD drives with postgresql? > > > > Here's a single S3700 smoking an array of 16 15k drives (poster didn't > > realize that; was to focused on synthetic numbers): > > http://dba.stackexchange.com/questions/45224/postgres-write-performance-on-intel-s3700-ssd > > I just ran a quick test earlier this week on an old Dell 2970 (2 > Opteron 2387, 16GB RAM) comparing a 6-disk RAID10 with 10k 147GB SAS > disks to a 2-disk RAID1 with 480GB Intel S3500 SSDs and found the SSDs > are about 4-6x faster using pgbench and a scaling factor of 1100. Some > sort of MegaRAID controller according to lspci and has BBU. TPS > numbers below are approximate. > > RAID10 disk array: > 8 clients: 350 tps > 16 clients: 530 tps > 32 clients: 800 tps > > RAID1 SSD array: > 8 clients: 2100 tps > 16 clients: 2500 tps > 32 clients: 3100 tps > > So yeah, even the slower, cheaper S3500 SSDs are way fast. If your > write workload isn't too high, the S3500 can work well. Is a write cycle anywhere on the drive different than a re-write? Or is a write a write! They feedback/comments are awesome. I'm shopping.. > We'll see how > the SMART drive lifetime numbers do once we get into production, but > right now we estimate they should last at least 5 years and from what > we've seen it seems that SSDs seem to wear much better than expected. > If not, we'll pony up and go for the S3700 or perhaps move the xlog > back on to spinning disks. > > -Dave
On 4/3/2014 12:32 PM, David Rees wrote: > So yeah, even the slower, cheaper S3500 SSDs are way fast. If your > write workload isn't too high, the S3500 can work well. We'll see how > the SMART drive lifetime numbers do once we get into production, but > right now we estimate they should last at least 5 years and from what > we've seen it seems that SSDs seem to wear much better than expected. > If not, we'll pony up and go for the S3700 or perhaps move the xlog > back on to spinning disks. an important thing in getting decent wear leveling life with SSDs is to keep them under about 70% full. -- john r pierce 37N 122W somewhere on the middle of the left coast
On 4/3/2014 2:00 PM, John R Pierce wrote: > > an important thing in getting decent wear leveling life with SSDs is > to keep them under about 70% full. > This depends on the drive : drives with higher specified write endurance already have significant overprovisioning, before the user sees the space.
On Thu, Apr 3, 2014 at 2:53 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On a machine with 16 cores with HT (appears as 32 cores) and 8 of the > 3700 series Intel SSDs in a RAID-10 under an LSI MegaRAID with BBU, I > was able to get 6300 to 7500 tps on a decent sized pgbench db > (-s1000). Did you happen to grab any 'select only' numbers? merlin
Hi David,
Does the RAID 1 array give any performance benefits over a single drive? I'd guess that writes may be slower, reads may be faster (if balanced) but data security is improved.
Brent Wood
Does the RAID 1 array give any performance benefits over a single drive? I'd guess that writes may be slower, reads may be faster (if balanced) but data security is improved.
Brent Wood
Brent Wood |
Principal Technician - GIS and Spatial Data Management Programme Leader - Environmental Information Delivery |
+64-4-386-0529 | 301 Evans Bay Parade, Greta Point, Wellington | www.niwa.co.nz |
________________________________________
From: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] on behalf of David Rees [drees76@gmail.com]
Sent: Friday, April 4, 2014 8:32 AM
To: Merlin Moncure
Cc: bret_stern@machinemanagement.com; PostgreSQL General
Subject: Re: [GENERAL] SSD Drives
On Thu, Apr 3, 2014 at 12:13 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Wed, Apr 2, 2014 at 2:37 PM, Bret Stern
> <bret_stern@machinemanagement.com> wrote:
>> Any opinions/comments on using SSD drives with postgresql?
>
> Here's a single S3700 smoking an array of 16 15k drives (poster didn't
> realize that; was to focused on synthetic numbers):
> http://dba.stackexchange.com/questions/45224/postgres-write-performance-on-intel-s3700-ssd
I just ran a quick test earlier this week on an old Dell 2970 (2
Opteron 2387, 16GB RAM) comparing a 6-disk RAID10 with 10k 147GB SAS
disks to a 2-disk RAID1 with 480GB Intel S3500 SSDs and found the SSDs
are about 4-6x faster using pgbench and a scaling factor of 1100. Some
sort of MegaRAID controller according to lspci and has BBU. TPS
numbers below are approximate.
RAID10 disk array:
8 clients: 350 tps
16 clients: 530 tps
32 clients: 800 tps
RAID1 SSD array:
8 clients: 2100 tps
16 clients: 2500 tps
32 clients: 3100 tps
So yeah, even the slower, cheaper S3500 SSDs are way fast. If your
write workload isn't too high, the S3500 can work well. We'll see how
the SMART drive lifetime numbers do once we get into production, but
right now we estimate they should last at least 5 years and from what
we've seen it seems that SSDs seem to wear much better than expected.
If not, we'll pony up and go for the S3700 or perhaps move the xlog
back on to spinning disks.
-Dave
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
From: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] on behalf of David Rees [drees76@gmail.com]
Sent: Friday, April 4, 2014 8:32 AM
To: Merlin Moncure
Cc: bret_stern@machinemanagement.com; PostgreSQL General
Subject: Re: [GENERAL] SSD Drives
On Thu, Apr 3, 2014 at 12:13 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Wed, Apr 2, 2014 at 2:37 PM, Bret Stern
> <bret_stern@machinemanagement.com> wrote:
>> Any opinions/comments on using SSD drives with postgresql?
>
> Here's a single S3700 smoking an array of 16 15k drives (poster didn't
> realize that; was to focused on synthetic numbers):
> http://dba.stackexchange.com/questions/45224/postgres-write-performance-on-intel-s3700-ssd
I just ran a quick test earlier this week on an old Dell 2970 (2
Opteron 2387, 16GB RAM) comparing a 6-disk RAID10 with 10k 147GB SAS
disks to a 2-disk RAID1 with 480GB Intel S3500 SSDs and found the SSDs
are about 4-6x faster using pgbench and a scaling factor of 1100. Some
sort of MegaRAID controller according to lspci and has BBU. TPS
numbers below are approximate.
RAID10 disk array:
8 clients: 350 tps
16 clients: 530 tps
32 clients: 800 tps
RAID1 SSD array:
8 clients: 2100 tps
16 clients: 2500 tps
32 clients: 3100 tps
So yeah, even the slower, cheaper S3500 SSDs are way fast. If your
write workload isn't too high, the S3500 can work well. We'll see how
the SMART drive lifetime numbers do once we get into production, but
right now we estimate they should last at least 5 years and from what
we've seen it seems that SSDs seem to wear much better than expected.
If not, we'll pony up and go for the S3700 or perhaps move the xlog
back on to spinning disks.
-Dave
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Attachment
On Thu, Apr 3, 2014 at 3:28 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Thu, Apr 3, 2014 at 2:53 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >> On a machine with 16 cores with HT (appears as 32 cores) and 8 of the >> 3700 series Intel SSDs in a RAID-10 under an LSI MegaRAID with BBU, I >> was able to get 6300 to 7500 tps on a decent sized pgbench db >> (-s1000). > > Did you happen to grab any 'select only' numbers? Darnit. Nope. I'll try to grab some on a spare box if I get one again. Now they're all in production so running pgbench is kind of frowned upon.
On Thu, Apr 3, 2014 at 1:44 PM, Brent Wood <Brent.Wood@niwa.co.nz> wrote: > > Hi David, > > Does the RAID 1 array give any performance benefits over a single drive? I'd guess that writes may be slower, reads maybe faster (if balanced) but data security is improved. I did some testing on machines with 3xMLC FusionIO Drive2s with 1.2TB. Comparing 1 drive and 2 drives in RAID-1 the difference in performance was minimal. However, a 3 drive mirror was noticeably slower. This was all with ubuntu 12.04 using 3.8.latest kernel and software RAID. RAID-0 was by far the fastest, about 30% faster than either a single or a pair of drives in RAID-1
On 04/03/2014 12:44 PM, Brent Wood wrote:
P.ImprintUniqueID {MARGIN: 0cm 0cm 0pt } LI.ImprintUniqueID {MARGIN: 0cm 0cm 0pt } DIV.ImprintUniqueID {MARGIN: 0cm 0cm 0pt } TABLE.ImprintUniqueIDTable {MARGIN: 0cm 0cm 0pt } DIV.Section1 {page: Section1 } .ExternalClass * {LINE-HEIGHT: 100% } Hi David,
Does the RAID 1 array give any performance benefits over a single drive? I'd guess that writes may be slower, reads may be faster (if balanced) but data security is improved.
I've been looking into upgrading to SSD and wondering about RAID and where to apply $$$ as well. In particular I'm curious about any real-world PostgreSQL-oriented performance and data-protections advice in the following areas:
1. With SSDs being orders of magnitude faster than spinning media, when does the RAID controller rather than the storage become the bottleneck?
2. Do I need both BBU on the RAID *and* capacitor on the SSD or just on one? Which one? I'm suspecting capacitor on the SSD and write-through on the RAID.
2. Current thoughts on hardware vs. software RAID - especially since many of the current SSD solutions plug straight into the bus.
3. Potential issues or conflicts with SSD-specific requirements like TRIM.
4. Manufacturers, models or technologies to seek out or avoid.
5. At what point do we consider the RAID controller an additional SPOF that decreases instead of increases reliability?
6. Thoughts on "best bang for the buck?" For example, am I better off dropping the RAID cards and additional drives and instead adding another standby server?
Cheers,
Steve
It would be useful to know more details -- how much storage space you need for example. fwiw I considered all of these issues when we first deployed SSDs and decided to not use RAID controllers. There have not been any reasons to re-think that decision since. However, it depends on your specific needs I think. We prefer to think in terms of a single machine as the unit of service failure -- a machine is either working, or not working, and we ensure state is replicated to several machines for durability. Therefore a storage solution on each machine that is more reliable than the machine itself is not useful. In our deployments we can't max out even one SSD, so there isn't anything a RAID controller can add in terms of performance, but your case could be different. You might also want to consider the power dissipated by the RAID controller : I was quite surprised by how much heat they generate, but this was a couple of years ago. Possibly there are lower power controllers available now. You need the capacitor on the SSD -- a RAID controller with BBU will not fix a non-power-fail-safe SSD. On 4/4/2014 10:04 AM, Steve Crawford wrote: > > I've been looking into upgrading to SSD and wondering about RAID and > where to apply $$$ as well. In particular I'm curious about any > real-world PostgreSQL-oriented performance and data-protections advice > in the following areas: > > 1. With SSDs being orders of magnitude faster than spinning media, > when does the RAID controller rather than the storage become the > bottleneck? > > 2. Do I need both BBU on the RAID *and* capacitor on the SSD or just > on one? Which one? I'm suspecting capacitor on the SSD and > write-through on the RAID. > > 2. Current thoughts on hardware vs. software RAID - especially since > many of the current SSD solutions plug straight into the bus. > > 3. Potential issues or conflicts with SSD-specific requirements like TRIM. > > 4. Manufacturers, models or technologies to seek out or avoid. > > 5. At what point do we consider the RAID controller an additional SPOF > that decreases instead of increases reliability? > > 6. Thoughts on "best bang for the buck?" For example, am I better off > dropping the RAID cards and additional drives and instead adding > another standby server?
On Fri, Apr 4, 2014 at 11:04 AM, Steve Crawford <scrawford@pinpointresearch.com> wrote: > On 04/03/2014 12:44 PM, Brent Wood wrote: > > Hi David, My take: > Does the RAID 1 array give any performance benefits over a single drive? I'd > guess that writes may be slower, reads may be faster (if balanced) but data > security is improved. Probably not so much for SSD drives. Read and write performance are very unbalanced in SSD and RAID1 doesn't help with writes. > I've been looking into upgrading to SSD and wondering about RAID and where > to apply $$$ as well. In particular I'm curious about any real-world > PostgreSQL-oriented performance and data-protections advice in the following > areas: > > 1. With SSDs being orders of magnitude faster than spinning media, when does > the RAID controller rather than the storage become the bottleneck? SSD (at least the good ones) are maybe order of magnitude faster on writes. Can be less or more depending on the application write particulars. SSD are 2-3 orders faster for reads. > 2. Do I need both BBU on the RAID *and* capacitor on the SSD or just on one? > Which one? I'm suspecting capacitor on the SSD and write-through on the > RAID. You need both. The capacitor protects the drive, the BBU protects the raid controller. > 2. Current thoughts on hardware vs. software RAID - especially since many of > the current SSD solutions plug straight into the bus. IMNSHO, software raid is a better bet. The advantages are compelling: Cost, TRIM support, etc. and the SSD drives do not benefit as much from the write cache. But hardware controllers offer very fast burst write performance which is nice. > 3. Potential issues or conflicts with SSD-specific requirements like TRIM. TRIM is not essential but does help. Pretty much all hardware raid controllers do not support TRIM. I've been waiting for a controller that manages TRIM and other SSD stuff (like consolidated wear leveling) across an entire array but so far nothing has really materialized. If it does happen it will probably come from intel. > 4. Manufacturers, models or technologies to seek out or avoid. Avoid consumer grade/enthusiast stuff, and anything that does not have a capacitor. Intel offerings tend to be the benchmark. > 5. At what point do we consider the RAID controller an additional SPOF that > decreases instead of increases reliability? > > 6. Thoughts on "best bang for the buck?" For example, am I better off > dropping the RAID cards and additional drives and instead adding another > standby server? This is going to depend a lot on write patterns. If you don't do much writing, you can gear up accordingly. For all around performance, the S3700 (2.5$/gb) IMO held the crown for most of 2013 and I think is still the one to buy. The s3500 (1.25$/gb) came out and also looks like a pretty good deal, and there are some decent competitors (600 pro for example). If you're willing to spend more, there are a lot of other options. I don't think it's reasonable to spend less for a write heavy application. merlin
On 4/4/2014 10:15 AM, Merlin Moncure wrote:
2. Do I need both BBU on the RAID *and* capacitor on the SSD or just on one? > Which one? I'm suspecting capacitor on the SSD and write-through on the > RAID.You need both. The capacitor protects the drive, the BBU protects the raid controller.
note BBU's on raid cards are being replaced by 'flash-back' which is a supercap and flash memory backup for the raid card's write-back cache.
-- john r pierce 37N 122W somewhere on the middle of the left coast
On Fri, Apr 4, 2014 at 11:15 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Fri, Apr 4, 2014 at 11:04 AM, Steve Crawford > <scrawford@pinpointresearch.com> wrote: >> On 04/03/2014 12:44 PM, Brent Wood wrote: >> 2. Do I need both BBU on the RAID *and* capacitor on the SSD or just on one? >> Which one? I'm suspecting capacitor on the SSD and write-through on the >> RAID. > > You need both. The capacitor protects the drive, the BBU protects the > raid controller. You don't technically need the BBU / flashback memory IF the controller is in write through. My experience has been that the BBU helps a lot on write heavy applications or to get maximum performance for your money. On most cards, it's < $100 so unless you can definitively show no real performance loss without one, get one. OTOH it's worth testing to be sure. But the BBU does a lot to reorder writes and such and flattens out bursty write performance very well. It also speeds up checkpointing if / when it has to occur.
On 4/4/2014 12:08 PM, Scott Marlowe wrote: > You don't technically need the BBU / flashback memory IF the > controller is in write through. if you HAVE the BBU/flash why would you put the controller in write through?? the whole POINT of bbu/flashback is that you can safely enable writeback caching. my testing with postgresql OLTP benchmarks on Linux, I've found virtually identical performance using mdraid vs hardware raid in the same caching mode. its the writeback cache that gives raid cards like the LSI Megaraid SAS2 series, or HP P420, or whatever, their big advantage vs a straight JBOD configuration. -- john r pierce 37N 122W somewhere on the middle of the left coast
On Fri, Apr 4, 2014 at 10:15 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > For all around performance, the > S3700 (2.5$/gb) IMO held the crown for most of 2013 and I think is > still the one to buy. The s3500 (1.25$/gb) came out and also looks > like a pretty good deal The S3500 can be had for $1.00/GB now these days. If you don't need the write durability or the all-out write performance of the S3700, it's a great deal. I do have to wonder if hardware RAID with a BBU can help with write amplification when on SSDs. Though since RHEL/CentOS 6.5 supports trim in software raid, that could be a bigger win. -Dave
On Fri, Apr 4, 2014 at 1:18 PM, John R Pierce <pierce@hogranch.com> wrote: > On 4/4/2014 12:08 PM, Scott Marlowe wrote: >> >> You don't technically need the BBU / flashback memory IF the >> controller is in write through. > > > if you HAVE the BBU/flash why would you put the controller in write > through?? the whole POINT of bbu/flashback is that you can safely enable > writeback caching. > > my testing with postgresql OLTP benchmarks on Linux, I've found virtually > identical performance using mdraid vs hardware raid in the same caching > mode. its the writeback cache that gives raid cards like the LSI Megaraid > SAS2 series, or HP P420, or whatever, their big advantage vs a straight JBOD > configuration. I'm not sure you read / got the whole conversation. The OP was asking if he COULD use a RAID controller with no BBU in write through with SSDs. It's a valid question. My main point was in answer to this response: On Fri, Apr 4, 2014 at 11:15 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Fri, Apr 4, 2014 at 11:04 AM, Steve Crawford >> 2. Do I need both BBU on the RAID *and* capacitor on the SSD or just on one? >> Which one? I'm suspecting capacitor on the SSD and write-through on the >> RAID. > > You need both. The capacitor protects the drive, the BBU protects the > raid controller. Context is king here. You do not have to have a BBU as long as you are in write through as the OP mentioned. With no BBU, in write-through, with supercaps, you should be safe. It's not a sensible configuration for most applications. OTOH, most HW RAIDs have auto spare promotion and easy swap out of dead drives with auto-rebuild. So if you're building 1000 units for the government that just plug in and work, you want the poor guy on the other end to just unplug bad drives and replace them. The cost of a service call could be way more than a HW RAID card. So, there are plenty of reasons you might want to test or even run without a BBU. That wasn't my point. My point was you're SAFE (or should be) with a HW RAID no BBU and supercapped SSDs.
On Friday, April 4, 2014, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Fri, Apr 4, 2014 at 1:18 PM, John R Pierce <pierce@hogranch.com> wrote:
> On 4/4/2014 12:08 PM, Scott Marlowe wrote:
>>
>> You don't technically need the BBU / flashback memory IF the
>> controller is in write through.
>
>
> if you HAVE the BBU/flash why would you put the controller in write
> through?? the whole POINT of bbu/flashback is that you can safely enable
> writeback caching.
>
> my testing with postgresql OLTP benchmarks on Linux, I've found virtually
> identical performance using mdraid vs hardware raid in the same caching
> mode. its the writeback cache that gives raid cards like the LSI Megaraid
> SAS2 series, or HP P420, or whatever, their big advantage vs a straight JBOD
> configuration.
I'm not sure you read / got the whole conversation. The OP was asking
if he COULD use a RAID controller with no BBU in write through with
SSDs. It's a valid question. My main point was in answer to this
response:
On Fri, Apr 4, 2014 at 11:15 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Fri, Apr 4, 2014 at 11:04 AM, Steve Crawford
>> 2. Do I need both BBU on the RAID *and* capacitor on the SSD or just on one?
>> Which one? I'm suspecting capacitor on the SSD and write-through on the
>> RAID.
>
> You need both. The capacitor protects the drive, the BBU protects the
> raid controller.
Context is king here. You do not have to have a BBU as long as you are
in write through as the OP mentioned. With no BBU, in write-through,
with supercaps, you should be safe. It's not a sensible configuration
for most applications. OTOH, most HW RAIDs have auto spare promotion
and easy swap out of dead drives with auto-rebuild. So if you're
building 1000 units for the government that just plug in and work, you
want the poor guy on the other end to just unplug bad drives and
replace them. The cost of a service call could be way more than a HW
RAID card.
So, there are plenty of reasons you might want to test or even run
without a BBU. That wasn't my point. My point was you're SAFE (or
should be) with a HW RAID no BBU and supercapped SSDs.
Agreed on all points. At the end of the day though hw raid is debatable in terms of value with ssd.
I like mdadm more than most ulilitues also.
merlin
On 04/04/2014 10:15 AM, Merlin Moncure wrote: >> 2. Do I need both BBU on the RAID *and* capacitor on the SSD or just on one? >> Which one? I'm suspecting capacitor on the SSD and write-through on the >> RAID. > You need both. The capacitor protects the drive, the BBU protects the > raid controller. ?? In write-through the controller shouldn't return success until it gets it from the drive so no BBU should be required. One LSI slide deck recommends write-back as the optimum policy for SSDs. But I could be wrong which is why I ask. >> 2. Current thoughts on hardware vs. software RAID - especially since many of >> the current SSD solutions plug straight into the bus. > IMNSHO, software raid is a better bet. The advantages are compelling: > Cost, TRIM support, etc. and the SSD drives do not benefit as much > from the write cache. But hardware controllers offer very fast burst > write performance which is nice. > > 6. Thoughts on "best bang for the buck?" For example, am I better off > dropping the RAID cards and additional drives and instead adding another > standby server? > This is going to depend a lot on write patterns. If you don't do much > writing, you can gear up accordingly. For all around performance, the > S3700 (2.5$/gb) IMO held the crown for most of 2013 and I think is > still the one to buy. The s3500 (1.25$/gb) came out and also looks > like a pretty good deal, and there are some decent competitors (600 > pro for example). If you're willing to spend more, there are a lot of > other options. I don't think it's reasonable to spend less for a > write heavy application. FWIW, the workload is somewhat over 50% writes and currently peaks at ~1,600 queries/second after excluding "set" statements. This is currently spread across four 15k SATA drives in RAID 10. Judicious archiving allows us to keep our total OS+data storage requirements under 100GB. Usually. So we should be able to easily stay in the $500/drive price range (200GB S3700) and still have plenty of headroom for wear-leveling. One option I'm considering is no RAID at all but spend the savings from the controllers and extra drives toward an additional standby server. Cheers, Steve
On 4/4/2014 3:57 PM, Steve Crawford wrote: > Judicious archiving allows us to keep our total OS+data storage > requirements under 100GB. Usually. So we should be able to easily stay > in the $500/drive price range (200GB S3700) and still have plenty of > headroom for wear-leveling. > > One option I'm considering is no RAID at all but spend the savings > from the controllers and extra drives toward an additional standby > server. This very similar to our workload. We use a single 200G or 300G Intel SSD per machine, directly attached to the motherboard SATA controller. No RAID controller. We run 7 servers at present in this configuration in a single cluster. Roughly 120W per box peak (8-core, 64G RAM).
On 04/02/2014 02:55 PM, Bret Stern wrote: > Care to share the SSD hardware you're using? > > I've used none to date, and have some critical data I would like > to put on a development server to test with. > > Regards, > > Bret Stern SSDs are ridiculously cheap when you consider the performance difference. We saw at *least* a 10x improvement in performance going with SATA SSDs vs. 10k SAS drives in a messy, read/write environment. (most of our tests were 20x or more) It's a no-brainer for us. It might be tempting to use a consumer-grade SSD due to the significant cost savings, but the money saved is vapor. They may be OK for a dev environment, but you *will* pay in downtime in a production environment. Unlike regular hard drives where the difference between consumer and enterprise drives is performance and a few features, SSDs are different animals. SSDs wear something like a salt-shaker. There's a fairly definite number of writes that they are good for, and when they are gone, the drive will fail. Like a salt shaker, when the salt is gone, you won't get salt any more no matter how you shake it. So, spend the money and get the enterprise class SSDs. They have come down considerably in price over the last year or so. Although on paper the Intel Enterprise SSDs tend to trail the performance numbers of the leading consumer drives, they have wear characteristics that mean you can trust them as much as you can any other drive for years, and they still leave spinning rust far, far behind. Our production servers are 4x 1U rackmounts with 32 cores, 128 GB of ECC RAM, and SW RAID1 400 GB SSDs in each. We back up all our databases hourly, with peak volume around 200-300 QPS/server with a write ratio of perhaps 40%, and a iostat disk utilization at about 10-20% in 5 second intervals. -Ben
> > It might be tempting to use a consumer-grade SSD due to the significant > cost savings, but the money saved is vapor. They may be OK for a dev > environment, but you *will* pay in downtime in a production environment. > Unlike regular hard drives where the difference between consumer and > enterprise drives is performance and a few features, SSDs are different > animals. > > SSDs wear something like a salt-shaker. There's a fairly definite number > of writes that they are good for, and when they are gone, the drive will > fail. Like a salt shaker, when the salt is gone, you won't get salt any > more no matter how you shake it. > In theory, SMART is supposed to be a reliable indicator of impending "salt exhaustion". Have you had any drives "run outof salt" where SMART did not let you know in advance? If SMART does actually perform as expected there should be no downtime,just swap of the drive in the array and wait for the rebuild. I'd expect the cheapest consumer drives to fail suddenlyand without warning, but I've never had cause to find out so far... James
On Fri, Apr 4, 2014 at 5:29 PM, Lists <lists@benjamindsmith.com> wrote: > On 04/02/2014 02:55 PM, Bret Stern wrote: >> >> Care to share the SSD hardware you're using? >> >> I've used none to date, and have some critical data I would like >> to put on a development server to test with. >> >> Regards, >> >> Bret Stern > > > SSDs are ridiculously cheap when you consider the performance difference. We > saw at *least* a 10x improvement in performance going with SATA SSDs vs. 10k > SAS drives in a messy, read/write environment. (most of our tests were 20x > or more) It's a no-brainer for us. > > It might be tempting to use a consumer-grade SSD due to the significant cost > savings, but the money saved is vapor. They may be OK for a dev environment, > but you *will* pay in downtime in a production environment. Unlike regular > hard drives where the difference between consumer and enterprise drives is > performance and a few features, SSDs are different animals. > > SSDs wear something like a salt-shaker. There's a fairly definite number of > writes that they are good for, and when they are gone, the drive will fail. > Like a salt shaker, when the salt is gone, you won't get salt any more no > matter how you shake it. The real danger with consumer drives is they don't have supercaps and can and will therefore corrupt your data on power failure. The actual write cycles aren't a big deal for many uses, as now even consumer drives have very long write cycle lives.
On Fri, Apr 4, 2014 at 5:20 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > The real danger with consumer drives is they don't have supercaps and > can and will therefore corrupt your data on power failure. The actual > write cycles aren't a big deal for many uses, as now even consumer > drives have very long write cycle lives. Don't forget about the Crucial M500, M550 and Samsung 840 Pro - those all have power loss protection, though have other drawbacks. The Crucial drives in particular don't expose any sort of wear status through SMART. -Dave
On 4/4/2014 5:29 PM, Lists wrote: > So, spend the money and get the enterprise class SSDs. They have come > down considerably in price over the last year or so. Although on paper > the Intel Enterprise SSDs tend to trail the performance numbers of the > leading consumer drives, they have wear characteristics that mean you > can trust them as much as you can any other drive for years, and they > still leave spinning rust far, far behind. Another issue to bear in mind is that SSD performance may not be consistent over time. This is because the software on the drive that manages where data lives in the NAND chips has to perform operations similar to garbage collection. Drive performance may slowly decrease over the lifetime of the drive, or worse : Consumer drives may be designed such that this GC-like activity is expected to take place "when the drive is idle", which it may well be for much of the time, in a laptop. However, in a server subject to a constant load, there may never be "idle time". As a result the drive may all of a sudden decide to stop processing host I/O operations while it reshuffles its blocks. Enterprise drives are designed to address this problem and are specified for longevity under a constant high workload. Performance is similarly specified over worst-case lifetime conditions (which could explain why consumer drives appear to be faster, at least initially).
On Sat, Apr 5, 2014 at 9:13 AM, David Boreham <david_list@boreham.org> wrote: > On 4/4/2014 5:29 PM, Lists wrote: >> >> So, spend the money and get the enterprise class SSDs. They have come down >> considerably in price over the last year or so. Although on paper the Intel >> Enterprise SSDs tend to trail the performance numbers of the leading >> consumer drives, they have wear characteristics that mean you can trust them >> as much as you can any other drive for years, and they still leave spinning >> rust far, far behind. > > > Another issue to bear in mind is that SSD performance may not be consistent > over time. This is because the software on the drive that manages where data > lives in the NAND chips has to perform operations similar to garbage > collection. Drive performance may slowly decrease over the lifetime of the > drive, or worse : Consumer drives may be designed such that this GC-like > activity is expected to take place "when the drive is idle", which it may > well be for much of the time, in a laptop. However, in a server subject to a > constant load, there may never be "idle time". As a result the drive may all > of a sudden decide to stop processing host I/O operations while it > reshuffles its blocks. Enterprise drives are designed to address this > problem and are specified for longevity under a constant high workload. > Performance is similarly specified over worst-case lifetime conditions > (which could explain why consumer drives appear to be faster, at least > initially). Good points as well. This brings us to the area of trim support. Trim support is fairly common on most modern-ish linux kernels. There were some nasty data corruption bugs if you added discard to your mount options in older kernels (2.6 series etc) and one or two found and squashed since then. But the real issue is that mdraid doesn't pass down the trim commands from discard until kernel version 3.8. If you're running on an older kernel you get no trim support with SATA SSDs and mdraid arrays. ext3 doesn't support trim, and there are also some known bugs for filesystems converted form ext3 to ext4. On top of that most RAID controllers don't support any form of trim. All of these things need to be considered when implementing SSD storage. FusionIO drives btw, DO support / pass trim when mounted with the discard option and running a fs that supports it like ext4. Overprovisioning regular SSDs on either a RAID controller or older kernels with mdraid is usually enough to keep performance up over the life of the drive, but performance monitoring can let you know if the drives are slowly getting slower as they're used month after month.
On 4/5/2014 8:13 AM, David Boreham wrote: > On 4/4/2014 5:29 PM, Lists wrote: >> So, spend the money and get the enterprise class SSDs. They have come >> down considerably in price over the last year or so. Although on >> paper the Intel Enterprise SSDs tend to trail the performance numbers >> of the leading consumer drives, they have wear characteristics that >> mean you can trust them as much as you can any other drive for years, >> and they still leave spinning rust far, far behind. > > Another issue to bear in mind is that SSD performance may not be > consistent over time. This is because the software on the drive that > manages where data lives in the NAND chips has to perform operations > similar to garbage collection. Drive performance may slowly decrease > over the lifetime of the drive, or worse : Consumer drives may be > designed such that this GC-like activity is expected to take place > "when the drive is idle", which it may well be for much of the time, > in a laptop. However, in a server subject to a constant load, there > may never be "idle time". As a result the drive may all of a sudden > decide to stop processing host I/O operations while it reshuffles its > blocks. Enterprise drives are designed to address this problem and are > specified for longevity under a constant high workload. Performance is > similarly specified over worst-case lifetime conditions (which could > explain why consumer drives appear to be faster, at least initially). My experience has been, consumer SSDs used in a high usage desktop type environment are about twice as slow after a year as they were brand new. I note my current desktop system has written 15TB total onto my 250GB drive after about 16 months. The SMART wear leveling count suggests the drive has 91% of its useful life left. -- john r pierce 37N 122W somewhere on the middle of the left coast
On Thu, Apr 3, 2014 at 4:00 PM, John R Pierce <pierce@hogranch.com> wrote: > an important thing in getting decent wear leveling life with SSDs is to keep > them under about 70% full. You have to do that at provisioning time in the drive. Ie, once you layer a file system on it, the drive doesn't know what's "empty" and what's not; you have to tell it before hand that only show X% to the system, and keep the rest for wear leveling. I don't know the tools for doing it, as my vendor takes care of that for me.