Thread: BBU still needed with SSD?
Hi, Is BBU still needed with SSD? SSD has its own cache. And in certain models such as Intel 320 that cache is backed by capacitors. So in a sense that cacheacts as a BBU that's backed by capacitors instead of batteries. In this case is BBU still needed? If I put 2 SSD in software RAID 1, would that be any slower than 2 SSD in HW RAID 1 withBBU? What are the pros and cons? Thanks. Andy
On 18/07/2011 9:43 AM, Andy wrote: > Hi, > > Is BBU still needed with SSD? You *need* an SSD with a supercapacitor or on-board battery backup for its cache. Otherwise you *will* lose data. Consumer SSDs are like a hard disk attached to a RAID controller with write-back caching enabled and no BBU. In other words: designed to eat your data. > In this case is BBU still needed? If I put 2 SSD in software RAID 1, would that be any slower than 2 SSD in HW RAID 1 withBBU? What are the pros and cons? > You don't need write-back caching for fsync() performance if your SSDs have big enough caches. I don't know enough to say whether there are other benefits to having them on a BBU HW raid controller or whether SW RAID is fine. POST Newspapers 276 Onslow Rd, Shenton Park Ph: 08 9381 3088 Fax: 08 9388 2258 ABN: 50 008 917 717 http://www.postnewspapers.com.au/
On 2011-07-18 03:43, Andy wrote: > Hi, > > Is BBU still needed with SSD? > > SSD has its own cache. And in certain models such as Intel 320 that cache is backed by capacitors. So in a sense that cacheacts as a BBU that's backed by capacitors instead of batteries. > > In this case is BBU still needed? If I put 2 SSD +with supercap? > in software RAID 1, would that be any slower than 2 SSD in HW RAID 1 with BBU? What are the pros and cons? The biggest drawback of 2 SSD's with supercap in hardware raid 1, is that if they are both new and of the same model/firmware, they'd probably reach the end of their write cycles at the same time, thereby failing simultaneously. You'd have to start with two SSD's with different remaining life left in the software raid setup. regards, Yeb
On Sun, Jul 17, 2011 at 7:30 PM, Craig Ringer <craig@postnewspapers.com.au> wrote: > On 18/07/2011 9:43 AM, Andy wrote: >> Is BBU still needed with SSD? > > You *need* an SSD with a supercapacitor or on-board battery backup for its > cache. Otherwise you *will* lose data. > > Consumer SSDs are like a hard disk attached to a RAID controller with > write-back caching enabled and no BBU. In other words: designed to eat your > data. No you don't. Greg Smith pulled the power on a Intel 320 series drive without suffering any data loss thanks to the 6 regular old caps it has. Look for his post in a long thread titled "Intel SSDs that may not suck". >> In this case is BBU still needed? If I put 2 SSD in software RAID 1, would >> that be any slower than 2 SSD in HW RAID 1 with BBU? What are the pros and >> cons? What will perform better will vary greatly depending on the exact SSDs, rotating disks, RAID BBU controller and application. But certainly a couple of Intel 320s in RAID1 seem to be an inexpensive way of getting very good performance while maintaining reliability. -Dave
Andy wrote: > SSD has its own cache. And in certain models such as Intel 320 that cache is backed by capacitors. So in a sense that cacheacts as a BBU that's backed by capacitors instead of batteries. > Tests I did on the 320 series says it works fine: http://archives.postgresql.org/message-id/4D9D1FC3.4020207@2ndQuadrant.com And there's a larger discussion of this topic at http://blog.2ndquadrant.com/en/2011/04/intel-ssd-now-off-the-sherr-sh.html that answers this question in a bit more detail. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
--- On Mon, 7/18/11, David Rees <drees76@gmail.com> wrote: > >> In this case is BBU still needed? If I put 2 SSD > in software RAID 1, would > >> that be any slower than 2 SSD in HW RAID 1 with > BBU? What are the pros and > >> cons? > > What will perform better will vary greatly depending on the > exact > SSDs, rotating disks, RAID BBU controller and > application. But > certainly a couple of Intel 320s in RAID1 seem to be an > inexpensive > way of getting very good performance while maintaining > reliability. I'm not comparing SSD in SW RAID with rotating disks in HW RAID with BBU though. I'm just comparing SSDs with or withoutBBU. I'm going to get a couple of Intel 320s, just want to know if BBU makes sense for them.
Andy wrote: > > > --- On Mon, 7/18/11, David Rees <drees76@gmail.com> wrote: > > > >> In this case is BBU still needed? If I put 2 SSD > > in software RAID 1, would > > >> that be any slower than 2 SSD in HW RAID 1 with > > BBU? What are the pros and > > >> cons? > > > > What will perform better will vary greatly depending on the > > exact > > SSDs, rotating disks, RAID BBU controller and > > application.? But > > certainly a couple of Intel 320s in RAID1 seem to be an > > inexpensive > > way of getting very good performance while maintaining > > reliability. > > I'm not comparing SSD in SW RAID with rotating disks in HW RAID with > BBU though. I'm just comparing SSDs with or without BBU. I'm going to > get a couple of Intel 320s, just want to know if BBU makes sense for > them. Yes, it certainly does, even if you have a RAID BBU. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
> > I'm not comparing SSD in SW RAID with rotating disks > in HW RAID with > > BBU though. I'm just comparing SSDs with or without > BBU. I'm going to > > get a couple of Intel 320s, just want to know if BBU > makes sense for > > them. > > Yes, it certainly does, even if you have a RAID BBU. "even if you have a RAID BBU"? Can you elaborate? I'm talking about after I get 2 Intel 320s, should I spend the extra money on a RAID BBU? Adding RAID BBU in this case wouldn'timprove reliability, but does it improve performance? If so, how much improvement can it bring?
* Yeb Havinga: > The biggest drawback of 2 SSD's with supercap in hardware raid 1, is > that if they are both new and of the same model/firmware, they'd > probably reach the end of their write cycles at the same time, thereby > failing simultaneously. I thought so too, but I've got two Intel 320s (I suppose, the report device model is "SSDSA2CT040G3") in a RAID 1 configuration, and after about a month of testing, one is down to 89 on the media wearout indicator, and the other is still at 96. Both devices are deteriorating, but one at a significantly faster rate. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
On 07/18/2011 11:56 PM, Andy wrote: > I'm talking about after I get 2 Intel 320s, should I spend the extra > money on a RAID BBU? Adding RAID BBU in this case wouldn't improve > reliability, but does it improve performance? If so, how much > improvement can it bring? It won't improve performance enough that I would bother. The main benefit of adding a RAID with BBU to traditional disks is that you can commit much, much faster to the card RAM than the disks can spin. You can go from 100 commits/second to 10,000 commits/second that way (in theory--actually getting >2000 at the database level is harder). Since the Intel 320 drives can easily hit 2000 to 4000 commits/second on their own, using the cache that's built-in to the drive, the advantage of adding a RAID card on top of that is pretty minimal. Adding a RAID cache will help some, because that layer will be faster than the SSD at absorbing writes, and putting another cache layer into a system always helps with improving burst performance. But you'd probably be better off using the same money to add more RAM, or more/bigger SSD drives. The fundamental thing that RAID BBU units do--speed up commits--you will only see minimal benefit from with these SSDs. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
On 2011-07-19 09:56, Florian Weimer wrote: > * Yeb Havinga: > >> The biggest drawback of 2 SSD's with supercap in hardware raid 1, is >> that if they are both new and of the same model/firmware, they'd >> probably reach the end of their write cycles at the same time, thereby >> failing simultaneously. > I thought so too, but I've got two Intel 320s (I suppose, the report > device model is "SSDSA2CT040G3") in a RAID 1 configuration, and after > about a month of testing, one is down to 89 on the media wearout > indicator, and the other is still at 96. Both devices are > deteriorating, but one at a significantly faster rate. That's great news if this turns out to be generally true. Is it on mdadm software raid? I searched a bit in the mdadm manual for reasons this can be the case. It isn't the occasional check (echo check > /sys/block/md0/md/sync_action) since that seems to do two reads and compare. Another idea was that the layout of the mirror might not be different, but the manual says that the --layout configuration directive is only for RAID 5,6 and 10, but not RAID 1. Then my eye caught --write-behind, the maximum number of outstanding writes and it has a non-zero default value, but is only done if a drive is marked write-mostly. Maybe it is caused by the initial build of the array? But then a 7% difference seems like an awful lot. It would be interesting to see if the drives also show total xyz written, and if that differs a lot too. regards, Yeb Havinga
* Yeb Havinga: > On 2011-07-19 09:56, Florian Weimer wrote: >> * Yeb Havinga: >> >>> The biggest drawback of 2 SSD's with supercap in hardware raid 1, is >>> that if they are both new and of the same model/firmware, they'd >>> probably reach the end of their write cycles at the same time, thereby >>> failing simultaneously. >> I thought so too, but I've got two Intel 320s (I suppose, the report >> device model is "SSDSA2CT040G3") in a RAID 1 configuration, and after >> about a month of testing, one is down to 89 on the media wearout >> indicator, and the other is still at 96. Both devices are >> deteriorating, but one at a significantly faster rate. > That's great news if this turns out to be generally true. Is it on > mdadm software raid? Yes, it is. It's a mixed blessing because judging by the values, one of the drives wears down pretty quickly. > Maybe it is caused by the initial build of the array? But then a 7% > difference seems like an awful lot. Both drives a supposedly fresh from the factory, and they started with the wearout indicator at 100. The initial build should write just zeros, and I would expect the drive firmware to recognize that. I've got a second system against which I could run the same test. I wonder if it is reproducible. > It would be interesting to see if the drives also show total xyz > written, and if that differs a lot too. Do you know how to check that with smartctl? -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
On 2011-07-19 12:47, Florian Weimer wrote: > >> It would be interesting to see if the drives also show total xyz >> written, and if that differs a lot too. > Do you know how to check that with smartctl? smartctl -a /dev/<your disk> should show all values. If it shows something that looks like garbage, it means that the database of smartmontools doesn't have the correct information yet for these new drives. I know that for the recently new OCZ vertex 2 and 3 SSDs you need at least 5.40 or 5.41 and that's pretty new stuff. (I just happened to install Fedora 15 today and that has smartmontools 5.41, whereas e.g. Scientific Linux 6 has 5.39). -- Yeb Havinga http://www.mgrid.net/ Mastering Medical Data
* Yeb Havinga: > On 2011-07-19 12:47, Florian Weimer wrote: >> >>> It would be interesting to see if the drives also show total xyz >>> written, and if that differs a lot too. >> Do you know how to check that with smartctl? > smartctl -a /dev/<your disk> should show all values. If it shows > something that looks like garbage, it means that the database of > smartmontools doesn't have the correct information yet for these new > drives. I know that for the recently new OCZ vertex 2 and 3 SSDs you > need at least 5.40 or 5.41 and that's pretty new stuff. (I just > happened to install Fedora 15 today and that has smartmontools 5.41, > whereas e.g. Scientific Linux 6 has 5.39). Is this "Total_LBAs_Written"? The values appear to be far too low: 241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 188276 242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 116800 241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 189677 242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 92509 The second set of numbers are from the drive which wears more quickly. The read asymmetry is not unusual for RAID-1 configurations (depending on the implementation; few do "read both and compare", as originally envisioned, but prefer the primary block device instead). Reduced read traffic could translate to increased fragmentation and wear if the drive defragments on read. I don't know if the Intel 320s do this. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
On 2011-07-19 13:37, Florian Weimer wrote: > Is this "Total_LBAs_Written"? I got the same name "Total_LBAs_Written" on an 5.39 smartmontools, which was renamed to 241 Lifetime_Writes_GiB after upgrade to 5.42. Note that this is smartmontools new interpretation of the values, which happen to match with the OCZ tools interpretation (241: SSD Lifetime writes from host Number of bytes written to SSD: 448 G). So for the Intels it's probably also lifetime writes in GB but you'd have to check with an Intel smart values reader to be absolutely sure. > The values appear to be far too low: > > 241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 188276 > 242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 116800 > > 241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 189677 > 242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 92509 Hmm that would mean 188TB written. Does that value seem right to your use case? If you'd write 100MB/s sustained, it would take 22 days to reach 188TB. > The second set of numbers are from the drive which wears more quickly. It's strange that there's such a large difference in lifetime left, when lifetime writes are so similar. Maybe there are more small md metadata updates on the second disk, but without digging into md's internals it's impossible to say anything constructive about it. Off-topic: new cool tool in smartmontools-5.4x: /usr/sbin/update-smart-drivedb :-) -- Yeb Havinga http://www.mgrid.net/ Mastering Medical Data
Yeb Havinga wrote: > So for the Intels it's probably also lifetime writes in GB but you'd > have to check with an Intel smart values reader to be absolutely sure. With my 320 series drive, the LBA units are pretty clearly 32MB each. Watch this: root@toy:/ssd/data# smartctl --version smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build) ... root@toy:/ssd/data# du -skh pg_xlog/ 4.2G pg_xlog/ root@toy:/ssd/data# smartctl -a /dev/sdg1 | grep LBAs 241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 18128 242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 10375 root@toy:/ssd/data# cat pg_xlog/* > /dev/null root@toy:/ssd/data# smartctl -a /dev/sdg1 | grep LBAs 241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 18128 242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 10508 That's an increase of 133 after reading 4.2GB of data, which means makes each LBA turn out to be 32MB in size. Let's try to confirm that by doing a write: root@toy:/ssd/gsmith# smartctl -a /dev/sdg1 | grep LBAs 241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 18159 242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 10508 root@toy:/ssd/gsmith# dd if=/dev/zero of=test_file.0 bs=32M count=25 && sync 25+0 records in 25+0 records out 838860800 bytes (839 MB) copied, 5.95257 s, 141 MB/s root@toy:/ssd/gsmith# smartctl -a /dev/sdg1 | grep LBAs 241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 18184 242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 10508 18184 - 18159 = 25; exactly the count I used in 32MB blocks. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
Have you also created your partitions with a reasonably new fdisk (or equivalent) with -c -u as options? Your partitions should be starting somewhere at 2048 i guess (let the sw figure that out). The fast degradation of the one disk might indicate bad partitioning? (maybe recheck with a grml.iso or something alike http://www.grml.org/ ) Also, ... did you know that any unused space in the disk is being used as bad block 'replacement'? so just leave out 1-2 GB space at the end of your disk to make use of this 'feature' otherwise, mdadm supports raid1 with more than 2 drives. I havent seen this configuration much but it makes absolute sense on drives where you expect failure. (i am not speaking spare, but really raid1 with > 2 drives). I like this setup, with ssd drives it might be the solution to decay. regs, klaus