Thread: BBU still needed with SSD?

From:
Andy
Date:

Hi,

Is BBU still needed with SSD?

SSD has its own cache. And in certain models such as Intel 320 that cache is backed by capacitors. So in a sense that
cacheacts as a BBU that's backed by capacitors instead of batteries.  

In this case is BBU still needed? If I put 2 SSD in software RAID 1, would that be any slower than 2 SSD in HW RAID 1
withBBU? What are the pros and cons? 

Thanks.

Andy

From:
Craig Ringer
Date:

On 18/07/2011 9:43 AM, Andy wrote:
> Hi,
>
> Is BBU still needed with SSD?
You *need* an SSD with a supercapacitor or on-board battery backup for
its cache. Otherwise you *will* lose data.

Consumer SSDs are like a hard disk attached to a RAID controller with
write-back caching enabled and no BBU. In other words: designed to eat
your data.

> In this case is BBU still needed? If I put 2 SSD in software RAID 1, would that be any slower than 2 SSD in HW RAID 1
withBBU? What are the pros and cons? 
>
You don't need write-back caching for fsync() performance if your SSDs
have big enough caches. I don't know enough to say whether there are
other benefits to having them on a BBU HW raid controller or whether SW
RAID is fine.

POST Newspapers
276 Onslow Rd, Shenton Park
Ph: 08 9381 3088     Fax: 08 9388 2258
ABN: 50 008 917 717
http://www.postnewspapers.com.au/

From:
Yeb Havinga
Date:

On 2011-07-18 03:43, Andy wrote:
> Hi,
>
> Is BBU still needed with SSD?
>
> SSD has its own cache. And in certain models such as Intel 320 that cache is backed by capacitors. So in a sense that
cacheacts as a BBU that's backed by capacitors instead of batteries. 
>
> In this case is BBU still needed? If I put 2 SSD
+with supercap?
>   in software RAID 1, would that be any slower than 2 SSD in HW RAID 1 with BBU? What are the pros and cons?
The biggest drawback of 2 SSD's with supercap in hardware raid 1, is
that if they are both new and of the same model/firmware, they'd
probably reach the end of their write cycles at the same time, thereby
failing simultaneously. You'd have to start with two SSD's with
different remaining life left in the software raid setup.

regards,
Yeb



From:
David Rees
Date:

On Sun, Jul 17, 2011 at 7:30 PM, Craig Ringer
<> wrote:
> On 18/07/2011 9:43 AM, Andy wrote:
>> Is BBU still needed with SSD?
>
> You *need* an SSD with a supercapacitor or on-board battery backup for its
> cache. Otherwise you *will* lose data.
>
> Consumer SSDs are like a hard disk attached to a RAID controller with
> write-back caching enabled and no BBU. In other words: designed to eat your
> data.

No you don't.  Greg Smith pulled the power on a Intel 320 series drive
without suffering any data loss thanks to the 6 regular old caps it
has.  Look for his post in a long thread titled "Intel SSDs that may
not suck".

>> In this case is BBU still needed? If I put 2 SSD in software RAID 1, would
>> that be any slower than 2 SSD in HW RAID 1 with BBU? What are the pros and
>> cons?

What will perform better will vary greatly depending on the exact
SSDs, rotating disks, RAID BBU controller and application.  But
certainly a couple of Intel 320s in RAID1 seem to be an inexpensive
way of getting very good performance while maintaining reliability.

-Dave

From:
Greg Smith
Date:

Andy wrote:
> SSD has its own cache. And in certain models such as Intel 320 that cache is backed by capacitors. So in a sense that
cacheacts as a BBU that's backed by capacitors instead of batteries.  
>

Tests I did on the 320 series says it works fine:
http://archives.postgresql.org/message-id/

And there's a larger discussion of this topic at
http://blog.2ndquadrant.com/en/2011/04/intel-ssd-now-off-the-sherr-sh.html
that answers this question in a bit more detail.

--
Greg Smith   2ndQuadrant US       Baltimore, MD



From:
Andy
Date:


--- On Mon, 7/18/11, David Rees <> wrote:

> >> In this case is BBU still needed? If I put 2 SSD
> in software RAID 1, would
> >> that be any slower than 2 SSD in HW RAID 1 with
> BBU? What are the pros and
> >> cons?
>
> What will perform better will vary greatly depending on the
> exact
> SSDs, rotating disks, RAID BBU controller and
> application.  But
> certainly a couple of Intel 320s in RAID1 seem to be an
> inexpensive
> way of getting very good performance while maintaining
> reliability.

I'm not comparing SSD in SW RAID with rotating disks in HW RAID with BBU though. I'm just comparing SSDs with or
withoutBBU. I'm going to get a couple of Intel 320s, just want to know if BBU makes sense for them. 

From:
Bruce Momjian
Date:

Andy wrote:
>
>
> --- On Mon, 7/18/11, David Rees <> wrote:
>
> > >> In this case is BBU still needed? If I put 2 SSD
> > in software RAID 1, would
> > >> that be any slower than 2 SSD in HW RAID 1 with
> > BBU? What are the pros and
> > >> cons?
> >
> > What will perform better will vary greatly depending on the
> > exact
> > SSDs, rotating disks, RAID BBU controller and
> > application.? But
> > certainly a couple of Intel 320s in RAID1 seem to be an
> > inexpensive
> > way of getting very good performance while maintaining
> > reliability.
>
> I'm not comparing SSD in SW RAID with rotating disks in HW RAID with
> BBU though. I'm just comparing SSDs with or without BBU. I'm going to
> get a couple of Intel 320s, just want to know if BBU makes sense for
> them.

Yes, it certainly does, even if you have a RAID BBU.

--
  Bruce Momjian  <>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

From:
Andy
Date:

> > I'm not comparing SSD in SW RAID with rotating disks
> in HW RAID with
> > BBU though. I'm just comparing SSDs with or without
> BBU. I'm going to
> > get a couple of Intel 320s, just want to know if BBU
> makes sense for
> > them.
>
> Yes, it certainly does, even if you have a RAID BBU.

"even if you have a RAID BBU"? Can you elaborate?

I'm talking about after I get 2 Intel 320s, should I spend the extra money on a RAID BBU? Adding RAID BBU in this case
wouldn'timprove reliability, but does it improve performance? If so, how much improvement can it bring? 

From:
Florian Weimer
Date:

* Yeb Havinga:

> The biggest drawback of 2 SSD's with supercap in hardware raid 1, is
> that if they are both new and of the same model/firmware, they'd
> probably reach the end of their write cycles at the same time, thereby
> failing simultaneously.

I thought so too, but I've got two Intel 320s (I suppose, the report
device model is "SSDSA2CT040G3") in a RAID 1 configuration, and after
about a month of testing, one is down to 89 on the media wearout
indicator, and the other is still at 96.  Both devices are
deteriorating, but one at a significantly faster rate.

--
Florian Weimer                <>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

From:
Greg Smith
Date:

On 07/18/2011 11:56 PM, Andy wrote:
> I'm talking about after I get 2 Intel 320s, should I spend the extra
> money on a RAID BBU? Adding RAID BBU in this case wouldn't improve
> reliability, but does it improve performance? If so, how much
> improvement can it bring?

It won't improve performance enough that I would bother.  The main
benefit of adding a RAID with BBU to traditional disks is that you can
commit much, much faster to the card RAM than the disks can spin.  You
can go from 100 commits/second to 10,000 commits/second that way (in
theory--actually getting >2000 at the database level is harder).

Since the Intel 320 drives can easily hit 2000 to 4000 commits/second on
their own, using the cache that's built-in to the drive, the advantage
of adding a RAID card on top of that is pretty minimal.  Adding a RAID
cache will help some, because that layer will be faster than the SSD at
absorbing writes, and putting another cache layer into a system always
helps with improving burst performance.  But you'd probably be better
off using the same money to add more RAM, or more/bigger SSD drives.
The fundamental thing that RAID BBU units do--speed up commits--you will
only see minimal benefit from with these SSDs.

--
Greg Smith   2ndQuadrant US       Baltimore, MD



From:
Yeb Havinga
Date:

On 2011-07-19 09:56, Florian Weimer wrote:
> * Yeb Havinga:
>
>> The biggest drawback of 2 SSD's with supercap in hardware raid 1, is
>> that if they are both new and of the same model/firmware, they'd
>> probably reach the end of their write cycles at the same time, thereby
>> failing simultaneously.
> I thought so too, but I've got two Intel 320s (I suppose, the report
> device model is "SSDSA2CT040G3") in a RAID 1 configuration, and after
> about a month of testing, one is down to 89 on the media wearout
> indicator, and the other is still at 96.  Both devices are
> deteriorating, but one at a significantly faster rate.
That's great news if this turns out to be generally true. Is it on mdadm
software raid?

I searched a bit in the mdadm manual for reasons this can be the case.
It isn't the occasional check (echo check >
/sys/block/md0/md/sync_action) since that seems to do two reads and
compare. Another idea was that the layout of the mirror might not be
different, but the manual says that the --layout configuration directive
is only for RAID 5,6 and 10, but not RAID 1. Then my eye caught
--write-behind, the maximum number of outstanding writes and it has a
non-zero default value, but is only done if a drive is marked write-mostly.

Maybe it is caused by the initial build of the array? But then a 7%
difference seems like an awful lot.

It would be interesting to see if the drives also show total xyz
written, and if that differs a lot too.

regards,
Yeb Havinga


From:
Florian Weimer
Date:

* Yeb Havinga:

> On 2011-07-19 09:56, Florian Weimer wrote:
>> * Yeb Havinga:
>>
>>> The biggest drawback of 2 SSD's with supercap in hardware raid 1, is
>>> that if they are both new and of the same model/firmware, they'd
>>> probably reach the end of their write cycles at the same time, thereby
>>> failing simultaneously.
>> I thought so too, but I've got two Intel 320s (I suppose, the report
>> device model is "SSDSA2CT040G3") in a RAID 1 configuration, and after
>> about a month of testing, one is down to 89 on the media wearout
>> indicator, and the other is still at 96.  Both devices are
>> deteriorating, but one at a significantly faster rate.
> That's great news if this turns out to be generally true. Is it on
> mdadm software raid?

Yes, it is.

It's a mixed blessing because judging by the values, one of the drives
wears down pretty quickly.

> Maybe it is caused by the initial build of the array? But then a 7%
> difference seems like an awful lot.

Both drives a supposedly fresh from the factory, and they started with
the wearout indicator at 100.  The initial build should write just
zeros, and I would expect the drive firmware to recognize that.

I've got a second system against which I could run the same test.  I
wonder if it is reproducible.

> It would be interesting to see if the drives also show total xyz
> written, and if that differs a lot too.

Do you know how to check that with smartctl?

--
Florian Weimer                <>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

From:
Yeb Havinga
Date:

On 2011-07-19 12:47, Florian Weimer wrote:
>
>> It would be interesting to see if the drives also show total xyz
>> written, and if that differs a lot too.
> Do you know how to check that with smartctl?
smartctl -a /dev/<your disk> should show all values. If it shows
something that looks like garbage, it means that the database of
smartmontools doesn't have the correct information yet for these new
drives. I know that for the recently new OCZ vertex 2 and 3 SSDs you
need at least 5.40 or 5.41 and that's pretty new stuff. (I just happened
to install Fedora 15 today and that has smartmontools 5.41, whereas e.g.
Scientific Linux 6 has 5.39).

--
Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data


From:
Florian Weimer
Date:

* Yeb Havinga:

> On 2011-07-19 12:47, Florian Weimer wrote:
>>
>>> It would be interesting to see if the drives also show total xyz
>>> written, and if that differs a lot too.
>> Do you know how to check that with smartctl?

> smartctl -a /dev/<your disk> should show all values. If it shows
> something that looks like garbage, it means that the database of
> smartmontools doesn't have the correct information yet for these new
> drives. I know that for the recently new OCZ vertex 2 and 3 SSDs you
> need at least 5.40 or 5.41 and that's pretty new stuff. (I just
> happened to install Fedora 15 today and that has smartmontools 5.41,
> whereas e.g. Scientific Linux 6 has 5.39).

Is this "Total_LBAs_Written"?  The values appear to be far too low:

241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       188276
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       -       116800

241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       189677
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       -       92509

The second set of numbers are from the drive which wears more quickly.

The read asymmetry is not unusual for RAID-1 configurations (depending
on the implementation; few do "read both and compare", as originally
envisioned, but prefer the primary block device instead).  Reduced read
traffic could translate to increased fragmentation and wear if the drive
defragments on read.  I don't know if the Intel 320s do this.

--
Florian Weimer                <>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

From:
Yeb Havinga
Date:

On 2011-07-19 13:37, Florian Weimer wrote:
> Is this "Total_LBAs_Written"?
I got the same name "Total_LBAs_Written" on an 5.39 smartmontools, which
was renamed to 241 Lifetime_Writes_GiB after upgrade to 5.42. Note that
this is smartmontools new interpretation of the values, which happen to
match with the OCZ tools interpretation (241: SSD Lifetime writes from
host          Number of bytes written to SSD: 448 G). So for the Intels
it's probably also lifetime writes in GB but you'd have to check with an
Intel smart values reader to be absolutely sure.
>    The values appear to be far too low:
>
> 241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       188276
> 242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       -       116800
>
> 241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       189677
> 242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       -       92509
Hmm that would mean 188TB written. Does that value seem right to your
use case? If you'd write 100MB/s sustained, it would take 22 days to
reach 188TB.
> The second set of numbers are from the drive which wears more quickly.
It's strange that there's such a large difference in lifetime left, when
lifetime writes are so similar. Maybe there are more small md metadata
updates on the second disk, but without digging into md's internals it's
impossible to say anything constructive about it.

Off-topic: new cool tool in smartmontools-5.4x:
/usr/sbin/update-smart-drivedb :-)

--

Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data


From:
Greg Smith
Date:

Yeb Havinga wrote:
> So for the Intels it's probably also lifetime writes in GB but you'd
> have to check with an Intel smart values reader to be absolutely sure.

With my 320 series drive, the LBA units are pretty clearly 32MB each.
Watch this:

root@toy:/ssd/data# smartctl --version
smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build)
...

root@toy:/ssd/data# du -skh pg_xlog/
4.2G    pg_xlog/

root@toy:/ssd/data# smartctl -a /dev/sdg1 | grep LBAs
241 Total_LBAs_Written      0x0032   100   100   000    Old_age
Always       -       18128
242 Total_LBAs_Read         0x0032   100   100   000    Old_age
Always       -       10375

root@toy:/ssd/data# cat pg_xlog/* > /dev/null

root@toy:/ssd/data# smartctl -a /dev/sdg1 | grep LBAs
241 Total_LBAs_Written      0x0032   100   100   000    Old_age
Always       -       18128
242 Total_LBAs_Read         0x0032   100   100   000    Old_age
Always       -       10508

That's an increase of 133 after reading 4.2GB of data, which means makes
each LBA turn out to be 32MB in size.  Let's try to confirm that by
doing a write:

root@toy:/ssd/gsmith# smartctl -a /dev/sdg1 | grep LBAs
241 Total_LBAs_Written      0x0032   100   100   000    Old_age
Always       -       18159
242 Total_LBAs_Read         0x0032   100   100   000    Old_age
Always       -       10508
root@toy:/ssd/gsmith# dd if=/dev/zero of=test_file.0 bs=32M count=25 && sync
25+0 records in
25+0 records out
838860800 bytes (839 MB) copied, 5.95257 s, 141 MB/s
root@toy:/ssd/gsmith# smartctl -a /dev/sdg1 | grep LBAs
241 Total_LBAs_Written      0x0032   100   100   000    Old_age
Always       -       18184
242 Total_LBAs_Read         0x0032   100   100   000    Old_age
Always       -       10508

18184 - 18159 = 25; exactly the count I used in 32MB blocks.

--
Greg Smith   2ndQuadrant US       Baltimore, MD



From:
Klaus Ita
Date:

Have you also created your partitions with a reasonably new fdisk (or
equivalent) with -c -u as options?

Your partitions should be starting somewhere at 2048 i guess (let the
sw figure that out). The fast degradation of the one disk might
indicate bad partitioning? (maybe recheck with a grml.iso or something
alike http://www.grml.org/ )
Also, ... did you know that any unused space in the disk is being used
as bad block 'replacement'? so just leave out 1-2 GB space at the end
of your disk to make use of this 'feature'

otherwise, mdadm supports raid1 with more than 2 drives. I havent seen
this configuration much but it makes absolute sense on drives where
you expect failure. (i am not speaking spare, but really raid1 with >
2 drives).

I like this setup, with ssd drives it might be the solution to decay.

regs,
klaus