Re: Raid 5 vs Raid 10 Benchmarks Using bonnie++ - Mailing list pgsql-performance

From Aidan Van Dyk
Subject Re: Raid 5 vs Raid 10 Benchmarks Using bonnie++
Date
Msg-id CAC_2qU-Aj8SiayYVWzdmrxF0d=YiFHXEtv2C_Vro2nhA7zu_ew@mail.gmail.com
Whole thread Raw
In response to Re: Raid 5 vs Raid 10 Benchmarks Using bonnie++  (david@lang.hm)
Responses Re: Raid 5 vs Raid 10 Benchmarks Using bonnie++
List pgsql-performance
On Mon, Sep 12, 2011 at 6:57 PM,  <david@lang.hm> wrote:

>> The "barrier" is the linux fs/block way of saying "these writes need
>> to be on persistent media before I can depend on them".  On typical
>> spinning media disks, that means out of the disk cache (which is not
>> persistent) and on platters.  The way it assures that the writes are
>> on "persistant media" is with a "flush cache" type of command.  The
>> "flush cache" is a close approximation to "make sure it's persistent".
>>
>> If your cache is battery backed, it is now persistent, and there is no
>> need to "flush cache", hence the nobarrier option if you believe your
>> cache is persistent.
>>
>> Now, make sure that even though your raid cache is persistent, your
>> disks have cache in write-through mode, cause it would suck for your
>> raid cache to "work", but believe the data is safely on disk and only
>> find out that it was in the disks (small) cache, and you're raid is
>> out of sync after an outage because of that...  I believe most raid
>> cards will handle that correctly for you automatically.
>
> if you don't have barriers enabled, the data may not get written out of main
> memory to the battery backed memory on the card as the OS has no reason to
> do the write out of the OS buffers now rather than later.

It's not quite so simple.  The "sync" calls (pick your flavour) is
what tells the OS buffers they have to go out.  The syscall (on a
working FS) won't return until the write and data has reached the
"device" safely, and is considered persistent.

But in linux, a barrier is actually a "synchronization" point, not
just a "flush cache"...  It's a "guarantee everything up to now is
persistent, I'm going to start counting on it".  But depending on your
card, drivers and yes, kernel version, that "barrier" is sometimes a
"drain/block I/O queue, issue cache flush, wait, write specific data,
flush, wait, open I/O queue".  The double flush is because it needs to
guarantee everything previous is good before it writes the "critical"
piece, and then needs to guarantee that too.

Now, on good raid hardware it's not usually that bad.

And then, just to confuse people more, LVM up until 2.6.29 (so that
includes all those RHEL5/CentOS5 installs out there which default to
using LVM) didn't handle barriers, it just sort of threw them out as
it came across them, meaning that you got the performance of
nobarrier, even if you thought you were using barriers on poor raid
hardware.

> Every raid card I have seen has ignored the 'flush cache' type of command if
> it has a battery and that battery is good, so you leave the barriers enabled
> and the card still gives you great performance.

XFS FAQ  goes over much of it, starting at Q24:
   http://xfs.org/index.php/XFS_FAQ#Q:_What_is_the_problem_with_the_write_cache_on_journaled_filesystems.3F

So, for pure performance, on a battery-backed controller, nobarrier is
the recommended *performance* setting.

But, to throw a wrench into the plan, what happens when during normal
battery tests, your raid controller decides the battery is failing...
of course, it's going to start screaming and send all your monitoring
alarms off (you're monitoring that, right?), but have you thought to
make sure that your FS is remounted with barriers at the first sign of
battery trouble?

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

pgsql-performance by date:

Previous
From: Hany ABOU-GHOURY
Date:
Subject: Re: Databases optimization
Next
From: david@lang.hm
Date:
Subject: Re: Raid 5 vs Raid 10 Benchmarks Using bonnie++