Re: WAL in RAM - Mailing list pgsql-performance

From Marcus Engene
Subject Re: WAL in RAM
Date
Msg-id 4ED50F50.1030804@engene.se
Whole thread Raw
In response to Re: WAL in RAM  (Scott Marlowe <scott.marlowe@gmail.com>)
List pgsql-performance
On 10/29/11 10:11 , Scott Marlowe wrote:
> In over 10 years of using hardware RAID controllers with battery
> backup on many many machines, I have had exactly zero data loss due to
> a failed battery backup.  Of course proper monitoring is important, to
> make sure the batteries aren't old and dead, but every single BBU RAID
> controller I have used automatically switched from write back to write
> through when they detected a bad battery pack.
>
> Proper testing is essential whether it's BBU Caching or using an SSD,
> and failure to do so is inconceivable if your data is at all
> important.  Given the current high failure rate of SSDs due to
> firmware issues (and it's not just the intel drives experiencing such
> failures) I'm much more confident in Areca, 3Ware, and LSI BBU RAID
> controllers right now than I am in SSDs.
>

Rimu got me a setup with 2x5805 BBU configured as two RAID10 with
SAS 15k rpm drives and on top of that 2x Xeon E5645 (the hex core).
Since I heard warnings that with non software raids, the machine could be
unresponsive during boot when doing a rebuild, I took small 300G drives.
Not that SAS 15k come in that much bigger sizes, but still.

I chickened out from pg 9.1 due to the low minor number.

I also set...
wal_buffers = 16MB
...which used to be default 64kB which possibly could explain some of
the choke problems at write bursts.
>
>> As per others suggestions I don't feel encouraged to put WAL on SSD from
>> finding several texts by Greg Smith and others warning about this. I do have
>> 2x OCI Sandforce 1500 drives (with supercap) for some burst load tables.
>>
>> The reason I started to think about putting WAL on a RAM drive to begin with
>> was that performance figures for unlogged tables looked very promising
>> indeed. And the test were of the sort that's occupying my bandwidth;
>> accumulating statistical writes.
>>
>> The present pg9 computer is a Pg 9.0.4, Debian Squeeze, 2xXeon, 72GB,
>> software 4xRAID6(sorry) + 2xSSD. It's OLTP website with 10M products and
>> SOLR for FTS. During peak it's using ~3-4% CPU, and it's 99.9% reads or
>> thereabouts. It's the peaks we want to take down. RAID6 or not, with a
>> spindle as bottleneck there is just a certain max# of writes/s.
>>
> First things first, get off RAID-6.  A 4 drive RAID-6 gives no more
> storage than a 4 drive RAID-10, and is painfully slow by comparison.
> Looking at SSDs for WAL is putting the cart about 1,000 miles ahead of
> the horse at this point.  You'd be much better off migrating to a
> single SSD for everything than running on a 4 disk RAID-6.
>
>

Message received and understood :)

Having read up too much on drive reliability paranoia in combination
with going from 7k2 -> 15k I feel a bit uneasy, but this mama is fast.
I suppose a little bit could be credited the newly restored dump instead
of the little over a year entropy in the other machine. But I also did some
update/write torture and it was hard to provoke any io wait.

I put OS & WAL on one array and the general data files on the other.
The data directory that used to be on the SSD drive was also put on the
WAL raid.

Thanks for your advices!
Marcus

pgsql-performance by date:

Previous
From: Joost Kraaijeveld
Date:
Subject: Re: PostgreSQL 9.1 : why is this query slow?
Next
From: MirrorX
Date:
Subject: vacuum internals and performance affect