Re: With 4 disks should I go for RAID 5 or RAID 10 - Mailing list pgsql-performance

From Mark Mielke
Subject Re: With 4 disks should I go for RAID 5 or RAID 10
Date
Msg-id 47732BCB.2090302@mark.mielke.cc
Whole thread Raw
In response to Re: With 4 disks should I go for RAID 5 or RAID 10  (Shane Ambler <pgsql@Sheeky.Biz>)
Responses Re: With 4 disks should I go for RAID 5 or RAID 10
List pgsql-performance
Shane Ambler wrote:
> So in theory a modern RAID 1 setup can be configured to get similar
> read speeds as RAID 0 but would still drop to single disk speeds (or
> similar) when writing, but RAID 0 can get the faster write performance.

Unfortunately, it's a bit more complicated than that. RAID 1 has a
sequential read problem, as read-ahead is wasted, and you may as well
read from one disk and ignore the others. RAID 1 does, however, allows
for much greater concurrency. 4 processes on a 4 disk RAID 1 system can,
theoretically, each do whatever they want, without impacting each other.
Database loads involving a single active read user will see greater
performance with RAID 0. Database loads involving multiple concurrent
active read users will see greater performance with RAID 1. All of these
assume writes are not being performed to any great significance. Even
single writes cause all disks in a RAID 1 system to synchronize
temporarily eliminating the read benefit. RAID 0 allows some degree of
concurrent reads and writes occurring at the same time (assuming even
distribution of the data across the devices). Of course, RAID 0 systems
have an expected life that decreases as the number of disks in the
system increase.

So, this is where we get to RAID 1+0. Redundancy, good read performance,
good write performance, relatively simple implementation. For a mere
cost of double the number of disk storage,
you can get around the problems of RAID 1 and the problems of RAID 0. :-)

> So in a perfect setup (probably 1+0) 4x 300MB/s SATA drives could
> deliver 1200MB/s of data to RAM, which is also assuming that all 4
> channels have their own data path to RAM and aren't sharing.
> (anyone know how segregated the on board controllers such as these are?)
> (do some pci controllers offer better throughput?)
> We all know that doesn't happen in the real world ;-) Let's say we are
> restricted to 80% - 1000MB/s - and some of that (10%) gets used by the
> system - so we end up with 900MB/s delivered off disk to postgres -
> that would still be more than the perfect rate at which 2x 300MB/s
> drives can deliver.

I expect you would have to have good hardware, and a well tuned system
to see 80%+ theoretical for common work loads. But then, this isn't
unique to RAID. Even in a single disk system, one has trouble achieving
80%+ theoretical. :-)

I achieve something closer to +20% - +60% over the theoretical
performance of a single disk with my four disk RAID 1+0 partitions. Lots
of compromises in my system though that I won't get into. For me, I
value the redundancy, allowing for a single disk to fail and giving me
time to easily recover, but for the cost of two more disks, I am able to
counter the performance cost of redundancy, and actually see a positive
performance effect instead.

> So in this situation - if configured correctly with a good controller
> (driver for software RAID etc) a single 4 disk RAID 1+0 could
> outperform two 2 disk RAID 1 setups with data/OS+WAL split between the
> two.
> Is the real world speeds so different that this theory is real fantasy
> or has hardware reached a point performance wise where this is close
> to fact??
I think it depends on the balance. If every second operation requires a
WAL write, having separate might make sense. However, if the balance is
less than even, one would end up with one of the 2 disk RAID 1 setups
being more idle than the other. It's not an exact science when it comes
to the various compromises being made. :-)

If you can only put 4 disks in to the system (either cost, or because of
the system size), I would suggest RAID 1+0 on all four as sensible
compromise. If you can put more in - start to consider breaking it up. :-)

Cheers,
mark

--
Mark Mielke <mark@mielke.cc>

pgsql-performance by date:

Previous
From: Greg Smith
Date:
Subject: Re: With 4 disks should I go for RAID 5 or RAID 10
Next
From: Tom Lane
Date:
Subject: Re: More shared buffers causes lower performances