Thread: Multiple disks: RAID 5 or PG Cluster

Multiple disks: RAID 5 or PG Cluster

From
Yves Vindevogel
Date:
Hi,


We are looking to build a new machine for a big PG database.

We were wondering if a machine with 5 scsi-disks would perform better
if we use a hardware raid 5 controller or if we would go for the
clustering in PG.

If we cluster in PG, do we have redundancy on the data like in a RAID
5 ?


First concern is performance, not redundancy (we can do that a
different way because all data comes from upload files)


Met vriendelijke groeten,

Bien à vous,

Kind regards,


<bold>Yves Vindevogel</bold>

<bold>Implements</bold>

<smaller>

</smaller>Hi,

We are looking to build a new machine for a big PG database.
We were wondering if a machine with 5 scsi-disks would perform better
if we use a hardware raid 5 controller or if we would go for the
clustering in PG.
If we cluster in PG, do we have redundancy on the data like in a RAID 5
?

First concern is performance, not redundancy (we can do that a
different way because all data comes from upload files)

Met vriendelijke groeten,
Bien à vous,
Kind regards,

Yves Vindevogel
Implements

<smaller>


Mail: yves.vindevogel@implements.be  - Mobile: +32 (478) 80 82 91


Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76


Web: http://www.implements.be

<italic><x-tad-smaller>

First they ignore you.  Then they laugh at you.  Then they fight you.
Then you win.

Mahatma Ghandi.</x-tad-smaller></italic></smaller>


Mail: yves.vindevogel@implements.be  - Mobile: +32 (478) 80 82 91

Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76

Web: http://www.implements.be

First they ignore you.  Then they laugh at you.  Then they fight you.
Then you win.
Mahatma Ghandi.

Attachment

Re: Multiple disks: RAID 5 or PG Cluster

From
Vivek Khera
Date:

On Jun 17, 2005, at 3:34 PM, Yves Vindevogel wrote:

We are looking to build a new machine for a big PG database.

We were wondering if a machine with 5 scsi-disks would perform better if we use a hardware raid 5 controller or if we would go for the clustering in PG.

If we cluster in PG, do we have redundancy on the data like in a RAID 5 ?


I'd recommend 4 disks in a hardware RAID10 plus a hot spare, or use the 5th disk as boot + OS if you're feeling lucky.


Vivek Khera, Ph.D.

+1-301-869-4449 x806



Re: Multiple disks: RAID 5 or PG Cluster

From
mudfoot@rawbw.com
Date:
If you truly do not care about data protection -- either from drive loss or from
sudden power failure, or anything else -- and just want to get the fastest
possible performance, then do RAID 0 (striping).  It may be faster to do that
with software RAID on the host than with a special RAID controller.  And turn
off fsyncing the write ahead log in postgresql.conf (fsync = false).

But be prepared to replace your whole database from scratch (or backup or
whatever) if you lose a single hard drive.  And if you have a sudden power loss
or other type of unclean system shutdown (kernel panic or something) then your
data integrity will be at risk as well.

To squeeze evena little bit more performance, put your operating system, swap
and PostgreSQL binaries on a cheap IDE or SATA drive--and only your data on the
5 striped SCSI drives.

I do not know what clustering would do for you.  But striping will provide a
high level of assurance that each of your hard drives will process equivalent
amounts of IO operations.

Quoting Yves Vindevogel <yves.vindevogel@implements.be>:

> Hi,
>
> We are looking to build a new machine for a big PG database.
> We were wondering if a machine with 5 scsi-disks would perform better
> if we use a hardware raid 5 controller or if we would go for the
> clustering in PG.
> If we cluster in PG, do we have redundancy on the data like in a RAID 5
> ?
>
> First concern is performance, not redundancy (we can do that a
> different way because all data comes from upload files)
>
> Met vriendelijke groeten,
> Bien à vous,
> Kind regards,
>
> Yves Vindevogel
> Implements
>
>



Re: Multiple disks: RAID 5 or PG Cluster

From
PFC
Date:

> I do not know what clustering would do for you.  But striping will
> provide a
> high level of assurance that each of your hard drives will process
> equivalent
> amounts of IO operations.

    I don't know what I'm talking about, but wouldn't mirorring be faster
than striping for random reads like you often get on a database ? (ie. the
reads can be dispatched to any disk) ? (or course, not for writes, but if
you won't use fsync, random writes should be reduced no ?)



Re: Multiple disks: RAID 5 or PG Cluster

From
Jacques Caron
Date:
Hi,

At 18:00 18/06/2005, PFC wrote:
>         I don't know what I'm talking about, but wouldn't mirorring be
> faster
>than striping for random reads like you often get on a database ? (ie. the
>reads can be dispatched to any disk) ? (or course, not for writes, but if
>you won't use fsync, random writes should be reduced no ?)

Roughly, for random reads, the performance (in terms of operations/s)
compared to a single disk setup, with N being the number of drives, is:

RAID 0 (striping):
- read = N
- write = N
- capacity = N
- redundancy = 0

RAID 1 (mirroring, N=2):
- read = N
- write = 1
- capacity = 1
- redundancy = 1

RAID 5 (striping + parity, N>=3)
- read = N-1
- write = 1/2
- capacity = N-1
- redundancy = 1

RAID 10 (mirroring + striping, N=2n, N>=4)
- read = N
- write = N/2
- capacity = N/2
- redundancy < N/2

So depending on your app, i.e. your read/write ratio, how much data can be
cached, whether the data is important or not, how much data you have, etc,
one or the other option might be better.

Jacques.



Re: Multiple disks: RAID 5 or PG Cluster

From
Alex Turner
Date:
Of course these numbers are not true as soon as you exceed the stripe size for a read operation, which is often only 128k.  Typically a stripe of mirrors will not read from seperate halves of the mirrors either, so RAID 10 is only N/2 best case in my experience, Raid 0+1 is a mirror of stripes and will read from independant halves, but gives worse redundancy.

Alex Turner
NetEconomist

On 6/18/05, Jacques Caron <jc@directinfos.com> wrote:
Hi,

At 18:00 18/06/2005, PFC wrote:
>         I don't know what I'm talking about, but wouldn't mirorring be
> faster
>than striping for random reads like you often get on a database ? (ie. the
>reads can be dispatched to any disk) ? (or course, not for writes, but if
>you won't use fsync, random writes should be reduced no ?)

Roughly, for random reads, the performance (in terms of operations/s)
compared to a single disk setup, with N being the number of drives, is:

RAID 0 (striping):
- read = N
- write = N
- capacity = N
- redundancy = 0

RAID 1 (mirroring, N=2):
- read = N
- write = 1
- capacity = 1
- redundancy = 1

RAID 5 (striping + parity, N>=3)
- read = N-1
- write = 1/2
- capacity = N-1
- redundancy = 1

RAID 10 (mirroring + striping, N=2n, N>=4)
- read = N
- write = N/2
- capacity = N/2
- redundancy < N/2

So depending on your app, i.e. your read/write ratio, how much data can be
cached, whether the data is important or not, how much data you have, etc,
one or the other option might be better.

Jacques.



---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match