Fwd: Multiple disks: RAID 5 or PG Cluster - Mailing list pgsql-performance

From Yves Vindevogel
Subject Fwd: Multiple disks: RAID 5 or PG Cluster
Date
Msg-id aefbbdac96314b629fa02249d7ed246b@implements.be
Whole thread Raw
List pgsql-performance
BTW, tnx for the opinion ...

I forgot to cc list ...



Begin forwarded message:


<excerpt><bold><color><param>0000,0000,0000</param>From:
</color></bold>Yves Vindevogel <<yves.vindevogel@implements.be>

<bold><color><param>0000,0000,0000</param>Date: </color></bold>Fri 17
Jun 2005 23:29:32 CEST

<bold><color><param>0000,0000,0000</param>To:
</color></bold>mudfoot@rawbw.com

<bold><color><param>0000,0000,0000</param>Subject: </color>Re:
[PERFORM] Multiple disks: RAID 5 or PG Cluster

</bold>

Ok, striping is a good option ...


I'll tell you why I don't care about dataloss


1) The database will run 6 months, no more.

2) The database is fed with upload files.  So, if I have a backup each
day, plus my files of that day, I can restore pretty quickly.

3) Power failure is out of the question: battery backup (UPS), disk
failure is minimal change: new server, new discs, 6 months ...


We do have about 500.000 new records each day in that database, so
that's why I want performance

Records are uploaded in one major table and then denormalised into
several others.


But, I would like to hear somebody about the clustering method.  Isn't
that much used ?

Or isn't it used in a single machine ?


On 17 Jun 2005, at 22:38, mudfoot@rawbw.com wrote:


<excerpt>If you truly do not care about data protection -- either from
drive loss or from

sudden power failure, or anything else -- and just want to get the
fastest

possible performance, then do RAID 0 (striping).  It may be faster to
do that

with software RAID on the host than with a special RAID controller.
And turn

off fsyncing the write ahead log in postgresql.conf (fsync = false).


But be prepared to replace your whole database from scratch (or backup
or

whatever) if you lose a single hard drive.  And if you have a sudden
power loss

or other type of unclean system shutdown (kernel panic or something)
then your

data integrity will be at risk as well.


To squeeze evena little bit more performance, put your operating
system, swap

and PostgreSQL binaries on a cheap IDE or SATA drive--and only your
data on the

5 striped SCSI drives.


I do not know what clustering would do for you.  But striping will
provide a

high level of assurance that each of your hard drives will process
equivalent

amounts of IO operations.


Quoting Yves Vindevogel <<yves.vindevogel@implements.be>:


<excerpt>Hi,


We are looking to build a new machine for a big PG database.

We were wondering if a machine with 5 scsi-disks would perform better

if we use a hardware raid 5 controller or if we would go for the

clustering in PG.

If we cluster in PG, do we have redundancy on the data like in a RAID
5

?


First concern is performance, not redundancy (we can do that a

different way because all data comes from upload files)


Met vriendelijke groeten,

Bien à vous,

Kind regards,


Yves Vindevogel

Implements



</excerpt>



---------------------------(end of
broadcast)---------------------------

TIP 8: explain analyze is your friend



</excerpt>Met vriendelijke groeten,

Bien à vous,

Kind regards,


<bold>Yves Vindevogel</bold>

<bold>Implements</bold>

<smaller>

</smaller></excerpt>BTW, tnx for the opinion ...
I forgot to cc list ...


Begin forwarded message:

> From: Yves Vindevogel <yves.vindevogel@implements.be>
> Date: Fri 17 Jun 2005 23:29:32 CEST
> To: mudfoot@rawbw.com
> Subject: Re: [PERFORM] Multiple disks: RAID 5 or PG Cluster
>
> Ok, striping is a good option ...
>
> I'll tell you why I don't care about dataloss
>
> 1) The database will run 6 months, no more.
> 2) The database is fed with upload files.  So, if I have a backup each
> day, plus my files of that day, I can restore pretty quickly.
> 3) Power failure is out of the question: battery backup (UPS), disk
> failure is minimal change: new server, new discs, 6 months ...
>
> We do have about 500.000 new records each day in that database, so
> that's why I want performance
> Records are uploaded in one major table and then denormalised into
> several others.
>
> But, I would like to hear somebody about the clustering method.  Isn't
> that much used ?
> Or isn't it used in a single machine ?
>
> On 17 Jun 2005, at 22:38, mudfoot@rawbw.com wrote:
>
>> If you truly do not care about data protection -- either from drive
>> loss or from
>> sudden power failure, or anything else -- and just want to get the
>> fastest
>> possible performance, then do RAID 0 (striping).  It may be faster to
>> do that
>> with software RAID on the host than with a special RAID controller.
>> And turn
>> off fsyncing the write ahead log in postgresql.conf (fsync = false).
>>
>> But be prepared to replace your whole database from scratch (or
>> backup or
>> whatever) if you lose a single hard drive.  And if you have a sudden
>> power loss
>> or other type of unclean system shutdown (kernel panic or something)
>> then your
>> data integrity will be at risk as well.
>>
>> To squeeze evena little bit more performance, put your operating
>> system, swap
>> and PostgreSQL binaries on a cheap IDE or SATA drive--and only your
>> data on the
>> 5 striped SCSI drives.
>>
>> I do not know what clustering would do for you.  But striping will
>> provide a
>> high level of assurance that each of your hard drives will process
>> equivalent
>> amounts of IO operations.
>>
>> Quoting Yves Vindevogel <yves.vindevogel@implements.be>:
>>
>>> Hi,
>>>
>>> We are looking to build a new machine for a big PG database.
>>> We were wondering if a machine with 5 scsi-disks would perform better
>>> if we use a hardware raid 5 controller or if we would go for the
>>> clustering in PG.
>>> If we cluster in PG, do we have redundancy on the data like in a
>>> RAID 5
>>> ?
>>>
>>> First concern is performance, not redundancy (we can do that a
>>> different way because all data comes from upload files)
>>>
>>> Met vriendelijke groeten,
>>> Bien à vous,
>>> Kind regards,
>>>
>>> Yves Vindevogel
>>> Implements
>>>
>>>
>>
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 8: explain analyze is your friend
>>
>>
> Met vriendelijke groeten,
> Bien à vous,
> Kind regards,
>
> Yves Vindevogel
> Implements
>
<excerpt><smaller>


Mail: yves.vindevogel@implements.be  - Mobile: +32 (478) 80 82 91


Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76


Web: http://www.implements.be

<italic><x-tad-smaller>

First they ignore you.  Then they laugh at you.  Then they fight you.
Then you win.

Mahatma Ghandi.</x-tad-smaller></italic></smaller></excerpt><excerpt>


</excerpt>Met vriendelijke groeten,

Bien à vous,

Kind regards,


<bold>Yves Vindevogel</bold>

<bold>Implements</bold>

<smaller>

</smaller>>
>
> Mail: yves.vindevogel@implements.be  - Mobile: +32 (478) 80 82 91
>
> Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76
>
> Web: http://www.implements.be
>
> First they ignore you.  Then they laugh at you.  Then they fight you.
> Then you win.
> Mahatma Ghandi.
>
Met vriendelijke groeten,
Bien à vous,
Kind regards,

Yves Vindevogel
Implements

<smaller>


Mail: yves.vindevogel@implements.be  - Mobile: +32 (478) 80 82 91


Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76


Web: http://www.implements.be

<italic><x-tad-smaller>

First they ignore you.  Then they laugh at you.  Then they fight you.
Then you win.

Mahatma Ghandi.</x-tad-smaller></italic></smaller>



Mail: yves.vindevogel@implements.be  - Mobile: +32 (478) 80 82 91

Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76

Web: http://www.implements.be

First they ignore you.  Then they laugh at you.  Then they fight you.
Then you win.
Mahatma Ghandi.

Attachment

pgsql-performance by date:

Previous
From: Yves Vindevogel
Date:
Subject: Fwd: Multiple disks: RAID 5 or PG Cluster
Next
From: PFC
Date:
Subject: Re: Multiple disks: RAID 5 or PG Cluster