Thread: RAID Stripe size

RAID Stripe size

From
"bm\\mbn"
Date:
Hi Everyone

The machine is IBM x345 with ServeRAID 6i 128mb cache and 6 SCSI 15k
disks.

2 disks are in RAID1 and hold the OS, SWAP & pg_xlog
4 disks are in RAID10 and hold the Cluster itself.

the DB will have two major tables 1 with 10 million rows and one with
100 million rows.
All the activities against this tables will be SELECT.

Currently the strip size is 8k. I read in many place this is a poor
setting.

Am i right ?


Re: RAID Stripe size

From
John A Meinel
Date:
bm\mbn wrote:
> Hi Everyone
>
> The machine is IBM x345 with ServeRAID 6i 128mb cache and 6 SCSI 15k
> disks.
>
> 2 disks are in RAID1 and hold the OS, SWAP & pg_xlog
> 4 disks are in RAID10 and hold the Cluster itself.
>
> the DB will have two major tables 1 with 10 million rows and one with
> 100 million rows.
> All the activities against this tables will be SELECT.

What type of SELECTs will you be doing? Mostly sequential reads of a
bunch of data, or indexed lookups of random pieces?

>
> Currently the strip size is 8k. I read in many place this is a poor
> setting.

From what I've heard of RAID, if you are doing large sequential
transfers, larger stripe sizes (128k, 256k) generally perform better.
For postgres, though, when you are writing, having the stripe size be
around the same size as your page size (8k) could be advantageous, as
when postgres reads a page, it only reads a single stripe. So if it were
reading a series of pages, each one would come from a different disk.

I may be wrong about that, though.

John
=:->

>
> Am i right ?


Attachment

Re: RAID Stripe size

From
Michael Ben-Nes
Date:
Hi


John A Meinel wrote:

>bm\mbn wrote:
>
>
>>Hi Everyone
>>
>>The machine is IBM x345 with ServeRAID 6i 128mb cache and 6 SCSI 15k
>>disks.
>>
>>2 disks are in RAID1 and hold the OS, SWAP & pg_xlog
>>4 disks are in RAID10 and hold the Cluster itself.
>>
>>the DB will have two major tables 1 with 10 million rows and one with
>>100 million rows.
>>All the activities against this tables will be SELECT.
>>
>>
>
>What type of SELECTs will you be doing? Mostly sequential reads of a
>bunch of data, or indexed lookups of random pieces?
>
>
All of them. some Rtree some btree some without using indexes.

>
>
>>Currently the strip size is 8k. I read in many place this is a poor
>>setting.
>>
>>
>
>From what I've heard of RAID, if you are doing large sequential
>transfers, larger stripe sizes (128k, 256k) generally perform better.
>For postgres, though, when you are writing, having the stripe size be
>around the same size as your page size (8k) could be advantageous, as
>when postgres reads a page, it only reads a single stripe. So if it were
>reading a series of pages, each one would come from a different disk.
>
>I may be wrong about that, though.
>
>
I must admit im a bit amazed how such important parameter is so
ambiguous. an optimal strip size can improve the performance of the db
significantly. I bet that the difference in performance between a poor
stripe setting to an optimal one is more important then how much RAM or
CPU you have.
I hope to run some tests soon thugh i have limited time on the
production server to do such tests.

>John
>=:->
>
>
>
>>Am i right ?
>>
>>
>
>
>

--
--------------------------
Canaan Surfing Ltd.
Internet Service Providers
Ben-Nes Michael - Manager
Tel: 972-4-6991122
Cel: 972-52-8555757
Fax: 972-4-6990098
http://www.canaan.net.il
--------------------------


Re: RAID Stripe size

From
Michael Stone
Date:
On Tue, Sep 20, 2005 at 10:51:41AM +0300, Michael Ben-Nes wrote:
>I must admit im a bit amazed how such important parameter is so
>ambiguous. an optimal strip size can improve the performance of the db
>significantly.

It's configuration dependent. IME, it has an insignificant effect. If
anything, changing it from the vendor default may make performance worse
(maybe the firmware on the array is tuned for a particular size?)

>I bet that the difference in performance between a poor stripe setting
>to an optimal one is more important then how much RAM or CPU you have.

I'll take that bet, because I've benched it. If something so trivial
(and completely free) was the single biggest factor in performance, do
you really think it would be an undiscovered secret?

>I hope to run some tests soon thugh i have limited time on the
>production server to do such tests.

Well, benchmarking your data on your hardware is the first thing you
should do, not something you should try to cram in late in the game. You
can't get a valid answer to a "what's the best configuration" question
until you've tested some configurations.

Mike Stone

Re: RAID Stripe size

From
evgeny gridasov
Date:
Hi Everybody!

I've got a spare machine which is 2xXEON 3.2GHz, 4Gb RAM
14x140Gb SCSI 10k (LSI MegaRaid 320U). It is going into production in 3-5months.
I do have free time to run tests on this machine, and I could test different stripe sizes
if somebody prepares a test script and data for that.

I could also test different RAID modes 0,1,5 and 10 for this script.

I guess the community needs these results.

On 16 Sep 2005 04:51:43 -0700
"bm\\mbn" <miki@canaan.co.il> wrote:

> Hi Everyone
>
> The machine is IBM x345 with ServeRAID 6i 128mb cache and 6 SCSI 15k
> disks.
>
> 2 disks are in RAID1 and hold the OS, SWAP & pg_xlog
> 4 disks are in RAID10 and hold the Cluster itself.
>
> the DB will have two major tables 1 with 10 million rows and one with
> 100 million rows.
> All the activities against this tables will be SELECT.
>
> Currently the strip size is 8k. I read in many place this is a poor
> setting.
>
> Am i right ?

--
Evgeny Gridasov
Software Developer
I-Free, Russia

Re: RAID Stripe size

From
"Jignesh K. Shah"
Date:
Typically your stripe size impacts read and write.

In Solaris, the trick is to match it with your maxcontig parameter. If
you set maxcontig to 128 pages which is 128* 8 = 1024k (1M) then your
optimal stripe size is 128 * 8 / (number of spindles in LUN).. Assuming
number of spindles is 6 then you get an odd number. In such cases either
your current io or the next sequential io is going to be little bit
inefficient depending on what you select (as a rule of thumb however
just take the closest stripe size). However if your number of spindles
matches 8 then you get a perfect 128 and hence makes sense to select
128K. (Maxcontig is a paramter in Solaris which defines the max
contiguous space allocated to a block which really helps in case of
sequential  io operations).

But as you see this was maxcontig dependent in my case. What if your
maxcontig is way off track. This can happen if your io pattern is more
and more random. In such cases maxcontig is better at lower numbers to
reduce space wastage and in effect reducing your stripe size reduces
your responde time.

This means now it is Workload dependent... Random IOs or Sequential IOs
(atleast where IOs can be clubbed together).

As you can see stripe size in Solaris is eventually dependent on your
Workload. Typically my guess is on any other platform, the stripe size
is dependent on your Workload and how it will access the data. Lower
stripe size helps smaller IOs perform better but lack total throughtput
efficiency. While larger stripe size increases throughput efficiency at
the cost of response time of your small IO requirements.

Don't forget many file systems will buffer your IOs and can club them
together if it finds them sequential from its point of view. Hence in
such cases the effective IO size is what matters for raid sizes.

If you effective IO sizes are big then go for higher raid size.
If your effective IO sizes are small and response time is critical go
for smaller raid sizes

Regards,
Jignesh

evgeny gridasov wrote:

>Hi Everybody!
>
>I've got a spare machine which is 2xXEON 3.2GHz, 4Gb RAM
>14x140Gb SCSI 10k (LSI MegaRaid 320U). It is going into production in 3-5months.
>I do have free time to run tests on this machine, and I could test different stripe sizes
>if somebody prepares a test script and data for that.
>
>I could also test different RAID modes 0,1,5 and 10 for this script.
>
>I guess the community needs these results.
>
>On 16 Sep 2005 04:51:43 -0700
>"bm\\mbn" <miki@canaan.co.il> wrote:
>
>
>
>>Hi Everyone
>>
>>The machine is IBM x345 with ServeRAID 6i 128mb cache and 6 SCSI 15k
>>disks.
>>
>>2 disks are in RAID1 and hold the OS, SWAP & pg_xlog
>>4 disks are in RAID10 and hold the Cluster itself.
>>
>>the DB will have two major tables 1 with 10 million rows and one with
>>100 million rows.
>>All the activities against this tables will be SELECT.
>>
>>Currently the strip size is 8k. I read in many place this is a poor
>>setting.
>>
>>Am i right ?
>>
>>
>
>
>

--
______________________________

Jignesh K. Shah
MTS Software Engineer,
MDE - Horizontal Technologies
Sun Microsystems, Inc
Phone: (781) 442 3052
Email: J.K.Shah@sun.com
______________________________



Re: RAID Stripe size

From
Alex Turner
Date:
I have benched different sripe sizes with different file systems, and the perfmance differences can be quite dramatic.

Theoreticaly a smaller stripe is better for OLTP as you can write more small transactions independantly to more different disks more often than not, but a large stripe size is good for Data warehousing as you are often doing very large sequential reads, and a larger stripe size is going to exploit the on-drive cache as you request larger single chunks from the disk at a time.

It also seems that different controllers are partial to different defaults that can affect their performance, so I would suggest that testing this on two different controller cards man be less than optimal.

I would also recommend looking at file system.  For us JFS worked signifcantly faster than resier for large read loads and large write loads, so we chose JFS over ext3 and reiser.

I found that lower stripe sizes impacted performance badly as did overly large stripe sizes.

Alex Turner
NetEconomist

On 16 Sep 2005 04:51:43 -0700, bmmbn <miki@canaan.co.il> wrote:
Hi Everyone

The machine is IBM x345 with ServeRAID 6i 128mb cache and 6 SCSI 15k
disks.

2 disks are in RAID1 and hold the OS, SWAP & pg_xlog
4 disks are in RAID10 and hold the Cluster itself.

the DB will have two major tables 1 with 10 million rows and one with
100 million rows.
All the activities against this tables will be SELECT.

Currently the strip size is 8k. I read in many place this is a poor
setting.

Am i right ?


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: RAID Stripe size

From
"Welty, Richard"
Date:
Alex Turner  wrote:

> I would also recommend looking at file system.  For us JFS worked signifcantly
>  faster than resier for large read loads and large write loads, so we chose JFS
>  over ext3 and reiser.

has jfs been reliable for you? there seems to be a lot of conjecture about instability,
but i find jfs a potentially attractive alternative for a number of reasons.

richard

Re: RAID Stripe size

From
Alex Turner
Date:
I have found JFS to be just fine.  We have been running a medium load on this server for 9 months with no unscheduled down time.  Datbase is about 30gig on disk, and we get about 3-4 requests per second that generate results sets in the thousands from about 8am to about 11pm.

I have foudn that JFS barfs if you put a million files in a directory and try to do an 'ls', but then so did reiser, only Ext3 handled this test succesfully.  Fortunately with a database, this is an atypical situation, so JFS has been fine for DB for us so far.

We have had severe problems with Ext3 when file systems hit 100% usage, they get all kinds of unhappy, we haven't had the same problem with JFS.

Alex Turner
NetEconomist

On 9/20/05, Welty, Richard <richard.welty@bankofamerica.com> wrote:
Alex Turner  wrote:

> I would also recommend looking at file system.  For us JFS worked signifcantly
>  faster than resier for large read loads and large write loads, so we chose JFS
>  over ext3 and reiser.

has jfs been reliable for you? there seems to be a lot of conjecture about instability,
but i find jfs a potentially attractive alternative for a number of reasons.

richard

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: RAID Stripe size

From
evgeny gridasov
Date:
We have a production server(8.0.2) running 24x7, 300k+ transactions per day.
Linux 2.6.11 / JFS file system.
No problems. It works faster than ext3.

> Alex Turner  wrote:
>
> > I would also recommend looking at file system.  For us JFS worked signifcantly
> >  faster than resier for large read loads and large write loads, so we chose JFS
> >  over ext3 and reiser.
>
> has jfs been reliable for you? there seems to be a lot of conjecture about instability,
> but i find jfs a potentially attractive alternative for a number of reasons.
>
> richard
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match


--
Evgeny Gridasov
Software Developer
I-Free, Russia

Planner statistics vs. count(*)

From
evgeny gridasov
Date:
Hi Everybody.

I am going to replace some 'select count(*) from ... where ...' queries
which run on large tables (10M+ rows) with something like
'explain select * from ... where ....' and parse planner output after that
to find out its forecast about number of rows the query is going to retrieve.

Since my users do not need exact row count for large tables, this will
boost performance for my application. I ran some queries with explain and
explain analyze then. If i set statistics number for the table about 200-300
the planner forecast seems to be working very fine.

My questions are:
1. Is there a way to interact with postgresql planner, other than 'explain ...'? An aggregate query like 'select
estimate_count(*)from ...' would really help =)) 
2. How precise is the planner row count forecast given for a complex query (select with 3-5 joint
tables,aggregates,subselects,etc...)? 


--
Evgeny Gridasov
Software Developer
I-Free, Russia

Re: Planner statistics vs. count(*)

From
Bricklen Anderson
Date:
evgeny gridasov wrote:
> Hi Everybody.
>
> I am going to replace some 'select count(*) from ... where ...' queries
> which run on large tables (10M+ rows) with something like
> 'explain select * from ... where ....' and parse planner output after that
> to find out its forecast about number of rows the query is going to retrieve.
>
> Since my users do not need exact row count for large tables, this will
> boost performance for my application. I ran some queries with explain and
> explain analyze then. If i set statistics number for the table about 200-300
> the planner forecast seems to be working very fine.
>
> My questions are:
> 1. Is there a way to interact with postgresql planner, other than 'explain ...'? An aggregate query like 'select
estimate_count(*)from ...' would really help =)) 
> 2. How precise is the planner row count forecast given for a complex query (select with 3-5 joint
tables,aggregates,subselects,etc...)? 
>
>
I think that this has been done before. Check the list archives (I believe it
may have been Michael Fuhr?)

ah, check this:

http://archives.postgresql.org/pgsql-sql/2005-08/msg00046.php

--
_______________________________

This e-mail may be privileged and/or confidential, and the sender does
not waive any related rights and obligations. Any distribution, use or
copying of this e-mail or the information it contains by other than an
intended recipient is unauthorized. If you received this e-mail in
error, please advise me (by return e-mail or otherwise) immediately.
_______________________________