Thread: Hardware advice

Hardware advice

From
Adam Witney
Date:
Hi,

I am in the process of pricing up boxes for our database, and I was
wondering if anyone had any recommendations or comments.

The database itself will have around 100-150 users mostly accessing through
a PHP/apache interface. I don't expect lots of simultaneous activity,
however users will often be doing multiple table joins (can be up to 10-15
tables in one query). Also they will often be pulling out on the order of
250,000 rows (5-10 numeric fields per row), processing the data (I may split
this to a second box) and then writing back ~20,000 rows of data (2-3
numeric fields per row).

Estimating total amount of data is quite tricky, but it could grow to
100-250Gb over the next 3 years.

I have priced one box from the Dell web site as follows

Single Intel Xeon 2.8GHz with 512kb L2 cache
2GB RAM

36Gb 10,000rpm Ultra 3 160 SCSI
36Gb 10,000rpm Ultra 3 160 SCSI
146Gb 10,000rpm U320 SCSI
146Gb 10,000rpm U320 SCSI
146Gb 10,000rpm U320 SCSI

PERC 3/DC RAID Controller (128MB Cache)

RAID1 for 2x 36Gb drives
RAID5 for 3x 146Gb drives

Running RedHat Linux 8.0

This configuration would be pretty much the top of our budget (~ £5k).

I was planning on having the RAID1 setup for the OS and then the RAID5 for
the db files.

Would it be better to have a dual 2.4GHz setup rather than a single 2.8GHz
or would it not make much difference?

Does the RAID setup look ok, or would anyone forsee problems in this
context? (This machine can take a maximum of 5 internal drives).

Am I overdoing any particular component at the expense of another?

Any other comments would be most welcome.

Thanks for any help

Adam


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: Hardware advice

From
Andrew Sullivan
Date:
On Fri, May 30, 2003 at 03:23:28PM +0100, Adam Witney wrote:
> RAID5 for 3x 146Gb drives

I find the RAID5 on the PERC to be painfully slow.  It's _really_ bad
if you don't put WAL on its own drive.

Also, you don't mention it, but check to make sure you're getting ECC
memory on these boxes.  Random memory errors which go undetected will
make you very unhappy.  ECC lowers (but doesn't eliminate,
apparently) your chances.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: Hardware advice

From
"scott.marlowe"
Date:
On Fri, 30 May 2003, Adam Witney wrote:

> 250,000 rows (5-10 numeric fields per row), processing the data (I may split
> this to a second box) and then writing back ~20,000 rows of data (2-3
> numeric fields per row).

Make sure and vacuum often and crank up your fsm values to be able to
reclaim lost disk space.

> 36Gb 10,000rpm Ultra 3 160 SCSI
> 36Gb 10,000rpm Ultra 3 160 SCSI
> 146Gb 10,000rpm U320 SCSI
> 146Gb 10,000rpm U320 SCSI
> 146Gb 10,000rpm U320 SCSI
>
> PERC 3/DC RAID Controller (128MB Cache)

If that box has a built in U320 controller or you can bypass the Perc,
give the Linux kernel level RAID1 and RAID5 drivers a try.  On a dual CPU
box of that speed, they may well outrun many hardware controllers.
Contrary to popular opinion, software RAID is not slow in Linux.

> RAID1 for 2x 36Gb drives
> RAID5 for 3x 146Gb drives

You might wanna do something like go to all 146 gig drives, put a mirror
set on the first 20 or so gigs for the OS, and then use the remainder
(5x120gig or so ) to make your RAID5.  The more drives in a RAID5 the
better, generally, up to about 8 or 12 as the optimal for most setups.

But that setup of a RAID1 and RAID5 set is fine as is.

By running software RAID you may be able to afford to upgrade the 36 gig
drives...

> Would it be better to have a dual 2.4GHz setup rather than a single 2.8GHz
> or would it not make much difference?

Yes it would.  Linux servers running databases are much more responsive
with dual CPUs.

> Am I overdoing any particular component at the expense of another?

Maybe the RAID controller cost versus having more big hard drives.



Re: Hardware advice

From
Adam Witney
Date:
Hi scott,

Thanks for the info

> You might wanna do something like go to all 146 gig drives, put a mirror
> set on the first 20 or so gigs for the OS, and then use the remainder
> (5x120gig or so ) to make your RAID5.  The more drives in a RAID5 the
> better, generally, up to about 8 or 12 as the optimal for most setups.

I am not quite sure I understand what you mean here... Do you mean take 20Gb
from each of the 5 drives to setup a 20Gb RAID 1 device? Or just from the
first 2 drives?

Thanks again for your help

adam


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: Hardware advice

From
"scott.marlowe"
Date:
On Fri, 30 May 2003, Adam Witney wrote:

> Hi scott,
>
> Thanks for the info
>
> > You might wanna do something like go to all 146 gig drives, put a mirror
> > set on the first 20 or so gigs for the OS, and then use the remainder
> > (5x120gig or so ) to make your RAID5.  The more drives in a RAID5 the
> > better, generally, up to about 8 or 12 as the optimal for most setups.
>
> I am not quite sure I understand what you mean here... Do you mean take 20Gb
> from each of the 5 drives to setup a 20Gb RAID 1 device? Or just from the
> first 2 drives?

You could do it either way, since the linux kernel supports more than 2
drives in a mirror.  But, this costs on writes, so don't do it for things
like /var or the pg_xlog directory.

There are a few ways you could arrange 5 146 gig drives.

One might be to make the first 20 gig on each drive part of a mirror set
where the first two drives are the live mirror, and the next three are hot
spares.  Then you could setup your RAID5 to have 4 live drives and 1 hot
spare.

Hot spares are nice to have because they provide for the shortest period
of time during which your machine is running with a degraded RAID array.

note that in linux you can set the kernel parameter
dev.raid.speed_limit_max and dev.raid.speed_limit_min to control the
rebuild bandwidth used so that when a disk dies you can set a compromise
between fast rebuilds, and lowering the demands on the I/O subsystem
during a rebuild.  The max limit default is 100k / second, which is quite
slow.  On a machine with Ultra320 gear, you could set that to 10 ot 20
megs a second and still not saturate your SCSI buss.

Now that I think of it, you could probably set it up so that you have a
mirror set for the OS, one for pg_xlog, and then use the rest of the
drives as RAID5.  Then grab space on the fifth drive to make a hot spare
for both the pg_xlog and the OS drive.

Drive 0
[OS RAID1 20 Gig D0][big data drive RAID5 106 Gig D0]
Drive 1
[OS RAID1 20 Gig D1][big data drive RAID5 106 Gig D1]
Drive 2
[pg_xlog RAID1 20 gig D0][big data drive RAID5 106 Gig D2]
Drive 3
[pg_xlog RAID1 20 gig D1][big data drive RAID5 106 Gig D3]
Drive 4
[OS hot spare 20 gig][g_clog hot spare 20 gig][big data drive RAID5 106
Gig hot spare]

That would give you ~ 300 gigs storage.

Of course, there will likely be slightly less performance than you might
get from dedicated RAID arrays for each RAID1/RAID5 set, but my guess is
that by having 4 (or 5 if you don't want a hot spare) drives in the RAID5
it'll still be faster than a dedicated 3 drive RAID array.


Re: Hardware advice

From
"Roman Fail"
Date:
Based on what you've said, I would guess you are considering the Dell PowerEdge 2650 since it has 5 drive bays.  If you
couldafford the rackspace and just a bit more money, I'd get the tower configuration 2600 with 6 drive bays (and rack
railsif needed - Dell even gives you a special rackmount faceplate if you order a tower with rack rails).  This would
allowyou to have this configuration, which I think would be about ideal for the price range you are looking at:
 
 
* Linux kernel RAID
* Dual processors - better than a single faster processor, especially with concurrent user load and software RAID on
topof that
 
* 2x36GB in RAID-1 (for OS and WAL)
* 4x146GB in RAID-10 (for data) (alternative: 4-disk RAID-5)
 
The RAID-10 array gives you the same amount of space you would have with a 3-disk RAID-5 and improved fault tolerance.
AlthoughI'm pretty sure your drives won't be hot-swappable with the software RAID - I've never actually had to do it.
 
 
I can't say I like Scott's idea much because the WAL and OS are competing for disk time with the data since they are on
thesame physical disk.  In a database that is mainly reads with few writes, this wouldn't be such a problem though.
 
 
Just my inexpert opinion,
 
Roman
 

    -----Original Message----- 
    From: Adam Witney [mailto:awitney@sghms.ac.uk] 
    Sent: Fri 5/30/2003 9:55 AM 
    To: scott.marlowe; Adam Witney 
    Cc: pgsql-performance 
    Subject: Re: [PERFORM] Hardware advice
    
    

    Hi scott,
    
    Thanks for the info
    
    > You might wanna do something like go to all 146 gig drives, put a mirror
    > set on the first 20 or so gigs for the OS, and then use the remainder
    > (5x120gig or so ) to make your RAID5.  The more drives in a RAID5 the
    > better, generally, up to about 8 or 12 as the optimal for most setups.
    
    I am not quite sure I understand what you mean here... Do you mean take 20Gb
    from each of the 5 drives to setup a 20Gb RAID 1 device? Or just from the
    first 2 drives?
    
    Thanks again for your help
    
    adam
    
    
    --
    This message has been scanned for viruses and
    dangerous content by MailScanner, and is
    believed to be clean.
    
    
    ---------------------------(end of broadcast)---------------------------
    TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
    


Re: Hardware advice

From
Will LaShell
Date:
On Fri, 2003-05-30 at 07:44, Andrew Sullivan wrote:
> On Fri, May 30, 2003 at 03:23:28PM +0100, Adam Witney wrote:
> > RAID5 for 3x 146Gb drives
>
> I find the RAID5 on the PERC to be painfully slow.  It's _really_ bad
> if you don't put WAL on its own drive.

This seems to be an issue with the dell firmware.  The megaraid devel
list has been tracking this issue on and off for some time now.  People
have had good luck with a couple of different fixes. The PERC cards
-can- be made not to suck and the LSI cards simply don't have the
problem. ( Since they are effectively the same card its the opinion that
its the firmware )



> Also, you don't mention it, but check to make sure you're getting ECC
> memory on these boxes.  Random memory errors which go undetected will
> make you very unhappy.  ECC lowers (but doesn't eliminate,
> apparently) your chances.

100% agree with this note.

> A
>
> --
> ----
> Andrew Sullivan                         204-4141 Yonge Street
> Liberty RMS                           Toronto, Ontario Canada
> <andrew@libertyrms.info>                              M2P 2A8
>                                          +1 416 646 3304 x110
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)


Attachment

Re: Hardware advice

From
"scott.marlowe"
Date:
On Fri, 30 May 2003, Roman Fail wrote:

> Based on what you've said, I would guess you are considering the Dell PowerEdge 2650 since it has 5 drive bays.  If
youcould afford the rackspace and just a bit more money, I'd get the tower configuration 2600 with 6 drive bays (and
rackrails if needed - Dell even gives you a special rackmount faceplate if you order a tower with rack rails).  This
wouldallow you to have this configuration, which I think would be about ideal for the price range you are looking at: 
>
> * Linux kernel RAID

Actually, I think he was looking at hardware RAID, but I was recommending
software RAID as at least an option.  I've found that on modern hardware
with late model kernels, Linux is pretty fast with straight RAID, but not
as good with layering it, fyi.  I haven't tested since 2.4.9 though, so
things may well have changed, and hopefully for the better, in relation to
running fast in layered RAID.

they both would likely work well, but going with a sub par HW raid card
will make the system slower than the kernel sw RAID.

> * Dual processors - better than a single faster processor, especially
> with concurrent user load and software RAID on top of that
> * 2x36GB in RAID-1 (for OS and WAL)
> * 4x146GB in RAID-10 (for data) (alternative: 4-disk RAID-5)
>
> The RAID-10 array gives you the same amount of space you would have
> with a 3-disk RAID-5 and improved fault tolerance.  Although I'm pretty
> sure your drives won't be hot-swappable with the software RAID - I've
> never actually had to do it.

I agree that 6 drives makes this a much better option.

Actually, the hot swappable issue can only be accomplished in linux kernel
sw raid by using multiple controllers.  It's not really "hot swap" because
you have to basically reset that card and it's information about which
drives are on it.  Using two controllers, where one runs one RAID0 set,
and the other runs another RAID0 set, and you run a RAID1 on top, you can
then use hot swap shoes and replace failed drives.

The improved fault tolerance of the RAID 1+0 is minimal over the RAID5 if
the RAID5 has a hot spare, but it is there.

I've removed and added drives to running arrays, and the raidhotadd
program to do it is quite easy to drive.  It all seemed to work quite
well.  The biggest problem you'll note when a drive fails is that the
kernel / scsi driver will keep resetting the bus and timing out the
device, so with a failed device, linux kernel RAID can be a bit doggish
until you restart the SCSI driver so it KNOWs the drive's not there and
quits asking for it over and over.

> I can't say I like Scott's idea much because the WAL and OS are
> competing for disk time with the data since they are on the same
> physical disk.  In a database that is mainly reads with few writes,
> this wouldn't be such a problem though.

You'd be surprised how often this is a non-issue.  If you're writing
20,000 records every 10 minutes or so, the location of the WAL file is not
that important.  The machine will lug for a few seconds, insert, and be
done.  The speed increase averaged out over time is almost nothing.

Now, transactional systems are a whole nother enchilada.  I got the
feeling from the original post this was more a batch processing kinda
thing.

I knew the solution I was giving was suboptimal on performance (I might
have even alluded to that...).  I was going more for maximizing use of
rack space and getting the most storage. I think the user said that this
project might well grow to 250 or 300 gig, so size is probably more or as
important as speed for this system.

RAID5 is pretty much the compromise RAID set.  It's not necessarily the
fastest, it certainly isn't the sexiest, but it provides a lot of storage
for very little redundancy cost, and with a hot spare it's pretty much
24/7 with a couple days off a year for scheduled maintenance.  Combine
that with having n-1 number of platters for each read to be spread across
make it a nice choice for data warehousing or report serving.

Whatever he does, he should make sure he turns off atime on the data
partition.  That can utterly kill a postgresql / linux box by a factor of
right at two for someone doing small reads.



Re: Hardware advice

From
"scott.marlowe"
Date:
On 30 May 2003, Will LaShell wrote:

> On Fri, 2003-05-30 at 07:44, Andrew Sullivan wrote:
> > On Fri, May 30, 2003 at 03:23:28PM +0100, Adam Witney wrote:
> > > RAID5 for 3x 146Gb drives
> >
> > I find the RAID5 on the PERC to be painfully slow.  It's _really_ bad
> > if you don't put WAL on its own drive.
>
> This seems to be an issue with the dell firmware.  The megaraid devel
> list has been tracking this issue on and off for some time now.  People
> have had good luck with a couple of different fixes. The PERC cards
> -can- be made not to suck and the LSI cards simply don't have the
> problem. ( Since they are effectively the same card its the opinion that
> its the firmware )

I've used the LSI/MegaRAID cards in the past.  They're not super fast, but
they're not slow either.  Very solid operation.  Sometimes the firmware
makes you feel like you're wearing handcuffs compared to the relative
freedom in the kernel sw drivers (i.e. you can force the kernel to take
back a failed drive, the megaraid just won't take it back until it's been
formatted, that kind of thing).

The LSI plain scsi cards in general are great cards, I got an UWSCSI card
by them with gigabit ethernet thrown in off ebay a couple years back and
it's VERY fast and stable.

Also, if you're getting cache memory on the megaraid/perc card, make sure
you get the battery backup module.


Re: Hardware advice

From
Adam Witney
Date:
On 30/5/03 6:17 pm, "scott.marlowe" <scott.marlowe@ihs.com> wrote:

> On Fri, 30 May 2003, Adam Witney wrote:
>
>> Hi scott,
>>
>> Thanks for the info
>>
>>> You might wanna do something like go to all 146 gig drives, put a mirror
>>> set on the first 20 or so gigs for the OS, and then use the remainder
>>> (5x120gig or so ) to make your RAID5.  The more drives in a RAID5 the
>>> better, generally, up to about 8 or 12 as the optimal for most setups.
>>
>> I am not quite sure I understand what you mean here... Do you mean take 20Gb
>> from each of the 5 drives to setup a 20Gb RAID 1 device? Or just from the
>> first 2 drives?
>
> You could do it either way, since the linux kernel supports more than 2
> drives in a mirror.  But, this costs on writes, so don't do it for things
> like /var or the pg_xlog directory.
>
> There are a few ways you could arrange 5 146 gig drives.
>
> One might be to make the first 20 gig on each drive part of a mirror set
> where the first two drives are the live mirror, and the next three are hot
> spares.  Then you could setup your RAID5 to have 4 live drives and 1 hot
> spare.
>
> Hot spares are nice to have because they provide for the shortest period
> of time during which your machine is running with a degraded RAID array.
>
> note that in linux you can set the kernel parameter
> dev.raid.speed_limit_max and dev.raid.speed_limit_min to control the
> rebuild bandwidth used so that when a disk dies you can set a compromise
> between fast rebuilds, and lowering the demands on the I/O subsystem
> during a rebuild.  The max limit default is 100k / second, which is quite
> slow.  On a machine with Ultra320 gear, you could set that to 10 ot 20
> megs a second and still not saturate your SCSI buss.
>
> Now that I think of it, you could probably set it up so that you have a
> mirror set for the OS, one for pg_xlog, and then use the rest of the
> drives as RAID5.  Then grab space on the fifth drive to make a hot spare
> for both the pg_xlog and the OS drive.
>
> Drive 0
> [OS RAID1 20 Gig D0][big data drive RAID5 106 Gig D0]
> Drive 1
> [OS RAID1 20 Gig D1][big data drive RAID5 106 Gig D1]
> Drive 2
> [pg_xlog RAID1 20 gig D0][big data drive RAID5 106 Gig D2]
> Drive 3
> [pg_xlog RAID1 20 gig D1][big data drive RAID5 106 Gig D3]
> Drive 4
> [OS hot spare 20 gig][g_clog hot spare 20 gig][big data drive RAID5 106
> Gig hot spare]
>
> That would give you ~ 300 gigs storage.
>
> Of course, there will likely be slightly less performance than you might
> get from dedicated RAID arrays for each RAID1/RAID5 set, but my guess is
> that by having 4 (or 5 if you don't want a hot spare) drives in the RAID5
> it'll still be faster than a dedicated 3 drive RAID array.
>

Hi Scott,

Just following up a post from a few months back... I have now purchased the
hardware, do you have a recommended/preferred Linux distro that is easy to
configure for software RAID?

Thanks again

Adam


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.