Thread: Hardware/OS recommendations for large databases (5TB)

Hardware/OS recommendations for large databases (5TB)

From
"Adam Weisberg"
Date:
Does anyone have recommendations for hardware and/or OS to work with around 5TB datasets?
 
The data is for analysis, so there is virtually no inserting besides a big bulk load. Analysis involves full-database aggregations - mostly basic arithmetic and grouping. In addition, much smaller subsets of data would be pulled and stored to separate databases.
 
I have been working with datasets no bigger than around 30GB, and that (I'm afraid to admit) has been in MSSQL.
 
Thanks,
 
Adam

Re: Hardware/OS recommendations for large databases (5TB)

From
Claus Guttesen
Date:
> Does anyone have recommendations for hardware and/or OS to work with around
> 5TB datasets?

Hardware-wise I'd say dual core opterons. One dual-core-opteron
performs better than two single-core at the same speed. Tyan makes
some boards that have four sockets, thereby giving you 8 cpu's (if you
need that many). Sun and HP also makes nice hardware although the Tyan
board is more competetive priced.

OS wise I would choose the FreeBSD amd64 port but partititions larger
than 2 TB needs some special care, using gpt rather than disklabel
etc., tools like fsck may not be able to completely check partitions
larger than 2 TB. Linux or Solaris with either LVM or Veritas FS
sounds like candidates.

> I have been working with datasets no bigger than around 30GB, and that (I'm
> afraid to admit) has been in MSSQL.

Well, our data are just below 30 GB so I can't help you there :-)

regards
Claus

Re: Hardware/OS recommendations for large databases (5TB)

From
"Merlin Moncure"
Date:
> Hardware-wise I'd say dual core opterons. One dual-core-opteron
> performs better than two single-core at the same speed. Tyan makes
> some boards that have four sockets, thereby giving you 8 cpu's (if you
> need that many). Sun and HP also makes nice hardware although the Tyan
> board is more competetive priced.

just FYI: tyan makes a 8 socket motherboard (up to 16 cores!):
http://www.swt.com/vx50.html

It can be loaded with up to 128 gb memory if all the sockets are filled
:).

Merlin

Re: Hardware/OS recommendations for large databases (5TB)

From
Vivek Khera
Date:
On Nov 15, 2005, at 3:28 AM, Claus Guttesen wrote:

> Hardware-wise I'd say dual core opterons. One dual-core-opteron
> performs better than two single-core at the same speed. Tyan makes

at 5TB data, i'd vote that the application is disk I/O bound, and the
difference in CPU speed at the level of dual opteron vs. dual-core
opteron is not gonna be noticed.

to maximize disk, try getting a dedicated high-end disk system like
nstor or netapp file servers hooked up to fiber channel, then use a
good high-end fiber channel controller like one from LSI.

and go with FreeBSD amd64 port.  It is *way* fast, especially the
FreeBSD 6.0 disk system.


Re: Hardware/OS recommendations for large databases (5TB)

From
Claus Guttesen
Date:
> at 5TB data, i'd vote that the application is disk I/O bound, and the
> difference in CPU speed at the level of dual opteron vs. dual-core
> opteron is not gonna be noticed.
>
> to maximize disk, try getting a dedicated high-end disk system like
> nstor or netapp file servers hooked up to fiber channel, then use a
> good high-end fiber channel controller like one from LSI.
>
> and go with FreeBSD amd64 port.  It is *way* fast, especially the
> FreeBSD 6.0 disk system.

I'm (also) FreeBSD-biased but I'm not shure whether the 5 TB fs will
work so well if tools like fsck are needed. Gvinum could be one option
but I don't have any experience in that area.

regards
Claus

Re: Hardware/OS recommendations for large databases (5TB)

From
Vivek Khera
Date:
On Nov 16, 2005, at 4:50 PM, Claus Guttesen wrote:

> I'm (also) FreeBSD-biased but I'm not shure whether the 5 TB fs will
> work so well if tools like fsck are needed. Gvinum could be one option
> but I don't have any experience in that area.

Then look into an external filer and mount via NFS.  Then it is not
FreeBSD's responsibility to manage the volume.