On Thu, 2006-01-19 at 12:39, Kevin & Jessica Hermansen wrote:
> I'm looking to set up an informational database which will be accessed
> by the general public on our website (there will be no internal
> users). The database will likely have about 10 million records with a
> total database size of 200-300 GB. All 10 million records will be
> very similar (the same data fields), so I anticipate that the entire
> database will be just 1 table. This is the main application that will
> be running on our server. Users will be querying the database by
> searching either a particular field (or multiple fields) or by doing a
> full text search.
>
> My primary concern is that wait times (between the user doing a search
> query & then being able to view the results) will be long (I consider
> "long" to be more than 4 or 5 seconds). I was hoping someone could
> advise me on what dedicated server hardware to purchase (# of
> processors (physical and/or virtual), type of processors (Xeon, P4,
> 32-bit, 64-bit, HyperThreading, etc.), cache, RAM, etc.). We're on a
> very, very tight budget and don't want to OVER-deploy, but the speed
> of the database is also a major consideration.
That very much depends on how you approach this problem...
> Going along with that, can you also advise (in general terms) on how
> to optimally configure the enormous, 1-table database (and how to make
> it run optimally with the hardware configuration)?
Use full text searching. Look for tsearch2 in the contrib directory of
the tarball distribution, or a contribs package for your OS.
A pentium III-750 with 512 Megs of ram will likely be faster running
tsearch2 than a Pentium IV 2.8GHz and 2 to 4 gigs of RAM. Seriously,
using tsearch2 is your first step here. Assuming that you're doing text
searches.
If all the searches are numeric types, then performance should be good
with almost any system.
I'd do some test runs on smaller boxes you already probably have laying
about to get a base line of how fast this is gonna be on them, then
you'll know about how much, if any, you need to step things up.
Hyperthreading doesn't help, and often hurts PostgreSQL performance,
especially with a lot of users.
AMD 64 bit CPUs are currently the king of 2 to 8 way processor systems
to run PostgreSQL (or Oracle for that matter) on.
Throw money at a good RAID card and RAID 10 while you're at it. Areca,
LSI, and a few others are pretty good. Avoid Dell, even though they
make some servers with LSI controllers in the (PERC4C and PERC3C and who
knows what else) they lobotomize them or something, because they just
aren't as fast as the OEM cousins they were bred from.
If this database will be mostly read, then look at running a large array
(say 8 to 12) of medium / small disks (70 to 150 gig... yeah, that's
medium to small nowadays...) In fact, you may find better performance
if this is a reporting database (mostly read) by using older drives but
a lot of them. I'm just guessing that you may have a desk drawer full
of 30 gig drives laying about.