On Sun, 21 Apr 2002, Curt Sampson wrote:
> One of the things they want me to try is partitioning the data
> across multiple machines, and submitting queries in parallel. So
> I'll be writing software that will take a query, figure out what
> tables it needs to apply that query to, apply that query to those
> tables (chosing the servers appropriately as well), and consolidate
> the results.
Interesting.
> For hardware, it seems that a bunch of cheap, basic PCs would do
> the trick. I'm thinking of a system with a 1-2 GHz CPU, 512 MB of
> memory, a 20-40 GB IDE disk for the system, log and temporary space,
> and an 80 GB or larger IDE disk for the data. If reliability is a
> real concern, probably mirroring the disks is the best option.
May I suggest a different approach?
From what I understand this data may not change often.
How about instead of getting numerous cheap machines get only 2 or 3 good
machines with 2 15K RPM drives, 4GB of RAM and 1 IDE for the OS. Or if
you can get even more money... 4 15K rpm drives on Raid 0.