One particular large database application - Mailing list pgsql-general
From | Curt Sampson |
---|---|
Subject | One particular large database application |
Date | |
Msg-id | Pine.NEB.4.43.0204211417200.6249-100000@angelic.cynic.net Whole thread Raw |
In response to | Re: Backup very large databases (Francisco Reyes <lists@natserv.com>) |
Responses |
Re: One particular large database application
|
List | pgsql-general |
On Sat, 20 Apr 2002, Francisco Reyes wrote: > On Sat, 20 Apr 2002, Curt Sampson wrote: > > > I anticipate having a 30-50 GB or larger database in the next few months. > > I think that with so much data having good hardware is key. > What type of data is this? How many people concurrently connect? Well, "I'm not sure." The client is being a bit obscure about what he really wants, and in fact may not quite know exactly what he really wants. Welcome to the joys of being a consultant. :-) My contract is to do which seems to be a proof-of-concept. They are giving me a pile of sample data (about half a billion records) which includes a record ID, date, user ID, and a bit of other stuff. I need to do selects by user ID across a range of dates. I'm obviously going to have to work out more details about what they really want as I go along, which is no problem: that's one of the things I'm hired to do. Anyway, it appears that this is going to be used as a query-only copy of a database where the master data is stored elsewhere, so that's why I'm not too worried about updates. I probably don't need backups, either, since I can always just re-import the data if I lose it. One of the things they want me to try is partitioning the data across multiple machines, and submitting queries in parallel. So I'll be writing software that will take a query, figure out what tables it needs to apply that query to, apply that query to those tables (chosing the servers appropriately as well), and consolidate the results. For hardware, it seems that a bunch of cheap, basic PCs would do the trick. I'm thinking of a system with a 1-2 GHz CPU, 512 MB of memory, a 20-40 GB IDE disk for the system, log and temporary space, and an 80 GB or larger IDE disk for the data. If reliability is a real concern, probably mirroring the disks is the best option. These systems, in 1U cases, should come in under 200,000 yen each (under US$1500) and we can scale by adding more of them. But even the performance of a single one should be fairly good, since I'm avoiding RAID-5, which tends to be a serious performance killer. This, BTW, is why I've been going on about import speed recently. It's expensive (both in terms of cost and performance) to avoid disk system failure with RAID or whatever, but if you can live with recovering from failure, rather than avoiding failure, you can use cheaper and faster hardware. So I'm wondering if I can set this up to be able to recover from a failure fast enough that I don't need to worry so much about preventing one. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC
pgsql-general by date: