Chris,
> I've got an application that needs to chunk through ~2GB of data. The
> data is ~7000 different sets of 300 records each. I put all of the data
> into a postgres database but that doesn't look like its going to work
> because of how the data lives on the disk.
Your problem is curable through 4 steps:
1) adjust your postgresql.conf to appropriate levels for memory usage.
2) if those sets of 300 are blocks in some contiguous order, then cluster them
to force their physical ordering on disk to be the same order you want to
read them in. This will require you to re-cluster whenever you change a
significant number of records, but from the sound of it that happens in
batches.
3) Get better disks, preferrably a RAID array, or just very fast scsi if the
database is small. If you're budget-constrained, Linux software raid (or
BSD raid) on IDE disks is cheap. What kind of RAID depends on what else
you'll be doing with the app; RAID 5 is better for read-only access, RAID 1+0
is better for read-write.
4) Make sure that you aren't dumping the data to the same disk postgreSQL
lives on! Preferably, make sure that your swap partition is on a different
disk/array from postgresql. If the computing app is complex and requires
disk reads aside from postgres data, you should make sure that it lives on
yet another disk. Or you can simplify this with a good, really large
multi-channel RAID array.
--
-Josh Berkus
Aglio Database Solutions
San Francisco