On Thu, 2004-10-21 at 21:14, Joshua Marsh wrote:
> Hello everyone,
>
> I am currently working on a data project that uses PostgreSQL
> extensively to store, manage and maintain the data. We haven't had
> any problems regarding database size until recently. The three major
> tables we use never get bigger than 10 million records. With this
> size, we can do things like storing the indexes or even the tables in
> memory to allow faster access.
>
> Recently, we have found customers who are wanting to use our service
> with data files between 100 million and 300 million records. At that
> size, each of the three major tables will hold between 150 million and
> 700 million records. At this size, I can't expect it to run queries
> in 10-15 seconds (what we can do with 10 million records), but would
> prefer to keep them all under a minute.
>
> We did some original testing and with a server with 8GB or RAM and
> found we can do operations on data file up to 50 million fairly well,
> but performance drop dramatically after that. Does anyone have any
> suggestions on a good way to improve performance for these extra large
> tables? Things that have come to mind are Replication and Beowulf
> clusters, but from what I have recently studied, these don't do so wel
> with singular processes. We will have parallel process running, but
> it's more important that the speed of each process be faster than
> several parallel processes at once.
I'd assume that what's happening is that up to a certain data set size,
it all fits in memory, and you're going from CPU/memory bandwidth
limited to I/O limited. If this is the case, then a faster storage
subsystem is the only real answer. If the database is mostly read, then
a large RAID5 or RAID 1+0 array should help quite a bit.
You might wanna post some explain analyze of the queries that are going
slower at some point in size, along with schema for those tables etc...