Re: Large Database Performance suggestions - Mailing list pgsql-performance

From Scott Marlowe
Subject Re: Large Database Performance suggestions
Date
Msg-id 1098421413.21035.78.camel@localhost.localdomain
Whole thread Raw
In response to Large Database Performance suggestions  (Joshua Marsh <icub3d@gmail.com>)
List pgsql-performance
On Thu, 2004-10-21 at 21:14, Joshua Marsh wrote:
> Hello everyone,
>
> I am currently working on a data project that uses PostgreSQL
> extensively to store, manage and maintain the data.  We haven't had
> any problems regarding database size until recently.  The three major
> tables we use never get bigger than 10 million records.  With this
> size, we can do things like storing the indexes or even the tables in
> memory to allow faster access.
>
> Recently, we have found customers who are wanting to use our service
> with data files between 100 million and 300 million records.  At that
> size, each of the three major tables will hold between 150 million and
> 700 million records.  At this size, I can't expect it to run queries
> in 10-15 seconds (what we can do with 10 million records), but would
> prefer to keep them all under a minute.
>
> We did some original testing and with a server with 8GB or RAM and
> found we can do operations on data file up to 50 million fairly well,
> but performance drop dramatically after that.  Does anyone have any
> suggestions on a good way to improve performance for these extra large
> tables?  Things that have come to mind are Replication and Beowulf
> clusters, but from what I have recently studied, these don't do so wel
> with singular processes.  We will have parallel process running, but
> it's more important that the speed of each process be faster than
> several parallel processes at once.


I'd assume that what's happening is that up to a certain data set size,
it all fits in memory, and you're going from CPU/memory bandwidth
limited to I/O limited.  If this is the case, then a faster storage
subsystem is the only real answer.  If the database is mostly read, then
a large RAID5 or RAID 1+0 array should help quite a bit.

You might wanna post some explain analyze of the queries that are going
slower at some point in size, along with schema for those tables etc...


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: mmap (was First set of OSDL Shared Mem scalability results, some wierdness ...
Next
From: Dave Cramer
Date:
Subject: Re: Large Database Performance suggestions