Re: Large Database Performance suggestions - Mailing list pgsql-performance
From | Joshua Marsh |
---|---|
Subject | Re: Large Database Performance suggestions |
Date | |
Msg-id | 38242de904102608246d060f7d@mail.gmail.com Whole thread Raw |
In response to | Re: Large Database Performance suggestions (Dave Cramer <pg@fastcrypt.com>) |
Responses |
Re: Large Database Performance suggestions
Re: Large Database Performance suggestions |
List | pgsql-performance |
Thanks for all of your help so far. Here is some of the information you guys were asking for: Test System: 2x AMD Opteron 244 (1.8Ghz) 8GB RAM 7x 72GB SCSI HDD (Raid 5) postrgesql.conf information: #--------------------------------------------------------------------------- # RESOURCE USAGE (except WAL) #--------------------------------------------------------------------------- # - Memory - shared_buffers = 1000 # min 16, at least max_connections*2, 8KB each #sort_mem = 1024 # min 64, size in KB #vacuum_mem = 8192 # min 1024, size in KB sort_mem = 4096000 vacuum_mem = 1024000 # - Free Space Map - #max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each #max_fsm_relations = 1000 # min 100, ~50 bytes each # - Kernel Resource Usage - #max_files_per_process = 1000 # min 25 #preload_libraries = '' #--------------------------------------------------------------------------- # WRITE AHEAD LOG #--------------------------------------------------------------------------- # - Settings - #fsync = true # turns forced synchronization on or off #wal_sync_method = fsync # the default varies across platforms: # fsync, fdatasync, open_sync, or open_datasync #wal_buffers = 8 # min 4, 8KB each # - Checkpoints - #checkpoint_segments = 3 # in logfile segments, min 1, 16MB each #checkpoint_timeout = 300 # range 30-3600, in seconds #checkpoint_warning = 30 # 0 is off, in seconds #commit_delay = 0 # range 0-100000, in microseconds #commit_siblings = 5 # range 1-1000 Everything else are at their defaults. I actually think the WAL options are set to defaults as well, but I don't recall exactly :) As for the queries and table, The data we store is confidential, but it is essentially an account number with a bunch of boolean fields that specify if a person applies to criteria. So a query may look something like: SELECT acctno FROM view_of_data WHERE has_name AND is_active_member AND state = 'OH'; which is explained as something like this: QUERY PLAN ----------------------------------------------------------------- Seq Scan on view_of_data (cost=0.00..25304.26 rows=22054 width=11) Filter: (has_name AND is_active_member AND ((state)::text = 'OH'::text)) (2 rows) Occasionally, because we store data from several sources, we will have requests for data from several sources. We simply intersect the view_of_data table with a sources table that lists what acctno belong to what source. This query would look something like this: SELECT acctno FROM view_of_data WHERE has_name AND is_active_member AND state = 'OH' INTERSECT SELECT acctno FROM sources_data WHERE source = 175; which is explained as follows: QUERY PLAN ------------------------------------------------------------------------------------------- SetOp Intersect (cost=882226.14..885698.20 rows=69441 width=11) -> Sort (cost=882226.14..883962.17 rows=694411 width=11) Sort Key: acctno -> Append (cost=0.00..814849.42 rows=694411 width=11) -> Subquery Scan "*SELECT* 1" (cost=0.00..25524.80 rows=22054 width=11) -> Seq Scan on view_of_data (cost=0.00..25304.26 rows=22054 width=11) Filter: (has_name AND is_active_member AND ((state)::text = 'OH'::text)) -> Subquery Scan "*SELECT* 2" (cost=0.00..789324.62 rows=672357 width=11) -> Seq Scan on sources_data (cost=0.00..782601.05 rows=672357 width=11) Filter: (source = 23) Again, we see our biggest bottlenecks when we get over about 50 million records. The time to execute grows exponentially from that point. Thanks again for all of your help! -Josh On Fri, 22 Oct 2004 07:38:49 -0400, Dave Cramer <pg@fastcrypt.com> wrote: > Josh, > > Your hardware setup would be useful too. It's surprising how slow some > big name servers really are. > If you are seriously considering memory sizes over 4G you may want to > look at an opteron. > > Dave > > > > Joshua Marsh wrote: > > >Hello everyone, > > > >I am currently working on a data project that uses PostgreSQL > >extensively to store, manage and maintain the data. We haven't had > >any problems regarding database size until recently. The three major > >tables we use never get bigger than 10 million records. With this > >size, we can do things like storing the indexes or even the tables in > >memory to allow faster access. > > > >Recently, we have found customers who are wanting to use our service > >with data files between 100 million and 300 million records. At that > >size, each of the three major tables will hold between 150 million and > >700 million records. At this size, I can't expect it to run queries > >in 10-15 seconds (what we can do with 10 million records), but would > >prefer to keep them all under a minute. > > > >We did some original testing and with a server with 8GB or RAM and > >found we can do operations on data file up to 50 million fairly well, > >but performance drop dramatically after that. Does anyone have any > >suggestions on a good way to improve performance for these extra large > >tables? Things that have come to mind are Replication and Beowulf > >clusters, but from what I have recently studied, these don't do so wel > >with singular processes. We will have parallel process running, but > >it's more important that the speed of each process be faster than > >several parallel processes at once. > > > >Any help would be greatly appreciated! > > > >Thanks, > > > >Joshua Marsh > > > >P.S. Off-topic, I have a few invitations to gmail. If anyone would > >like one, let me know. > > > >---------------------------(end of broadcast)--------------------------- > >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > > > > > > > > -- > Dave Cramer > http://www.postgresintl.com > 519 939 0336 > ICQ#14675561 > >
pgsql-performance by date: