Re: [GENERAL] Large databases, performance - Mailing list pgsql-hackers
From | Charles H. Woloszynski |
---|---|
Subject | Re: [GENERAL] Large databases, performance |
Date | |
Msg-id | 3D9C3E05.7070906@clearmetrix.com Whole thread Raw |
In response to | Large databases, performance ("Shridhar Daithankar" <shridhar_daithankar@persistent.co.in>) |
Responses |
Re: [GENERAL] Large databases, performance
|
List | pgsql-hackers |
Can you comment on the tools you are using to do the insertions (Perl, Java?) and the distribution of data (all random, all static), and the transaction scope (all inserts in one transaction, each insert as a single transaction, some group of inserts as a transaction). I'd be curious what happens when you submit more queries than you have processors (you had four concurrent queries and four CPUs), if you care to run any additional tests. Also, I'd report the query time in absolute (like you did) and also in 'Time/number of concurrent queries". This will give you a sense of how the system is scaling as the workload increases. Personally I am more concerned about this aspect than the load time, since I am going to guess that this is where all the time is spent. Was the original posting on GENERAL or HACKERS. Is this moving the PERFORMANCE for follow-up? I'd like to follow this discussion and want to know if I should join another group? Thanks, Charlie P.S. Anyone want to comment on their expectation for 'commercial' databases handling this load? I know that we cannot speak about specific performance metrics on some products (licensing restrictions) but I'd be curious if folks have seen some of the databases out there handle these dataset sizes and respond resonably. Shridhar Daithankar wrote: >Hi, > >Today we concluded test for database performance. Attached are results and the >schema, for those who have missed earlier discussion on this. > >We have (almost) decided that we will partition the data across machines. The >theme is, after every some short interval a burst of data will be entered in >new table in database, indexed and vacuume. The table(s) will be inherited so >that query on base table will fetch results from all the children. The >application has to consolidate all the data per node basis. If the database is >not postgresql, app. has to consolidate data across partitions as well. > >Now we need to investigate whether selecting on base table to include children >would use indexes created on children table. > >It's estimated that when entire data is gathered, total number of children >tables would be around 1K-1.1K across all machines. > >This is in point of average rate of data insertion i.e. 5K records/sec and >total data size, estimated to be 9 billion rows max i.e. estimated database >size is 900GB. Obviously it's impossible to keep insertion rate on an indexed >table high as data grows. So partitioning/inheritance looks better approach. > >Postgresql is not the final winner as yet. Mysql is in close range. I will keep >you guys posted about the result. > >Let me know about any comments.. > >Bye > Shridhar > >-- >Price's Advice: It's all a game -- play it to have fun. > > > > >------------------------------------------------------------------------ > >Machine >Compaq Proliant Server ML 530 >"Intel Xeon 2.4 Ghz Processor x 4, " >"4 GB RAM, 5 x 72.8 GB SCSI HDD " >"RAID 0 (Striping) Hardware Setup, Mandrake Linux 9.0" >"Cost - $13,500 ($1,350 for each additional 72GB HDD)" > >Performance Parameter MySQL 3.23.52 MySQL 3.23.52 PostgreSQL 7.2.2 > WITHOUT InnoDB WITH InnoDB for with built-in support > for transactional transactional support for transactions > support >Complete Data > >Inserts + building a composite index >"40 GB data, 432,000,000 tuples" 3738 secs 18720 secs 20628 secs >"about 100 bytes each, schema on >'schema' sheet" >"composite index on 3 fields >(esn, min, datetime)" > >Load Speed 115570 tuples/second 23076 tuples/second 20942 tuples/second > >Database Size on Disk 48 GB 87 GB 111 GB > >Average per partition > >Inserts + building a composite index >"300MB data, 3,000,000 tuples," 28 secs 130 secs 150 secs >"about 100 bytes each, schema on >'schema' sheet" >"composite index on 3 fields >(esn, min, datetime)" > >Select Query 7 secs 7 secs 6 secs >based on equality match of 2 fields >(esn and min) - 4 concurrent queries >running > >Database Size on Disk 341 MB 619 MB 788 MB > > >------------------------------------------------------------------------ > >Field Name Field Type Nullable Indexed >type int no no >esn char (10) no yes >min char (10) no yes >datetime timestamp no yes >opc0 char (3) no no >opc1 char (3) no no >opc2 char (3) no no >dpc0 char (3) no no >dpc1 char (3) no no >dpc2 char (3) no no >npa char (3) no no >nxx char (3) no no >rest char (4) no no >field0 int yes no >field1 char (4) yes no >field2 int yes no >field3 char (4) yes no >field4 int yes no >field5 char (4) yes no >field6 int yes no >field7 char (4) yes no >field8 int yes no >field9 char (4) yes no > > > >------------------------------------------------------------------------ > > >---------------------------(end of broadcast)--------------------------- >TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > > -- Charles H. Woloszynski ClearMetrix, Inc. 115 Research Drive Bethlehem, PA 18015 tel: 610-419-2210 x400 fax: 240-371-3256 web: www.clearmetrix.com
pgsql-hackers by date: