Large databases, performance - Mailing list pgsql-hackers

From Shridhar Daithankar
Subject Large databases, performance
Date
Msg-id 3D9C8712.9513.9C6521D@localhost
Whole thread Raw
Responses Re: Large databases, performance  (Robert Treat <xzilla@users.sourceforge.net>)
Re: [GENERAL] Large databases, performance  (Manfred Koizar <mkoi-pg@aon.at>)
List pgsql-hackers
Hi,

Today we concluded test for database performance. Attached are results and the
schema, for those who have missed earlier discussion on this.

We have (almost) decided that we will partition the data across machines. The
theme is, after every some short interval a burst of data will be entered in
new table in database, indexed and vacuume. The table(s) will be inherited so
that query on base table will fetch results from all the children. The
application has to consolidate all the data per node basis. If the database is
not postgresql, app. has to consolidate data across partitions as well.

Now we need to investigate whether selecting on base table to include children
would use indexes created on children table.

It's estimated that when entire data is gathered, total number of children
tables would be around 1K-1.1K across all machines.

This is in point of average rate of data insertion i.e. 5K records/sec and
total data size, estimated to be 9 billion rows max i.e. estimated database
size is 900GB. Obviously it's impossible to keep insertion rate on an indexed
table high as data grows. So partitioning/inheritance looks better approach.

Postgresql is not the final winner as yet. Mysql is in close range. I will keep
you guys posted about the result.

Let me know about any comments..

Bye
 Shridhar

--
Price's Advice:    It's all a game -- play it to have fun.


Machine
Compaq Proliant Server ML 530
"Intel Xeon 2.4 Ghz Processor x 4, "
"4 GB RAM, 5 x 72.8 GB SCSI HDD "
"RAID 0 (Striping) Hardware Setup, Mandrake Linux 9.0"
"Cost - $13,500 ($1,350 for each additional 72GB HDD)"

Performance Parameter                MySQL 3.23.52          MySQL 3.23.52          PostgreSQL 7.2.2
                        WITHOUT InnoDB         WITH InnoDB for     with built-in support
                        for transactional     transactional support    for transactions
                        support
Complete Data

Inserts + building a composite index
"40 GB data, 432,000,000 tuples"        3738 secs        18720 secs        20628 secs
"about 100 bytes each, schema on
'schema' sheet"
"composite index on 3 fields
(esn, min, datetime)"

Load Speed                    115570 tuples/second    23076 tuples/second    20942 tuples/second

Database Size on Disk                48 GB            87 GB            111 GB

Average per partition

Inserts + building a composite index
"300MB data, 3,000,000 tuples,"            28 secs            130 secs        150 secs
"about 100 bytes each, schema on
'schema' sheet"
"composite index on 3 fields
(esn, min, datetime)"

Select Query                      7 secs            7 secs            6 secs
based on equality match of 2 fields
(esn and min) - 4 concurrent queries
running

Database Size on Disk                341 MB            619 MB            788 MB
Field Name    Field Type    Nullable    Indexed
type        int        no        no
esn        char (10)    no        yes
min        char (10)    no        yes
datetime    timestamp    no        yes
opc0        char (3)    no        no
opc1        char (3)    no        no
opc2        char (3)    no        no
dpc0        char (3)    no        no
dpc1        char (3)    no        no
dpc2        char (3)    no        no
npa        char (3)    no        no
nxx        char (3)    no        no
rest        char (4)    no        no
field0        int        yes        no
field1        char (4)    yes        no
field2        int        yes        no
field3        char (4)    yes        no
field4        int        yes        no
field5        char (4)    yes        no
field6        int        yes        no
field7        char (4)    yes        no
field8        int        yes        no
field9        char (4)    yes        no


pgsql-hackers by date:

Previous
From: Justin Clift
Date:
Subject: OT: Looking to Open Source the Flash training material
Next
From: Philip Warner
Date:
Subject: Re: pg_dump and large files - is this a problem?