Re: [GENERAL] Large databases, performance - Mailing list pgsql-hackers

From Charles H. Woloszynski
Subject Re: [GENERAL] Large databases, performance
Date
Msg-id 3D9C3E05.7070906@clearmetrix.com
Whole thread Raw
In response to Large databases, performance  ("Shridhar Daithankar" <shridhar_daithankar@persistent.co.in>)
Responses Re: [GENERAL] Large databases, performance  ("Shridhar Daithankar" <shridhar_daithankar@persistent.co.in>)
List pgsql-hackers
Can you comment on the tools you are using to do the insertions (Perl,
Java?) and the distribution of data (all random, all static), and the
transaction scope (all inserts in one transaction, each insert as a
single transaction, some group of inserts as a transaction).

I'd be curious what happens when you submit more queries than you have
processors (you had four concurrent queries and four CPUs), if you care
to run any additional tests.  Also, I'd report the query time in
absolute (like you did) and also in 'Time/number of concurrent queries".
 This will give you a sense of how the system is scaling as the workload
increases.  Personally I am more concerned about this aspect than the
load time, since I am going to guess that this is where all the time is
spent.

Was the original posting on GENERAL or HACKERS.  Is this moving the
PERFORMANCE for follow-up?  I'd like to follow this discussion and want
to know if I should join another group?

Thanks,

Charlie

P.S.  Anyone want to comment on their expectation for 'commercial'
databases handling this load?  I know that we cannot speak about
specific performance metrics on some products (licensing restrictions)
but I'd be curious if folks have seen some of the databases out there
handle these dataset sizes and respond resonably.


Shridhar Daithankar wrote:

>Hi,
>
>Today we concluded test for database performance. Attached are results and the
>schema, for those who have missed earlier discussion on this.
>
>We have (almost) decided that we will partition the data across machines. The
>theme is, after every some short interval a burst of data will be entered in
>new table in database, indexed and vacuume. The table(s) will be inherited so
>that query on base table will fetch results from all the children. The
>application has to consolidate all the data per node basis. If the database is
>not postgresql, app. has to consolidate data across partitions as well.
>
>Now we need to investigate whether selecting on base table to include children
>would use indexes created on children table.
>
>It's estimated that when entire data is gathered, total number of children
>tables would be around 1K-1.1K across all machines.
>
>This is in point of average rate of data insertion i.e. 5K records/sec and
>total data size, estimated to be 9 billion rows max i.e. estimated database
>size is 900GB. Obviously it's impossible to keep insertion rate on an indexed
>table high as data grows. So partitioning/inheritance looks better approach.
>
>Postgresql is not the final winner as yet. Mysql is in close range. I will keep
>you guys posted about the result.
>
>Let me know about any comments..
>
>Bye
> Shridhar
>
>--
>Price's Advice:    It's all a game -- play it to have fun.
>
>
>
>
>------------------------------------------------------------------------
>
>Machine
>Compaq Proliant Server ML 530
>"Intel Xeon 2.4 Ghz Processor x 4, "
>"4 GB RAM, 5 x 72.8 GB SCSI HDD "
>"RAID 0 (Striping) Hardware Setup, Mandrake Linux 9.0"
>"Cost - $13,500 ($1,350 for each additional 72GB HDD)"
>
>Performance Parameter                MySQL 3.23.52          MySQL 3.23.52          PostgreSQL 7.2.2
>                        WITHOUT InnoDB         WITH InnoDB for     with built-in support
>                        for transactional     transactional support    for transactions
>                        support
>Complete Data
>
>Inserts + building a composite index
>"40 GB data, 432,000,000 tuples"        3738 secs        18720 secs        20628 secs
>"about 100 bytes each, schema on
>'schema' sheet"
>"composite index on 3 fields
>(esn, min, datetime)"
>
>Load Speed                    115570 tuples/second    23076 tuples/second    20942 tuples/second
>
>Database Size on Disk                48 GB            87 GB            111 GB
>
>Average per partition
>
>Inserts + building a composite index
>"300MB data, 3,000,000 tuples,"            28 secs            130 secs        150 secs
>"about 100 bytes each, schema on
>'schema' sheet"
>"composite index on 3 fields
>(esn, min, datetime)"
>
>Select Query                      7 secs            7 secs            6 secs
>based on equality match of 2 fields
>(esn and min) - 4 concurrent queries
>running
>
>Database Size on Disk                341 MB            619 MB            788 MB
>
>
>------------------------------------------------------------------------
>
>Field Name    Field Type    Nullable    Indexed
>type        int        no        no
>esn        char (10)    no        yes
>min        char (10)    no        yes
>datetime    timestamp    no        yes
>opc0        char (3)    no        no
>opc1        char (3)    no        no
>opc2        char (3)    no        no
>dpc0        char (3)    no        no
>dpc1        char (3)    no        no
>dpc2        char (3)    no        no
>npa        char (3)    no        no
>nxx        char (3)    no        no
>rest        char (4)    no        no
>field0        int        yes        no
>field1        char (4)    yes        no
>field2        int        yes        no
>field3        char (4)    yes        no
>field4        int        yes        no
>field5        char (4)    yes        no
>field6        int        yes        no
>field7        char (4)    yes        no
>field8        int        yes        no
>field9        char (4)    yes        no
>
>
>
>------------------------------------------------------------------------
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 2: you can get off all lists at once with the unregister command
>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
>

--


Charles H. Woloszynski

ClearMetrix, Inc.
115 Research Drive
Bethlehem, PA 18015

tel: 610-419-2210 x400
fax: 240-371-3256
web: www.clearmetrix.com





pgsql-hackers by date:

Previous
From: Tino Wildenhain
Date:
Subject: Re: [GENERAL] Anyone want to assist with the translation of the
Next
From: Roland Roberts
Date:
Subject: Re: (Fwd) Re: Any Oracle 9 users? A test please...