How Postgresql Compares For Query And Load Operations - Mailing list pgsql-general
From | Mark kirkwood |
---|---|
Subject | How Postgresql Compares For Query And Load Operations |
Date | |
Msg-id | 01071322224708.01182@spikey.slithery.org Whole thread Raw |
Responses |
Re: How Postgresql Compares For Query And Load Operations
(Tom Lane <tgl@sss.pgh.pa.us>)
Re: How Postgresql Compares For Query And Load Operations (Ryan Mahoney <ryan@paymentalliance.net>) |
List | pgsql-general |
Dear list, With the advent of Version 7.1.2 I thought it would be interesting to compare how Postgresql does a certain class of queries (Star Queries), and Data Loads with some of the other leading databases ( which were in my humble opinion Mysql 3.23.39, Db2 7.2 ,Oracle 9.0 [ ok - Possibly SQL 2000 too but I dont run Winanyk] ). The results were overall very encouraging : Postgresql can clearly hold its own when compared to the "big guys". The full details (together with a wee rant) are aviailable on : http://homepages.slingshot.co.nz/~markir (if anyone asks I can submit the entire results...but I figured, lets cut to the chase here....) There were two areas where Postgresql was slower, and I thought it would be informative to discuss these briefly : 1 Star query scanning a sigificant portion of a fact table SELECT d0.f1, count(f.val) FROM dim0 d0, fact1 f WHERE d0.d0key = f.d0key AND d0.f1 between '2007-07-01' AND '2018-07-01' GROUP BY d0.f1 This query requires summarizing a significant proportion of the 3000000 row ( 700Mb ) fact1 table. Postgres 7.1.2 executed this query like : Aggregate (cost=2732703.88..2738731.49 rows=120552 width=20) -> Group (cost=2732703.88..2735717.69 rows=1205521 width=20) -> Sort (cost=2732703.88..2732703.88 rows=1205521 width=20) -> Hash Join (cost=1967.52..2550188.93 rows=1205521 width=20) -> Seq Scan on fact1 f (cost=0.00..1256604.00 rows=3000000 width=8) -> Hash (cost=1957.47..1957.47 rows=4018 width=12) -> Index Scan using dim0_q1 on dim0 d0 (cost=0.00..1957.47 rows=4018 width=12) for an elapsed time of 3m50s Wheras Oracle 9.0 used : SELECT STATEMENT Optimizer=CHOOSE (Cost=5810 Card=4020 Bytes =60300) SORT (GROUP BY) (Cost=5810 Card=4020 Bytes=60300) HASH JOIN (Cost=5810 Card=2989644 Bytes=44844660) TABLE ACCESS (FULL) OF 'DIM0' (Cost=4 Card=4020 Bytes= 40200) TABLE ACCESS (FULL) OF 'FACT1' (Cost=5806 Card=2990089 Bytes=14950445) for an elapsed time of 50s. It would seem that Oracle's execution plan is more optimal. 2 Bulk loading data Buld Load times for a 3000000 row (700Mb ) fact table were Postgresql 9m30s (copy) Db2 2m15s (load) Oracle 5m (sqlldr) Mysql 2m20s (load) (Note that Db2 is faster than Mysql here ! ) While I left "fsync = on" for this test... I still think the difference was worth noting. Any comments on these sort of results would be welcome. regards Mark
pgsql-general by date: