How Postgresql Compares For Some Query Types - Mailing list pgsql-sql

From Mark kirkwood
Subject How Postgresql Compares For Some Query Types
Date
Msg-id 01071223501800.01909@spikey.slithery.org
Whole thread Raw
Responses Re: How Postgresql Compares For Some Query Types
List pgsql-sql
Dear list,

With the advent of Version 7.1.2 I thought it would be interesting to compare 
how Postgresql does a certain class of queries (Star Queries), and Data Loads 
with some of the other leading databases ( which were in my humble opinion 
Mysql 3.23.39, Db2 7.2 ,Oracle 9.0 [ ok - Possibly SQL 2000 too but I dont 
run Winanyk] ). 

The results were overall very encouraging :

Postgresql can clearly hold its own when compared to the "big guys".

The full details (together with a wee rant) are aviailable on :

http://homepages.slingshot.co.nz/~markir

(if anyone asks I can submit the entire results...but I figured, lets cut to 
the chase here....)

There were two areas where Postgresql was slower, and I thought it would be 
informative to discuss these briefly :


1  Star query scanning a sigificant portion of a fact table 

SELECT       d0.f1,      count(f.val)
FROM dim0 d0,    fact1 f
WHERE d0.d0key = f.d0key
AND   d0.f1 between '2007-07-01' AND '2018-07-01'
GROUP BY d0.f1

This query requires summarizing a significant proportion of the 3000000 row ( 
700Mb ) fact1 table.

Postgres 7.1.2 executed this query like :
Aggregate  (cost=2732703.88..2738731.49 rows=120552 width=20) -> Group  (cost=2732703.88..2735717.69 rows=1205521
width=20)    -> Sort  (cost=2732703.88..2732703.88 rows=1205521 width=20)        -> Hash Join
(cost=1967.52..2550188.93rows=1205521 width=20)          -> Seq Scan on fact1 f  (cost=0.00..1256604.00 rows=3000000 
 
width=8)          -> Hash  (cost=1957.47..1957.47 rows=4018 width=12)            -> Index Scan using dim0_q1 on dim0 d0
(cost=0.00..1957.47 
 
rows=4018 width=12) 

for an elapsed time of 3m50s

Wheras Oracle 9.0 used :
SELECT STATEMENT Optimizer=CHOOSE (Cost=5810 Card=4020 Bytes =60300)    SORT (GROUP BY) (Cost=5810 Card=4020
Bytes=60300)     HASH JOIN (Cost=5810 Card=2989644 Bytes=44844660)        TABLE ACCESS (FULL) OF 'DIM0' (Cost=4
Card=4020Bytes= 40200)        TABLE ACCESS (FULL) OF 'FACT1' (Cost=5806 Card=2990089 
 
Bytes=14950445)

for an elapsed time of 50s.

It would seem that Oracle's execution plan is more optimal.


2    Bulk loading data

Buld Load times for a 3000000 row (700Mb ) fact table were 

Postgresql    9m30s    (copy)
Db2        2m15s    (load)
Oracle        5m    (sqlldr)
Mysql        2m20s    (load)


(Note that Db2 is faster than Mysql here ! )

While I left "fsync = on" for this test... I still think the difference was 
worth noting.

Any comments on these sort of results would be welcome.

regards

Mark






pgsql-sql by date:

Previous
From: Carlos
Date:
Subject: Select distinct and order by.
Next
From: "Thalis A. Kalfigopoulos"
Date:
Subject: Re: [GENERAL] trigger on DELETE