Thread: How Postgresql Compares For Some Query Types

How Postgresql Compares For Some Query Types

From
Mark kirkwood
Date:
Dear list,

With the advent of Version 7.1.2 I thought it would be interesting to compare 
how Postgresql does a certain class of queries (Star Queries), and Data Loads 
with some of the other leading databases ( which were in my humble opinion 
Mysql 3.23.39, Db2 7.2 ,Oracle 9.0 [ ok - Possibly SQL 2000 too but I dont 
run Winanyk] ). 

The results were overall very encouraging :

Postgresql can clearly hold its own when compared to the "big guys".

The full details (together with a wee rant) are aviailable on :

http://homepages.slingshot.co.nz/~markir

(if anyone asks I can submit the entire results...but I figured, lets cut to 
the chase here....)

There were two areas where Postgresql was slower, and I thought it would be 
informative to discuss these briefly :


1  Star query scanning a sigificant portion of a fact table 

SELECT       d0.f1,      count(f.val)
FROM dim0 d0,    fact1 f
WHERE d0.d0key = f.d0key
AND   d0.f1 between '2007-07-01' AND '2018-07-01'
GROUP BY d0.f1

This query requires summarizing a significant proportion of the 3000000 row ( 
700Mb ) fact1 table.

Postgres 7.1.2 executed this query like :
Aggregate  (cost=2732703.88..2738731.49 rows=120552 width=20) -> Group  (cost=2732703.88..2735717.69 rows=1205521
width=20)    -> Sort  (cost=2732703.88..2732703.88 rows=1205521 width=20)        -> Hash Join
(cost=1967.52..2550188.93rows=1205521 width=20)          -> Seq Scan on fact1 f  (cost=0.00..1256604.00 rows=3000000 
 
width=8)          -> Hash  (cost=1957.47..1957.47 rows=4018 width=12)            -> Index Scan using dim0_q1 on dim0 d0
(cost=0.00..1957.47 
 
rows=4018 width=12) 

for an elapsed time of 3m50s

Wheras Oracle 9.0 used :
SELECT STATEMENT Optimizer=CHOOSE (Cost=5810 Card=4020 Bytes =60300)    SORT (GROUP BY) (Cost=5810 Card=4020
Bytes=60300)     HASH JOIN (Cost=5810 Card=2989644 Bytes=44844660)        TABLE ACCESS (FULL) OF 'DIM0' (Cost=4
Card=4020Bytes= 40200)        TABLE ACCESS (FULL) OF 'FACT1' (Cost=5806 Card=2990089 
 
Bytes=14950445)

for an elapsed time of 50s.

It would seem that Oracle's execution plan is more optimal.


2    Bulk loading data

Buld Load times for a 3000000 row (700Mb ) fact table were 

Postgresql    9m30s    (copy)
Db2        2m15s    (load)
Oracle        5m    (sqlldr)
Mysql        2m20s    (load)


(Note that Db2 is faster than Mysql here ! )

While I left "fsync = on" for this test... I still think the difference was 
worth noting.

Any comments on these sort of results would be welcome.

regards

Mark






Re: How Postgresql Compares For Some Query Types

From
"Josh Berkus"
Date:
Mark, (and Tom below)

    Thanks for doing the tests.  These'll be useful to me if any client
asks me again if PostgreSQL is a "real" database.  I'm particularly
intrigued that MySQL performed so poorly.  In the field, it's been quite
speedy but maybe that's because applications using it are restricted to
simple queries.

    You may wish to post your link to Slashdot and pgsql-announce as well.
I'm sure the Great Bridge folks will be interested.

    I'm a little unclear on what a "fact table" is.  Can you explain?

    Finally, Tom, I figured out what was up with the 30-45 second response
time on the large star query I couldn't get to perform.  Turns out that
the client had changed two of the VARCHAR fields to TEXT fields and
added large blocks of text (> 16k for some) to them.  When I removed
these fields from the query, it gives me the <7s response time I'd
expected.


                            -Josh


______AGLIO DATABASE SOLUTIONS___________________________
                                       Josh Berkus
  Complete information technology      josh@agliodbs.com
   and data management solutions       (415) 565-7293
  for law firms, small businesses        fax 621-2533
    and non-profit organizations.      San Francisco

Attachment

Re: How Postgresql Compares For Some Query Types

From
Date:
> 
>     I'm a little unclear on what a "fact table" is.  Can you explain?

Good question... The idea comes from data warehousing where a typical data 
construction involves two "types" of table :

dimensions : time, location, ethicity....( essentially denormalized lookups )
facts      : observations, accidents, sales ....

facts have a composite primary key , each componant of which is a foreign key 
for one of the dimensions.

If drawn with the fact table(s) in the center and dimensions around about, then 
the ERD looks like a "star". Hence the names "star schema" and "star query"

A quick search located a vaguely helpful page : ( apologies about the source )

http://www.oradoc.com/ora816/server.816/a76994/schemas.htm

A much better source is Kimball's book "The Data Warehousing Toolkit".

regards

Mark