Re: PostgreSQL runs a query much slower than BDE and MySQL - Mailing list pgsql-performance

From Tom Lane
Subject Re: PostgreSQL runs a query much slower than BDE and MySQL
Date
Msg-id 29246.1155768717@sss.pgh.pa.us
Whole thread Raw
In response to PostgreSQL runs a query much slower than BDE and MySQL  ("Peter Hardman" <peter@ssbg.zetnet.co.uk>)
Responses Re: PostgreSQL runs a query much slower than BDE and MySQL  ("Peter Hardman" <peter@ssbg.zetnet.co.uk>)
Re: PostgreSQL runs a query much slower than BDE and MySQL  (Scott Lamb <slamb@slamb.org>)
List pgsql-performance
"Peter Hardman" <peter@ssbg.zetnet.co.uk> writes:
> I'm in the process of migrating a Paradox 7/BDE 5.01 database from single-user
> Paradox to a web based interface to either MySQL or PostgreSQL.
> The query I run is:

> /* Select all sheep who's most recent transfer was into the subject flock */
> SELECT DISTINCT f1.regn_no, f1.transfer_date as date_in
> FROM SHEEP_FLOCK f1 JOIN
>     /* The last transfer date for each sheep */
>     (SELECT f.regn_no, MAX(f.transfer_date) as last_xfer_date
>     FROM  SHEEP_FLOCK f
>     GROUP BY f.regn_no) f2
> ON f1.regn_no = f2.regn_no
> WHERE f1.flock_no = '1359'
> AND f1.transfer_date = f2.last_xfer_date

This seems pretty closely related to this recent thread:
http://archives.postgresql.org/pgsql-performance/2006-08/msg00220.php
in which the OP is doing a very similar kind of query in almost exactly
the same way.

I can't help thinking that there's probably a better way to phrase this
type of query in SQL, though it's not jumping out at me what that is.

What I find interesting though is that it sounds like both MSSQL and
Paradox know something we don't about how to optimize it.  PG doesn't
have any idea how to do the above query without forming the full output
of the sub-select, but I suspect that the commercial DBs know a
shortcut; perhaps they are able to automatically derive a restriction
in the subquery similar to what you did by hand.  Does Paradox have
anything comparable to EXPLAIN that would give a hint about the query
plan they are using?

Also, just as in the other thread, I'm thinking that a seqscan+hash
aggregate would be a better idea than this bit:

>                    ->  GroupAggregate  (cost=0.00..3924.91 rows=33676 width=13) (actual time=0.324..473.131
rows=38815loops=1) 
>                          ->  Index Scan using sheep_flock_pkey on sheep_flock f (cost=0.00..3094.95 rows=81802
width=13)(actual time=0.295..232.156) 

Possibly you need to raise work_mem to get it to consider the hash
aggregation method.

BTW, are you *sure* you are testing PG 8.1?  The "Subquery Scan f2" plan
node looks unnecessary to me, and I'd have expected 8.1 to drop it out.
8.0 and before would have left it in the plan though.  This doesn't make
all that much difference performance-wise in itself, but it does make me
wonder what you are testing.

            regards, tom lane

pgsql-performance by date:

Previous
From: "Peter Hardman"
Date:
Subject: Re: PostgreSQL runs a query much slower than BDE and MySQL
Next
From: Steve Poe
Date:
Subject: Re: Postgresql Performance on an HP DL385 and