Re: Huge difference in query performance between 8.3 and 8.4 (possibly) - Mailing list pgsql-performance

From Tom Lane
Subject Re: Huge difference in query performance between 8.3 and 8.4 (possibly)
Date
Msg-id 2712.1247159351@sss.pgh.pa.us
Whole thread Raw
In response to Huge difference in query performance between 8.3 and 8.4 (possibly)  (Robin Houston <robin.houston@gmail.com>)
List pgsql-performance
Robin Houston <robin.houston@gmail.com> writes:
> We have a query that runs very slowly on our 8.3 database. (I can't
> tell you exactly how slowly, because it has never successfully run to
> completion even when we left it running overnight.) On the 8.4
> database on my laptop, it runs in about 90 seconds. Of course there
> are several differences between the two instances, but I wonder
> whether query planning improvements in 8.4 could essentially account
> for it.

Well, it's hard to be sure with only EXPLAIN and not EXPLAIN ANALYZE
output to look at; but I think the significant difference in these plans
is that 8.4 has chosen a hash instead of nestloop join for a couple of
the intermediate join levels.  Which is evidently because of a change
in the estimated size of the next join down:

                     ->  Nested Loop  (cost=44050.86..77114.32 rows=1 width=50)
                           Join Filter: ((second.feature_cvterm_id > first.feature_cvterm_id) AND (second.feature_id =
first.feature_id)AND (second.pub_id = first.pub_id) AND ((second_withfrom_dbxref.accession)::text =
(first_withfrom_dbxref.accession)::text))
                           ->  Nested Loop  (cost=30794.26..42915.70 rows=1 width=69)
                                 ->  Hash Join (cost=30794.26..42906.88 rows=1 width=65)
                                       Hash Cond: (second_evidence.type_id = evidence_type.cvterm_id)

versus

                           ->  Hash Join  (cost=63949.73..77732.49 rows=1 width=59)
                                 Hash Cond: ((second.feature_id = first.feature_id) AND (second.pub_id = first.pub_id)
AND((second_withfrom_dbxref.accession)::text = (first_withfrom_dbxref.accession)::text)) 
                                 Join Filter: (second.feature_cvterm_id > first.feature_cvterm_id)
                                 ->  Hash Join (cost=30236.57..41303.13 rows=4607 width=66)
                                       Hash Cond: (second_evidence.type_id = evidence_type.cvterm_id)

If the 8.4 rowcount estimate is accurate then it's not surprising that
the nestloop plan sucks --- it'd be re-executing the other arm of the
join 4600 or so times.

This could reflect improvements in the join size estimation code, or
maybe it's just a consequence of 8.4 using larger statistics targets
by default.  It's hard to be sure with so little information to go on.

            regards, tom lane

pgsql-performance by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Huge difference in query performance between 8.3 and 8.4 (possibly)
Next
From: "Haszlakiewicz, Eric"
Date:
Subject: embedded sql regression from 8.2.4 to 8.3.7