Re: Why is 8.4 and 9.0 so much slower on some queries? - Mailing list pgsql-general

From Tom Lane
Subject Re: Why is 8.4 and 9.0 so much slower on some queries?
Date
Msg-id 16745.1302584242@sss.pgh.pa.us
Whole thread Raw
In response to Why is 8.4 and 9.0 so much slower on some queries?  (Uwe Schroeder <uwe@oss4u.com>)
Responses Re: Why is 8.4 and 9.0 so much slower on some queries?
List pgsql-general
Uwe Schroeder <uwe@oss4u.com> writes:
> I have a 8.3 database and decided for various reasons to upgrade to 8.4. I also tried 9.0 - same results.
> On the exactly same hardware with the exactly same configuration, some queries perform a factor of 100000 slower on
8.4and 9.0 than on 8.3 

Hmm ... the problem here seems to come from the much poorer size
estimate from the t/ugt antijoin:

>                ->  Hash Left Join  (cost=71.37..2445.93 rows=52 width=9) (actual time=0.791..2.017 rows=104 loops=1)
>                      Hash Cond: ((subplan) = ugt.user_id)
>                      Filter: (ugt.group_id IS NULL)

>                ->  Hash Anti Join  (cost=62.53..2432.89 rows=1 width=9) (actual time=0.639..2.976 rows=105 loops=1)
>                      Hash Cond: ((SubPlan 2) = ugt.user_id)

Now, 8.3 has no idea what it's doing here, and it's just luck that it
produces an estimate that's only a factor of 2 off from reality.  8.4
actually understands that it's dealing with an antijoin, and with that
greater knowledge it produces ... a much worse estimate :-(.  And that
sends it into the weeds about how to perform the higher-level join.

While I don't have your specific example to try, I did some
experimenting with queries of this form, and I noticed that 8.4's
heuristic in eqjoinsel_semi() was going completely nuts and estimating
that all rows in the lefthand side have join partners (thus, no rows out
of the antijoin).  This is because it has stats for one side of the
comparison operator but not the other side (the one with the
sub-select).  But it's taking the totally-made-up ndistinct estimate for
the sub-select at face value.  It needs to be a bit warier I think.

In general you might want to think about whether you can recast this
query to avoid the sub-selects in the join conditions.  The planner
isn't terribly bright about sub-selects in the first place, and you're
putting them in places where quality estimates are really critical to
getting a good plan.

            regards, tom lane

pgsql-general by date:

Previous
From: Uwe Schroeder
Date:
Subject: Why is 8.4 and 9.0 so much slower on some queries?
Next
From: Arash pajoohande
Date:
Subject: Re: what data type to store fixed size integer?