Re: Why is 8.4 and 9.0 so much slower on some queries? - Mailing list pgsql-general

From Uwe Schroeder
Subject Re: Why is 8.4 and 9.0 so much slower on some queries?
Date
Msg-id 201104112258.01949.uwe@oss4u.com
Whole thread Raw
In response to Re: Why is 8.4 and 9.0 so much slower on some queries?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Why is 8.4 and 9.0 so much slower on some queries?
List pgsql-general

> Uwe Schroeder <uwe@oss4u.com> writes:
> > I have a 8.3 database and decided for various reasons to upgrade to 8.4.
> > I also tried 9.0 - same results. On the exactly same hardware with the
> > exactly same configuration, some queries perform a factor of 100000
> > slower on 8.4 and 9.0 than on 8.3
>
> Hmm ... the problem here seems to come from the much poorer size
>
> estimate from the t/ugt antijoin:
> >                ->  Hash Left Join  (cost=71.37..2445.93 rows=52 width=9)
> >                (actual time=0.791..2.017 rows=104 loops=1)
> >
> >                      Hash Cond: ((subplan) = ugt.user_id)
> >                      Filter: (ugt.group_id IS NULL)
> >
> >                ->  Hash Anti Join  (cost=62.53..2432.89 rows=1 width=9)
> >                (actual time=0.639..2.976 rows=105 loops=1)
> >
> >                      Hash Cond: ((SubPlan 2) = ugt.user_id)
>
> Now, 8.3 has no idea what it's doing here, and it's just luck that it
> produces an estimate that's only a factor of 2 off from reality.  8.4
> actually understands that it's dealing with an antijoin, and with that
> greater knowledge it produces ... a much worse estimate :-(.  And that
> sends it into the weeds about how to perform the higher-level join.
>
> While I don't have your specific example to try, I did some
> experimenting with queries of this form, and I noticed that 8.4's
> heuristic in eqjoinsel_semi() was going completely nuts and estimating
> that all rows in the lefthand side have join partners (thus, no rows out
> of the antijoin).  This is because it has stats for one side of the
> comparison operator but not the other side (the one with the
> sub-select).  But it's taking the totally-made-up ndistinct estimate for
> the sub-select at face value.  It needs to be a bit warier I think.
>
> In general you might want to think about whether you can recast this
> query to avoid the sub-selects in the join conditions.  The planner
> isn't terribly bright about sub-selects in the first place, and you're
> putting them in places where quality estimates are really critical to
> getting a good plan.


Ahhh, that actually makes sense. Since this example isn't the only spot where
I have this kind of query (I tweaked those for weeks to run fast, but always
on a 8.3 server), I'll just downgrade to a 8.3 for production.

If you're interested, I can provide database and data for you to toy with as
there's nothing particularly secretive in that data except email addresses
which can be erased as they're not needed for anything.

By the end of the day I'll set up another machine with a 9.0 and start
changing the queries - simply because 9.0 is a heck of a lot faster with most
things - except these kind of left outer joins and subselects.

Thanks for your response.

Uwe




pgsql-general by date:

Previous
From: Arash pajoohande
Date:
Subject: Re: what data type to store fixed size integer?
Next
From: tomas@tuxteam.de
Date:
Subject: Different cast behavior of TEXT and VARCHAR