Re: Why is 8.4 and 9.0 so much slower on some queries? - Mailing list pgsql-general

From Merlin Moncure
Subject Re: Why is 8.4 and 9.0 so much slower on some queries?
Date
Msg-id BANLkTikDYnA9hZEqQViQ6Cp9Mpa_pDt1=A@mail.gmail.com
Whole thread Raw
In response to Re: Why is 8.4 and 9.0 so much slower on some queries?  (Uwe Schroeder <uwe@oss4u.com>)
List pgsql-general
On Tue, Apr 12, 2011 at 12:58 AM, Uwe Schroeder <uwe@oss4u.com> wrote:
>
>
>> Uwe Schroeder <uwe@oss4u.com> writes:
>> > I have a 8.3 database and decided for various reasons to upgrade to 8.4.
>> > I also tried 9.0 - same results. On the exactly same hardware with the
>> > exactly same configuration, some queries perform a factor of 100000
>> > slower on 8.4 and 9.0 than on 8.3
>>
>> Hmm ... the problem here seems to come from the much poorer size
>>
>> estimate from the t/ugt antijoin:
>> >                ->  Hash Left Join  (cost=71.37..2445.93 rows=52 width=9)
>> >                (actual time=0.791..2.017 rows=104 loops=1)
>> >
>> >                      Hash Cond: ((subplan) = ugt.user_id)
>> >                      Filter: (ugt.group_id IS NULL)
>> >
>> >                ->  Hash Anti Join  (cost=62.53..2432.89 rows=1 width=9)
>> >                (actual time=0.639..2.976 rows=105 loops=1)
>> >
>> >                      Hash Cond: ((SubPlan 2) = ugt.user_id)
>>
>> Now, 8.3 has no idea what it's doing here, and it's just luck that it
>> produces an estimate that's only a factor of 2 off from reality.  8.4
>> actually understands that it's dealing with an antijoin, and with that
>> greater knowledge it produces ... a much worse estimate :-(.  And that
>> sends it into the weeds about how to perform the higher-level join.
>>
>> While I don't have your specific example to try, I did some
>> experimenting with queries of this form, and I noticed that 8.4's
>> heuristic in eqjoinsel_semi() was going completely nuts and estimating
>> that all rows in the lefthand side have join partners (thus, no rows out
>> of the antijoin).  This is because it has stats for one side of the
>> comparison operator but not the other side (the one with the
>> sub-select).  But it's taking the totally-made-up ndistinct estimate for
>> the sub-select at face value.  It needs to be a bit warier I think.
>>
>> In general you might want to think about whether you can recast this
>> query to avoid the sub-selects in the join conditions.  The planner
>> isn't terribly bright about sub-selects in the first place, and you're
>> putting them in places where quality estimates are really critical to
>> getting a good plan.
>
>
> Ahhh, that actually makes sense. Since this example isn't the only spot where
> I have this kind of query (I tweaked those for weeks to run fast, but always
> on a 8.3 server), I'll just downgrade to a 8.3 for production.
>
> If you're interested, I can provide database and data for you to toy with as
> there's nothing particularly secretive in that data except email addresses
> which can be erased as they're not needed for anything.

Tom already checked in a fix -- if you are able to build from source
you should be able to test it yourself.  9.0.4 and 8.4.8 will contain
the fix.  In the short term, you can get it from git (follow
instructions here): http://wiki.postgresql.org/wiki/Working_with_Git.
Build/install instructions here:
http://www.postgresql.org/docs/9.0/interactive/installation.html.

you will want the branch REL8_4_STABLE or REL9_0_STABLE.

merlin

pgsql-general by date:

Previous
From: Armin Nesiren
Date:
Subject: full text search for words with a few specific chars
Next
From: Merlin Moncure
Date:
Subject: Re: PostgreSQL backend process high memory usage issue