Re: SQL statement over 500% slower with 9.2 compared with 9.1 - Mailing list pgsql-performance

From Tomas Vondra
Subject Re: SQL statement over 500% slower with 9.2 compared with 9.1
Date
Msg-id 521D19BD.70208@fuzzy.cz
Whole thread Raw
In response to Re: SQL statement over 500% slower with 9.2 compared with 9.1  (Rafael Martinez <r.m.guerrero@usit.uio.no>)
Responses Re: SQL statement over 500% slower with 9.2 compared with 9.1  (Rafael Martinez <r.m.guerrero@usit.uio.no>)
List pgsql-performance
On 27.8.2013 11:19, Rafael Martinez wrote:
> On 08/26/2013 02:33 PM, Rafael Martinez wrote:
> [............]
>> The SQL statement is:
>
>> SELECT  firstname || ' ' || lastname AS Name FROM    Person R WHERE
>> R.gender like 'F' AND  19 < (SELECT COUNT(DISTINCT filmId) FROM
>> FilmParticipation F WHERE  F.partType = 'director' AND F.personId =
>> R.personId    ) AND NOT EXISTS ( SELECT  * FROM
>> FilmParticipation D WHERE   D.partType = 'director' AND D.personId
>> = R.personId AND NOT EXISTS ( SELECT  * FROM    FilmParticipation
>> C WHERE   C.partType = 'cast' AND C.filmId = D.filmId AND
>> C.personId = D.personId ) ) ;
>
>
> [.............]
>
>> We can see that the query plan is very different between versions
>> and that 9.2 is really wrong with the number of rows involved. Why
>> is 9.2 taking so wrong about the number of rows involved in some
>> parts of the plan?
>
>
> Hei
>
> More information:
>
> If we turn off enable_indexscan the runtime gets more similar to the
> one we get with 9.1, we are down to 4200ms.
>
> The query plan with this configuration is here:
> http://explain.depesz.com/s/jVR
>
> The question remains the same, why is 9.2 using such a different and
> bad plan compared to 9.1, when the data and the configuration are the
> same?

Hi,

seems the problem is mostly about the inner-most query, i.e. this:

    SELECT  *
    FROM    FilmParticipation C
    WHERE   C.partType = 'cast'
            AND C.filmId = D.filmId
            AND C.personId = D.personId
        )

In 9.2 it's estimated to return 1 row, but it returns 595612 of them (or
97780 after materialization). I believe this is the culprit that causes
cost estimates that are way off, and that in turn leads to choice of
"cheaper" plan that actually takes much longer to evaluate.

Because the slow plan is estimated to "cost" 122367017.97 while the fast
one 335084834.95 (i.e. 3x more).

I don't immediately see where's the problem - maybe some other hacker on
this list can. Can you prepare a testcase for this? I.e. a structure of
the tables + data so that we can reproduce it?

regards
Tomas



pgsql-performance by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: Cpu usage 100% on slave. s_lock problem.
Next
From: Tomas Vondra
Date:
Subject: Re: Poor performance on simple queries compared to sql server express