Thread: SQL statement over 500% slower with 9.2 compared with 9.1

SQL statement over 500% slower with 9.2 compared with 9.1

From
Rafael Martinez
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello

We have a SQL statement that with 9.1 takes ca 4000ms to finnish and
with 9.2 over 22000ms.

The explain analyze information is here:

With 9.1.:
http://explain.depesz.com/s/5ou

With 9.2
http://explain.depesz.com/s/d4vU

The SQL statement is:

SELECT  firstname || ' ' || lastname AS Name
FROM    Person R
WHERE  R.gender like 'F'
AND  19 < (SELECT COUNT(DISTINCT filmId)
              FROM   FilmParticipation F
              WHERE  F.partType = 'director' AND
                     F.personId = R.personId    )
        AND NOT EXISTS (
                SELECT  *
                FROM    FilmParticipation D
                WHERE   D.partType = 'director'
                        AND D.personId = R.personId
                        AND NOT EXISTS (
                                SELECT  *
                                FROM    FilmParticipation C
                                WHERE   C.partType = 'cast'
                                        AND C.filmId = D.filmId
                                        AND C.personId = D.personId
                                       )
                       )
;


The tables information:

# SELECT count(*) from filmparticipation;
  count
- ----------
 10835351
(1 row)

# SELECT pg_size_pretty(pg_table_size('filmparticipation'));
 pg_size_pretty
- ----------------
 540 MB
(1 row)

# SELECT count(*) from person;
  count
- ---------
 1709384
(1 row)

# SELECT pg_size_pretty(pg_table_size('person'));
 pg_size_pretty
- ----------------
 85 MB
(1 row)


We can see that the query plan is very different between versions and
that 9.2 is really wrong with the number of rows involved. Why is 9.2
taking so wrong about the number of rows involved in some parts of the
plan?

Some additional information:

* VACUUM ANALYZE has been run in both databases.
* Both databases are running on servers running RHEL6.3.
* The relevant parameters changed from the default configuration are:

9.1:
- ----

 checkpoint_segments         | 128
 client_encoding             | UTF8
 effective_cache_size        | 28892MB
 maintenance_work_mem        | 256MB
 max_connections             | 400
 max_stack_depth             | 4MB
 random_page_cost            | 2
 server_encoding             | UTF8
 shared_buffers              | 8026MB
 ssl                         | on
 ssl_renegotiation_limit     | 0
 wal_buffers                 | 16MB
 wal_level                   | archive
 wal_sync_method             | fdatasync
 work_mem                    | 16MB


9.2:
- ----

 checkpoint_segments         | 128
 client_encoding             | UTF8
 effective_cache_size        | 28892MB
 maintenance_work_mem        | 256MB
 max_connections             | 400
 max_stack_depth             | 4MB
 random_page_cost            | 2
 server_encoding             | UTF8
 shared_buffers              | 8026MB
 ssl                         | on
 ssl_renegotiation_limit     | 0
 wal_buffers                 | 16MB
 wal_level                   | archive
 wal_sync_method             | fdatasync
 work_mem                    | 16MB


Any ideas on why this is happening and how to fix it?

Thanks in advance for your time.
regards,
- --
 Rafael Martinez Guerrero
 Center for Information Technology
 University of Oslo, Norway

 PGP Public Key: http://folk.uio.no/rafael/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)

iEYEARECAAYFAlIbSyoACgkQBhuKQurGihTOYwCfWC/ptAuMQ1pxFcplq9bHfBi3
uekAnj+nll/Z2Lr8kFgPAB6Fx0Kop4/0
=3TPA
-----END PGP SIGNATURE-----


Re: SQL statement over 500% slower with 9.2 compared with 9.1

From
Rafael Martinez
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 08/26/2013 02:33 PM, Rafael Martinez wrote:
[............]
> The SQL statement is:
>
> SELECT  firstname || ' ' || lastname AS Name FROM    Person R WHERE
> R.gender like 'F' AND  19 < (SELECT COUNT(DISTINCT filmId) FROM
> FilmParticipation F WHERE  F.partType = 'director' AND F.personId =
> R.personId    ) AND NOT EXISTS ( SELECT  * FROM
> FilmParticipation D WHERE   D.partType = 'director' AND D.personId
> = R.personId AND NOT EXISTS ( SELECT  * FROM    FilmParticipation
> C WHERE   C.partType = 'cast' AND C.filmId = D.filmId AND
> C.personId = D.personId ) ) ;
>
>
[.............]
>
> We can see that the query plan is very different between versions
> and that 9.2 is really wrong with the number of rows involved. Why
> is 9.2 taking so wrong about the number of rows involved in some
> parts of the plan?
>

Hei

More information:

If we turn off enable_indexscan the runtime gets more similar to the
one we get with 9.1, we are down to 4200ms.

The query plan with this configuration is here:
http://explain.depesz.com/s/jVR

The question remains the same, why is 9.2 using such a different and
bad plan compared to 9.1, when the data and the configuration are the
same?

regards,
- --
 Rafael Martinez Guerrero
 Center for Information Technology
 University of Oslo, Norway

 PGP Public Key: http://folk.uio.no/rafael/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)

iEYEARECAAYFAlIcbx8ACgkQBhuKQurGihReJgCcCiEfGQ0rZHcazlN3Ihb2PeCn
jOsAnjkh1M0j4r1DQJ4Xb1djZ+y4mji3
=Td8b
-----END PGP SIGNATURE-----


Re: SQL statement over 500% slower with 9.2 compared with 9.1

From
Tomas Vondra
Date:
On 27.8.2013 11:19, Rafael Martinez wrote:
> On 08/26/2013 02:33 PM, Rafael Martinez wrote:
> [............]
>> The SQL statement is:
>
>> SELECT  firstname || ' ' || lastname AS Name FROM    Person R WHERE
>> R.gender like 'F' AND  19 < (SELECT COUNT(DISTINCT filmId) FROM
>> FilmParticipation F WHERE  F.partType = 'director' AND F.personId =
>> R.personId    ) AND NOT EXISTS ( SELECT  * FROM
>> FilmParticipation D WHERE   D.partType = 'director' AND D.personId
>> = R.personId AND NOT EXISTS ( SELECT  * FROM    FilmParticipation
>> C WHERE   C.partType = 'cast' AND C.filmId = D.filmId AND
>> C.personId = D.personId ) ) ;
>
>
> [.............]
>
>> We can see that the query plan is very different between versions
>> and that 9.2 is really wrong with the number of rows involved. Why
>> is 9.2 taking so wrong about the number of rows involved in some
>> parts of the plan?
>
>
> Hei
>
> More information:
>
> If we turn off enable_indexscan the runtime gets more similar to the
> one we get with 9.1, we are down to 4200ms.
>
> The query plan with this configuration is here:
> http://explain.depesz.com/s/jVR
>
> The question remains the same, why is 9.2 using such a different and
> bad plan compared to 9.1, when the data and the configuration are the
> same?

Hi,

seems the problem is mostly about the inner-most query, i.e. this:

    SELECT  *
    FROM    FilmParticipation C
    WHERE   C.partType = 'cast'
            AND C.filmId = D.filmId
            AND C.personId = D.personId
        )

In 9.2 it's estimated to return 1 row, but it returns 595612 of them (or
97780 after materialization). I believe this is the culprit that causes
cost estimates that are way off, and that in turn leads to choice of
"cheaper" plan that actually takes much longer to evaluate.

Because the slow plan is estimated to "cost" 122367017.97 while the fast
one 335084834.95 (i.e. 3x more).

I don't immediately see where's the problem - maybe some other hacker on
this list can. Can you prepare a testcase for this? I.e. a structure of
the tables + data so that we can reproduce it?

regards
Tomas



Re: SQL statement over 500% slower with 9.2 compared with 9.1

From
Jeff Janes
Date:
On Monday, August 26, 2013, Rafael Martinez wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello

We have a SQL statement that with 9.1 takes ca 4000ms to finnish and
with 9.2 over 22000ms.

The explain analyze information is here:

Could you do explain (analyze, buffers) of these?  
 

With 9.1.:
http://explain.depesz.com/s/5ou

With 9.2
http://explain.depesz.com/s/d4vU

The SQL statement is:

SELECT  firstname || ' ' || lastname AS Name
FROM    Person R
WHERE  R.gender like 'F'
AND  19 < (SELECT COUNT(DISTINCT filmId)
              FROM   FilmParticipation F
              WHERE  F.partType = 'director' AND
                     F.personId = R.personId    )

What happens if you excise the "19 < (select ...)" clause?

That would greatly simplify the analysis, assuming the problem remains.

How many distinct filmId are there?




We can see that the query plan is very different between versions and
that 9.2 is really wrong with the number of rows involved. Why is 9.2
taking so wrong about the number of rows involved in some parts of the
plan?

Most directors are not also actors, so there is a strong negative correlation that PostgreSQL is not aware of. However, I think if you could get 9.1 to report the same path, it would be just as wrong on that estimate.  But since it doesn't report the same path, you don't see how wrong it is.

Try running:

explain (analyze, buffers)
 SELECT  D.personId
                FROM    FilmParticipation D
                WHERE   D.partType = 'director'
                        --AND D.personId = R.personId
                        AND NOT EXISTS (
                                SELECT  *
                                FROM    FilmParticipation C
                                WHERE   C.partType = 'cast'
                                        AND C.filmId = D.filmId
                                        AND C.personId = D.personId
                                       );

On both 9.1 and 9.2.

Cheers,

Jeff

Re: SQL statement over 500% slower with 9.2 compared with 9.1

From
Rafael Martinez
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 08/27/2013 11:27 PM, Tomas Vondra wrote:
[............]

> I don't immediately see where's the problem - maybe some other
> hacker on this list can. Can you prepare a testcase for this? I.e.
> a structure of the tables + data so that we can reproduce it?
>

Hello

Of course, you can download a SQL dump of the tables involved, here:
http://folk.uio.no/rafael/filmdatabase_testcase.sql.gz

This file is 357M gunzipped and 101M gzipped. When restored in a
database it will use 1473MB.

# \d+
                          List of relations
 Schema |       Name        | Type  |  Owner   |  Size  | Description
- --------+-------------------+-------+----------+--------+-------------
 public | filmitem          | table | postgres | 41 MB  |
 public | filmparticipation | table | postgres | 540 MB |
 public | person            | table | postgres | 85 MB  |
(3 rows)

[dbpg-hotel-utv:5432/postgres@fdb_testcase][]# \di+
                                           List of relations
 Schema |              Name              | Type  |  Owner   |
Table       |  Size  | Description
-
--------+--------------------------------+-------+----------+-------------------+--------+-------------
 public | filmitempkey                   | index | postgres | filmitem
         | 26 MB  |
 public | filmparticipationfilmidindex   | index | postgres |
filmparticipation | 232 MB |
 public | filmparticipationpersonidindex | index | postgres |
filmparticipation | 232 MB |
 public | filmparticipationpkey          | index | postgres |
filmparticipation | 232 MB |
 public | personlastnameindex            | index | postgres | person
          | 41 MB  |
 public | personpkey                     | index | postgres | person
          | 37 MB  |

regards,
- --
 Rafael Martinez Guerrero
 Center for Information Technology
 University of Oslo, Norway

 PGP Public Key: http://folk.uio.no/rafael/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)

iEYEARECAAYFAlIdvbkACgkQBhuKQurGihTZ0ACgk5ZpAvBFdhJs7A3xm3h80VhR
AX4AoIp+tSeeQtmmQh7ShP5WFI3hS+gp
=wK/M
-----END PGP SIGNATURE-----


Re: SQL statement over 500% slower with 9.2 compared with 9.1

From
Rafael Martinez
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 08/28/2013 06:10 AM, Jeff Janes wrote:
> On Monday, August 26, 2013, Rafael Martinez wrote:

Hei

>
> Could you do explain (analyze, buffers) of these?
>

With 9.1:
http://explain.depesz.com/s/FMe

with 9.2:
http://explain.depesz.com/s/Z1j


>
> What happens if you excise the "19 < (select ...)" clause? That
> would greatly simplify the analysis, assuming the problem remains.
>

With 9.1:
http://explain.depesz.com/s/DhuV

With 9.2:
I do not get a result in a reasonable time, after several minuttes I
cancel the query.


> How many distinct filmId are there?
>

 count
- --------
 934752


>
> Most directors are not also actors, so there is a strong negative
> correlation that PostgreSQL is not aware of. However, I think if
> you could get 9.1 to report the same path, it would be just as
> wrong on that estimate.  But since it doesn't report the same
> path, you don't see how wrong it is.
>
> Try running:
>
> explain (analyze, buffers) SELECT  D.personId FROM
> FilmParticipation D WHERE   D.partType = 'director' --AND
> D.personId = R.personId AND NOT EXISTS ( SELECT  * FROM
> FilmParticipation C WHERE   C.partType = 'cast' AND C.filmId =
> D.filmId AND C.personId = D.personId );
>
> On both 9.1 and 9.2.
>

Same result with both:

with 9.1:
http://explain.depesz.com/s/fdO

With 9.2
http://explain.depesz.com/s/gHz

regards,
- --
 Rafael Martinez Guerrero
 Center for Information Technology
 University of Oslo, Norway

 PGP Public Key: http://folk.uio.no/rafael/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)

iEYEARECAAYFAlIdzb4ACgkQBhuKQurGihSGEgCeP6frW7l65IphXFUjw80VMZun
qO0An1++ZB7IGQ0MwR4wphWmlcYGXFDD
=9fg4
-----END PGP SIGNATURE-----


Re: SQL statement over 500% slower with 9.2 compared with 9.1

From
Tom Lane
Date:
Rafael Martinez <r.m.guerrero@usit.uio.no> writes:
> We have a SQL statement that with 9.1 takes ca 4000ms to finnish and
> with 9.2 over 22000ms.
> ...
> We can see that the query plan is very different between versions and
> that 9.2 is really wrong with the number of rows involved. Why is 9.2
> taking so wrong about the number of rows involved in some parts of the
> plan?

9.1's no better.  The reason you don't get a similar plan out of 9.1
is that it doesn't flatten the nested EXISTS sub-selects, so that a
parameterized nestloop plan is the best it can do no matter what.
9.2 is able to consider more types of plan for this query, and it's
finding one that it thinks is cheaper.  Unfortunately, parameterized
nestloop really is the best thing in this specific case.

I think the rowcount estimation error that's actually serious is the one
for the "19 < (Subplan 1)" condition, where it's expecting 161252 rows but
reality is only 179.  If that were even just one order of magnitude closer
to reality, the other plan style would look cheaper.

Unfortunately, I can't offhand think of anything you can do to improve
the estimation of that condition as-is.  Maybe there's some other way to
phrase the query?  The current coding of the query looks rather like it's
been tuned for the one case that pre-9.2 releases can manage to do well.

If you don't want to do any major rewriting, you could probably stick an
OFFSET 0 into the outer EXISTS sub-select (and/or the inner one) to get
something similar to the 9.1 plan.

For some context see commit 0816fad6eebddb8f1f0e21635e46625815d690b9 and
the previous commits it references.

            regards, tom lane


Re: SQL statement over 500% slower with 9.2 compared with 9.1

From
Rafael Martinez
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 08/28/2013 09:08 PM, Tom Lane wrote:
[..........]

>
> If you don't want to do any major rewriting, you could probably
> stick an OFFSET 0 into the outer EXISTS sub-select (and/or the
> inner one) to get something similar to the 9.1 plan.
>

Thank you for your help.

Using OFFSET 0 in

SELECT  *
FROM    FilmParticipation C
WHERE   C.partType = 'cast'
AND C.filmId = D.filmId
AND C.personId = D.personId
OFFSET 0

give us a result similar to 9.1.

This SQL is used as an example in one of the database courses at the
University. I will send them this information and they can decide if
they want to rewrite the statement or use the OFFSET trick.

regards,
- --
 Rafael Martinez Guerrero
 Center for Information Technology
 University of Oslo, Norway

 PGP Public Key: http://folk.uio.no/rafael/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)

iEYEARECAAYFAlIm1/8ACgkQBhuKQurGihRAogCePl6G51w8dfYMruj+qSm4Vsjl
coMAn2sjyv6PcfsKhASC7ct0WI4YKRtJ
=FdeD
-----END PGP SIGNATURE-----