bad estimation together with large work_mem generates terrible slow hash joins - Mailing list pgsql-hackers

From Pavel Stehule
Subject bad estimation together with large work_mem generates terrible slow hash joins
Date
Msg-id CAFj8pRA4BEtna-WEV6NUwN8_0bJoquv3677zvms7+=0No7c6ew@mail.gmail.com
Whole thread Raw
Responses Re: bad estimation together with largework_mem generates terrible slow hash joins  (Tomas Vondra <tv@fuzzy.cz>)
List pgsql-hackers
Hello all,

today I had to work with one slow query - when I checked different scenarios I found a dependency on work_mem size - but it is atypical - bigger work_mem increased query execution 31 minutes (600MB work mem) and 1 minute (1MB).

db_kost07e2d9cdmg20b1takpqntobo6ghj=# set work_mem to '600MB';
SET
db_kost07e2d9cdmg20b1takpqntobo6ghj=# explain analyze SELECT "f_users_aax8qg0e23asx5h"."firtsclickutmsource_id" AS "a_5078", COUNT( ( CASE WHEN ( 89388 = "f_emails_aaekn159p5aw3t8"."read_id" ) THEN "f_emails_aaekn159p5aw3t8"."id" ELSE CAST( NULL AS INT ) END ) ) AS "c_7a2a545bf904a48a", COUNT( "f_emails_aaekn159p5aw3t8"."id" ) AS "c_9e301e13350cc0dc", ( MAX( ( CASE WHEN ( 89388 = "f_emails_aaekn159p5aw3t8"."read_id" ) THEN 0 END ) ) IS NOT NULL ) AS "def_7a2a545bf904a48a", TRUE AS "def_9e301e13350cc0dc" FROM ( ( "f_emails_aaekn159p5aw3t8" INNER JOIN "f_users_aax8qg0e23asx5h" ON ( "f_emails_aaekn159p5aw3t8"."userid_id" =  "f_users_aax8qg0e23asx5h"."id" ) ) INNER JOIN "dwh_dm_abnblk9j5oagorw" ON ( "f_users_aax8qg0e23asx5h"."dt_registrationdatetime_id" = "dwh_dm_abnblk9j5oagorw"."id" ) ) WHERE ( 2014 = "dwh_dm_abnblk9j5oagorw"."id_year" ) GROUP BY 1;
                                                                                  QUERY PLAN                                                                                 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=2145957.37..2145957.96 rows=59 width=12) (actual time=1864088.145..1864088.155 rows=31 loops=1)
   ->  Hash Join  (cost=882573.27..2142753.08 rows=213619 width=12) (actual time=9083.326..1859069.171 rows=7070141 loops=1)
         Hash Cond: (f_emails_aaekn159p5aw3t8.userid_id = f_users_aax8qg0e23asx5h.id)
         ->  Seq Scan on f_emails_aaekn159p5aw3t8  (cost=0.00..854559.95 rows=32278695 width=12) (actual time=0.006..14271.239 rows=32211565 loops=1)
         ->  Hash  (cost=881801.58..881801.58 rows=61735 width=8) (actual time=9076.153..9076.153 rows=3310768 loops=1)
               Buckets: 8192  Batches: 1  Memory Usage: 129327kB
               ->  Hash Join  (cost=782.15..881801.58 rows=61735 width=8) (actual time=1.423..8086.228 rows=3310768 loops=1)
                     Hash Cond: (f_users_aax8qg0e23asx5h.dt_registrationdatetime_id = dwh_dm_abnblk9j5oagorw.id)
                     ->  Seq Scan on f_users_aax8qg0e23asx5h  (cost=0.00..845420.42 rows=9328442 width=12) (actual time=0.015..4098.915 rows=9322096 loops=1)
                     ->  Hash  (cost=777.59..777.59 rows=365 width=4) (actual time=1.400..1.400 rows=365 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 13kB
                           ->  Bitmap Heap Scan on dwh_dm_abnblk9j5oagorw  (cost=11.08..777.59 rows=365 width=4) (actual time=0.111..1.218 rows=365 loops=1)
                                 Recheck Cond: (2014 = id_year)
                                 ->  Bitmap Index Scan on dwh_dm_abnblk9j5oagorw_id_year_idx  (cost=0.00..10.99 rows=365 width=0) (actual time=0.068..0.068 rows=365 loops=1)
                                       Index Cond: (2014 = id_year)
 Total runtime: 1864107.373 ms
(16 rows)


db_kost07e2d9cdmg20b1takpqntobo6ghj=# set work_mem to '1MB';
SET
db_kost07e2d9cdmg20b1takpqntobo6ghj=# explain analyze SELECT "f_users_aax8qg0e23asx5h"."firtsclickutmsource_id" AS "a_5078", COUNT( ( CASE WHEN ( 89388 = "f_emails_aaekn159p5aw3t8"."read_id" ) THEN "f_emails_aaekn159p5aw3t8"."id" ELSE CAST( NULL AS INT ) END ) ) AS "c_7a2a545bf904a48a", COUNT( "f_emails_aaekn159p5aw3t8"."id" ) AS "c_9e301e13350cc0dc", ( MAX( ( CASE WHEN ( 89388 = "f_emails_aaekn159p5aw3t8"."read_id" ) THEN 0 END ) ) IS NOT NULL ) AS "def_7a2a545bf904a48a", TRUE AS "def_9e301e13350cc0dc" FROM ( ( "f_emails_aaekn159p5aw3t8" INNER JOIN "f_users_aax8qg0e23asx5h" ON ( "f_emails_aaekn159p5aw3t8"."userid_id" =  "f_users_aax8qg0e23asx5h"."id" ) ) INNER JOIN "dwh_dm_abnblk9j5oagorw" ON ( "f_users_aax8qg0e23asx5h"."dt_registrationdatetime_id" = "dwh_dm_abnblk9j5oagorw"."id" ) ) WHERE ( 2014 = "dwh_dm_abnblk9j5oagorw"."id_year" ) GROUP BY 1;
                                                                                  QUERY PLAN                                                                                 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=2275675.45..2275675.95 rows=50 width=12) (actual time=48438.407..48438.416 rows=31 loops=1)
   ->  Hash Join  (cost=882772.88..2272526.68 rows=209918 width=12) (actual time=9384.610..45211.963 rows=6988804 loops=1)
         Hash Cond: (f_emails_aaekn159p5aw3t8.userid_id = f_users_aax8qg0e23asx5h.id)
         ->  Seq Scan on f_emails_aaekn159p5aw3t8  (cost=0.00..839754.64 rows=31719464 width=12) (actual time=0.034..14299.338 rows=31653392 loops=1)
         ->  Hash  (cost=881759.44..881759.44 rows=61715 width=8) (actual time=9368.371..9368.371 rows=3310768 loops=1)
               Buckets: 4096  Batches: 256 (originally 4)  Memory Usage: 1025kB
               ->  Hash Join  (cost=782.15..881759.44 rows=61715 width=8) (actual time=3.839..8223.097 rows=3310768 loops=1)
                     Hash Cond: (f_users_aax8qg0e23asx5h.dt_registrationdatetime_id = dwh_dm_abnblk9j5oagorw.id)
                     ->  Seq Scan on f_users_aax8qg0e23asx5h  (cost=0.00..845389.92 rows=9325392 width=12) (actual time=0.017..4118.598 rows=9322096 loops=1)
                     ->  Hash  (cost=777.59..777.59 rows=365 width=4) (actual time=3.804..3.804 rows=365 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 13kB
                           ->  Bitmap Heap Scan on dwh_dm_abnblk9j5oagorw  (cost=11.08..777.59 rows=365 width=4) (actual time=0.188..3.641 rows=365 loops=1)
                                 Recheck Cond: (2014 = id_year)
                                 ->  Bitmap Index Scan on dwh_dm_abnblk9j5oagorw_id_year_idx  (cost=0.00..10.99 rows=365 width=0) (actual time=0.115..0.115 rows=365 loops=1)
                                       Index Cond: (2014 = id_year)
 Total runtime: 48439.958 ms
(16 rows)

db_kost07e2d9cdmg20b1takpqntobo6ghj=# set work_mem to '10MB';
SET
db_kost07e2d9cdmg20b1takpqntobo6ghj=# explain analyze SELECT "f_users_aax8qg0e23asx5h"."firtsclickutmsource_id" AS "a_5078", COUNT( ( CASE WHEN ( 89388 = "f_emails_aaekn159p5aw3t8"."read_id" ) THEN "f_emails_aaekn159p5aw3t8"."id" ELSE CAST( NULL AS INT ) END ) ) AS "c_7a2a545bf904a48a", COUNT( "f_emails_aaekn159p5aw3t8"."id" ) AS "c_9e301e13350cc0dc", ( MAX( ( CASE WHEN ( 89388 = "f_emails_aaekn159p5aw3t8"."read_id" ) THEN 0 END ) ) IS NOT NULL ) AS "def_7a2a545bf904a48a", TRUE AS "def_9e301e13350cc0dc" FROM ( ( "f_emails_aaekn159p5aw3t8" INNER JOIN "f_users_aax8qg0e23asx5h" ON ( "f_emails_aaekn159p5aw3t8"."userid_id" =  "f_users_aax8qg0e23asx5h"."id" ) ) INNER JOIN "dwh_dm_abnblk9j5oagorw" ON ( "f_users_aax8qg0e23asx5h"."dt_registrationdatetime_id" = "dwh_dm_abnblk9j5oagorw"."id" ) ) WHERE ( 2014 = "dwh_dm_abnblk9j5oagorw"."id_year" ) GROUP BY 1;
                                                                                  QUERY PLAN                                                                                 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=2124026.77..2124027.27 rows=50 width=12) (actual time=100739.984..100739.998 rows=31 loops=1)
   ->  Hash Join  (cost=882530.88..2120878.00 rows=209918 width=12) (actual time=9467.702..97238.959 rows=6988804 loops=1)
         Hash Cond: (f_emails_aaekn159p5aw3t8.userid_id = f_users_aax8qg0e23asx5h.id)
         ->  Seq Scan on f_emails_aaekn159p5aw3t8  (cost=0.00..839754.64 rows=31719464 width=12) (actual time=0.015..9185.978 rows=31653392 loops=1)
         ->  Hash  (cost=881759.44..881759.44 rows=61715 width=8) (actual time=9466.440..9466.440 rows=3310768 loops=1)
               Buckets: 8192  Batches: 16 (originally 1)  Memory Usage: 10241kB
               ->  Hash Join  (cost=782.15..881759.44 rows=61715 width=8) (actual time=3.681..8183.423 rows=3310768 loops=1)
                     Hash Cond: (f_users_aax8qg0e23asx5h.dt_registrationdatetime_id = dwh_dm_abnblk9j5oagorw.id)
                     ->  Seq Scan on f_users_aax8qg0e23asx5h  (cost=0.00..845389.92 rows=9325392 width=12) (actual time=0.012..4064.044 rows=9322096 loops=1)
                     ->  Hash  (cost=777.59..777.59 rows=365 width=4) (actual time=3.654..3.654 rows=365 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 13kB
                           ->  Bitmap Heap Scan on dwh_dm_abnblk9j5oagorw  (cost=11.08..777.59 rows=365 width=4) (actual time=0.129..3.449 rows=365 loops=1)
                                 Recheck Cond: (2014 = id_year)
                                 ->  Bitmap Index Scan on dwh_dm_abnblk9j5oagorw_id_year_idx  (cost=0.00..10.99 rows=365 width=0) (actual time=0.077..0.077 rows=365 loops=1)
                                       Index Cond: (2014 = id_year)
 Total runtime: 100740.552 ms
(16 rows)

The basic problem is wrong estimation - but I am not able to fix it - I changed a statistics to 10000 without any change. I cannot to change SQL in application, but for debugging I can test it with fixed estimation:

db_kost07e2d9cdmg20b1takpqntobo6ghj=# explain analyze SELECT "stehule"."firtsclickutmsource_id" AS "a_5078", COUNT( ( CASE WHEN ( 89388 = "f_emails_aaekn159p5aw3t8"."read_id" ) THEN "f_emails_aaekn159p5aw3t8"."id" ELSE CAST( NULL AS INT ) END ) ) AS "c_7a2a545bf904a48a", COUNT( "f_emails_aaekn159p5aw3t8"."id" ) AS "c_9e301e13350cc0dc", ( MAX( ( CASE WHEN ( 89388 = "f_emails_aaekn159p5aw3t8"."read_id" ) THEN 0 END ) ) IS NOT NULL ) AS "def_7a2a545bf904a48a", TRUE AS "def_9e301e13350cc0dc" FROM ( ( "f_emails_aaekn159p5aw3t8" INNER JOIN stehule on  "f_emails_aaekn159p5aw3t8"."userid_id" =  stehule."id" ) )  GROUP BY 1 ;
                                                                     QUERY PLAN                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=2073456.55..2073457.23 rows=68 width=12) (actual time=58724.094..58724.106 rows=31 loops=1)
   ->  Hash Join  (cost=89142.28..1589276.13 rows=32278695 width=12) (actual time=2191.019..55499.331 rows=7070141 loops=1)
         Hash Cond: (f_emails_aaekn159p5aw3t8.userid_id = stehule.id)
         ->  Seq Scan on f_emails_aaekn159p5aw3t8  (cost=0.00..854559.95 rows=32278695 width=12) (actual time=0.018..8364.554 rows=32211565 loops=1)
         ->  Hash  (cost=47757.68..47757.68 rows=3310768 width=8) (actual time=2188.858..2188.858 rows=3310768 loops=1)
               Buckets: 524288  Batches: 1  Memory Usage: 129327kB
               ->  Seq Scan on stehule  (cost=0.00..47757.68 rows=3310768 width=8) (actual time=0.026..927.235 rows=3310768 loops=1)
 Total runtime: 58741.224 ms
(8 rows)

db_kost07e2d9cdmg20b1takpqntobo6ghj=# set work_mem to '1MB';
SET
db_kost07e2d9cdmg20b1takpqntobo6ghj=# explain analyze SELECT "stehule"."firtsclickutmsource_id" AS "a_5078", COUNT( ( CASE WHEN ( 89388 = "f_emails_aaekn159p5aw3t8"."read_id" ) THEN "f_emails_aaekn159p5aw3t8"."id" ELSE CAST( NULL AS INT ) END ) ) AS "c_7a2a545bf904a48a", COUNT( "f_emails_aaekn159p5aw3t8"."id" ) AS "c_9e301e13350cc0dc", ( MAX( ( CASE WHEN ( 89388 = "f_emails_aaekn159p5aw3t8"."read_id" ) THEN 0 END ) ) IS NOT NULL ) AS "def_7a2a545bf904a48a", TRUE AS "def_9e301e13350cc0dc" FROM ( ( "f_emails_aaekn159p5aw3t8" INNER JOIN stehule on  "f_emails_aaekn159p5aw3t8"."userid_id" =  stehule."id" ) )  GROUP BY 1 ;
                                                                      QUERY PLAN                                                                     
------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=2293499.45..2293500.13 rows=68 width=12) (actual time=37357.967..37357.976 rows=31 loops=1)
   ->  Hash Join  (cost=102075.28..1809319.02 rows=32278695 width=12) (actual time=1814.232..34290.818 rows=7070141 loops=1)
         Hash Cond: (f_emails_aaekn159p5aw3t8.userid_id = stehule.id)
         ->  Seq Scan on f_emails_aaekn159p5aw3t8  (cost=0.00..854559.95 rows=32278695 width=12) (actual time=0.007..14066.754 rows=32211565 loops=1)
         ->  Hash  (cost=47757.68..47757.68 rows=3310768 width=8) (actual time=1806.453..1806.453 rows=3310768 loops=1)
               Buckets: 4096  Batches: 256  Memory Usage: 518kB
               ->  Seq Scan on stehule  (cost=0.00..47757.68 rows=3310768 width=8) (actual time=0.007..781.672 rows=3310768 loops=1)
 Total runtime: 37359.264 ms
(8 rows)

Still less work_mem performs better - but with our default 600MB it performs relatively well

db_kost07e2d9cdmg20b1takpqntobo6ghj=# select version();
                                                             version                                                            
---------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.2.8 on x86_64-gdc-linux-gnu, with GoodData patches, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit
(1 row)

db_kost07e2d9cdmg20b1takpqntobo6ghj=# \dt+ stehule
                     List of relations
 Schema |  Name   | Type  |  Owner   |  Size  | Description
--------+---------+-------+----------+--------+-------------
 public | stehule | table | postgres | 114 MB |
(1 row)

db_kost07e2d9cdmg20b1takpqntobo6ghj=# \dt+ f_users_aax8qg0e23asx5h
                            List of relations
 Schema |          Name           | Type  | Owner |  Size   | Description
--------+-------------------------+-------+-------+---------+-------------
 public | f_users_aax8qg0e23asx5h | table | beard | 5878 MB |
(1 row)

db_kost07e2d9cdmg20b1takpqntobo6ghj=# \dt+ f_emails_aaekn159p5aw3t8
                             List of relations
 Schema |           Name           | Type  | Owner |  Size   | Description
--------+--------------------------+-------+-------+---------+-------------
 public | f_emails_aaekn159p5aw3t8 | table | beard | 4156 MB |
(1 row)

all data are in FS cache. There are 60GB RAM.

Is interesting so 31M nested loops is significantly faster, than bad hash join

db_kost07e2d9cdmg20b1takpqntobo6ghj=# explain analyze  SELECT "f_users_aax8qg0e23asx5h"."firtsclickutmsource_id" AS "a_5078", COUNT( ( CASE WHEN ( 89388 = "f_emails_aaekn159p5aw3t8"."read_id" ) THEN "f_emails_aaekn159p5aw3t8"."id" ELSE CAST( NULL AS INT ) END ) ) AS "c_7a2a545bf904a48a", COUNT( "f_emails_aaekn159p5aw3t8"."id" ) AS "c_9e301e13350cc0dc", ( MAX( ( CASE WHEN ( 89388 = "f_emails_aaekn159p5aw3t8"."read_id" ) THEN 0 END ) ) IS NOT NULL ) AS "def_7a2a545bf904a48a", TRUE AS "def_9e301e13350cc0dc" FROM ( ( "f_emails_aaekn159p5aw3t8" INNER JOIN "f_users_aax8qg0e23asx5h" ON ( "f_emails_aaekn159p5aw3t8"."userid_id" =  "f_users_aax8qg0e23asx5h"."id" ) ) INNER JOIN "dwh_dm_abnblk9j5oagorw" ON ( "f_users_aax8qg0e23asx5h"."dt_registrationdatetime_id" = "dwh_dm_abnblk9j5oagorw"."id" ) ) WHERE ( 2014 = "dwh_dm_abnblk9j5oagorw"."id_year" ) GROUP BY 1;
                                                                                      QUERY PLAN                                                                                      
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=40076389.70..40076448.70 rows=59 width=12) (actual time=31124.643..31124.658 rows=31 loops=1)
   ->  Nested Loop  (cost=2773.92..40073185.42 rows=213619 width=12) (actual time=37.945..27670.473 rows=7070141 loops=1)
         ->  Hash Join  (cost=2773.92..10180068.58 rows=61735 width=8) (actual time=0.951..8934.170 rows=3310768 loops=1)
               Hash Cond: (f_users_aax8qg0e23asx5h.dt_registrationdatetime_id = dwh_dm_abnblk9j5oagorw.id)
               ->  Seq Scan on f_users_aax8qg0e23asx5h  (cost=0.00..10080578.00 rows=9328442 width=12) (actual time=0.004..4297.579 rows=9322096 loops=1)
               ->  Hash  (cost=2408.01..2408.01 rows=365 width=4) (actual time=0.919..0.919 rows=365 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 13kB
                     ->  Bitmap Heap Scan on dwh_dm_abnblk9j5oagorw  (cost=386.27..2408.01 rows=365 width=4) (actual time=0.104..0.733 rows=365 loops=1)
                           Recheck Cond: (2014 = id_year)
                           ->  Bitmap Index Scan on dwh_dm_abnblk9j5oagorw_id_year_idx  (cost=0.00..386.18 rows=365 width=0) (actual time=0.059..0.059 rows=365 loops=1)
                                 Index Cond: (2014 = id_year)
         ->  Index Scan using f_emails_aaekn159p5aw3t8_userid_id_idx on f_emails_aaekn159p5aw3t8  (cost=0.00..396.22 rows=88 width=12) (actual time=0.002..0.004 rows=2 loops=3310768)
               Index Cond: (userid_id = f_users_aax8qg0e23asx5h.id)
 Total runtime: 31125.975 ms

Regards

Pavel


pgsql-hackers by date:

Previous
From: Michael Glaesemann
Date:
Subject: Re: sorting a union over inheritance vs pathkeys
Next
From: Rushabh Lathia
Date:
Subject: Re: "RETURNING PRIMARY KEY" syntax extension