Huge overestimation in rows expected results in bad plan - Mailing list pgsql-performance

From bricklen
Subject Huge overestimation in rows expected results in bad plan
Date
Msg-id AANLkTim=JS-tu5G1CvrPnvfSTj7=YSxu=kwNUP+hLVfG@mail.gmail.com
Whole thread Raw
Responses Re: Huge overestimation in rows expected results in bad plan  (Andy Colson <andy@squeakycode.net>)
Re: Huge overestimation in rows expected results in bad plan  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Hi,

I have a query that is getting a pretty bad plan due to a massively
incorrect count of expected rows. All tables in the query were vacuum
analyzed right before the query was tested. Disabling nested loops
gives a significantly faster result (4s vs 292s).
Any thoughts on what I can change to make the planner generate a better plan?


32GB ram
effective_cache_size = 16GB
shared_buffers = 4GB
random_page_cost = 1.5
default_statistics_target = 100
Note: for the tables in question, I tested default_statistics_target
at 100, then also at 5000 to see if there was an improvement (none
noted).



select version();
                                                     version
------------------------------------------------------------------------------------------------------------------
 PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC gcc
(GCC) 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit


explain analyze
select c.id, c.transactionid, c.clickgenerated, c.confirmed,
c.rejected, cr.rejectedreason
from conversion c
inner join conversionrejected cr on cr.idconversion = c.id
where date = '2010-11-06'
and idaction = 12906
and idaffiliate = 198338
order by transactionid;


     QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=2318120.52..2345652.23 rows=11012683 width=78) (actual
time=292668.896..292668.903 rows=70 loops=1)
   Sort Key: c.transactionid
   Sort Method:  quicksort  Memory: 43kB
   ->  Nested Loop  (cost=1234.69..715468.13 rows=11012683 width=78)
(actual time=8687.314..292668.159 rows=70 loops=1)
         Join Filter: ((cr.idconversion = c.id) OR (c.id = 38441828354::bigint))
         ->  Append  (cost=1234.69..1244.03 rows=2 width=56) (actual
time=15.292..15.888 rows=72 loops=1)
               ->  Bitmap Heap Scan on conversion c
(cost=1234.69..1240.76 rows=1 width=31) (actual time=15.291..15.840
rows=72 loops=1)
                     Recheck Cond: ((idaffiliate = 198338) AND (date =
'2010-11-06'::date))
                     Filter: (idaction = 12906)
                     ->  BitmapAnd  (cost=1234.69..1234.69 rows=4
width=0) (actual time=15.152..15.152 rows=0 loops=1)
                           ->  Bitmap Index Scan on
conversion_idaffiliate_idx  (cost=0.00..49.16 rows=3492 width=0)
(actual time=4.071..4.071 rows=28844 loops=1)
                                 Index Cond: (idaffiliate = 198338)
                           ->  Bitmap Index Scan on
conversion_date_idx  (cost=0.00..1185.28 rows=79282 width=0) (actual
time=10.343..10.343 rows=82400 loops=1)
                                 Index Cond: (date = '2010-11-06'::date)
               ->  Index Scan using conversionlate_date_idx on
conversionlate c  (cost=0.00..3.27 rows=1 width=80) (actual
time=0.005..0.005 rows=0 loops=1)
                     Index Cond: (date = '2010-11-06'::date)
                     Filter: ((idaction = 12906) AND (idaffiliate = 198338))
         ->  Seq Scan on conversionrejected cr  (cost=0.00..191921.82
rows=11012682 width=31) (actual time=0.003..1515.816 rows=11012682
loops=72)
 Total runtime: 292668.992 ms


select count(*) from conversionrejected ;
  count
----------
 11013488

Time: 3649.647 ms

select count(*) from conversion where date = '2010-11-06';
 count
-------
 82400

Time: 507.985 ms


select count(*) from conversion;
  count
----------
 73419376(1 row)

Time: 7100.619 ms



-- with enable_nestloop to off;
-- much faster!

        QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=234463.54..234463.54 rows=2 width=78) (actual
time=4035.340..4035.347 rows=70 loops=1)
   Sort Key: c.transactionid
   Sort Method:  quicksort  Memory: 43kB
   ->  Hash Join  (cost=1244.13..234463.53 rows=2 width=78) (actual
time=4024.816..4034.715 rows=70 loops=1)
         Hash Cond: (cr.idconversion = c.id)
         ->  Seq Scan on conversionrejected cr  (cost=0.00..191921.82
rows=11012682 width=31) (actual time=0.003..1949.597 rows=11013576
loops=1)
         ->  Hash  (cost=1244.11..1244.11 rows=2 width=56) (actual
time=19.312..19.312 rows=72 loops=1)
               ->  Append  (cost=1234.77..1244.11 rows=2 width=56)
(actual time=18.539..19.261 rows=72 loops=1)
                     ->  Bitmap Heap Scan on conversion c
(cost=1234.77..1240.83 rows=1 width=31) (actual time=18.538..19.235
rows=72 loops=1)
                           Recheck Cond: ((idaffiliate = 198338) AND
(date = '2010-11-06'::date))
                           Filter: (idaction = 12906)
                           ->  BitmapAnd  (cost=1234.77..1234.77
rows=4 width=0) (actual time=18.237..18.237 rows=0 loops=1)
                                 ->  Bitmap Index Scan on
conversion_idaffiliate_idx  (cost=0.00..49.16 rows=3492 width=0)
(actual time=4.932..4.932 rows=28844 loops=1)
                                       Index Cond: (idaffiliate = 198338)
                                 ->  Bitmap Index Scan on
conversion_date_idx  (cost=0.00..1185.36 rows=79292 width=0) (actual
time=12.473..12.473 rows=82400 loops=1)
                                       Index Cond: (date = '2010-11-06'::date)
                     ->  Index Scan using conversionlate_date_idx on
conversionlate c  (cost=0.00..3.27 rows=1 width=80) (actual
time=0.006..0.006 rows=0 loops=1)
                           Index Cond: (date = '2010-11-06'::date)
                           Filter: ((idaction = 12906) AND
(idaffiliate = 198338))
 Total runtime: 4035.439 ms






-- for completeness,
-- same query, on 9.0.0, underpowered server, 2 disks mirrored.
Approximately the same table sizes/counts.

select version();
                                                      version
-------------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.0.0 on x86_64-unknown-linux-gnu, compiled by GCC gcc
(GCC) 4.1.2 20080704 (Red Hat 4.1.2-48), 64-bit



       QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=120727.25..120727.25 rows=2 width=78) (actual
time=3214.827..3214.867 rows=70 loops=1)
   Sort Key: c.transactionid
   Sort Method:  quicksort  Memory: 43kB
   ->  Nested Loop  (cost=697.95..120727.24 rows=2 width=78) (actual
time=2955.321..3214.208 rows=70 loops=1)
         ->  Append  (cost=697.95..120712.87 rows=2 width=56) (actual
time=2931.584..3173.402 rows=72 loops=1)
               ->  Bitmap Heap Scan on conversion c
(cost=697.95..120706.59 rows=1 width=31) (actual
time=2931.582..3150.231 rows=72 loops=1)
                     Recheck Cond: (date = '2010-11-06'::date)
                     Filter: ((idaction = 12906) AND (idaffiliate = 198338))
                     ->  Bitmap Index Scan on conversion_date_idx
(cost=0.00..697.95 rows=44365 width=0) (actual time=51.692..51.692
rows=82400 loops=1)
                           Index Cond: (date = '2010-11-06'::date)
               ->  Index Scan using conversionlate_idaffiliate_idx on
conversionlate c  (cost=0.00..6.27 rows=1 width=80) (actual
time=23.091..23.091 rows=0 loops=1)
                     Index Cond: (idaffiliate = 198338)
                     Filter: ((date = '2010-11-06'::date) AND
(idaction = 12906))
         ->  Index Scan using conversionrejected_pk on
conversionrejected cr  (cost=0.00..7.17 rows=1 width=31) (actual
time=0.563..0.564 rows=1 loops=72)
               Index Cond: (cr.idconversion = c.id)
 Total runtime: 3214.972 ms



Thanks,

Bricklen

pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: anti-join chosen even when slower than old plan
Next
From: Andy Colson
Date:
Subject: Re: Huge overestimation in rows expected results in bad plan