Query running long - cost estimation question... - Mailing list pgsql-performance
From | Akos Gabriel |
---|---|
Subject | Query running long - cost estimation question... |
Date | |
Msg-id | 20090119190710.46db608d@gabriel-desktop Whole thread Raw |
List | pgsql-performance |
(now that my test went through, here is the question :) ) Dear Community, We are using PostgreSQL proudly for quite a long time, but now we are facing an interesting problem. Query plan seems to depend on how long the IN() clause is. explain analyze select p.product_id from product p left join infx.infx_product i on p.external_id = i.external_id where p.product_id in (7905, 7915, 7919, 7817, 8200, 7907, 7909, 9379, 9375, 9368, 9384, 9930, 9928, 9927, 9929, 9925, 9931, 9922, 7885, 9705, 8201, 7921); ------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=40.52..108603.02 rows=11287 width=4) (actual time=1.187..33.484 rows=4882 loops=1) -> Seq Scan on product p (cost=0.00..577.41 rows=22 width=18) (actual time=1.012..5.144 rows=22 loops=1) Filter: (product_id = ANY ('{7905,7915,7919,7817,8200,7907,7909,9379,9375,9368,9384,9930,9928,9927,9929,9925,9931,9922,7885,9705,8201,7921}'::integer[])) -> Bitmap Heap Scan on infx_product i (cost=40.52..4890.70 rows=1564 width=15) (actual time=0.107..0.563 rows=222 loops=22) Recheck Cond: ((p.external_id)::text = (i.external_id)::text) -> Bitmap Index Scan on infx_product_full_external_id_key (cost=0.00..40.13 rows=1564 width=0) (actual time=0.091..0.091 rows=222 loops=22) Index Cond: ((p.external_id)::text = (i.external_id)::text) Total runtime: 41.470 ms (8 rows) Nice. But: explain analyze select p.product_id from product p left join infx.infx_product i on p.external_id = i.external_id where p.product_id in (7905, 7915, 7919, 7817, 8200, 7907, 7909, 9379, 9375, 9368, 9384, 9930, 9928, 9927, 9929, 9925, 9931, 9922, 7885, 9705, 8201, 7921,1,1,1,1,1,1); ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Left Join (cost=127157.21..128199.91 rows=14365 width=4) (actual time=11508.623..11518.198 rows=4882 loops=1) Hash Cond: ((p.external_id)::text = (i.external_id)::text) -> Bitmap Heap Scan on product p (cost=103.25..187.88 rows=28 width=18) (actual time=0.099..0.161 rows=22 loops=1) Recheck Cond: (product_id = ANY ('{7905,7915,7919,7817,8200,7907,7909,9379,9375,9368,9384,9930,9928,9927,9929,9925,9931,9922,7885,9705,8201,7921,1,1,1,1,1,1}'::integer[])) -> Bitmap Index Scan on pk_product (cost=0.00..103.24 rows=28 width=0) (actual time=0.086..0.086 rows=22 loops=1) Index Cond: (product_id = ANY ('{7905,7915,7919,7817,8200,7907,7909,9379,9375,9368,9384,9930,9928,9927,9929,9925,9931,9922,7885,9705,8201,7921,1,1,1,1,1,1}'::integer[])) -> Hash (cost=100040.65..100040.65 rows=2161065 width=15) (actual time=11505.578..11505.578 rows=2161065 loops=1) -> Seq Scan on infx_product i (cost=0.00..100040.65 rows=2161065 width=15) (actual time=0.074..6921.127 rows=2161065 loops=1) Total runtime: 11543.758 ms (9 rows) Another try: necosgi=# set enable_hashjoin=off; SET necosgi=# explain analyze select p.product_id from product p left join infx.infx_product i on p.external_id = i.external_id where p.product_id in (7905, 7915, 7919, 7817, 8200, 7907, 7909, 9379, 9375, 9368, 9384, 9930, 9928, 9927, 9929, 9925, 9931, 9922, 7885, 9705, 8201, 7921,1,1,1,1,1,1); ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=40.42..136685.07 rows=14365 width=4) (actual time=1.308..33.897 rows=4882 loops=1) -> Seq Scan on product p (cost=0.00..634.50 rows=28 width=18) (actual time=1.135..5.747 rows=22 loops=1) Filter: (product_id = ANY ('{7905,7915,7919,7817,8200,7907,7909,9379,9375,9368,9384,9930,9928,9927,9929,9925,9931,9922,7885,9705,8201,7921,1,1,1,1,1,1}'::integer[])) -> Bitmap Heap Scan on infx_product i (cost=40.42..4839.40 rows=1564 width=15) (actual time=0.096..0.553 rows=222 loops=22) Recheck Cond: ((p.external_id)::text = (i.external_id)::text) -> Bitmap Index Scan on infx_product_full_external_id_key (cost=0.00..40.03 rows=1564 width=0) (actual time=0.080..0.080 rows=222 loops=22) Index Cond: ((p.external_id)::text = (i.external_id)::text) Total runtime: 41.961 ms (8 rows) It looks like the maximum cost of the nested loop is higher than the maximum cost of hashjoin, that's why the planner chooses hashjoin. Table was analyzed, index was reindexed. This behaviour is always reproducible. Questions: - how could I avoid full table scan - how can I advise to use the index? - how could I advise planner to better estimate the (much lower) cost of the nested loop? Another info: if I put more 1's in the query, the planner will use mergejoins... bahh :) Any help/question/suggestion is welcome! Best regards, Akos
pgsql-performance by date: