Optimizer choosing the wrong plan - Mailing list pgsql-performance

From Viswanath
Subject Optimizer choosing the wrong plan
Date
Msg-id 1543227078813-0.post@n3.nabble.com
Whole thread Raw
Responses Re: Optimizer choosing the wrong plan
List pgsql-performance
*Postgres server version -  9.5.10*
*RAM - 128 GB*
*WorkMem 64 MB*

*Problematic query with explain :*
*Query 1 (original):*
explain analyse SELECT myTable1.ID FROM myTable1 LEFT JOIN myTable2 ON
myTable1.ID=myTable2.ID WHERE  ((((myTable1.bool_val = true) AND
(myTable1.small_intval IN (1,2,3))) AND ((*myTable2.bigint_val = 1*) AND
(myTable1.bool_val = true))) AND (((myTable1.ID >= 1000000000000) AND
(myTable1.ID <= 1999999999999)) ))  ORDER BY 1 DESC , 1 NULLS FIRST  LIMIT
11;
                                                                                   
QUERY PLAN                                                                                     

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1.00..8077.43 rows=11 width=8) (actual time=6440.245..6440.245
rows=0 loops=1)
   ->  Nested Loop  (cost=1.00..1268000.55 *rows=1727* width=8) (actual
time=6440.242..6440.242 rows=0 loops=1)
         ->  Index Scan Backward using myTable2_fk1_idx on myTable2 
(cost=0.43..1259961.54 rows=1756 width=8) (actual time=6440.241..6440.241
rows=0 loops=1)
               Filter: (bigint_val = 1)
               Rows Removed by Filter: 12701925
         ->  Index Scan using myTable1_fk2_idx on myTable1  (cost=0.56..4.57
rows=1 width=8) (never executed)
               Index Cond: ((id = myTable2.id) AND (id >=
'1000000000000'::bigint) AND (id <= '1999999999999'::bigint))
               Filter: (bool_val AND bool_val AND (small_intval = ANY
('{1,2,3}'::integer[])))
 Planning time: 0.654 ms
 Execution time: 6440.353 ms
(10 rows)

*The columns myTable1.ID and myTable2.bigint_val = 1 both are indexed*

The table myTable2 contains *12701952* entries. Out of which only *86227* is
not null and *146* entries are distinct.

The above query returns 0 rows since 'myTable2.bigint_val = 1' criteria
satisfies nothing. It takes 6 seconds for execution as the planner chooses*
myTable1.ID column's index*. 


If I use nulls last on the order by clause of the query then the planner
chooses this plan since it doesn't use index for *DESC NULLS LAST*. And the
query executes in milliseconds.

*Query 2 (order by modified to avoid index):*
explain analyse SELECT myTable1.ID FROM myTable1 LEFT JOIN myTable2 ON
myTable1.ID=myTable2.ID WHERE  ((((myTable1.bool_val = true) AND
(myTable1.small_intval IN (1,2,3))) AND ((myTable2.bigint_val = 1) AND
(myTable1.bool_val = true))) AND (((myTable1.ID >= 1000000000000) AND
(myTable1.ID <= 1999999999999)) ))  ORDER BY 1 DESC *NULLS LAST*, 1 NULLS
FIRST  LIMIT 11;
                                                                                    
QUERY PLAN                                                                                     

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=11625.07..11625.10 rows=11 width=8) (actual time=0.028..0.028
rows=0 loops=1)
   ->  Sort  (cost=11625.07..11629.39 rows=1727 width=8) (actual
time=0.028..0.028 rows=0 loops=1)
         Sort Key: myTable1.id DESC NULLS LAST
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop  (cost=0.85..11586.56 *rows=1727 *width=8) (actual
time=0.024..0.024 rows=0 loops=1)
               ->  Index Scan using bigint_val_idx_px on myTable2 
(cost=0.29..3547.55 rows=1756 width=8) (actual time=0.024..0.024 rows=0
loops=1)
                     Index Cond: (bigint_val = 1)
               ->  Index Scan using myTable1_fk2_idx on myTable1 
(cost=0.56..4.57 rows=1 width=8) (never executed)
                     Index Cond: ((id = myTable2.id) AND (id >=
'1000000000000'::bigint) AND (id <= '1999999999999'::bigint))
                     Filter: (bool_val AND bool_val AND (small_intval = ANY
('{1,2,3}'::integer[])))
 Planning time: 0.547 ms
 Execution time: 0.110 ms

The reason why postgres chooses the 1st plan over the 2nd was due to it's
cost. *plan 1 - 8077.43 and plan 2 - 11625.10* . But obviously plan 2 is
correct. 

I tried running *vacuum analyse* table many times, tried changing the
*statistics target of the column to 250 (since there are only 149 distinct
values)*. But none worked out. The planner thinks that there are *1727* rows
that matches the condition *myTable2.bigint_val = 1* but there are none.

Also I tried changing the limit of the 1st query, increasing the limit
increases the cost of the 1st plan so if I use 16 as limit for the same 1st
query the planner chooses the 2nd plan.

*Query 3 (same as 1st but limit increased to 16):*

explain analyse SELECT myTable1.ID FROM myTable1 LEFT JOIN myTable2 ON
myTable1.ID=myTable2.ID WHERE  ((((myTable1.bool_val = true) AND
(myTable1.small_intval IN (1,2,3))) AND ((myTable2.bigint_val = 1) AND
(myTable1.bool_val = true))) AND (((myTable1.ID >= 1000000000000) AND
(myTable1.ID <= 1999999999999)) ))  ORDER BY 1 DESC , 1 NULLS FIRST  *LIMIT
16*;
                                                                                    
QUERY PLAN                                                                                     

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=11629.74..11629.78 rows=16 width=8) (actual time=0.043..0.043
rows=0 loops=1)
   ->  Sort  (cost=11629.74..11634.05 rows=1727 width=8) (actual
time=0.042..0.042 rows=0 loops=1)
         Sort Key: myTable1.id DESC
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop  (cost=0.85..11586.56 rows=1727 width=8) (actual
time=0.036..0.036 rows=0 loops=1)
               ->  Index Scan using bigint_val_idx_px on myTable2 
(cost=0.29..3547.55 rows=1756 width=8) (actual time=0.036..0.036 rows=0
loops=1)
                     Index Cond: (bigint_val = 1)
               ->  Index Scan using myTable1_fk2_idx on myTable1 
(cost=0.56..4.57 rows=1 width=8) (never executed)
                     Index Cond: ((id = myTable2.id) AND (id >=
'1000000000000'::bigint) AND (id <= '1999999999999'::bigint))
                     Filter: (bool_val AND bool_val AND (small_intval = ANY
('{1,2,3}'::integer[])))
 Planning time: 0.601 ms
 Execution time: 0.170 ms
(12 rows)

Is there any way to make postgres use the myTable2.bigint_val's index by
changing/optimizing parameters?
I tried changing cost parameters too but since bot plan uses index scan it
doesn't affect much. Is there any 
way to set *how the cost works based on  limit*?



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html


pgsql-performance by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: explain analyze faster then query
Next
From: Mariel Cherkassky
Date:
Subject: autovacuum run but last_autovacuum is empty