Optimizer wrongly picks Nested Loop Left Join - Mailing list pgsql-performance

From Edin Kadribasic
Subject Optimizer wrongly picks Nested Loop Left Join
Date
Msg-id 001f01c55604$494d2e60$1200000a@intra.emini.dk
Whole thread Raw
Responses Re: Optimizer wrongly picks Nested Loop Left Join
Re: Optimizer wrongly picks Nested Loop Left Join
List pgsql-performance
Hi,


I have a query that is giving the optimizer (and me) great headache. When
its in the good mood the optimizer chooses Hash Left Join and the query
executes in 13ms or so, but sometimes (more and more often) it chooses
Nested Loop Left Join and the execution time goes up to 2-30sec.

The query:
SELECT COUNT(DISTINCT a.tid) FROM axp_temp_order_match a LEFT OUTER JOIN (
SELECT ol.tid, ds.orid FROM axp_dayschedule ds JOIN axp_order_line ol ON
ol.olid = ds.olid JOIN axp_order o ON ds.orid = o.orid WHERE o.status >= 100
AND ds.day between '2005-05-12' and '2005-05-12' AND ds.used = '1' ) b ON
(a.tid = b.tid) WHERE b.tid IS NULL AND a.sid = 16072;

Good plan:
=========
Aggregate  (cost=221.93..221.93 rows=1 width=4) (actual time=34.262..34.266
rows=1 loops=1)
   ->  Hash Left Join  (cost=9.07..220.86 rows=426 width=4) (actual
time=34.237..34.237 rows=0 loops=1)
         Hash Cond: ("outer".tid = "inner".tid)
         Filter: ("inner".tid IS NULL)
         ->  Index Scan using axp_temp_order_match_idx1 on
axp_temp_order_match a  (cost=0.00..209.65 rows=426 width=4) (actual
time=0.277..0.512 rows=6 loops=1)
               Index Cond: (sid = 16072)
         ->  Hash  (cost=9.07..9.07 rows=1 width=4) (actual
time=32.777..32.777 rows=0 loops=1)
               ->  Nested Loop  (cost=0.00..9.07 rows=1 width=4) (actual
time=0.208..31.563 rows=284 loops=1)
                     ->  Nested Loop  (cost=0.00..6.05 rows=1 width=4)
(actual time=0.178..20.684 rows=552 loops=1)
                           ->  Index Scan using axp_dayschedule_day_idx on
axp_dayschedule ds  (cost=0.00..3.02 rows=1 width=8) (actual
time=0.036..3.973 rows=610 loops=1)
                                 Index Cond: (("day" >= '2005-05-12'::date)
AND ("day" <= '2005-05-12'::date))
                                 Filter: (used = B'1'::"bit")
                           ->  Index Scan using axp_order_orid_key on
axp_order o  (cost=0.00..3.02 rows=1 width=4) (actual time=0.009..0.013
rows=1 loops=610)
                                 Index Cond: ("outer".orid = o.orid)
                                 Filter: (status >= 100)
                     ->  Index Scan using axp_order_line_pk on
axp_order_line ol  (cost=0.00..3.01 rows=1 width=8) (actual
time=0.006..0.008 rows=1 loops=552)
                           Index Cond: (ol.olid = "outer".olid)
 Total runtime: 34.581 ms

Bad plan (same query different values):
=======================================
 Aggregate  (cost=11.54..11.54 rows=1 width=4) (actual
time=11969.281..11969.285 rows=1 loops=1)
   ->  Nested Loop Left Join  (cost=0.00..11.53 rows=1 width=4) (actual
time=25.730..11967.180 rows=338 loops=1)
         Join Filter: ("outer".tid = "inner".tid)
         Filter: ("inner".tid IS NULL)
         ->  Index Scan using axp_temp_order_match_idx1 on
axp_temp_order_match a  (cost=0.00..2.45 rows=1 width=4) (actual
time=0.027..2.980 rows=471 loops=1)
               Index Cond: (sid = 16092)
         ->  Nested Loop  (cost=0.00..9.07 rows=1 width=4) (actual
time=0.088..24.350 rows=285 loops=471)
               ->  Nested Loop  (cost=0.00..6.04 rows=1 width=8) (actual
time=0.067..15.649 rows=317 loops=471)
                     ->  Index Scan using axp_dayschedule_day_idx on
axp_dayschedule ds  (cost=0.00..3.02 rows=1 width=8) (actual
time=0.015..3.557 rows=606 loops=471)
                           Index Cond: (("day" >= '2005-05-13'::date) AND
("day" <= '2005-05-13'::date))
                           Filter: (used = B'1'::"bit")
                     ->  Index Scan using axp_order_line_pk on
axp_order_line ol  (cost=0.00..3.01 rows=1 width=8) (actual
time=0.006..0.008 rows=1 loops=285426)
                           Index Cond: (ol.olid = "outer".olid)
               ->  Index Scan using axp_order_orid_key on axp_order o
(cost=0.00..3.02 rows=1 width=4) (actual time=0.009..0.013 rows=1
loops=149307)
                     Index Cond: ("outer".orid = o.orid)
                     Filter: (status >= 100)
 Total runtime: 11969.443 ms

Please note that sometimes when I get "bad plan" in the logfile, I just
re-run the query and the optimizer chooses the more efficient one. Sometime
it does not.

Any ideas?

postgresql-8.0.2 on 2x3.2 GHz Xeon with 2GB ram Linux 2.6
shared_buffers = 15000
work_mem = 128000
effective_cache_size = 200000
random_page_cost = (tried 1.0 - 4, seemingly without effect on this
particular issue).

Edin




pgsql-performance by date:

Previous
From: "David Roussel"
Date:
Subject: Re: Partitioning / Clustering
Next
From: Alex Stapleton
Date:
Subject: Re: Partitioning / Clustering