Thread: Optimizer wrongly picks Nested Loop Left Join
Hi, I have a query that is giving the optimizer (and me) great headache. When its in the good mood the optimizer chooses Hash Left Join and the query executes in 13ms or so, but sometimes (more and more often) it chooses Nested Loop Left Join and the execution time goes up to 2-30sec. The query: SELECT COUNT(DISTINCT a.tid) FROM axp_temp_order_match a LEFT OUTER JOIN ( SELECT ol.tid, ds.orid FROM axp_dayschedule ds JOIN axp_order_line ol ON ol.olid = ds.olid JOIN axp_order o ON ds.orid = o.orid WHERE o.status >= 100 AND ds.day between '2005-05-12' and '2005-05-12' AND ds.used = '1' ) b ON (a.tid = b.tid) WHERE b.tid IS NULL AND a.sid = 16072; Good plan: ========= Aggregate (cost=221.93..221.93 rows=1 width=4) (actual time=34.262..34.266 rows=1 loops=1) -> Hash Left Join (cost=9.07..220.86 rows=426 width=4) (actual time=34.237..34.237 rows=0 loops=1) Hash Cond: ("outer".tid = "inner".tid) Filter: ("inner".tid IS NULL) -> Index Scan using axp_temp_order_match_idx1 on axp_temp_order_match a (cost=0.00..209.65 rows=426 width=4) (actual time=0.277..0.512 rows=6 loops=1) Index Cond: (sid = 16072) -> Hash (cost=9.07..9.07 rows=1 width=4) (actual time=32.777..32.777 rows=0 loops=1) -> Nested Loop (cost=0.00..9.07 rows=1 width=4) (actual time=0.208..31.563 rows=284 loops=1) -> Nested Loop (cost=0.00..6.05 rows=1 width=4) (actual time=0.178..20.684 rows=552 loops=1) -> Index Scan using axp_dayschedule_day_idx on axp_dayschedule ds (cost=0.00..3.02 rows=1 width=8) (actual time=0.036..3.973 rows=610 loops=1) Index Cond: (("day" >= '2005-05-12'::date) AND ("day" <= '2005-05-12'::date)) Filter: (used = B'1'::"bit") -> Index Scan using axp_order_orid_key on axp_order o (cost=0.00..3.02 rows=1 width=4) (actual time=0.009..0.013 rows=1 loops=610) Index Cond: ("outer".orid = o.orid) Filter: (status >= 100) -> Index Scan using axp_order_line_pk on axp_order_line ol (cost=0.00..3.01 rows=1 width=8) (actual time=0.006..0.008 rows=1 loops=552) Index Cond: (ol.olid = "outer".olid) Total runtime: 34.581 ms Bad plan (same query different values): ======================================= Aggregate (cost=11.54..11.54 rows=1 width=4) (actual time=11969.281..11969.285 rows=1 loops=1) -> Nested Loop Left Join (cost=0.00..11.53 rows=1 width=4) (actual time=25.730..11967.180 rows=338 loops=1) Join Filter: ("outer".tid = "inner".tid) Filter: ("inner".tid IS NULL) -> Index Scan using axp_temp_order_match_idx1 on axp_temp_order_match a (cost=0.00..2.45 rows=1 width=4) (actual time=0.027..2.980 rows=471 loops=1) Index Cond: (sid = 16092) -> Nested Loop (cost=0.00..9.07 rows=1 width=4) (actual time=0.088..24.350 rows=285 loops=471) -> Nested Loop (cost=0.00..6.04 rows=1 width=8) (actual time=0.067..15.649 rows=317 loops=471) -> Index Scan using axp_dayschedule_day_idx on axp_dayschedule ds (cost=0.00..3.02 rows=1 width=8) (actual time=0.015..3.557 rows=606 loops=471) Index Cond: (("day" >= '2005-05-13'::date) AND ("day" <= '2005-05-13'::date)) Filter: (used = B'1'::"bit") -> Index Scan using axp_order_line_pk on axp_order_line ol (cost=0.00..3.01 rows=1 width=8) (actual time=0.006..0.008 rows=1 loops=285426) Index Cond: (ol.olid = "outer".olid) -> Index Scan using axp_order_orid_key on axp_order o (cost=0.00..3.02 rows=1 width=4) (actual time=0.009..0.013 rows=1 loops=149307) Index Cond: ("outer".orid = o.orid) Filter: (status >= 100) Total runtime: 11969.443 ms Please note that sometimes when I get "bad plan" in the logfile, I just re-run the query and the optimizer chooses the more efficient one. Sometime it does not. Any ideas? postgresql-8.0.2 on 2x3.2 GHz Xeon with 2GB ram Linux 2.6 shared_buffers = 15000 work_mem = 128000 effective_cache_size = 200000 random_page_cost = (tried 1.0 - 4, seemingly without effect on this particular issue). Edin
Edin Kadribasic wrote: > Hi, > > > I have a query that is giving the optimizer (and me) great headache. When > its in the good mood the optimizer chooses Hash Left Join and the query > executes in 13ms or so, but sometimes (more and more often) it chooses > Nested Loop Left Join and the execution time goes up to 2-30sec. > > The query: > SELECT COUNT(DISTINCT a.tid) FROM axp_temp_order_match a LEFT OUTER JOIN ( > SELECT ol.tid, ds.orid FROM axp_dayschedule ds JOIN axp_order_line ol ON > ol.olid = ds.olid JOIN axp_order o ON ds.orid = o.orid WHERE o.status >= 100 > AND ds.day between '2005-05-12' and '2005-05-12' AND ds.used = '1' ) b ON > (a.tid = b.tid) WHERE b.tid IS NULL AND a.sid = 16072; > Unfortunately, because Hash Join doesn't report the number of rows (rows=0 always), it's hard to tell how good the estimator is. But I *can* say that the NestLoop estimation is way off. > > Good plan: > ========= > Aggregate (cost=221.93..221.93 rows=1 width=4) (actual time=34.262..34.266 > rows=1 loops=1) > -> Hash Left Join (cost=9.07..220.86 rows=426 width=4) (actual > time=34.237..34.237 rows=0 loops=1) > Hash Cond: ("outer".tid = "inner".tid) > Filter: ("inner".tid IS NULL) > -> Index Scan using axp_temp_order_match_idx1 on > axp_temp_order_match a (cost=0.00..209.65 rows=426 width=4) (actual > time=0.277..0.512 rows=6 loops=1) > Index Cond: (sid = 16072) > -> Hash (cost=9.07..9.07 rows=1 width=4) (actual > time=32.777..32.777 rows=0 loops=1) > -> Nested Loop (cost=0.00..9.07 rows=1 width=4) (actual > time=0.208..31.563 rows=284 loops=1) > -> Nested Loop (cost=0.00..6.05 rows=1 width=4) > (actual time=0.178..20.684 rows=552 loops=1) > -> Index Scan using axp_dayschedule_day_idx on > axp_dayschedule ds (cost=0.00..3.02 rows=1 width=8) (actual > time=0.036..3.973 rows=610 loops=1) > Index Cond: (("day" >= '2005-05-12'::date) > AND ("day" <= '2005-05-12'::date)) > Filter: (used = B'1'::"bit") > -> Index Scan using axp_order_orid_key on > axp_order o (cost=0.00..3.02 rows=1 width=4) (actual time=0.009..0.013 > rows=1 loops=610) > Index Cond: ("outer".orid = o.orid) > Filter: (status >= 100) > -> Index Scan using axp_order_line_pk on > axp_order_line ol (cost=0.00..3.01 rows=1 width=8) (actual > time=0.006..0.008 rows=1 loops=552) > Index Cond: (ol.olid = "outer".olid) > Total runtime: 34.581 ms > > Bad plan (same query different values): > ======================================= > Aggregate (cost=11.54..11.54 rows=1 width=4) (actual > time=11969.281..11969.285 rows=1 loops=1) > -> Nested Loop Left Join (cost=0.00..11.53 rows=1 width=4) (actual > time=25.730..11967.180 rows=338 loops=1) See here, it thinks it will only have to do 1 nestloop, which would be quite fast, but it hast to do 338. > Join Filter: ("outer".tid = "inner".tid) > Filter: ("inner".tid IS NULL) > -> Index Scan using axp_temp_order_match_idx1 on > axp_temp_order_match a (cost=0.00..2.45 rows=1 width=4) (actual > time=0.027..2.980 rows=471 loops=1) > Index Cond: (sid = 16092) > -> Nested Loop (cost=0.00..9.07 rows=1 width=4) (actual > time=0.088..24.350 rows=285 loops=471) Same thing here. > -> Nested Loop (cost=0.00..6.04 rows=1 width=8) (actual > time=0.067..15.649 rows=317 loops=471) And here. > -> Index Scan using axp_dayschedule_day_idx on > axp_dayschedule ds (cost=0.00..3.02 rows=1 width=8) (actual > time=0.015..3.557 rows=606 loops=471) This estimate is way off too, but it is off in both plans. > Index Cond: (("day" >= '2005-05-13'::date) AND > ("day" <= '2005-05-13'::date)) > Filter: (used = B'1'::"bit") > -> Index Scan using axp_order_line_pk on > axp_order_line ol (cost=0.00..3.01 rows=1 width=8) (actual > time=0.006..0.008 rows=1 loops=285426) This is probably what is killing you. It is doing a single lookup 285k times. The above plan only does it 552 times. > Index Cond: (ol.olid = "outer".olid) > -> Index Scan using axp_order_orid_key on axp_order o > (cost=0.00..3.02 rows=1 width=4) (actual time=0.009..0.013 rows=1 > loops=149307) > Index Cond: ("outer".orid = o.orid) > Filter: (status >= 100) > Total runtime: 11969.443 ms > > Please note that sometimes when I get "bad plan" in the logfile, I just > re-run the query and the optimizer chooses the more efficient one. Sometime > it does not. You work_mem is quite high relative to your total Ram, hopefully you don't have many allowed concurrent connections. But that is a side point. I assume the tables are freshly VACUUM ANALYZEd. Have you tried altering the statistics for the columns, one of them to look at is axp_dayschedule(day). That one seems to be consistently incorrect. Perhaps because a between with the same date is too restrictive in postgres? I don't really know. Also, looking at the original query, maybe I am missing something, but you are doing a COUNT(DISTINCT) with a LEFT OUTER JOIN. If it is a LEFT JOIN, isn't that the same as SELECT COUNT(DISTINCT a.tid) FROM axp_temp_order_match a WHERE a.sid = 16072; I also have to wonder about: SELECT ... a.tid FROM a LEFT JOIN (...) b ON (a.tid = b.tid) WHERE b.tid IS NULL Isn't that equivalent to SELECT COUNT(DISTINCT a.tid) FROM a WHERE a.tid IS NULL; Which is also equivalent to SELECT CASE WHEN EXISTS (SELECT a.tid FROM a WHERE a.tid IS NULL AND a.sid = 16072) THEN 1 ELSE 0 END; I could be wrong, but if a.tid IS NULL (because b.tid IS NULL, and a.tid = b.tid), DISTINCT can only return 0 or 1 rows, which is the same as using a case statement. You could also put a LIMIT in there, since you know DISTINCT can only return 1 row SELECT COUNT(a.tid) FROM a WHERE a.tid IS NULL AND a.sid = ... LIMIT 1; However, one final point, COUNT(column) where column is NULL doesn't count anything. So really your query can be replaced with: SELECT 0; Now maybe this is a generated query, and under other circumstances it gives a different query which actually is important. The biggest thing is that using a COUNT() and a DISTINCT on a column that is on the left side of a LEFT JOIN, sounds like you can get rid of the entire right side of that join. John =:-> > > Any ideas? > > postgresql-8.0.2 on 2x3.2 GHz Xeon with 2GB ram Linux 2.6 > shared_buffers = 15000 > work_mem = 128000 > effective_cache_size = 200000 > random_page_cost = (tried 1.0 - 4, seemingly without effect on this > particular issue). > > Edin > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match
Attachment
"Edin Kadribasic" <edink@emini.dk> writes: > I have a query that is giving the optimizer (and me) great headache. The main problem seems to be that the rowcount estimates for axp_temp_order_match and axp_dayschedule are way off: > -> Index Scan using axp_temp_order_match_idx1 on > axp_temp_order_match a (cost=0.00..209.65 rows=426 width=4) (actual > time=0.277..0.512 rows=6 loops=1) > Index Cond: (sid = 16072) > -> Index Scan using axp_dayschedule_day_idx on > axp_dayschedule ds (cost=0.00..3.02 rows=1 width=8) (actual > time=0.036..3.973 rows=610 loops=1) > Index Cond: (("day" >= '2005-05-12'::date) > AND ("day" <= '2005-05-12'::date)) > Filter: (used = B'1'::"bit") > -> Index Scan using axp_temp_order_match_idx1 on > axp_temp_order_match a (cost=0.00..2.45 rows=1 width=4) (actual > time=0.027..2.980 rows=471 loops=1) > Index Cond: (sid = 16092) > -> Index Scan using axp_dayschedule_day_idx on > axp_dayschedule ds (cost=0.00..3.02 rows=1 width=8) (actual > time=0.015..3.557 rows=606 loops=471) > Index Cond: (("day" >= '2005-05-13'::date) AND > ("day" <= '2005-05-13'::date)) > Filter: (used = B'1'::"bit") Do you ANALYZE these tables on a regular basis? If so, it may be necessary to increase the statistics target to the point where you get better estimates. > Please note that sometimes when I get "bad plan" in the logfile, I just > re-run the query and the optimizer chooses the more efficient one. That's fairly hard to believe, unless you've got autovacuum running in the background. regards, tom lane
John A Meinel <john@arbash-meinel.com> writes: > Unfortunately, because Hash Join doesn't report the number of rows > (rows=0 always), it's hard to tell how good the estimator is. This is only a cosmetic problem because you can just look at the number of rows actually emitted by the Hash node's child; that's always exactly the number loaded into the hashtable. (But having said that, it is fixed in CVS tip.) You may be confused though --- the Hash node is not the Hash Join node. A zero report from Hash Join does actually mean that it emitted zero rows. regards, tom lane
From: "Tom Lane" <tgl@sss.pgh.pa.us> > "Edin Kadribasic" <edink@emini.dk> writes: > > I have a query that is giving the optimizer (and me) great headache. > > The main problem seems to be that the rowcount estimates for > axp_temp_order_match and axp_dayschedule are way off: > > > -> Index Scan using axp_temp_order_match_idx1 on > > axp_temp_order_match a (cost=0.00..209.65 rows=426 width=4) (actual > > time=0.277..0.512 rows=6 loops=1) > > Index Cond: (sid = 16072) > > > -> Index Scan using axp_dayschedule_day_idx on > > axp_dayschedule ds (cost=0.00..3.02 rows=1 width=8) (actual > > time=0.036..3.973 rows=610 loops=1) > > Index Cond: (("day" >= '2005-05-12'::date) > > AND ("day" <= '2005-05-12'::date)) > > Filter: (used = B'1'::"bit") > > > -> Index Scan using axp_temp_order_match_idx1 on > > axp_temp_order_match a (cost=0.00..2.45 rows=1 width=4) (actual > > time=0.027..2.980 rows=471 loops=1) > > Index Cond: (sid = 16092) > > > -> Index Scan using axp_dayschedule_day_idx on > > axp_dayschedule ds (cost=0.00..3.02 rows=1 width=8) (actual > > time=0.015..3.557 rows=606 loops=471) > > Index Cond: (("day" >= '2005-05-13'::date) AND > > ("day" <= '2005-05-13'::date)) > > Filter: (used = B'1'::"bit") > > Do you ANALYZE these tables on a regular basis? If so, it may be > necessary to increase the statistics target to the point where you > get better estimates. Increasing statistics didn't seem to help, but both of you gave me an idea of what might be wrong. axp_temp_order match contains temporary matches for a search. Just before execution of that query the new matches are inserted into the table under a new search id (sid column). Since the ANALYZE was that before it it grossly underestimates the number of matches for that sid. As this table is relatively small inserting ANALYZE axp_temp_order_match just before running the query does not introduce a great perforance penalty (50ms) and it reduces the query execution time from up to 50s down to ~20ms. > > Please note that sometimes when I get "bad plan" in the logfile, I just > > re-run the query and the optimizer chooses the more efficient one. > > That's fairly hard to believe, unless you've got autovacuum running > in the background. The application had ANALYZE axp_temp_order_match placed in the "slightly" wrong location, before the large insert was done (1000 rows with a new sid). So when the app run the next search, previous search got correctly analyzed and the query execution time dropped dramatically as I was trying to EXPLAIN ANALYZE query recorded in the log file. Thanks for your help, Edin