Re: Optimizer wrongly picks Nested Loop Left Join - Mailing list pgsql-performance

From John A Meinel
Subject Re: Optimizer wrongly picks Nested Loop Left Join
Date
Msg-id 428210FF.4090302@arbash-meinel.com
Whole thread Raw
In response to Optimizer wrongly picks Nested Loop Left Join  ("Edin Kadribasic" <edink@emini.dk>)
Responses Re: Optimizer wrongly picks Nested Loop Left Join
List pgsql-performance
Edin Kadribasic wrote:
> Hi,
>
>
> I have a query that is giving the optimizer (and me) great headache. When
> its in the good mood the optimizer chooses Hash Left Join and the query
> executes in 13ms or so, but sometimes (more and more often) it chooses
> Nested Loop Left Join and the execution time goes up to 2-30sec.
>
> The query:
> SELECT COUNT(DISTINCT a.tid) FROM axp_temp_order_match a LEFT OUTER JOIN (
> SELECT ol.tid, ds.orid FROM axp_dayschedule ds JOIN axp_order_line ol ON
> ol.olid = ds.olid JOIN axp_order o ON ds.orid = o.orid WHERE o.status >= 100
> AND ds.day between '2005-05-12' and '2005-05-12' AND ds.used = '1' ) b ON
> (a.tid = b.tid) WHERE b.tid IS NULL AND a.sid = 16072;
>

Unfortunately, because Hash Join doesn't report the number of rows
(rows=0 always), it's hard to tell how good the estimator is. But I
*can* say that the NestLoop estimation is way off.

>
> Good plan:
> =========
> Aggregate  (cost=221.93..221.93 rows=1 width=4) (actual time=34.262..34.266
> rows=1 loops=1)
>    ->  Hash Left Join  (cost=9.07..220.86 rows=426 width=4) (actual
> time=34.237..34.237 rows=0 loops=1)
>          Hash Cond: ("outer".tid = "inner".tid)
>          Filter: ("inner".tid IS NULL)
>          ->  Index Scan using axp_temp_order_match_idx1 on
> axp_temp_order_match a  (cost=0.00..209.65 rows=426 width=4) (actual
> time=0.277..0.512 rows=6 loops=1)
>                Index Cond: (sid = 16072)
>          ->  Hash  (cost=9.07..9.07 rows=1 width=4) (actual
> time=32.777..32.777 rows=0 loops=1)
>                ->  Nested Loop  (cost=0.00..9.07 rows=1 width=4) (actual
> time=0.208..31.563 rows=284 loops=1)
>                      ->  Nested Loop  (cost=0.00..6.05 rows=1 width=4)
> (actual time=0.178..20.684 rows=552 loops=1)
>                            ->  Index Scan using axp_dayschedule_day_idx on
> axp_dayschedule ds  (cost=0.00..3.02 rows=1 width=8) (actual
> time=0.036..3.973 rows=610 loops=1)
>                                  Index Cond: (("day" >= '2005-05-12'::date)
> AND ("day" <= '2005-05-12'::date))
>                                  Filter: (used = B'1'::"bit")
>                            ->  Index Scan using axp_order_orid_key on
> axp_order o  (cost=0.00..3.02 rows=1 width=4) (actual time=0.009..0.013
> rows=1 loops=610)
>                                  Index Cond: ("outer".orid = o.orid)
>                                  Filter: (status >= 100)
>                      ->  Index Scan using axp_order_line_pk on
> axp_order_line ol  (cost=0.00..3.01 rows=1 width=8) (actual
> time=0.006..0.008 rows=1 loops=552)
>                            Index Cond: (ol.olid = "outer".olid)
>  Total runtime: 34.581 ms
>
> Bad plan (same query different values):
> =======================================
>  Aggregate  (cost=11.54..11.54 rows=1 width=4) (actual
> time=11969.281..11969.285 rows=1 loops=1)
>    ->  Nested Loop Left Join  (cost=0.00..11.53 rows=1 width=4) (actual
> time=25.730..11967.180 rows=338 loops=1)

See here, it thinks it will only have to do 1 nestloop, which would be
quite fast, but it hast to do 338.

>          Join Filter: ("outer".tid = "inner".tid)
>          Filter: ("inner".tid IS NULL)
>          ->  Index Scan using axp_temp_order_match_idx1 on
> axp_temp_order_match a  (cost=0.00..2.45 rows=1 width=4) (actual
> time=0.027..2.980 rows=471 loops=1)
>                Index Cond: (sid = 16092)
>          ->  Nested Loop  (cost=0.00..9.07 rows=1 width=4) (actual
> time=0.088..24.350 rows=285 loops=471)

Same thing here.

>                ->  Nested Loop  (cost=0.00..6.04 rows=1 width=8) (actual
> time=0.067..15.649 rows=317 loops=471)

And here.

>                      ->  Index Scan using axp_dayschedule_day_idx on
> axp_dayschedule ds  (cost=0.00..3.02 rows=1 width=8) (actual
> time=0.015..3.557 rows=606 loops=471)

This estimate is way off too, but it is off in both plans.

>                            Index Cond: (("day" >= '2005-05-13'::date) AND
> ("day" <= '2005-05-13'::date))
>                            Filter: (used = B'1'::"bit")
>                      ->  Index Scan using axp_order_line_pk on
> axp_order_line ol  (cost=0.00..3.01 rows=1 width=8) (actual
> time=0.006..0.008 rows=1 loops=285426)

This is probably what is killing you. It is doing a single lookup 285k
times. The above plan only does it 552 times.
>                            Index Cond: (ol.olid = "outer".olid)
>                ->  Index Scan using axp_order_orid_key on axp_order o
> (cost=0.00..3.02 rows=1 width=4) (actual time=0.009..0.013 rows=1
> loops=149307)
>                      Index Cond: ("outer".orid = o.orid)
>                      Filter: (status >= 100)
>  Total runtime: 11969.443 ms
>
> Please note that sometimes when I get "bad plan" in the logfile, I just
> re-run the query and the optimizer chooses the more efficient one. Sometime
> it does not.

You work_mem is quite high relative to your total Ram, hopefully you
don't have many allowed concurrent connections. But that is a side point.

I assume the tables are freshly VACUUM ANALYZEd. Have you tried altering
the statistics for the columns, one of them to look at is
axp_dayschedule(day). That one seems to be consistently incorrect.
Perhaps because a between with the same date is too restrictive in
postgres? I don't really know.


Also, looking at the original query, maybe I am missing something, but
you are doing a COUNT(DISTINCT) with a LEFT OUTER JOIN.

If it is a LEFT JOIN, isn't that the same as

SELECT COUNT(DISTINCT a.tid) FROM axp_temp_order_match a
  WHERE a.sid = 16072;

I also have to wonder about:

SELECT ... a.tid FROM a LEFT JOIN (...) b ON (a.tid = b.tid)
  WHERE b.tid IS NULL

Isn't that equivalent to

SELECT COUNT(DISTINCT a.tid) FROM a WHERE a.tid IS NULL;

Which is also equivalent to
SELECT CASE WHEN EXISTS (SELECT a.tid FROM a WHERE a.tid IS NULL AND
a.sid = 16072) THEN 1 ELSE 0 END;

I could be wrong, but if a.tid IS NULL (because b.tid IS NULL, and a.tid
= b.tid), DISTINCT can only return 0 or 1 rows, which is the same as
using a case statement. You could also put a LIMIT in there, since you
know DISTINCT can only return 1 row

SELECT COUNT(a.tid) FROM a WHERE a.tid IS NULL AND a.sid = ... LIMIT 1;

However, one final point, COUNT(column) where column is NULL doesn't
count anything. So really your query can be replaced with:

SELECT 0;

Now maybe this is a generated query, and under other circumstances it
gives a different query which actually is important.

The biggest thing is that using a COUNT() and a DISTINCT on a column
that is on the left side of a LEFT JOIN, sounds like you can get rid of
the entire right side of that join.

John
=:->

>
> Any ideas?
>
> postgresql-8.0.2 on 2x3.2 GHz Xeon with 2GB ram Linux 2.6
> shared_buffers = 15000
> work_mem = 128000
> effective_cache_size = 200000
> random_page_cost = (tried 1.0 - 4, seemingly without effect on this
> particular issue).
>
> Edin
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match


Attachment

pgsql-performance by date:

Previous
From: Greg Stark
Date:
Subject: Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL
Next
From: Neil Conway
Date:
Subject: Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL