Bad mis-costing of Merge Left Join in 8.0.1 - Mailing list pgsql-bugs

From Andrew - Supernews
Subject Bad mis-costing of Merge Left Join in 8.0.1
Date
Msg-id slrnd50l1s.2ilg.andrew+nonews@trinity.supernews.net
Whole thread Raw
Responses Re: Bad mis-costing of Merge Left Join in 8.0.1
List pgsql-bugs
This came up from a user in the IRC channel; while examining his EXPLAIN
ANALYZE output, we found some rather radical discrepancies in the costs
for a merge join, which was resulting in very suboptimal plans. I was
able to reduce his data to a test case as follows:

create table mjtest1 (id integer primary key, value1 integer,
                      padding float8[]);
create table mjtest2 (id integer primary key, value2 integer);
create index mjtest1_idx on mjtest1(value1);
insert into mjtest1
  select i,i%100,
         ARRAY[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
  from generate_series(1,80000) as s(i);
insert into mjtest2 select i,i from generate_series(1,10) as s(i);
analyze;
set enable_mergejoin=true;
explain analyze select * from mjtest1 left join mjtest2 using (id)
  where mjtest1.value1 = 5;
set enable_mergejoin=false;
explain analyze select * from mjtest1 left join mjtest2 using (id)
  where mjtest1.value1 = 5;

On my 8.0.1 test setup, I get the following plans:

                                                              QUERY PLAN
               

---------------------------------------------------------------------------------------------------------------------------------------
 Merge Left Join  (cost=1.27..6.98 rows=779 width=276) (actual time=0.318..114.686 rows=800 loops=1)
   Merge Cond: ("outer".id = "inner".id)
   ->  Index Scan using mjtest1_pkey on mjtest1  (cost=0.00..4402.00 rows=779 width=272) (actual time=0.103..103.150
rows=800loops=1) 
         Filter: (value1 = 5)
   ->  Sort  (cost=1.27..1.29 rows=10 width=8) (actual time=0.144..0.185 rows=10 loops=1)
         Sort Key: mjtest2.id
         ->  Seq Scan on mjtest2  (cost=0.00..1.10 rows=10 width=8) (actual time=0.019..0.068 rows=10 loops=1)
 Total runtime: 118.181 ms
(8 rows)

                                                             QUERY PLAN
            

------------------------------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=1.12..2785.16 rows=779 width=276) (actual time=0.314..15.002 rows=800 loops=1)
   Hash Cond: ("outer".id = "inner".id)
   ->  Index Scan using mjtest1_idx on mjtest1  (cost=0.00..2780.13 rows=779 width=272) (actual time=0.027..6.032
rows=800loops=1) 
         Index Cond: (value1 = 5)
   ->  Hash  (cost=1.10..1.10 rows=10 width=8) (actual time=0.122..0.122 rows=0 loops=1)
         ->  Seq Scan on mjtest2  (cost=0.00..1.10 rows=10 width=8) (actual time=0.018..0.065 rows=10 loops=1)
 Total runtime: 18.415 ms
(7 rows)


The cost for the Merge Left Join is clearly preposterous, since the join
cost can't be lower than the cost of the left branch, as it is an outer
join and therefore that branch must be run to completion. I do not fully
understand the cost estimation code for the merge join, but it appears to
be reducing its total cost estimate below that of the child nodes on the
assumption that the join can be aborted early, which is clearly not the
case for outer joins.

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

pgsql-bugs by date:

Previous
From: "Magnus Hagander"
Date:
Subject: Re: windows installation
Next
From: Tom Lane
Date:
Subject: Re: Bad mis-costing of Merge Left Join in 8.0.1