Thread: Plan for outer joins

Plan for outer joins

From
Alban Hertroys
Date:
Hi,

I was tuning a join on a few tables for a SELECT COUNT(*) query, when I
realized that the content of the second table didn't actually matter to
the count. So, I figured a LEFT OUTER JOIN would be faster, but...
apparently it's not.

Shouldn't the planner notice that the right part of SELECT COUNT(*) FROM
x LEFT OUTER JOIN y USING (id) isn't relevant for the result and skip
the join?

Attached are the explain plans for both versions and a few variations
(TB wraps it if I paste).

Regards,
--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
    7500 AK Enschede

// Integrate Your World //
explain analyze select count(*) from mm_medical_care_container_table JOIN mm_object USING (number);
                                                                      QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=73149.73..73149.73 rows=1 width=0) (actual time=2780.516..2780.517 rows=1 loops=1)
   ->  Hash Join  (cost=2025.33..72927.66 rows=88826 width=0) (actual time=406.166..2774.618 rows=5936 loops=1)
         Hash Cond: ("outer".number = "inner".number)
         ->  Seq Scan on mm_object  (cost=0.00..28666.63 rows=1651963 width=4) (actual time=0.015..1074.074 rows=892597
loops=1)
         ->  Hash  (cost=1369.26..1369.26 rows=88826 width=4) (actual time=22.724..22.724 rows=0 loops=1)
               ->  Seq Scan on mm_medical_care_container_table  (cost=0.00..1369.26 rows=88826 width=4) (actual
time=0.028..13.889rows=5936 loops=1) 
 Total runtime: 2780.599 ms

explain analyze select count(*) from mm_medical_care_container_table LEFT OUTER JOIN mm_object USING (number);
                                                                   QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=95099.40..95099.40 rows=1 width=0) (actual time=2919.999..2920.000 rows=1 loops=1)
   ->  Hash Left Join  (cost=40863.54..94877.34 rows=88826 width=0) (actual time=2283.399..2913.955 rows=5936 loops=1)
         Hash Cond: ("outer".number = "inner".number)
         ->  Seq Scan on mm_medical_care_container_table  (cost=0.00..1369.26 rows=88826 width=4) (actual
time=0.021..12.450rows=5936 loops=1) 
         ->  Hash  (cost=28666.63..28666.63 rows=1651963 width=4) (actual time=2263.407..2263.407 rows=0 loops=1)
               ->  Seq Scan on mm_object  (cost=0.00..28666.63 rows=1651963 width=4) (actual time=0.022..1067.016
rows=892597loops=1) 
 Total runtime: 2920.260 ms

I figured maybe the '*' was the culprit, so I tried this as well:

explain analyze select count(mm_medical_care_container_table.*) from mm_medical_care_container_table LEFT OUTER JOIN
mm_objectUSING (number); 
                                                                   QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=95099.40..95099.40 rows=1 width=4) (actual time=2984.603..2984.603 rows=1 loops=1)
   ->  Hash Left Join  (cost=40863.54..94877.34 rows=88826 width=4) (actual time=2318.856..2978.470 rows=5936 loops=1)
         Hash Cond: ("outer".number = "inner".number)
         ->  Seq Scan on mm_medical_care_container_table  (cost=0.00..1369.26 rows=88826 width=8) (actual
time=0.024..16.904rows=5936 loops=1) 
         ->  Hash  (cost=28666.63..28666.63 rows=1651963 width=4) (actual time=2295.350..2295.350 rows=0 loops=1)
               ->  Seq Scan on mm_object  (cost=0.00..28666.63 rows=1651963 width=4) (actual time=0.018..1074.118
rows=892597loops=1) 
 Total runtime: 2985.101 ms

And here's what I expected to happen:

explain analyze select count(*) from mm_medical_care_container_table;
                                                                QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1591.33..1591.33 rows=1 width=0) (actual time=17.427..17.428 rows=1 loops=1)
   ->  Seq Scan on mm_medical_care_container_table  (cost=0.00..1369.26 rows=88826 width=0) (actual time=0.021..12.186
rows=5936loops=1) 
 Total runtime: 17.482 ms

Re: Plan for outer joins

From
Martijn van Oosterhout
Date:
On Mon, Sep 11, 2006 at 11:58:56AM +0200, Alban Hertroys wrote:
> Hi,
>
> I was tuning a join on a few tables for a SELECT COUNT(*) query, when I
> realized that the content of the second table didn't actually matter to
> the count. So, I figured a LEFT OUTER JOIN would be faster, but...
> apparently it's not.

Well, it would matter if the outer side of the join could return more
than one row. You may know that won't happen, but the database may not.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: Plan for outer joins

From
Alban Hertroys
Date:
Martijn van Oosterhout wrote:
> On Mon, Sep 11, 2006 at 11:58:56AM +0200, Alban Hertroys wrote:
>> Hi,
>>
>> I was tuning a join on a few tables for a SELECT COUNT(*) query, when I
>> realized that the content of the second table didn't actually matter to
>> the count. So, I figured a LEFT OUTER JOIN would be faster, but...
>> apparently it's not.
>
> Well, it would matter if the outer side of the join could return more
> than one row. You may know that won't happen, but the database may not.

Oh, of course. Thanks.

--
Alban Hertroys