Plan for outer joins - Mailing list pgsql-general

From Alban Hertroys
Subject Plan for outer joins
Date
Msg-id 45053360.6020906@magproductions.nl
Whole thread Raw
Responses Re: Plan for outer joins
List pgsql-general
Hi,

I was tuning a join on a few tables for a SELECT COUNT(*) query, when I
realized that the content of the second table didn't actually matter to
the count. So, I figured a LEFT OUTER JOIN would be faster, but...
apparently it's not.

Shouldn't the planner notice that the right part of SELECT COUNT(*) FROM
x LEFT OUTER JOIN y USING (id) isn't relevant for the result and skip
the join?

Attached are the explain plans for both versions and a few variations
(TB wraps it if I paste).

Regards,
--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
    7500 AK Enschede

// Integrate Your World //
explain analyze select count(*) from mm_medical_care_container_table JOIN mm_object USING (number);
                                                                      QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=73149.73..73149.73 rows=1 width=0) (actual time=2780.516..2780.517 rows=1 loops=1)
   ->  Hash Join  (cost=2025.33..72927.66 rows=88826 width=0) (actual time=406.166..2774.618 rows=5936 loops=1)
         Hash Cond: ("outer".number = "inner".number)
         ->  Seq Scan on mm_object  (cost=0.00..28666.63 rows=1651963 width=4) (actual time=0.015..1074.074 rows=892597
loops=1)
         ->  Hash  (cost=1369.26..1369.26 rows=88826 width=4) (actual time=22.724..22.724 rows=0 loops=1)
               ->  Seq Scan on mm_medical_care_container_table  (cost=0.00..1369.26 rows=88826 width=4) (actual
time=0.028..13.889rows=5936 loops=1) 
 Total runtime: 2780.599 ms

explain analyze select count(*) from mm_medical_care_container_table LEFT OUTER JOIN mm_object USING (number);
                                                                   QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=95099.40..95099.40 rows=1 width=0) (actual time=2919.999..2920.000 rows=1 loops=1)
   ->  Hash Left Join  (cost=40863.54..94877.34 rows=88826 width=0) (actual time=2283.399..2913.955 rows=5936 loops=1)
         Hash Cond: ("outer".number = "inner".number)
         ->  Seq Scan on mm_medical_care_container_table  (cost=0.00..1369.26 rows=88826 width=4) (actual
time=0.021..12.450rows=5936 loops=1) 
         ->  Hash  (cost=28666.63..28666.63 rows=1651963 width=4) (actual time=2263.407..2263.407 rows=0 loops=1)
               ->  Seq Scan on mm_object  (cost=0.00..28666.63 rows=1651963 width=4) (actual time=0.022..1067.016
rows=892597loops=1) 
 Total runtime: 2920.260 ms

I figured maybe the '*' was the culprit, so I tried this as well:

explain analyze select count(mm_medical_care_container_table.*) from mm_medical_care_container_table LEFT OUTER JOIN
mm_objectUSING (number); 
                                                                   QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=95099.40..95099.40 rows=1 width=4) (actual time=2984.603..2984.603 rows=1 loops=1)
   ->  Hash Left Join  (cost=40863.54..94877.34 rows=88826 width=4) (actual time=2318.856..2978.470 rows=5936 loops=1)
         Hash Cond: ("outer".number = "inner".number)
         ->  Seq Scan on mm_medical_care_container_table  (cost=0.00..1369.26 rows=88826 width=8) (actual
time=0.024..16.904rows=5936 loops=1) 
         ->  Hash  (cost=28666.63..28666.63 rows=1651963 width=4) (actual time=2295.350..2295.350 rows=0 loops=1)
               ->  Seq Scan on mm_object  (cost=0.00..28666.63 rows=1651963 width=4) (actual time=0.018..1074.118
rows=892597loops=1) 
 Total runtime: 2985.101 ms

And here's what I expected to happen:

explain analyze select count(*) from mm_medical_care_container_table;
                                                                QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1591.33..1591.33 rows=1 width=0) (actual time=17.427..17.428 rows=1 loops=1)
   ->  Seq Scan on mm_medical_care_container_table  (cost=0.00..1369.26 rows=88826 width=0) (actual time=0.021..12.186
rows=5936loops=1) 
 Total runtime: 17.482 ms

pgsql-general by date:

Previous
From: "Purusothaman A"
Date:
Subject: Re: [ADMIN] Problem with lo_export() and lo_import() from remote machine.
Next
From: Martijn van Oosterhout
Date:
Subject: Re: Child program using parent program's transaction?