Re: ORDER BY does not work as expected with multiple joins - Mailing list pgsql-sql

From Michael Glaesemann
Subject Re: ORDER BY does not work as expected with multiple joins
Date
Msg-id 3D62826C-0BCA-4A08-B745-86CE1C9CC210@myrealbox.com
Whole thread Raw
In response to Re: ORDER BY does not work as expected with multiple joins  (Michael Glaesemann <grzm@myrealbox.com>)
List pgsql-sql
On Jan 14, 2006, at 0:45 , Michael Glaesemann wrote:

>
> On Jan 14, 2006, at 0:22 , Adam Rosi-Kessel wrote:
>
>> id1 will always have a value but id2 can be NULL. So should I do a  
>> left JOIN
>> on id2 but a plain JOIN on id1? Is there a disadvantage to using a  
>> left JOIN
>> where it is not necessary?
>
> In that case, yes, I'd JOIN on id1 and LEFT JOIN on id2. I'm not  
> sure if there's a penalty or not in query planning, though there  
> might be.


> You can always use EXPLAIN ANALYZE to compare query plans. It can  
> be very useful to see how your query is executed by the planner.

Of course, this sample is probably much smaller than your actual  
dataset, but here it does appear that JOIN is a (tiny) bit more  
efficient than LEFT [OUTER] JOIN. I'm a novice at reading these, but  
the things I notice are:

* The topmost hash join in the first query is faster than the topmost  
merge left join in the second.
* The second query needs to do an extra sort and sequential scan that  
is unnecessary in the first.

In *this* case, I think the difference may be just noise. You'll get  
slightly different times each run, so you'll have to see what runs  
better on average if they're close. However, it does look like the  
left join forces at least an extra 2 steps, which should take more  
time, on average, so I think the LEFT JOIN will be slower than the JOIN.

All those more knowledgeable, feel free to jump in and correct me :)

Michael Glaesemann
grzm myrealbox com

test=# explain analyze
select some_date    , x.some_name as name1    , y.some_name as name2
from table1
join table2 as x on id1 = x.id
join table2 as y on id2 = y.id
where (some_date is not null and (id1 = 1 or id2 = 1))
order by some_date;                                                          QUERY PLAN
------------------------------------------------------------------------ 
-------------------------------------------------------
Sort  (cost=3.20..3.21 rows=2 width=18) (actual time=0.270..0.272  
rows=3 loops=1)   Sort Key: table1.some_date   ->  Hash Join  (cost=2.12..3.19 rows=2 width=18) (actual  
time=0.219..0.227 rows=3 loops=1)         Hash Cond: ("outer".id = "inner".id1)         ->  Seq Scan on table2 x
(cost=0.00..1.03rows=3 width=11)  
 
(actual time=0.016..0.019 rows=3 loops=1)         ->  Hash  (cost=2.12..2.12 rows=2 width=15) (actual  
time=0.141..0.141 rows=3 loops=1)               ->  Hash Join  (cost=1.05..2.12 rows=2 width=15)  
(actual time=0.109..0.118 rows=3 loops=1)                     Hash Cond: ("outer".id = "inner".id2)
-> Seq Scan on table2 y  (cost=0.00..1.03  
 
rows=3 width=11) (actual time=0.004..0.008 rows=3 loops=1)                     ->  Hash  (cost=1.04..1.04 rows=2
width=12) 
 
(actual time=0.053..0.053 rows=3 loops=1)                           ->  Seq Scan on table1  (cost=0.00..1.04  
rows=2 width=12) (actual time=0.017..0.027 rows=3 loops=1)                                 Filter: ((some_date IS NOT
NULL) 
 
AND ((id1 = 1) OR (id2 = 1)))
Total runtime: 0.594 ms
(13 rows)

test=# explain analyze
select some_date    , x.some_name as name1    , y.some_name as name2
from table1
join table2 as x on id1 = x.id
left join table2 as y on id2 = y.id
where (some_date is not null and (id1 = 1 or id2 = 1))
order by some_date;                                                          QUERY PLAN
------------------------------------------------------------------------ 
-------------------------------------------------------
Sort  (cost=3.24..3.25 rows=2 width=18) (actual time=0.281..0.282  
rows=3 loops=1)   Sort Key: table1.some_date   ->  Merge Left Join  (cost=3.19..3.23 rows=2 width=18) (actual  
time=0.242..0.251 rows=3 loops=1)         Merge Cond: ("outer".id2 = "inner".id)         ->  Sort  (cost=2.13..2.14
rows=2width=15) (actual  
 
time=0.165..0.165 rows=3 loops=1)               Sort Key: table1.id2               ->  Hash Join  (cost=1.05..2.12
rows=2width=15)  
 
(actual time=0.128..0.137 rows=3 loops=1)                     Hash Cond: ("outer".id = "inner".id1)
-> Seq Scan on table2 x  (cost=0.00..1.03  
 
rows=3 width=11) (actual time=0.016..0.020 rows=3 loops=1)                     ->  Hash  (cost=1.04..1.04 rows=2
width=12) 
 
(actual time=0.052..0.052 rows=3 loops=1)                           ->  Seq Scan on table1  (cost=0.00..1.04  
rows=2 width=12) (actual time=0.017..0.027 rows=3 loops=1)                                 Filter: ((some_date IS NOT
NULL) 
 
AND ((id1 = 1) OR (id2 = 1)))         ->  Sort  (cost=1.05..1.06 rows=3 width=11) (actual  
time=0.067..0.068 rows=3 loops=1)               Sort Key: y.id               ->  Seq Scan on table2 y  (cost=0.00..1.03
rows=3 
 
width=11) (actual time=0.005..0.010 rows=3 loops=1)
Total runtime: 0.600 ms
(16 rows)



pgsql-sql by date:

Previous
From: Tom Lane
Date:
Subject: Re: ORDER BY does not work as expected with multiple joins
Next
From: Jaime Casanova
Date:
Subject: Re: ORDER BY does not work as expected with multiple joins