Thread: ORDER BY does not work as expected with multiple joins

ORDER BY does not work as expected with multiple joins

From
Adam Rosi-Kessel
Date:
I posted this message once through the Google Groups interface to pgsql.sql,
but I've received no replies and the message doesn't appear at all in the
MHonArc archives at postgresql.org, so I'm not sure it actually was
distributed to this list. Apologies if this is a duplicate:

I have a multijoin SELECT statement where the ORDER BY clause doesn't
seem to be working. I can't figure out how pgsql is sorting the
results, but it is definitely different from the same list without the
multiple joins.

To simplify: I have two tables. The first table ("table1") has three
fields: date, id1, and id2. The data look like this:

Date     ID1   ID2
1/5/06   2     1
1/1/06   1     2
1/3/06   1     3

The second table ("table2") has two fields: id and name. The data look
like this:

ID   NAME
1    John
2    Jane
3    Sam

What I am trying to do is join the two tables and sort by the date.
ID1 and ID2 in table1 *both* reference ID in table2.  The desired
result of the SELECT statement would look like this:

Date     name1  name2
1/1/06   John     Jane
1/3/06   John     Sam
1/5/06   Jane     John

For some reason, the result is not properly sorting by date--the data
are being reordered, but not in a way I can understand.

Here is the SELECT statement:

SELECT date,x.name as name1,y.name AS name2 FROM table1 LEFT JOIN
table2 AS x ON id1=x.id RIGHT JOIN table2 AS y ON id2=y.id WHERE (date
IS NOT NULL AND (id1 = ? OR id2 = ?)) ORDER BY date;

(where ? is replaced by the desired ID -- I want the results to list
all instances in table1 where id1 or id2 is a given ID number).

Did I mess up the multiple JOINs?  It seems like the SELECT is giving
the correct results other than not ordering by DATE.


Re: ORDER BY does not work as expected with multiple joins

From
Tom Lane
Date:
Adam Rosi-Kessel <adam@rosi-kessel.org> writes:
> I have a multijoin SELECT statement where the ORDER BY clause doesn't
> seem to be working.

You haven't mentioned your Postgres version, nor provided anywhere near
enough information to let other people reproduce the problem.  You might
find it helpful to read this:
http://www.postgresql.org/docs/8.1/static/bug-reporting.html

One comment is that your joining seems wrong: I'd have thought the
RIGHT JOIN should be LEFT JOIN for what you're trying to do.  But that
shouldn't make any difference to whether the sort order is correct.
        regards, tom lane


Re: ORDER BY does not work as expected with multiple joins

From
Adam Rosi-Kessel
Date:
Tom Lane wrote:
> Adam Rosi-Kessel <adam@rosi-kessel.org> writes:
>> I have a multijoin SELECT statement where the ORDER BY clause doesn't
>> seem to be working.
> You haven't mentioned your Postgres version, nor provided anywhere near
> enough information to let other people reproduce the problem.  You might
> find it helpful to read this:
> http://www.postgresql.org/docs/8.1/static/bug-reporting.html

Thanks. I didn't think this was a bug--I thought I had an error in my SELECT
statement somehow. I certainly would have included more information with a
bug report. (For the record, it is PostgreSQL 7.4 from Debian Sarge).

> One comment is that your joining seems wrong: I'd have thought the
> RIGHT JOIN should be LEFT JOIN for what you're trying to do.  But that
> shouldn't make any difference to whether the sort order is correct.

Actually, it looks like I might have just been confused about DATESTYLE.
Adding in some dates where the day was greater than 12 clarified how the
sorting was being done. Due to a bug in my program, the DATESTYLE was set
differently for this multiple JOIN SELECT. Sorry about the misguided request!

But I am curious why you think a LEFT JOIN should be used rather than RIGHT
JOIN? I think I believe I am getting the desired set of records. When I
substitute LEFT JOIN for RIGHT JOIN I actually get the same result. Can you
help me understand under what conditions, in this case, I would obtain a
different result?

Adam


Re: ORDER BY does not work as expected with multiple joins

From
Michael Glaesemann
Date:
On Jan 13, 2006, at 23:32 , Adam Rosi-Kessel wrote:

> I have a multijoin SELECT statement where the ORDER BY clause doesn't
> seem to be working. I can't figure out how pgsql is sorting the
> results, but it is definitely different from the same list without the
> multiple joins.

<snip />

> What I am trying to do is join the two tables and sort by the date.
> ID1 and ID2 in table1 *both* reference ID in table2.  The desired
> result of the SELECT statement would look like this:
>
> Date     name1  name2
> 1/1/06   John     Jane
> 1/3/06   John     Sam
> 1/5/06   Jane     John

I don't seem to have a problem with the ordering given your data. I'm  
not sure why you're doing right and left joins, though. You might  
want to use left joins if you have NULL id1 or id2, but otherwise  
just plain JOIN should work fine. I've included both your SELECT  
(slightly modified to avoid using key words) and how I'd probably  
right the statement.

test=# select * from table2;
id | some_name
----+-----------  1 | John  2 | Jane  3 | Sam
(3 rows)

test=# select * from table1;
id1 | id2 | some_date
-----+-----+------------   2 |   1 | 2006-01-05   1 |   2 | 2006-01-01   1 |   3 | 2006-01-03
(3 rows)

test=# SELECT some_date, x.some_name as name1, y.some_name AS name2
FROM table1
LEFT JOIN table2 AS x ON id1 = x.id
RIGHT JOIN table2 AS y ON id2 = y.id
WHERE (some_date IS NOT NULL AND (id1 = 1 OR id2 = 1))
ORDER BY some_date;
some_date  | name1 | name2
------------+-------+-------
2006-01-01 | John  | Jane
2006-01-03 | John  | Sam
2006-01-05 | Jane  | John
(3 rows)

test=# select some_date, x.some_name as name1, y.some_name as name2
from table1
join table2 as x on id1 = x.id
join table2 as y on id2 = y.id
where (some_date is not null and (id1 = 1 or id2 = 1))
order by some_date;
some_date  | name1 | name2
------------+-------+-------
2006-01-01 | John  | Jane
2006-01-03 | John  | Sam
2006-01-05 | Jane  | John
(3 rows)

Hope this helps.

Michael Glaesemann
grzm myrealbox com


-- DDL
create table table1 (    id1 integer not null    , id2 integer not null    , some_date date not null
);

copy table1 (some_date, id1, id2) from stdin;
2006-01-05    2    1
2006-01-01    1    2
2006-01-03    1    3
\.

create table table2 (id integer not null, some_name text not null
);

copy table2 (id, some_name) from stdin;
1    John
2    Jane
3    Sam
\.




Re: ORDER BY does not work as expected with multiple joins

From
Adam Rosi-Kessel
Date:
Thanks for your helpful response.

Michael Glaesemann wrote:
> I don't seem to have a problem with the ordering given your data. I'm
> not sure why you're doing right and left joins, though. You might want
> to use left joins if you have NULL id1 or id2, but otherwise just plain
> JOIN should work fine. I've included both your SELECT (slightly modified
> to avoid using key words) and how I'd probably right the statement.

id1 will always have a value but id2 can be NULL. So should I do a left JOIN
on id2 but a plain JOIN on id1? Is there a disadvantage to using a left JOIN
where it is not necessary?

Adam


Re: ORDER BY does not work as expected with multiple joins

From
Michael Glaesemann
Date:
On Jan 14, 2006, at 0:22 , Adam Rosi-Kessel wrote:

> id1 will always have a value but id2 can be NULL. So should I do a  
> left JOIN
> on id2 but a plain JOIN on id1? Is there a disadvantage to using a  
> left JOIN
> where it is not necessary?

In that case, yes, I'd JOIN on id1 and LEFT JOIN on id2. I'm not sure  
if there's a penalty or not in query planning, though there might be.  
For me, I use JOIN unless I need to use a LEFT JOIN. (I can't think  
of a time I've used a RIGHT JOIN.) In relational theory, JOIN is a  
relational operator, while LEFT [OUTER] JOIN (or any OUTER JOIN) is  
not. It's just a goal of mine to keep my queries as close as possible  
to constructions that are based on relational theory, and I deviate  
from that only when I have to, either for performance reasons, or  
when SQL doesn't provide an appropriate equivalent to a relational  
construct.

You can always use EXPLAIN ANALYZE to compare query plans. It can be  
very useful to see how your query is executed by the planner.

Michael Glaesemann
grzm myrealbox com





Re: ORDER BY does not work as expected with multiple joins

From
Tom Lane
Date:
Adam Rosi-Kessel <adam@rosi-kessel.org> writes:
> But I am curious why you think a LEFT JOIN should be used rather than RIGHT
> JOIN? I think I believe I am getting the desired set of records. When I
> substitute LEFT JOIN for RIGHT JOIN I actually get the same result. Can you
> help me understand under what conditions, in this case, I would obtain a
> different result?

Try some cases where there are IDs in the ID/name table that do not
appear in the linking table, or the other way around.  If you still
like your results in those situations, then it's the right way for you.
        regards, tom lane


Re: ORDER BY does not work as expected with multiple joins

From
Michael Glaesemann
Date:
On Jan 14, 2006, at 0:45 , Michael Glaesemann wrote:

>
> On Jan 14, 2006, at 0:22 , Adam Rosi-Kessel wrote:
>
>> id1 will always have a value but id2 can be NULL. So should I do a  
>> left JOIN
>> on id2 but a plain JOIN on id1? Is there a disadvantage to using a  
>> left JOIN
>> where it is not necessary?
>
> In that case, yes, I'd JOIN on id1 and LEFT JOIN on id2. I'm not  
> sure if there's a penalty or not in query planning, though there  
> might be.


> You can always use EXPLAIN ANALYZE to compare query plans. It can  
> be very useful to see how your query is executed by the planner.

Of course, this sample is probably much smaller than your actual  
dataset, but here it does appear that JOIN is a (tiny) bit more  
efficient than LEFT [OUTER] JOIN. I'm a novice at reading these, but  
the things I notice are:

* The topmost hash join in the first query is faster than the topmost  
merge left join in the second.
* The second query needs to do an extra sort and sequential scan that  
is unnecessary in the first.

In *this* case, I think the difference may be just noise. You'll get  
slightly different times each run, so you'll have to see what runs  
better on average if they're close. However, it does look like the  
left join forces at least an extra 2 steps, which should take more  
time, on average, so I think the LEFT JOIN will be slower than the JOIN.

All those more knowledgeable, feel free to jump in and correct me :)

Michael Glaesemann
grzm myrealbox com

test=# explain analyze
select some_date    , x.some_name as name1    , y.some_name as name2
from table1
join table2 as x on id1 = x.id
join table2 as y on id2 = y.id
where (some_date is not null and (id1 = 1 or id2 = 1))
order by some_date;                                                          QUERY PLAN
------------------------------------------------------------------------ 
-------------------------------------------------------
Sort  (cost=3.20..3.21 rows=2 width=18) (actual time=0.270..0.272  
rows=3 loops=1)   Sort Key: table1.some_date   ->  Hash Join  (cost=2.12..3.19 rows=2 width=18) (actual  
time=0.219..0.227 rows=3 loops=1)         Hash Cond: ("outer".id = "inner".id1)         ->  Seq Scan on table2 x
(cost=0.00..1.03rows=3 width=11)  
 
(actual time=0.016..0.019 rows=3 loops=1)         ->  Hash  (cost=2.12..2.12 rows=2 width=15) (actual  
time=0.141..0.141 rows=3 loops=1)               ->  Hash Join  (cost=1.05..2.12 rows=2 width=15)  
(actual time=0.109..0.118 rows=3 loops=1)                     Hash Cond: ("outer".id = "inner".id2)
-> Seq Scan on table2 y  (cost=0.00..1.03  
 
rows=3 width=11) (actual time=0.004..0.008 rows=3 loops=1)                     ->  Hash  (cost=1.04..1.04 rows=2
width=12) 
 
(actual time=0.053..0.053 rows=3 loops=1)                           ->  Seq Scan on table1  (cost=0.00..1.04  
rows=2 width=12) (actual time=0.017..0.027 rows=3 loops=1)                                 Filter: ((some_date IS NOT
NULL) 
 
AND ((id1 = 1) OR (id2 = 1)))
Total runtime: 0.594 ms
(13 rows)

test=# explain analyze
select some_date    , x.some_name as name1    , y.some_name as name2
from table1
join table2 as x on id1 = x.id
left join table2 as y on id2 = y.id
where (some_date is not null and (id1 = 1 or id2 = 1))
order by some_date;                                                          QUERY PLAN
------------------------------------------------------------------------ 
-------------------------------------------------------
Sort  (cost=3.24..3.25 rows=2 width=18) (actual time=0.281..0.282  
rows=3 loops=1)   Sort Key: table1.some_date   ->  Merge Left Join  (cost=3.19..3.23 rows=2 width=18) (actual  
time=0.242..0.251 rows=3 loops=1)         Merge Cond: ("outer".id2 = "inner".id)         ->  Sort  (cost=2.13..2.14
rows=2width=15) (actual  
 
time=0.165..0.165 rows=3 loops=1)               Sort Key: table1.id2               ->  Hash Join  (cost=1.05..2.12
rows=2width=15)  
 
(actual time=0.128..0.137 rows=3 loops=1)                     Hash Cond: ("outer".id = "inner".id1)
-> Seq Scan on table2 x  (cost=0.00..1.03  
 
rows=3 width=11) (actual time=0.016..0.020 rows=3 loops=1)                     ->  Hash  (cost=1.04..1.04 rows=2
width=12) 
 
(actual time=0.052..0.052 rows=3 loops=1)                           ->  Seq Scan on table1  (cost=0.00..1.04  
rows=2 width=12) (actual time=0.017..0.027 rows=3 loops=1)                                 Filter: ((some_date IS NOT
NULL) 
 
AND ((id1 = 1) OR (id2 = 1)))         ->  Sort  (cost=1.05..1.06 rows=3 width=11) (actual  
time=0.067..0.068 rows=3 loops=1)               Sort Key: y.id               ->  Seq Scan on table2 y  (cost=0.00..1.03
rows=3 
 
width=11) (actual time=0.005..0.010 rows=3 loops=1)
Total runtime: 0.600 ms
(16 rows)



Re: ORDER BY does not work as expected with multiple joins

From
Jaime Casanova
Date:
On 1/13/06, Adam Rosi-Kessel <adam@rosi-kessel.org> wrote:
> Tom Lane wrote:
> > Adam Rosi-Kessel <adam@rosi-kessel.org> writes:
> >> I have a multijoin SELECT statement where the ORDER BY clause doesn't
> >> seem to be working.
> > You haven't mentioned your Postgres version, nor provided anywhere near
> > enough information to let other people reproduce the problem.  You might
> > find it helpful to read this:
> > http://www.postgresql.org/docs/8.1/static/bug-reporting.html
>
> Thanks. I didn't think this was a bug--I thought I had an error in my SELECT
> statement somehow. I certainly would have included more information with a
> bug report. (For the record, it is PostgreSQL 7.4 from Debian Sarge).
>

Tom's comment about this was, AFAICT, because you didn't show in the
first post OS version nor postgres version, you didn't show your own
results either (and that is a good start to try to see the reason of
your problem...

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)