Thread: subquery join order by

subquery join order by

From
Mage
Date:
             Hello,

(I googled and read docs before sending this e-mail).

Is it necessary to use order by twice (inside and outside) to get the
proper order if I have an ordered subqery in a join?

select * from (select distinct on (b_id) * from a order by b_id, id) sub
left join b on b.id = sub.b_id;

or

select * from (select distinct on (b_id) * from a order by b_id, id) sub
left join b on b.id = sub.b_id order by b_id;


It seems to me that it's enough to use 'order by' only inside wheter 'by
desc' or 'by asc' (b_id), however I'd like to be sure.

Thank you.

             Mage


Re: subquery join order by

From
Thom Brown
Date:
On 19 November 2010 01:36, Mage <mage@mage.hu> wrote:
>            Hello,
>
> (I googled and read docs before sending this e-mail).
>
> Is it necessary to use order by twice (inside and outside) to get the proper
> order if I have an ordered subqery in a join?
>
> select * from (select distinct on (b_id) * from a order by b_id, id) sub
> left join b on b.id = sub.b_id;
>
> or
>
> select * from (select distinct on (b_id) * from a order by b_id, id) sub
> left join b on b.id = sub.b_id order by b_id;
>
>
> It seems to me that it's enough to use 'order by' only inside wheter 'by
> desc' or 'by asc' (b_id), however I'd like to be sure.
>
> Thank you.
>
>            Mage

You should always use ORDER BY on the outer-most part of the query
since that's what will be finally returning your data.  Don't bother
with ordering sub-selects.

So in your case, just use:

SELECT *
FROM (SELECT DISTINCT ON (b_id) * FROM a) sub
LEFT JOIN b ON b.id = sub.b_id
ORDER BY sub.b_id, sub.id;

But why bother with a sub-select anyway?  You can write it as:

SELECT DISTINCT ON (a.b_id) *
FROM a
LEFT JOIN b ON b.id = a.b_id
ORDER BY a.b_id, a.id;

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Re: subquery join order by

From
Mage
Date:
On 11/19/2010 03:21 AM, Thom Brown wrote:
>
> You should always use ORDER BY on the outer-most part of the query
> since that's what will be finally returning your data.  Don't bother
> with ordering sub-selects.
I definiatelly have to use the "order by" inside for two reasons.

When "distinct on (x)" is used then x must be in the first column in the
order by part.

The second column in the order by decides which records will I include
in the join so it is very important to use it for ordering.


> So in your case, just use:
>
> SELECT *
> FROM (SELECT DISTINCT ON (b_id) * FROM a) sub
> LEFT JOIN b ON b.id = sub.b_id
> ORDER BY sub.b_id, sub.id;
select distinct on (id) * from b order by name;
ERROR:  SELECT DISTINCT ON expressions must match initial ORDER BY
expressions

> But why bother with a sub-select anyway?  You can write it as:
>
> SELECT DISTINCT ON (a.b_id) *
> FROM a
> LEFT JOIN b ON b.id = a.b_id
> ORDER BY a.b_id, a.id;
I considered this, however the subquery is generated by an ORM. I wanted
to separate it.

Also the whole join affects many rows. I thought it's cheaper to
preselect them inside the subquery then do the join. I am not sure.
Explain analyze is my good friend but in this case I prefer to ask.

         Mage


Re: subquery join order by

From
Mage
Date:
> I considered this, however the subquery is generated by an ORM. I
> wanted to separate it.
>
> Also the whole join affects many rows. I thought it's cheaper to
> preselect them inside the subquery then do the join. I am not sure.
> Explain analyze is my good friend but in this case I prefer to ask.
# EXPLAIN ANALYZE select * from (select distinct on (b_id) * from a
order by b_id, id) sub left join b on b.id = sub.b_id;
                                                    QUERY PLAN
-----------------------------------------------------------------------------------------------------------------

  Hash Left Join  (cost=187.45..243.70 rows=1230 width=44) (actual
time=0.000..0.000 rows=3 loops=1)
[...]
(11 rows)

# EXPLAIN ANALYZE SELECT DISTINCT ON (a.b_id) * FROM a LEFT JOIN b ON
b.id = a.b_id ORDER BY a.b_id, a.id;
                                                        QUERY PLAN

------------------------------------------------------------------------------------------------------------------------


  Unique  (cost=1339.24..1405.05 rows=200 width=44) (actual
time=0.000..0.000 rows=3 loops=1)
[...]
(15 rows)

mage=# EXPLAIN ANALYZE select * from (select distinct on (b_id) * from a
order by b_id, id) sub left join b on b.id = sub.b_id order by b.id;
                                                       QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
  Sort  (cost=306.83..309.90 rows=1230 width=44) (actual
time=0.000..0.000 rows=3 loops=1)


The subquery seems to be better choice even with double ordering. But is
the second order required?

         Mage