Thread: DISTINCT ... ORDER BY

DISTINCT ... ORDER BY

From
Nabil Sayegh
Date:
Hi all,

I have built a query that mainly orders all columns.
It shouldn't matter how it orders but FYI:
I need to order by 1 given column and then by all others from 'left to right'.

No problem so far.

Then I want to return only 1 column (e.g. the 3rd) with duplicates removed
_AND_ still in the same order like above.

e.g.:

myCol
-----
a
a
c
b
b
c
b

should return:

myDistinctOrderedCol
--------------------
a
c
b

The problem is that DISTINCT complains about ORDER BY entries not being in the target list.
Any idea ?

Example:

initial query:
SELECT * FROM tab ORDER BY col_2='foo' DESC, col_1='bar' DESC, col_3='blah' DESC;

Then something like:
SELECT DISTINCT col3 FROM {initial query};

TIA
--
  e-Trolley Sayegh & John, Nabil Sayegh
  Tel.: 0700 etrolley /// 0700 38765539
  Fax.: +49 69 8299381-8
  PGP : http://www.e-trolley.de


Re: DISTINCT ... ORDER BY

From
Bruno Wolff III
Date:
On Wed, Nov 05, 2003 at 01:48:20 +0100,
  Nabil Sayegh <postgresql@e-trolley.de> wrote:
> Hi all,
>
> I have built a query that mainly orders all columns.
> It shouldn't matter how it orders but FYI:
> I need to order by 1 given column and then by all others from 'left to
> right'.
>
> No problem so far.
>
> Then I want to return only 1 column (e.g. the 3rd) with duplicates removed
> _AND_ still in the same order like above.
>
> e.g.:
>
> myCol
> -----
> a
> a
> c
> b
> b
> c
> b
>
> should return:
>
> myDistinctOrderedCol
> --------------------
> a
> c
> b
>
> The problem is that DISTINCT complains about ORDER BY entries not being in
> the target list.
> Any idea ?
>
> Example:
>
> initial query:
> SELECT * FROM tab ORDER BY col_2='foo' DESC, col_1='bar' DESC, col_3='blah'
> DESC;
>
> Then something like:
> SELECT DISTINCT col3 FROM {initial query};

You can use a distinct on with order by clause as a subselect to eliminate
the records you don't want. Then you can extract col3 from this subselect
and order by the original sort order (instead of col3 first) to get the
desired records in the desired order.

It would look something like:
select col3 from
  (select distinct on (col3) * from tablename order by col1, col2, col3, col4)
  order by col1, col2, col3, col4;

Re: DISTINCT ... ORDER BY

From
Stephan Szabo
Date:
On Wed, 5 Nov 2003, Nabil Sayegh wrote:

> I have built a query that mainly orders all columns.
> It shouldn't matter how it orders but FYI:
> I need to order by 1 given column and then by all others from 'left to right'.
>
> No problem so far.
>
> Then I want to return only 1 column (e.g. the 3rd) with duplicates removed
> _AND_ still in the same order like above.
>
> e.g.:
>
> myCol
> -----
> a
> a
> c
> b
> b
> c
> b
>
> should return:
>
> myDistinctOrderedCol
> --------------------
> a
> c
> b
>
> The problem is that DISTINCT complains about ORDER BY entries not being in the target list.

Right, because given this data:
col1 | col2
 a   |  b
 b   |  c
 b   |  a

what ordering should
 select distinct col1 from tab order by col2
give you?

Does it put b first because there's a col2 with a, or b second because
there's a col2 with b or is it indeterminate?


Maybe something of this general form would work if you want to order by
the minimums:
select col3 from tablename group by col3 order by min(col1), min(col2),
col3, min(col4);

Re: DISTINCT ... ORDER BY

From
Nabil Sayegh
Date:
Bruno Wolff III wrote:

> You can use a distinct on with order by clause as a subselect to eliminate
> the records you don't want. Then you can extract col3 from this subselect
> and order by the original sort order (instead of col3 first) to get the
> desired records in the desired order.
>
> It would look something like:
> select col3 from
>   (select distinct on (col3) * from tablename order by col1, col2, col3, col4)
>   order by col1, col2, col3, col4;

Unfortunately I need to ORDER BY expression, not plain columns.

SELECT DISTINCT ON (col_1) col_1 FROM tab ORDER BY col_2='foo' DESC, col_1='bar' DESC, col_3='blah' DESC;

fails with:

ERROR:  SELECT DISTINCT ON expressions must match initial ORDER BY expressions

:(

--
  e-Trolley Sayegh & John, Nabil Sayegh
  Tel.: 0700 etrolley /// 0700 38765539
  Fax.: +49 69 8299381-8
  PGP : http://www.e-trolley.de


Re: DISTINCT ... ORDER BY

From
Nabil Sayegh
Date:
Stephan Szabo wrote:

>>should return:
>>
>>myDistinctOrderedCol
>>--------------------
>>a
>>c
>>b
>>
>>The problem is that DISTINCT complains about ORDER BY entries not being in the target list.
>
>
> Right, because given this data:
> col1 | col2
>  a   |  b
>  b   |  c
>  b   |  a
>
> what ordering should
>  select distinct col1 from tab order by col2
> give you?

I want it to just return 1 column ;)

> Does it put b first because there's a col2 with a, or b second because
> there's a col2 with b or is it indeterminate?

It shall not mention col2 at all.
The idea is to
1. ORDER BY expressions
2. Pick only 1 column
3. make this column distinct without losing the sort order

> Maybe something of this general form would work if you want to order by
> the minimums:
> select col3 from tablename group by col3 order by min(col1), min(col2),
> col3, min(col4);

Good idea.
But it would require me to introduce row numbers.
What now ?

TFYH
--
  e-Trolley Sayegh & John, Nabil Sayegh
  Tel.: 0700 etrolley /// 0700 38765539
  Fax.: +49 69 8299381-8
  PGP : http://www.e-trolley.de


Re: DISTINCT ... ORDER BY

From
Bruno Wolff III
Date:
On Wed, Nov 05, 2003 at 17:52:03 +0100,
  Nabil Sayegh <postgresql@e-trolley.de> wrote:
> Bruno Wolff III wrote:
>
> >You can use a distinct on with order by clause as a subselect to eliminate
> >the records you don't want. Then you can extract col3 from this subselect
> >and order by the original sort order (instead of col3 first) to get the
> >desired records in the desired order.
> >
> >It would look something like:
> >select col3 from
> >  (select distinct on (col3) * from tablename order by col1, col2, col3,
> >  col4)
> >  order by col1, col2, col3, col4;
>
> Unfortunately I need to ORDER BY expression, not plain columns.
>
> SELECT DISTINCT ON (col_1) col_1 FROM tab ORDER BY col_2='foo' DESC,
> col_1='bar' DESC, col_3='blah' DESC;
>
> fails with:
>
> ERROR:  SELECT DISTINCT ON expressions must match initial ORDER BY
> expressions

I made a mistake in my example. In the distinct on subselect you have to
order by the column(s) in the distinct on argument first. So in your example
above the subselect should be:
SELECT DISTINCT ON (col_1) col_1 FROM tab ORDER BY col_1, col_2='foo' DESC,
col_1='bar' DESC, col_3='blah' DESC;

In the outer select you use just the order by items that you want. Also
note that the subselect will need to list all of the columns used in the
order by in the outer select.

Re: DISTINCT ... ORDER BY

From
Nabil Sayegh
Date:
Bruno Wolff III wrote:

> I made a mistake in my example. In the distinct on subselect you have to
> order by the column(s) in the distinct on argument first. So in your example
> above the subselect should be:
> SELECT DISTINCT ON (col_1) col_1 FROM tab ORDER BY col_1, col_2='foo' DESC,
> col_1='bar' DESC, col_3='blah' DESC;

Hm, are you sure that this 'non-intended' ordering is reversible?

> In the outer select you use just the order by items that you want. Also
> note that the subselect will need to list all of the columns used in the
> order by in the outer select.



--
  e-Trolley Sayegh & John, Nabil Sayegh
  Tel.: 0700 etrolley /// 0700 38765539
  Fax.: +49 69 8299381-8
  PGP : http://www.e-trolley.de


Re: DISTINCT ... ORDER BY

From
Bruno Wolff III
Date:
On Wed, Nov 05, 2003 at 18:37:26 +0100,
  Nabil Sayegh <postgresql@e-trolley.de> wrote:
> Bruno Wolff III wrote:
>
> >I made a mistake in my example. In the distinct on subselect you have to
> >order by the column(s) in the distinct on argument first. So in your
> >example
> >above the subselect should be:
> >SELECT DISTINCT ON (col_1) col_1 FROM tab ORDER BY col_1, col_2='foo' DESC,
> >col_1='bar' DESC, col_3='blah' DESC;
>
> Hm, are you sure that this 'non-intended' ordering is reversible?

I don't think you are asking the right question. However, this ordering
will result in the correct record with a particular value for col_1
being used for the next step. This is all that is needed, since in the
next step the data will be sorted according to the original criteria.

Re: DISTINCT ... ORDER BY

From
Stephan Szabo
Date:
On Wed, 5 Nov 2003, Nabil Sayegh wrote:

> Stephan Szabo wrote:
>
> >>should return:
> >>
> >>myDistinctOrderedCol
> >>--------------------
> >>a
> >>c
> >>b
> >>
> >>The problem is that DISTINCT complains about ORDER BY entries not being in the target list.
> >
> >
> > Right, because given this data:
> > col1 | col2
> >  a   |  b
> >  b   |  c
> >  b   |  a
> >
> > what ordering should
> >  select distinct col1 from tab order by col2
> > give you?
>
> I want it to just return 1 column ;)
>
> > Does it put b first because there's a col2 with a, or b second because
> > there's a col2 with b or is it indeterminate?
>
> It shall not mention col2 at all.
> The idea is to
> 1. ORDER BY expressions
> 2. Pick only 1 column
> 3. make this column distinct without losing the sort order

You can either use something like the distinct on extension with
subselects or some variation on group by.  Both of these involve
understanding however, which row you want out when you're going to
be dropping rows.

Your step 3 above isn't well defined in general.  There's no single place
in the sort order for a value of a column being distincted when the value
occurs multiple times.  In the data above, col1='b' occurs twice in the
sort order and you need to give an indication of which place in the sort
order you want to use.

Re: DISTINCT ... ORDER BY

From
Nabil Sayegh
Date:
Stephan Szabo wrote:

> Your step 3 above isn't well defined in general.  There's no single place
> in the sort order for a value of a column being distincted when the value
> occurs multiple times.  In the data above, col1='b' occurs twice in the
> sort order and you need to give an indication of which place in the sort
> order you want to use.

Ok, let's look for alternatives.

Is it possible to add a serial column to my ordered result ?

I hope there is a way to do it without temporary tables as I want to keep
it all in 1 query.

TFYH
--
  e-Trolley Sayegh & John, Nabil Sayegh
  Tel.: 0700 etrolley /// 0700 38765539
  Fax.: +49 69 8299381-8
  PGP : http://www.e-trolley.de


Re: DISTINCT ... ORDER BY

From
Bruno Wolff III
Date:
On Wed, Nov 05, 2003 at 21:06:30 +0100,
  Nabil Sayegh <postgresql@e-trolley.de> wrote:
>
> Ok, let's look for alternatives.

Did you try my corrected suggestion?

Re: DISTINCT ... ORDER BY

From
Stephan Szabo
Date:
On Wed, 5 Nov 2003, Nabil Sayegh wrote:

> Stephan Szabo wrote:
>
> > Your step 3 above isn't well defined in general.  There's no single place
> > in the sort order for a value of a column being distincted when the value
> > occurs multiple times.  In the data above, col1='b' occurs twice in the
> > sort order and you need to give an indication of which place in the sort
> > order you want to use.
>
> Ok, let's look for alternatives.

Well, what did you get when you tried something like Bruno's updated
example or my group by one, and lets work from there.

Re: DISTINCT ... ORDER BY

From
Nabil Sayegh
Date:
Bruno Wolff III wrote:
> On Wed, Nov 05, 2003 at 21:06:30 +0100,
>   Nabil Sayegh <postgresql@e-trolley.de> wrote:
>
>>Ok, let's look for alternatives.
>
>
> Did you try my corrected suggestion?

Yes, and it worked, thanks.
But then I was too afraid of it being non-deterministic :(

Is this query deterministic?

TFYH
--
  e-Trolley Sayegh & John, Nabil Sayegh
  Tel.: 0700 etrolley /// 0700 38765539
  Fax.: +49 69 8299381-8
  PGP : http://www.e-trolley.de


Re: DISTINCT ... ORDER BY

From
Nabil Sayegh
Date:
Stephan Szabo wrote:

> Well, what did you get when you tried something like Bruno's updated

It worked.
But I thought you were trying to tell me that it could be non-deterministic.

> example or my group by one, and lets work from there.

I tried the group by method but as my order by expressions are booleans I couldn't use min()
and didn't find an applicable aggregate function.
so I thought to myself if we could just get a serial column after sorting, then we could use min()
on that column.

But if (both of) you are sure that the 'distinct method' is deterministic (in the end) I'd be glad.

TFYH
--
  e-Trolley Sayegh & John, Nabil Sayegh
  Tel.: 0700 etrolley /// 0700 38765539
  Fax.: +49 69 8299381-8
  PGP : http://www.e-trolley.de


Re: DISTINCT ... ORDER BY

From
Stephan Szabo
Date:
On Wed, 5 Nov 2003, Nabil Sayegh wrote:

> Stephan Szabo wrote:
>
> > Well, what did you get when you tried something like Bruno's updated
>
> It worked.
> But I thought you were trying to tell me that it could be non-deterministic.

DISTINCT ON (blah) is different from DISTINCT and is a PostgreSQL
extension.

IIRC, it'll take the first row that matches the distincted on columns
based on the ordering rules from the order by. So, that's the part that
determines the "which of the matching places in the sort order" you want
to use (whichever is first in the ordering) which is something that
DISTINCT doesn't provide.

The difference here is between the question:
"Get the distinct values of a column ordered descendingly by the following
boolean expressions: expr1, expr2, ..."

And
"Get the distinct values of a column ordered descendingly by the following
boolean expressions: expr1, expr2, ... for the row in each group of rows
having a particular distinct value of the column having the highest
value of expr1, and in the case of ties, the highest value of expr2, ..."

The difference is small, but very important.

> > example or my group by one, and lets work from there.
>
> I tried the group by method but as my order by expressions are booleans I couldn't use min()
> and didn't find an applicable aggregate function.

You'd have to build a min(boolean) (which I'm sortof surprised isn't
there) or use a case to convert it into an integer.  Or given that it
looks like you were doing DESC sorts, you'd probably want max().
DISTINCT ON is a better choice for postgresql, it'll almost certainly be
faster, but it's not very standard.

Re: DISTINCT ... ORDER BY

From
Bruno Wolff III
Date:
On Wed, Nov 05, 2003 at 22:27:40 +0100,
  Nabil Sayegh <postgresql@e-trolley.de> wrote:
> Bruno Wolff III wrote:
> >On Wed, Nov 05, 2003 at 21:06:30 +0100,
> >  Nabil Sayegh <postgresql@e-trolley.de> wrote:
> >
> >>Ok, let's look for alternatives.
> >
> >
> >Did you try my corrected suggestion?
>
> Yes, and it worked, thanks.
> But then I was too afraid of it being non-deterministic :(
>
> Is this query deterministic?

You will always get the same output order for a table with a given set of
rows in it.

Re: DISTINCT ... ORDER BY

From
Nabil Sayegh
Date:
Stephan Szabo wrote:

> You'd have to build a min(boolean) (which I'm sortof surprised isn't
> there) or use a case to convert it into an integer.  Or given that it
> looks like you were doing DESC sorts, you'd probably want max().
> DISTINCT ON is a better choice for postgresql, it'll almost certainly be
> faster, but it's not very standard.

Thanks
I've choosen this way and it works like a charm.

cu
--
  e-Trolley Sayegh & John, Nabil Sayegh
  Tel.: 0700 etrolley /// 0700 38765539
  Fax.: +49 69 8299381-8
  PGP : http://www.e-trolley.de