Thread: DISTINCT ... ORDER BY
Hi all, I have built a query that mainly orders all columns. It shouldn't matter how it orders but FYI: I need to order by 1 given column and then by all others from 'left to right'. No problem so far. Then I want to return only 1 column (e.g. the 3rd) with duplicates removed _AND_ still in the same order like above. e.g.: myCol ----- a a c b b c b should return: myDistinctOrderedCol -------------------- a c b The problem is that DISTINCT complains about ORDER BY entries not being in the target list. Any idea ? Example: initial query: SELECT * FROM tab ORDER BY col_2='foo' DESC, col_1='bar' DESC, col_3='blah' DESC; Then something like: SELECT DISTINCT col3 FROM {initial query}; TIA -- e-Trolley Sayegh & John, Nabil Sayegh Tel.: 0700 etrolley /// 0700 38765539 Fax.: +49 69 8299381-8 PGP : http://www.e-trolley.de
On Wed, Nov 05, 2003 at 01:48:20 +0100, Nabil Sayegh <postgresql@e-trolley.de> wrote: > Hi all, > > I have built a query that mainly orders all columns. > It shouldn't matter how it orders but FYI: > I need to order by 1 given column and then by all others from 'left to > right'. > > No problem so far. > > Then I want to return only 1 column (e.g. the 3rd) with duplicates removed > _AND_ still in the same order like above. > > e.g.: > > myCol > ----- > a > a > c > b > b > c > b > > should return: > > myDistinctOrderedCol > -------------------- > a > c > b > > The problem is that DISTINCT complains about ORDER BY entries not being in > the target list. > Any idea ? > > Example: > > initial query: > SELECT * FROM tab ORDER BY col_2='foo' DESC, col_1='bar' DESC, col_3='blah' > DESC; > > Then something like: > SELECT DISTINCT col3 FROM {initial query}; You can use a distinct on with order by clause as a subselect to eliminate the records you don't want. Then you can extract col3 from this subselect and order by the original sort order (instead of col3 first) to get the desired records in the desired order. It would look something like: select col3 from (select distinct on (col3) * from tablename order by col1, col2, col3, col4) order by col1, col2, col3, col4;
On Wed, 5 Nov 2003, Nabil Sayegh wrote: > I have built a query that mainly orders all columns. > It shouldn't matter how it orders but FYI: > I need to order by 1 given column and then by all others from 'left to right'. > > No problem so far. > > Then I want to return only 1 column (e.g. the 3rd) with duplicates removed > _AND_ still in the same order like above. > > e.g.: > > myCol > ----- > a > a > c > b > b > c > b > > should return: > > myDistinctOrderedCol > -------------------- > a > c > b > > The problem is that DISTINCT complains about ORDER BY entries not being in the target list. Right, because given this data: col1 | col2 a | b b | c b | a what ordering should select distinct col1 from tab order by col2 give you? Does it put b first because there's a col2 with a, or b second because there's a col2 with b or is it indeterminate? Maybe something of this general form would work if you want to order by the minimums: select col3 from tablename group by col3 order by min(col1), min(col2), col3, min(col4);
Bruno Wolff III wrote: > You can use a distinct on with order by clause as a subselect to eliminate > the records you don't want. Then you can extract col3 from this subselect > and order by the original sort order (instead of col3 first) to get the > desired records in the desired order. > > It would look something like: > select col3 from > (select distinct on (col3) * from tablename order by col1, col2, col3, col4) > order by col1, col2, col3, col4; Unfortunately I need to ORDER BY expression, not plain columns. SELECT DISTINCT ON (col_1) col_1 FROM tab ORDER BY col_2='foo' DESC, col_1='bar' DESC, col_3='blah' DESC; fails with: ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions :( -- e-Trolley Sayegh & John, Nabil Sayegh Tel.: 0700 etrolley /// 0700 38765539 Fax.: +49 69 8299381-8 PGP : http://www.e-trolley.de
Stephan Szabo wrote: >>should return: >> >>myDistinctOrderedCol >>-------------------- >>a >>c >>b >> >>The problem is that DISTINCT complains about ORDER BY entries not being in the target list. > > > Right, because given this data: > col1 | col2 > a | b > b | c > b | a > > what ordering should > select distinct col1 from tab order by col2 > give you? I want it to just return 1 column ;) > Does it put b first because there's a col2 with a, or b second because > there's a col2 with b or is it indeterminate? It shall not mention col2 at all. The idea is to 1. ORDER BY expressions 2. Pick only 1 column 3. make this column distinct without losing the sort order > Maybe something of this general form would work if you want to order by > the minimums: > select col3 from tablename group by col3 order by min(col1), min(col2), > col3, min(col4); Good idea. But it would require me to introduce row numbers. What now ? TFYH -- e-Trolley Sayegh & John, Nabil Sayegh Tel.: 0700 etrolley /// 0700 38765539 Fax.: +49 69 8299381-8 PGP : http://www.e-trolley.de
On Wed, Nov 05, 2003 at 17:52:03 +0100, Nabil Sayegh <postgresql@e-trolley.de> wrote: > Bruno Wolff III wrote: > > >You can use a distinct on with order by clause as a subselect to eliminate > >the records you don't want. Then you can extract col3 from this subselect > >and order by the original sort order (instead of col3 first) to get the > >desired records in the desired order. > > > >It would look something like: > >select col3 from > > (select distinct on (col3) * from tablename order by col1, col2, col3, > > col4) > > order by col1, col2, col3, col4; > > Unfortunately I need to ORDER BY expression, not plain columns. > > SELECT DISTINCT ON (col_1) col_1 FROM tab ORDER BY col_2='foo' DESC, > col_1='bar' DESC, col_3='blah' DESC; > > fails with: > > ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY > expressions I made a mistake in my example. In the distinct on subselect you have to order by the column(s) in the distinct on argument first. So in your example above the subselect should be: SELECT DISTINCT ON (col_1) col_1 FROM tab ORDER BY col_1, col_2='foo' DESC, col_1='bar' DESC, col_3='blah' DESC; In the outer select you use just the order by items that you want. Also note that the subselect will need to list all of the columns used in the order by in the outer select.
Bruno Wolff III wrote: > I made a mistake in my example. In the distinct on subselect you have to > order by the column(s) in the distinct on argument first. So in your example > above the subselect should be: > SELECT DISTINCT ON (col_1) col_1 FROM tab ORDER BY col_1, col_2='foo' DESC, > col_1='bar' DESC, col_3='blah' DESC; Hm, are you sure that this 'non-intended' ordering is reversible? > In the outer select you use just the order by items that you want. Also > note that the subselect will need to list all of the columns used in the > order by in the outer select. -- e-Trolley Sayegh & John, Nabil Sayegh Tel.: 0700 etrolley /// 0700 38765539 Fax.: +49 69 8299381-8 PGP : http://www.e-trolley.de
On Wed, Nov 05, 2003 at 18:37:26 +0100, Nabil Sayegh <postgresql@e-trolley.de> wrote: > Bruno Wolff III wrote: > > >I made a mistake in my example. In the distinct on subselect you have to > >order by the column(s) in the distinct on argument first. So in your > >example > >above the subselect should be: > >SELECT DISTINCT ON (col_1) col_1 FROM tab ORDER BY col_1, col_2='foo' DESC, > >col_1='bar' DESC, col_3='blah' DESC; > > Hm, are you sure that this 'non-intended' ordering is reversible? I don't think you are asking the right question. However, this ordering will result in the correct record with a particular value for col_1 being used for the next step. This is all that is needed, since in the next step the data will be sorted according to the original criteria.
On Wed, 5 Nov 2003, Nabil Sayegh wrote: > Stephan Szabo wrote: > > >>should return: > >> > >>myDistinctOrderedCol > >>-------------------- > >>a > >>c > >>b > >> > >>The problem is that DISTINCT complains about ORDER BY entries not being in the target list. > > > > > > Right, because given this data: > > col1 | col2 > > a | b > > b | c > > b | a > > > > what ordering should > > select distinct col1 from tab order by col2 > > give you? > > I want it to just return 1 column ;) > > > Does it put b first because there's a col2 with a, or b second because > > there's a col2 with b or is it indeterminate? > > It shall not mention col2 at all. > The idea is to > 1. ORDER BY expressions > 2. Pick only 1 column > 3. make this column distinct without losing the sort order You can either use something like the distinct on extension with subselects or some variation on group by. Both of these involve understanding however, which row you want out when you're going to be dropping rows. Your step 3 above isn't well defined in general. There's no single place in the sort order for a value of a column being distincted when the value occurs multiple times. In the data above, col1='b' occurs twice in the sort order and you need to give an indication of which place in the sort order you want to use.
Stephan Szabo wrote: > Your step 3 above isn't well defined in general. There's no single place > in the sort order for a value of a column being distincted when the value > occurs multiple times. In the data above, col1='b' occurs twice in the > sort order and you need to give an indication of which place in the sort > order you want to use. Ok, let's look for alternatives. Is it possible to add a serial column to my ordered result ? I hope there is a way to do it without temporary tables as I want to keep it all in 1 query. TFYH -- e-Trolley Sayegh & John, Nabil Sayegh Tel.: 0700 etrolley /// 0700 38765539 Fax.: +49 69 8299381-8 PGP : http://www.e-trolley.de
On Wed, Nov 05, 2003 at 21:06:30 +0100, Nabil Sayegh <postgresql@e-trolley.de> wrote: > > Ok, let's look for alternatives. Did you try my corrected suggestion?
On Wed, 5 Nov 2003, Nabil Sayegh wrote: > Stephan Szabo wrote: > > > Your step 3 above isn't well defined in general. There's no single place > > in the sort order for a value of a column being distincted when the value > > occurs multiple times. In the data above, col1='b' occurs twice in the > > sort order and you need to give an indication of which place in the sort > > order you want to use. > > Ok, let's look for alternatives. Well, what did you get when you tried something like Bruno's updated example or my group by one, and lets work from there.
Bruno Wolff III wrote: > On Wed, Nov 05, 2003 at 21:06:30 +0100, > Nabil Sayegh <postgresql@e-trolley.de> wrote: > >>Ok, let's look for alternatives. > > > Did you try my corrected suggestion? Yes, and it worked, thanks. But then I was too afraid of it being non-deterministic :( Is this query deterministic? TFYH -- e-Trolley Sayegh & John, Nabil Sayegh Tel.: 0700 etrolley /// 0700 38765539 Fax.: +49 69 8299381-8 PGP : http://www.e-trolley.de
Stephan Szabo wrote: > Well, what did you get when you tried something like Bruno's updated It worked. But I thought you were trying to tell me that it could be non-deterministic. > example or my group by one, and lets work from there. I tried the group by method but as my order by expressions are booleans I couldn't use min() and didn't find an applicable aggregate function. so I thought to myself if we could just get a serial column after sorting, then we could use min() on that column. But if (both of) you are sure that the 'distinct method' is deterministic (in the end) I'd be glad. TFYH -- e-Trolley Sayegh & John, Nabil Sayegh Tel.: 0700 etrolley /// 0700 38765539 Fax.: +49 69 8299381-8 PGP : http://www.e-trolley.de
On Wed, 5 Nov 2003, Nabil Sayegh wrote: > Stephan Szabo wrote: > > > Well, what did you get when you tried something like Bruno's updated > > It worked. > But I thought you were trying to tell me that it could be non-deterministic. DISTINCT ON (blah) is different from DISTINCT and is a PostgreSQL extension. IIRC, it'll take the first row that matches the distincted on columns based on the ordering rules from the order by. So, that's the part that determines the "which of the matching places in the sort order" you want to use (whichever is first in the ordering) which is something that DISTINCT doesn't provide. The difference here is between the question: "Get the distinct values of a column ordered descendingly by the following boolean expressions: expr1, expr2, ..." And "Get the distinct values of a column ordered descendingly by the following boolean expressions: expr1, expr2, ... for the row in each group of rows having a particular distinct value of the column having the highest value of expr1, and in the case of ties, the highest value of expr2, ..." The difference is small, but very important. > > example or my group by one, and lets work from there. > > I tried the group by method but as my order by expressions are booleans I couldn't use min() > and didn't find an applicable aggregate function. You'd have to build a min(boolean) (which I'm sortof surprised isn't there) or use a case to convert it into an integer. Or given that it looks like you were doing DESC sorts, you'd probably want max(). DISTINCT ON is a better choice for postgresql, it'll almost certainly be faster, but it's not very standard.
On Wed, Nov 05, 2003 at 22:27:40 +0100, Nabil Sayegh <postgresql@e-trolley.de> wrote: > Bruno Wolff III wrote: > >On Wed, Nov 05, 2003 at 21:06:30 +0100, > > Nabil Sayegh <postgresql@e-trolley.de> wrote: > > > >>Ok, let's look for alternatives. > > > > > >Did you try my corrected suggestion? > > Yes, and it worked, thanks. > But then I was too afraid of it being non-deterministic :( > > Is this query deterministic? You will always get the same output order for a table with a given set of rows in it.
Stephan Szabo wrote: > You'd have to build a min(boolean) (which I'm sortof surprised isn't > there) or use a case to convert it into an integer. Or given that it > looks like you were doing DESC sorts, you'd probably want max(). > DISTINCT ON is a better choice for postgresql, it'll almost certainly be > faster, but it's not very standard. Thanks I've choosen this way and it works like a charm. cu -- e-Trolley Sayegh & John, Nabil Sayegh Tel.: 0700 etrolley /// 0700 38765539 Fax.: +49 69 8299381-8 PGP : http://www.e-trolley.de