Thread: SELECT DISTINCT ... ORDER BY problem
Hi all, I've got a table that I am trying to SELECT DISTINCT on one column and ORDER BY on a second column, but am getting the error: SELECT DISTINCT ON expressions must match initial ORDER BY expressions I can't add the second column to the DISTINCT clause because every row is unique. Likewise, I can't add the first column to my ORDER BY as it'd not sort the way I need it to. Here is a simplified version of my query: \d table Table "table" Column | Type | Modifiers -----------------+---------+------------------------------------------------ tbl_id | integer | not null default nextval('tbl_seq'::regclass) foo | text | bar | text | SELECT DISTINCT ON (foo) foo, bar FROM table WHERE bar < '2008-12-07 16:32:46' AND tbl_id=153 ORDER BY bar LIMIT 1; I understand from: http://archives.postgresql.org/pgsql-sql/2007-02/msg00169.php That this is not really possible because the any given 'foo' column could match multiple 'bar' columns, so what do you search by? However, it's made some sort of decision as a value is shown in 'bar' for each 'foo'. So my question is two-fold: 1. Can I not say, somehow, "sort all results by 'bar', and return the first/last 'bar' for each distinct 'foo'? 2. Can I somehow say "Order the results using the value of 'bar' you return, regardless of where it came from"? Thanks all! Madi
> -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of Madison Kelly > Sent: 08 December 2008 22:19 > To: pgsql-general@postgresql.org > Subject: [GENERAL] SELECT DISTINCT ... ORDER BY problem > > Hi all, > > I've got a table that I am trying to SELECT DISTINCT on one column > and ORDER BY on a second column, but am getting the error: > > SELECT DISTINCT ON expressions must match initial ORDER BY expressions > > I can't add the second column to the DISTINCT clause because every > row is unique. Likewise, I can't add the first column to my ORDER BY as > it'd not sort the way I need it to. > > Here is a simplified version of my query: > > \d table > Table "table" > Column | Type | Modifiers > > -----------------+---------+---------------------------------------------- > -- > tbl_id | integer | not null default > nextval('tbl_seq'::regclass) > foo | text | > bar | text | > > SELECT DISTINCT ON (foo) foo, bar FROM table WHERE bar < '2008-12-07 > 16:32:46' AND tbl_id=153 ORDER BY bar LIMIT 1; > To make the query valid you would have to ORDER BY foo,bar DISTINCT ON in this case is only going to show the first bar value for each foo. Is tbl_id not your PK and only giving 1 row anyway? > > I understand from: > > http://archives.postgresql.org/pgsql-sql/2007-02/msg00169.php > > That this is not really possible because the any given 'foo' column > could match multiple 'bar' columns, so what do you search by? However, > it's made some sort of decision as a value is shown in 'bar' for each > 'foo'. > > So my question is two-fold: > > 1. Can I not say, somehow, "sort all results by 'bar', and return the > first/last 'bar' for each distinct 'foo'? > > 2. Can I somehow say "Order the results using the value of 'bar' you > return, regardless of where it came from"? You can nest queries: SELECT foo,bar FROM (SELECT DISTINCT ON (foo) foo, Bar FROM table WHERE bar < '2008-12-07 16:32:46' AND tbl_id=153 ORDER BY foo,bar ) AS t ORDER BY bar; Notice that I'm only applying the final order by in the outer query. David.
David Rowley wrote: >> -----Original Message----- >> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- >> owner@postgresql.org] On Behalf Of Madison Kelly >> Sent: 08 December 2008 22:19 >> To: pgsql-general@postgresql.org >> Subject: [GENERAL] SELECT DISTINCT ... ORDER BY problem >> >> Hi all, >> >> I've got a table that I am trying to SELECT DISTINCT on one column >> and ORDER BY on a second column, but am getting the error: >> >> SELECT DISTINCT ON expressions must match initial ORDER BY expressions >> >> I can't add the second column to the DISTINCT clause because every >> row is unique. Likewise, I can't add the first column to my ORDER BY as >> it'd not sort the way I need it to. >> >> Here is a simplified version of my query: >> >> \d table >> Table "table" >> Column | Type | Modifiers >> >> -----------------+---------+---------------------------------------------- >> -- >> tbl_id | integer | not null default >> nextval('tbl_seq'::regclass) >> foo | text | >> bar | text | >> >> SELECT DISTINCT ON (foo) foo, bar FROM table WHERE bar < '2008-12-07 >> 16:32:46' AND tbl_id=153 ORDER BY bar LIMIT 1; >> > > To make the query valid you would have to ORDER BY foo,bar > DISTINCT ON in this case is only going to show the first bar value for each > foo. > > Is tbl_id not your PK and only giving 1 row anyway? > >> I understand from: >> >> http://archives.postgresql.org/pgsql-sql/2007-02/msg00169.php >> >> That this is not really possible because the any given 'foo' column >> could match multiple 'bar' columns, so what do you search by? However, >> it's made some sort of decision as a value is shown in 'bar' for each >> 'foo'. >> >> So my question is two-fold: >> >> 1. Can I not say, somehow, "sort all results by 'bar', and return the >> first/last 'bar' for each distinct 'foo'? >> >> 2. Can I somehow say "Order the results using the value of 'bar' you >> return, regardless of where it came from"? > > You can nest queries: > > SELECT foo,bar > FROM (SELECT DISTINCT ON (foo) foo, > Bar > FROM table > WHERE bar < '2008-12-07 16:32:46' > AND tbl_id=153 ORDER BY foo,bar > ) AS t ORDER BY bar; > > Notice that I'm only applying the final order by in the outer query. > > David. haha, darn... I've even done embedded SELECTs before, I should have thought of that! Thanks! Madi
On Mon, Dec 08, 2008 at 11:16:29PM -0000, David Rowley wrote: > > -----Original Message----- > > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > > owner@postgresql.org] On Behalf Of Madison Kelly > > Sent: 08 December 2008 22:19 > > To: pgsql-general@postgresql.org > > Subject: [GENERAL] SELECT DISTINCT ... ORDER BY problem > > > > Hi all, > > > > I've got a table that I am trying to SELECT DISTINCT on one column > > and ORDER BY on a second column, but am getting the error: > > > > SELECT DISTINCT ON expressions must match initial ORDER BY expressions > > > > I can't add the second column to the DISTINCT clause because every > > row is unique. Likewise, I can't add the first column to my ORDER BY as > > it'd not sort the way I need it to. > > > > Here is a simplified version of my query: > > > > \d table > > Table "table" > > Column | Type | Modifiers > > > > -----------------+---------+---------------------------------------------- > > -- > > tbl_id | integer | not null default > > nextval('tbl_seq'::regclass) > > foo | text | > > bar | text | > > > > SELECT DISTINCT ON (foo) foo, bar FROM table WHERE bar < '2008-12-07 > > 16:32:46' AND tbl_id=153 ORDER BY bar LIMIT 1; > > > > To make the query valid you would have to ORDER BY foo,bar > DISTINCT ON in this case is only going to show the first bar value for each > foo. > > Is tbl_id not your PK and only giving 1 row anyway? > > > > > I understand from: > > > > http://archives.postgresql.org/pgsql-sql/2007-02/msg00169.php > > > > That this is not really possible because the any given 'foo' column > > could match multiple 'bar' columns, so what do you search by? However, > > it's made some sort of decision as a value is shown in 'bar' for each > > 'foo'. > > > > So my question is two-fold: > > > > 1. Can I not say, somehow, "sort all results by 'bar', and return the > > first/last 'bar' for each distinct 'foo'? > > > > 2. Can I somehow say "Order the results using the value of 'bar' you > > return, regardless of where it came from"? > > You can nest queries: > > SELECT foo,bar > FROM (SELECT DISTINCT ON (foo) foo, > Bar > FROM table > WHERE bar < '2008-12-07 16:32:46' > AND tbl_id=153 ORDER BY foo,bar > ) AS t ORDER BY bar; > > Notice that I'm only applying the final order by in the outer query. When we get windowing functions, a lot of this pain will go away :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter wrote: > On Mon, Dec 08, 2008 at 11:16:29PM -0000, David Rowley wrote: >>> -----Original Message----- >>> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- >>> owner@postgresql.org] On Behalf Of Madison Kelly >>> Sent: 08 December 2008 22:19 >>> To: pgsql-general@postgresql.org >>> Subject: [GENERAL] SELECT DISTINCT ... ORDER BY problem >>> >>> Hi all, >>> >>> I've got a table that I am trying to SELECT DISTINCT on one column >>> and ORDER BY on a second column, but am getting the error: >>> >>> SELECT DISTINCT ON expressions must match initial ORDER BY expressions >>> >>> I can't add the second column to the DISTINCT clause because every >>> row is unique. Likewise, I can't add the first column to my ORDER BY as >>> it'd not sort the way I need it to. >>> >>> Here is a simplified version of my query: >>> >>> \d table >>> Table "table" >>> Column | Type | Modifiers >>> >>> -----------------+---------+---------------------------------------------- >>> -- >>> tbl_id | integer | not null default >>> nextval('tbl_seq'::regclass) >>> foo | text | >>> bar | text | >>> >>> SELECT DISTINCT ON (foo) foo, bar FROM table WHERE bar < '2008-12-07 >>> 16:32:46' AND tbl_id=153 ORDER BY bar LIMIT 1; >>> >> To make the query valid you would have to ORDER BY foo,bar >> DISTINCT ON in this case is only going to show the first bar value for each >> foo. >> >> Is tbl_id not your PK and only giving 1 row anyway? >> >>> I understand from: >>> >>> http://archives.postgresql.org/pgsql-sql/2007-02/msg00169.php >>> >>> That this is not really possible because the any given 'foo' column >>> could match multiple 'bar' columns, so what do you search by? However, >>> it's made some sort of decision as a value is shown in 'bar' for each >>> 'foo'. >>> >>> So my question is two-fold: >>> >>> 1. Can I not say, somehow, "sort all results by 'bar', and return the >>> first/last 'bar' for each distinct 'foo'? >>> >>> 2. Can I somehow say "Order the results using the value of 'bar' you >>> return, regardless of where it came from"? >> You can nest queries: >> >> SELECT foo,bar >> FROM (SELECT DISTINCT ON (foo) foo, >> Bar >> FROM table >> WHERE bar < '2008-12-07 16:32:46' >> AND tbl_id=153 ORDER BY foo,bar >> ) AS t ORDER BY bar; >> >> Notice that I'm only applying the final order by in the outer query. > > When we get windowing functions, a lot of this pain will go away :) > > Cheers, > David. Oh? I can't say I've been keeping up with what is in the pipes. What is windowing? Madi
> -----Original Message----- > From: David Fetter [mailto:david@fetter.org] > Sent: 09 December 2008 00:55 > To: David Rowley > Cc: 'Madison Kelly'; pgsql-general@postgresql.org > Subject: Re: [GENERAL] SELECT DISTINCT ... ORDER BY problem > > On Mon, Dec 08, 2008 at 11:16:29PM -0000, David Rowley wrote: > > > -----Original Message----- > > > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > > > owner@postgresql.org] On Behalf Of Madison Kelly > > > Sent: 08 December 2008 22:19 > > > To: pgsql-general@postgresql.org > > > Subject: [GENERAL] SELECT DISTINCT ... ORDER BY problem > > > > > > Hi all, > > > > > > I've got a table that I am trying to SELECT DISTINCT on one column > > > and ORDER BY on a second column, but am getting the error: > > > > > > SELECT DISTINCT ON expressions must match initial ORDER BY expressions > > > > > > I can't add the second column to the DISTINCT clause because every > > > row is unique. Likewise, I can't add the first column to my ORDER BY > as > > > it'd not sort the way I need it to. > > > > > > Here is a simplified version of my query: > > > > > > \d table > > > Table "table" > > > Column | Type | Modifiers > > > > > > -----------------+---------+------------------------------------------ > ---- > > > -- > > > tbl_id | integer | not null default > > > nextval('tbl_seq'::regclass) > > > foo | text | > > > bar | text | > > > > > > SELECT DISTINCT ON (foo) foo, bar FROM table WHERE bar < '2008-12-07 > > > 16:32:46' AND tbl_id=153 ORDER BY bar LIMIT 1; > > > > > > > To make the query valid you would have to ORDER BY foo,bar > > DISTINCT ON in this case is only going to show the first bar value for > each > > foo. > > > > Is tbl_id not your PK and only giving 1 row anyway? > > > > > > > > I understand from: > > > > > > http://archives.postgresql.org/pgsql-sql/2007-02/msg00169.php > > > > > > That this is not really possible because the any given 'foo' column > > > could match multiple 'bar' columns, so what do you search by? However, > > > it's made some sort of decision as a value is shown in 'bar' for each > > > 'foo'. > > > > > > So my question is two-fold: > > > > > > 1. Can I not say, somehow, "sort all results by 'bar', and return the > > > first/last 'bar' for each distinct 'foo'? > > > > > > 2. Can I somehow say "Order the results using the value of 'bar' you > > > return, regardless of where it came from"? > > > > You can nest queries: > > > > SELECT foo,bar > > FROM (SELECT DISTINCT ON (foo) foo, > > Bar > > FROM table > > WHERE bar < '2008-12-07 16:32:46' > > AND tbl_id=153 ORDER BY foo,bar > > ) AS t ORDER BY bar; > > > > Notice that I'm only applying the final order by in the outer query. > > When we get windowing functions, a lot of this pain will go away :) > Yes! Hope it won't be too long now. The patch seems to behave like it should now :) Hopefully we'll see it commited for 8.4. Though this does not look too much cleaner at least it's standard SQL: A preview for Madi: SELECT foo,bar FROM (SELECT foo,bar, ROW_NUMBER() OVER (PARTITION BY foo ORDER BY bar) AS pos FROM table ) AS t WHERE pos = 1 ORDER BY bar; Probably easier to understand what's going on in this one. David.
Madison Kelly Wrote: > David Fetter wrote: > > On Mon, Dec 08, 2008 at 11:16:29PM -0000, David Rowley wrote: > >>> -----Original Message----- > >>> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > >>> owner@postgresql.org] On Behalf Of Madison Kelly > >>> Sent: 08 December 2008 22:19 > >>> To: pgsql-general@postgresql.org > >>> Subject: [GENERAL] SELECT DISTINCT ... ORDER BY problem > >>> > >>> Hi all, > >>> > >>> I've got a table that I am trying to SELECT DISTINCT on one column > >>> and ORDER BY on a second column, but am getting the error: > >>> > >>> SELECT DISTINCT ON expressions must match initial ORDER BY expressions > >>> > >>> I can't add the second column to the DISTINCT clause because every > >>> row is unique. Likewise, I can't add the first column to my ORDER BY > as > >>> it'd not sort the way I need it to. > >>> > >>> Here is a simplified version of my query: > >>> > >>> \d table > >>> Table "table" > >>> Column | Type | Modifiers > >>> > >>> -----------------+---------+------------------------------------------ > ---- > >>> -- > >>> tbl_id | integer | not null default > >>> nextval('tbl_seq'::regclass) > >>> foo | text | > >>> bar | text | > >>> > >>> SELECT DISTINCT ON (foo) foo, bar FROM table WHERE bar < '2008-12-07 > >>> 16:32:46' AND tbl_id=153 ORDER BY bar LIMIT 1; > >>> > >> To make the query valid you would have to ORDER BY foo,bar > >> DISTINCT ON in this case is only going to show the first bar value for > each > >> foo. > >> > >> Is tbl_id not your PK and only giving 1 row anyway? > >> > >>> I understand from: > >>> > >>> http://archives.postgresql.org/pgsql-sql/2007-02/msg00169.php > >>> > >>> That this is not really possible because the any given 'foo' column > >>> could match multiple 'bar' columns, so what do you search by? However, > >>> it's made some sort of decision as a value is shown in 'bar' for each > >>> 'foo'. > >>> > >>> So my question is two-fold: > >>> > >>> 1. Can I not say, somehow, "sort all results by 'bar', and return the > >>> first/last 'bar' for each distinct 'foo'? > >>> > >>> 2. Can I somehow say "Order the results using the value of 'bar' you > >>> return, regardless of where it came from"? > >> You can nest queries: > >> > >> SELECT foo,bar > >> FROM (SELECT DISTINCT ON (foo) foo, > >> Bar > >> FROM table > >> WHERE bar < '2008-12-07 16:32:46' > >> AND tbl_id=153 ORDER BY foo,bar > >> ) AS t ORDER BY bar; > >> > >> Notice that I'm only applying the final order by in the outer query. > > > > When we get windowing functions, a lot of this pain will go away :) > > > > Cheers, > > David. > > Oh? > > I can't say I've been keeping up with what is in the pipes. What is > windowing? > These are also known as analytical functions in some other database systems, though by the standard they are known as window functions. http://en.wikipedia.org/wiki/Select_(SQL) It's worth a read. Hopefully we'll see this in 8.4. David.
> > > > When we get windowing functions, a lot of this pain will go away :) > > > > Yes! Hope it won't be too long now. The patch seems to behave like it > should > now :) > Hopefully we'll see it commited for 8.4. > > Though this does not look too much cleaner at least it's standard SQL: > > A preview for Madi: > > SELECT foo,bar > FROM (SELECT foo,bar, > ROW_NUMBER() OVER (PARTITION BY foo ORDER BY bar) AS pos > FROM table > ) AS t > WHERE pos = 1 > ORDER BY bar; > > Probably easier to understand what's going on in this one. > > David. > Is Oracle's FIRST_VALUE function not a SQL standard? The way I would do this in Oracle looks like: SELECT foo, FIRST_VALUE(bar) OVER (PARTITION BY foo ORDER BY bar) as bar FROM table http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/functions059. htm
Adam Rich Wrote: > > > > > > When we get windowing functions, a lot of this pain will go away :) > > > > > > > Yes! Hope it won't be too long now. The patch seems to behave like it > > should > > now :) > > Hopefully we'll see it commited for 8.4. > > > > Though this does not look too much cleaner at least it's standard SQL: > > > > A preview for Madi: > > > > SELECT foo,bar > > FROM (SELECT foo,bar, > > ROW_NUMBER() OVER (PARTITION BY foo ORDER BY bar) AS pos > > FROM table > > ) AS t > > WHERE pos = 1 > > ORDER BY bar; > > > > Probably easier to understand what's going on in this one. > > > > David. > > > > Is Oracle's FIRST_VALUE function not a SQL standard? The way I would > do this in Oracle looks like: > > SELECT foo, FIRST_VALUE(bar) OVER (PARTITION BY foo ORDER BY bar) as bar > FROM table > > http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/functions05 > 9. > htm > FIRST_VALUE is standard in SQL:2008. But with that query you're not reducing the rows. You're selecting the first bar for each foo. There may be many foo's. You could get Foo | bar A | a A | a C | a C | a Where with the other query we'd see only two rows, one with foo as 'a' and one as 'c'. Of course then you could nest it then do DISTINCT but then it's about as ugly as it was previously. David.
On Mon, Dec 8, 2008 at 10:19 PM, Madison Kelly <linux@alteeve.com> wrote: > Hi all, > > I've got a table that I am trying to SELECT DISTINCT on one column and > ORDER BY on a second column, but am getting the error: > > SELECT DISTINCT ON expressions must match initial ORDER BY expressions try SELECT distinct, array_accum(bar) FROM table WHERE bar < '2008-12-07 16:32:46' AND tbl_id=153 ORDER BY bar LIMIT 1; or even, when you change bar to proper type - that is, timestamp SELECT distinct foo, min(bar) as minbar, max(bar) as maxbar FROM table WHERE bar < '2008-12-07 16:32:46' AND tbl_id=153 ORDER BY bar LIMIT 1; etc. Trick, is to use aggregate on other value(s). HTH -- GJ
On Tue, Dec 9, 2008 at 9:02 AM, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote: > or even, when you change bar to proper type - that is, timestamp > > SELECT distinct foo, min(bar) as minbar, max(bar) as maxbar FROM > table WHERE bar < '2008-12-07 > 16:32:46' AND tbl_id=153 ORDER BY bar LIMIT 1; but than (after sec of thinking), you could just do SELECT foo, min(bar) AS minbar, max(bar) AS maxbar FROM table WHERE ..blabla.. GROUP BY foo ORDER BY maxbar LIMIT 1; -- GJ
Grzegorz Jaśkiewicz wrote: > On Mon, Dec 8, 2008 at 10:19 PM, Madison Kelly <linux@alteeve.com> wrote: >> Hi all, >> >> I've got a table that I am trying to SELECT DISTINCT on one column and >> ORDER BY on a second column, but am getting the error: >> >> SELECT DISTINCT ON expressions must match initial ORDER BY expressions > > try > SELECT distinct, array_accum(bar) FROM table WHERE bar < '2008-12-07 > 16:32:46' AND tbl_id=153 ORDER BY bar LIMIT 1; > > or even, when you change bar to proper type - that is, timestamp > > SELECT distinct foo, min(bar) as minbar, max(bar) as maxbar FROM > table WHERE bar < '2008-12-07 > 16:32:46' AND tbl_id=153 ORDER BY bar LIMIT 1; > > etc. > > Trick, is to use aggregate on other value(s). > > HTH Well shoot, I didn't realize I had the 'text' where I should have used 'timestamp'. >_< I updated the column, but it took me some fiddling (on a test box!) to sort out the proper command. In case it helps someone else, here was the error I was getting when I tried ALTER without USING: ALTER TABLE table ALTER foo TYPE TIMESTAMP WITHOUT TIME ZONE; ERROR: column "foo" cannot be cast to type "pg_catalog.timestamp" The syntax I needed was: ALTER TABLE table ALTER foo TYPE TIMESTAMP WITHOUT TIME ZONE USING CAST (foo AS TIMESTAMP); I know it's a little off-topic, but maybe it'll help someone searching someday. :) When I try to use: SELECT distinct foo, min(bar) as minbar, max(bar) as maxbar FROM table WHERE bar < '2008-12-07 16:32:46' AND tbl_id=153 ORDER BY bar LIMIT 1; I get the error: ERROR: column "table.foo" must appear in the GROUP BY clause or be used in an aggregate function Already a very big help though, thanks! Madi
Grzegorz Jaśkiewicz wrote: > On Tue, Dec 9, 2008 at 9:02 AM, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote: >> or even, when you change bar to proper type - that is, timestamp >> >> SELECT distinct foo, min(bar) as minbar, max(bar) as maxbar FROM >> table WHERE bar < '2008-12-07 >> 16:32:46' AND tbl_id=153 ORDER BY bar LIMIT 1; > > but than (after sec of thinking), you could just do > > SELECT foo, min(bar) AS minbar, max(bar) AS maxbar FROM table > WHERE ..blabla.. GROUP BY foo ORDER BY maxbar LIMIT 1; Woops, didn't see this. This actually solves a second problem I'd not asked about, too. Thanks!! Madi