Re: SELECT DISTINCT ... ORDER BY problem - Mailing list pgsql-general
From | David Rowley |
---|---|
Subject | Re: SELECT DISTINCT ... ORDER BY problem |
Date | |
Msg-id | 983A86461DFC4AF39F91ECB8CA361AA9@amd64 Whole thread Raw |
In response to | Re: SELECT DISTINCT ... ORDER BY problem (David Fetter <david@fetter.org>) |
Responses |
Re: SELECT DISTINCT ... ORDER BY problem
|
List | pgsql-general |
> -----Original Message----- > From: David Fetter [mailto:david@fetter.org] > Sent: 09 December 2008 00:55 > To: David Rowley > Cc: 'Madison Kelly'; pgsql-general@postgresql.org > Subject: Re: [GENERAL] SELECT DISTINCT ... ORDER BY problem > > On Mon, Dec 08, 2008 at 11:16:29PM -0000, David Rowley wrote: > > > -----Original Message----- > > > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > > > owner@postgresql.org] On Behalf Of Madison Kelly > > > Sent: 08 December 2008 22:19 > > > To: pgsql-general@postgresql.org > > > Subject: [GENERAL] SELECT DISTINCT ... ORDER BY problem > > > > > > Hi all, > > > > > > I've got a table that I am trying to SELECT DISTINCT on one column > > > and ORDER BY on a second column, but am getting the error: > > > > > > SELECT DISTINCT ON expressions must match initial ORDER BY expressions > > > > > > I can't add the second column to the DISTINCT clause because every > > > row is unique. Likewise, I can't add the first column to my ORDER BY > as > > > it'd not sort the way I need it to. > > > > > > Here is a simplified version of my query: > > > > > > \d table > > > Table "table" > > > Column | Type | Modifiers > > > > > > -----------------+---------+------------------------------------------ > ---- > > > -- > > > tbl_id | integer | not null default > > > nextval('tbl_seq'::regclass) > > > foo | text | > > > bar | text | > > > > > > SELECT DISTINCT ON (foo) foo, bar FROM table WHERE bar < '2008-12-07 > > > 16:32:46' AND tbl_id=153 ORDER BY bar LIMIT 1; > > > > > > > To make the query valid you would have to ORDER BY foo,bar > > DISTINCT ON in this case is only going to show the first bar value for > each > > foo. > > > > Is tbl_id not your PK and only giving 1 row anyway? > > > > > > > > I understand from: > > > > > > http://archives.postgresql.org/pgsql-sql/2007-02/msg00169.php > > > > > > That this is not really possible because the any given 'foo' column > > > could match multiple 'bar' columns, so what do you search by? However, > > > it's made some sort of decision as a value is shown in 'bar' for each > > > 'foo'. > > > > > > So my question is two-fold: > > > > > > 1. Can I not say, somehow, "sort all results by 'bar', and return the > > > first/last 'bar' for each distinct 'foo'? > > > > > > 2. Can I somehow say "Order the results using the value of 'bar' you > > > return, regardless of where it came from"? > > > > You can nest queries: > > > > SELECT foo,bar > > FROM (SELECT DISTINCT ON (foo) foo, > > Bar > > FROM table > > WHERE bar < '2008-12-07 16:32:46' > > AND tbl_id=153 ORDER BY foo,bar > > ) AS t ORDER BY bar; > > > > Notice that I'm only applying the final order by in the outer query. > > When we get windowing functions, a lot of this pain will go away :) > Yes! Hope it won't be too long now. The patch seems to behave like it should now :) Hopefully we'll see it commited for 8.4. Though this does not look too much cleaner at least it's standard SQL: A preview for Madi: SELECT foo,bar FROM (SELECT foo,bar, ROW_NUMBER() OVER (PARTITION BY foo ORDER BY bar) AS pos FROM table ) AS t WHERE pos = 1 ORDER BY bar; Probably easier to understand what's going on in this one. David.
pgsql-general by date: