Re: SELECT DISTINCT ... ORDER BY problem - Mailing list pgsql-general

From David Rowley
Subject Re: SELECT DISTINCT ... ORDER BY problem
Date
Msg-id 983A86461DFC4AF39F91ECB8CA361AA9@amd64
Whole thread Raw
In response to Re: SELECT DISTINCT ... ORDER BY problem  (David Fetter <david@fetter.org>)
Responses Re: SELECT DISTINCT ... ORDER BY problem
List pgsql-general
> -----Original Message-----
> From: David Fetter [mailto:david@fetter.org]
> Sent: 09 December 2008 00:55
> To: David Rowley
> Cc: 'Madison Kelly'; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] SELECT DISTINCT ... ORDER BY problem
>
> On Mon, Dec 08, 2008 at 11:16:29PM -0000, David Rowley wrote:
> > > -----Original Message-----
> > > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> > > owner@postgresql.org] On Behalf Of Madison Kelly
> > > Sent: 08 December 2008 22:19
> > > To: pgsql-general@postgresql.org
> > > Subject: [GENERAL] SELECT DISTINCT ... ORDER BY problem
> > >
> > > Hi all,
> > >
> > >    I've got a table that I am trying to SELECT DISTINCT on one column
> > > and ORDER BY on a second column, but am getting the error:
> > >
> > > SELECT DISTINCT ON expressions must match initial ORDER BY expressions
> > >
> > >    I can't add the second column to the DISTINCT clause because every
> > > row is unique. Likewise, I can't add the first column to my ORDER BY
> as
> > > it'd not sort the way I need it to.
> > >
> > >    Here is a simplified version of my query:
> > >
> > > \d table
> > >                      Table "table"
> > >       Column      |  Type   |                   Modifiers
> > >
> > > -----------------+---------+------------------------------------------
> ----
> > > --
> > >   tbl_id          | integer | not null default
> > > nextval('tbl_seq'::regclass)
> > >   foo             | text    |
> > >   bar             | text    |
> > >
> > > SELECT DISTINCT ON (foo) foo, bar FROM table WHERE bar < '2008-12-07
> > > 16:32:46' AND tbl_id=153 ORDER BY bar LIMIT 1;
> > >
> >
> > To make the query valid you would have to ORDER BY foo,bar
> > DISTINCT ON in this case is only going to show the first bar value for
> each
> > foo.
> >
> > Is tbl_id not your PK and only giving 1 row anyway?
> >
> > >
> > >    I understand from:
> > >
> > > http://archives.postgresql.org/pgsql-sql/2007-02/msg00169.php
> > >
> > >    That this is not really possible because the any given 'foo' column
> > > could match multiple 'bar' columns, so what do you search by? However,
> > > it's made some sort of decision as a value is shown in 'bar' for each
> > > 'foo'.
> > >
> > >    So my question is two-fold:
> > >
> > > 1. Can I not say, somehow, "sort all results by 'bar', and return the
> > > first/last 'bar' for each distinct 'foo'?
> > >
> > > 2. Can I somehow say "Order the results using the value of 'bar' you
> > > return, regardless of where it came from"?
> >
> > You can nest queries:
> >
> > SELECT foo,bar
> > FROM (SELECT DISTINCT ON (foo) foo,
> >                                Bar
> >       FROM table
> >       WHERE bar < '2008-12-07 16:32:46'
> >         AND tbl_id=153 ORDER BY foo,bar
> > ) AS t ORDER BY bar;
> >
> > Notice that I'm only applying the final order by in the outer query.
>
> When we get windowing functions, a lot of this pain will go away :)
>

Yes! Hope it won't be too long now. The patch seems to behave like it should
now :)
Hopefully we'll see it commited for 8.4.

Though this does not look too much cleaner at least it's standard SQL:

A preview for Madi:

SELECT foo,bar
FROM (SELECT foo,bar,
             ROW_NUMBER() OVER (PARTITION BY foo ORDER BY bar) AS pos
      FROM table
) AS t
WHERE pos = 1
ORDER BY bar;

Probably easier to understand what's going on in this one.

David.


pgsql-general by date:

Previous
From: Madison Kelly
Date:
Subject: Re: SELECT DISTINCT ... ORDER BY problem
Next
From: "David Rowley"
Date:
Subject: Re: SELECT DISTINCT ... ORDER BY problem