Thread: DISTINCT ON
Hi, I have a table: observation ( id int4 NOT NULL [PRIMARY KEY], vector_id NOT NULL [FORGIEN KEY], obs_type VARCHAR(4) NOT NULL, date TIMESTAMPNULL ) I need to select the newest observation id, classify by type, for each vector (there can be multiple observation ids on each vector). I have read the postgresql manual and see that there is a "DISTINCT ON" statement which can do this. i.e. SELECT DISTINCT ON (vector_id, obs_type) id FROM observation ORDER BY vector_id, obs_type, date DESC; However the documentation also states that "DISTINCT ON" is not part of the SQL standard and should be avoided when possible, stating that aggregations and sub-queries should be used instead... How would this be done in this scenario? Thanks for you help Jeremy
On Nov 19, 2005, at 7:49 , Jeremy Palmer wrote: > SELECT DISTINCT ON (vector_id, obs_type) > id > FROM observation > ORDER BY vector_id, > obs_type, > date DESC; > > However the documentation also states that "DISTINCT ON" is not > part of the > SQL standard and should be avoided when possible, stating that > aggregations > and sub-queries should be used instead... > > How would this be done in this scenario? Something like: select max(date), id from observation group by vector_id, obs_type; Do test to see which is better in your situation. Michael Glaesemann grzm myrealbox com
Thanks for the reply. Unfortunately that does not work as "id" column needs to be included in the group by statement or be used in an aggregate function. If I did this it definitely would note return the correct answer, as the "id" column is the primary key for the table. Any further suggestions? -----Original Message----- From: Michael Glaesemann [mailto:grzm@myrealbox.com] Sent: Saturday, 19 November 2005 12:28 p.m. To: Jeremy Palmer Cc: pgsql-sql@postgresql.org Subject: Re: [SQL] DISTINCT ON On Nov 19, 2005, at 7:49 , Jeremy Palmer wrote: > SELECT DISTINCT ON (vector_id, obs_type) > id > FROM observation > ORDER BY vector_id, > obs_type, > date DESC; > > However the documentation also states that "DISTINCT ON" is not > part of the > SQL standard and should be avoided when possible, stating that > aggregations > and sub-queries should be used instead... > > How would this be done in this scenario? Something like: select max(date), id from observation group by vector_id, obs_type; Do test to see which is better in your situation. Michael Glaesemann grzm myrealbox com
> -----Original Message----- > From: Michael Glaesemann [mailto:grzm@myrealbox.com] > Sent: Saturday, 19 November 2005 12:28 p.m. > On Nov 19, 2005, at 7:49 , Jeremy Palmer wrote: > >> SELECT DISTINCT ON (vector_id, obs_type) >> id >> FROM observation >> ORDER BY vector_id, >> obs_type, >> date DESC; >> >> However the documentation also states that "DISTINCT ON" is not >> part of the >> SQL standard and should be avoided when possible, stating that >> aggregations >> and sub-queries should be used instead... > Something like: > > select max(date), id > from observation > group by vector_id, obs_type; On Nov 19, 2005, at 11:50 , Jeremy Palmer wrote: > Unfortunately that does not work as "id" column needs to be > included in the > group by statement or be used in an aggregate function. If I did > this it > definitely would note return the correct answer, as the "id" column > is the > primary key for the table. [Please don't top post. It makes the post harder to read. I've reordered the post for readability.] Try something like this: select id from (select max(date) as date vector_id, obs_typefrom observationgroup by vector_id, obs_type) latest_observations join observation using (date, vector_id, obs_type) Michael Glaesemann grzm myrealbox com
> -----Original Message----- > From: Michael Glaesemann [mailto:grzm@myrealbox.com] > Sent: Saturday, 19 November 2005 4:07 p.m. > To: Jeremy Palmer > Cc: pgsql-sql@postgresql.org > Subject: Re: [SQL] DISTINCT ON > > [Please don't top post. It makes the post harder to read. I've > reordered the post for readability.] > > Try something like this: > > select id > from ( > select max(date) as date > vector_id, obs_type > from observation > group by vector_id, obs_type > ) latest_observations > join observation using (date, vector_id, obs_type) > > Michael Glaesemann > grzm myrealbox com Thanks again for the help. This query looks close to what I need. I think the only change is to use an explicit join criteria, as the max date alias can't be used in the "using" clause - only table columns can. Interesting enough, on my server the "distinct on" clause that I originally ran takes 10% of execution time that the query you provided does. Thanks for your help. Jeremy Palmer
On Sat, Nov 19, 2005 at 17:06:27 +1300, Jeremy Palmer <palmerj@xtra.co.nz> wrote: > > Interesting enough, on my server the "distinct on" clause that I originally > ran takes 10% of execution time that the query you provided does. If DISTINCT ON wasn't useful, I doubt it would have been added.