Thread: Re: Need help with what I think is likely a simple query - for each distinct val, return only one record with the most recent date.
Re: Need help with what I think is likely a simple query - for each distinct val, return only one record with the most recent date.
From
Reid Thompson
Date:
On 9/12/2011 9:54 PM, Reid Thompson wrote: > Ack -- i flubbed the subject and sample. > The sample data should be > > val val2 date > 1 1 2011-01-01 > 2 2 2011-01-02 > 3 3 2011-01-03 > 4 1 2011-01-04 > 5 2 2011-01-05 > 5 3 2011-01-01 > 4 1 2011-01-02 > 6 2 2011-01-03 > 4 3 2011-01-04 > 3 1 2011-01-05 > 2 2 2011-01-06 > 4 3 2011-01-07 > 6 1 2011-01-08 > 4 2 2011-01-09 > 5 3 2011-01-01 > 2 1 2011-01-02 > 4 2 2011-01-03 > 2 3 2011-01-04 > 1 1 2011-01-01 > 2 2 2011-01-02 > 3 3 2011-01-03 > 4 1 2011-01-04 > 3 2 2011-01-05 > 1 3 2011-01-01 > 2 1 2011-01-02 > 3 2 2011-01-03 > 4 3 2011-01-04 > 5 4 2012-01-01 > > resultset: > > 1 3 2011-01-01 > 2 2 2011-01-06 > 3 1 2011-01-05 > 4 2 2011-01-09 > 5 2 2011-01-05 > 6 1 2011-01-08 > > where any one of these 3 > > 1 1 2011-01-01 > 1 1 2011-01-01 > 1 3 2011-01-01 > > or any one of these 2 > 3 1 2011-01-05 > 3 2 2011-01-05 > > are suitable for val = 1, val = 3 respectively. > > sigh -- looks like I fat fingered one of my resultset values above. But, I think this gives me what I want: test=# select distinct on (val1) val1, val2, val3 from (SELECT max(val3) OVER (PARTITION BY val1), * FROM sampledata) as sq where val3 = max order by val1; val1 | val2 | val3 ------+------+------------ 1 | 3 | 2011-01-01 2 | 2 | 2011-01-06 3 | 2 | 2011-01-05 4 | 2 | 2011-01-09 5 | 4 | 2012-01-01 6 | 1 | 2011-01-08 (6 rows) val1 | val2 | val3 ------+------+------------ 1 | 3 | 2011-01-01 1 | 1 | 2011-01-01 1 | 1 | 2011-01-01 2 | 2 | 2011-01-02 2 | 1 | 2011-01-02 2 | 3 | 2011-01-04 2 | 1 | 2011-01-02 2 | 2 | 2011-01-06 2 | 2 | 2011-01-02 3 | 3 | 2011-01-03 3 | 3 | 2011-01-03 3 | 2 | 2011-01-05 3 | 1 | 2011-01-05 3 | 2 | 2011-01-03 4 | 1 | 2011-01-04 4 | 1 | 2011-01-02 4 | 3 | 2011-01-04 4 | 3 | 2011-01-07 4 | 3 | 2011-01-04 4 | 2 | 2011-01-09 4 | 1 | 2011-01-04 4 | 2 | 2011-01-03 5 | 4 | 2012-01-01 5 | 2 | 2011-01-05 5 | 3 | 2011-01-01 5 | 3 | 2011-01-01 6 | 1 | 2011-01-08 6 | 2 | 2011-01-03 (28 rows)
Re: Re: Need help with what I think is likely a simple query - for each distinct val, return only one record with the most recent date.
From
Vincent de Phily
Date:
On Monday 12 September 2011 22:51:54 Reid Thompson wrote: > test=# select distinct on (val1) val1, val2, val3 from (SELECT > max(val3) OVER (PARTITION BY val1), * FROM sampledata) as sq where val3 > = max order by val1; Other things I've tried (was limited to PG8.3 so no "OVER (PARTITION...)" support) : SELECT DISTINCT ON (val1), val2, val3 FROM table ORDER BY val1, val3; SELECT val1, val2, val3 FROM table WHERE id IN ( SELECT sq.i FROM ( SELECT val1, max(val3) FROM table GROUP by 1 ) AS sq (v,i)) My case was a bit different since I wanted the record for distinct(A,B) instead of just DISTINC(A), and since I had a primary key available on the table. But let it be food for thought. However, none of those queries are either efficient or beautiful, so I ended up populating a "last_values" table via a trigger, which is way more efficient if it fits your needs : CREATE TABLE last_values (PRIMARY KEY foo, bar integer, baz integer); CREATE OR REPLACE FUNCTION insert_last_value() RETURNS TRIGGER AS $$ BEGIN UPDATE last_values SET val1=NEW.val1... WHERE ...; IF NOT found THEN BEGIN INSERT INTO last_values (...) VALUES (NEW....); EXCEPTION WHEN UNIQUE_VIOLATION THEN UPDATE last_values SET ... WHERE ...; END; END IF; RETURN NULL; END $$ LANGUAGE plpgsql; CREATE TRIGGER last_values_trigger AFTER INSERT ON values FOR EACH ROW EXECUTE PROCEDURE insert_last_values(); -- Vincent de Phily