Thread: Re: Need help with what I think is likely a simple query - for each distinct val, return only one record with the most recent date.

Ack -- i flubbed the subject and sample.
The sample data should be

val  val2    date
1    1       2011-01-01
2    2       2011-01-02
3    3       2011-01-03
4    1       2011-01-04
5    2       2011-01-05
5    3       2011-01-01
4    1       2011-01-02
6    2       2011-01-03
4    3       2011-01-04
3    1       2011-01-05
2    2       2011-01-06
4    3       2011-01-07
6    1       2011-01-08
4    2       2011-01-09
5    3       2011-01-01
2    1       2011-01-02
4    2       2011-01-03
2    3       2011-01-04
1    1       2011-01-01
2    2       2011-01-02
3    3       2011-01-03
4    1       2011-01-04
3    2       2011-01-05
1    3       2011-01-01
2    1       2011-01-02
3    2       2011-01-03
4    3       2011-01-04
5    4       2012-01-01

resultset:

1    3       2011-01-01
2    2       2011-01-06
3    1       2011-01-05
4    2       2011-01-09
5    2       2011-01-05
6    1       2011-01-08

where any one of these 3

1    1       2011-01-01
1    1       2011-01-01
1    3       2011-01-01

or any one of these 2
3    1       2011-01-05
3    2       2011-01-05

are suitable for val = 1, val = 3 respectively.





On 9/12/2011 8:54 PM, Reid Thompson wrote:
> Could someone point me in the right direction..
> Thanks - reid
>
> Given the example data,
> how do I write a query that will give me the resultset:
>
> 1    2011-01-01
> 2    2011-01-06
> 3    2011-01-05
> 4    2011-01-09
> 5    2011-01-05
> 6    2011-01-08
>
> I.E. for each distinct val, return the record with the most recent date.
>
> ex data
> val  date
> 1    2011-01-01
> 2    2011-01-02
> 3    2011-01-03
> 4    2011-01-04
> 5    2011-01-05
> 5    2011-01-01
> 4    2011-01-02
> 6    2011-01-03
> 4    2011-01-04
> 3    2011-01-05
> 2    2011-01-06
> 4    2011-01-07
> 6    2011-01-08
> 4    2011-01-09
> 5    2011-01-01
> 2    2011-01-02
> 4    2011-01-03
> 2    2011-01-04
> 1    2011-01-01
> 2    2011-01-02
> 3    2011-01-03
> 4    2011-01-04
> 3    2011-01-05
> 1    2011-01-01
> 2    2011-01-02
> 3    2011-01-03
> 4    2011-01-04
> 5    2011-01-01
> -----------------------------------
>
> $ cat sampledata|sort -k1,2
> 1    2011-01-01
> 1    2011-01-01
> 1    2011-01-01
> 2    2011-01-02
> 2    2011-01-02
> 2    2011-01-02
> 2    2011-01-02
> 2    2011-01-04
> 2    2011-01-06
> 3    2011-01-03
> 3    2011-01-03
> 3    2011-01-03
> 3    2011-01-05
> 3    2011-01-05
> 4    2011-01-02
> 4    2011-01-03
> 4    2011-01-04
> 4    2011-01-04
> 4    2011-01-04
> 4    2011-01-04
> 4    2011-01-07
> 4    2011-01-09
> 5    2011-01-01
> 5    2011-01-01
> 5    2011-01-01
> 5    2011-01-05
> 6    2011-01-03
> 6    2011-01-08
>
>


Reid,


> where any one of these 3
>
> 1    1       2011-01-01
> 1    1       2011-01-01
> 1    3       2011-01-01
>
> or any one of these 2
> 3    1       2011-01-05
> 3    2       2011-01-05
>
> are suitable for val = 1, val = 3 respectively.


Can you please describe in words what you are trying to accomplish? When
I look at your data and expected output, I'd say you want this:

   For each distinct value of "val1", return any value of "val2" and
   the lowest value of "date".


This is actually quite simple - you could also use max(), avg(), ...
instead of min for val2:

SELECT val1, min(val2), min(date)
FROM data
GROUP BY val1


Best regards

    -hannes

On Tue, 2011-09-13 at 16:39 +0200, Hannes Erven wrote:
> Reid,
>
>
> > where any one of these 3
> >
> > 1    1       2011-01-01
> > 1    1       2011-01-01
> > 1    3       2011-01-01
> >
> > or any one of these 2
> > 3    1       2011-01-05
> > 3    2       2011-01-05
> >
> > are suitable for val = 1, val = 3 respectively.
>
>
> Can you please describe in words what you are trying to accomplish? When
> I look at your data and expected output, I'd say you want this:
>
>    For each distinct value of "val1", return any value of "val2" and
>    the lowest value of "date".

for each distinct value of "val1', return the highest value(most recent)
of "date" and the value of "val2" associated with that date




I'm sorry Reid, driving back home I realized that the previous query I suggested didn't do what I was expecting, cause it would compute all of val2 for each val1, even if they belonged to another group (not for a particular val1/date pair), or in other words, to another date.

I've considered this fact in the previous post but the resulting query appeared too complex for such a simple task, and then disregarded it, but well, after all it seems it was necessary to do this little extra work, because you wanted the exact val2 associated to the max(date) for val1.

I've come up with this alternative, basically a "key extractor" followed by diving in the original table looking for val2 for that key.
As I've said before, the data you provided showed that there could be multiple rows for each key made of val1 and max(date) for that specific val1.

So, you will still have to define a criteria on which val2 to keep.
Since I don't know the reason for this query, I've suggested an array_agg so that it is more clear to you.

select
   a.val1,
   a.date,
   array_agg(mytable.val2) -- given there can be multiple rows for any val1/date pair, this is where you are allowed to define which one you want (or all of them)
from (
   select val1, max(date) as date from mytable group by val1) a -- this gives you only rows satisfying val1/max(date)
   inner join mytable on a.val1 = mytable.val1 and a.date = mytable.date -- this join allows the retrieval of val2 for that "key" formed by val1/max(date), but be prepared for many rows
group by
   a.val1,
   a.date
order by
   val1;

test case:
========

select val1, val2, date from mytable;

1;16;"2011-09-13";1
1;15;"2011-09-13";2
1;14;"2011-09-13";3

1;23;"2011-09-12";4
1;22;"2011-09-12";5
1;21;"2011-09-12";6

2;6;"2011-09-13";7
2;5;"2011-09-13";8
2;4;"2011-09-13";9

2;3;"2011-09-14";10
2;2;"2011-09-14";11
2;1;"2011-09-14";12

This is the result coming from the new query, you see, only val2's of 14, 15 and 16 are computed (you will still eventually have to select one from it, using a more specific aggregate, like max, min, avg, etc.)
1;"2011-09-13";"{14,15,16}"
2;"2011-09-14";"{1,2,3}"


This is the result coming from the original, simpler (but flawed) query, which as it clearly shows computes val2's of 14, 15, 16, 23, 22 and 21, beging those last 3 (23, 22 and 21) from another date associated with val1 (which is not the max date, so they shouldn't have been considered at all).

select val1, max(date), array_agg(val2) from mytable group by val1

1;"2011-09-13";"{14,15,16,23,22,21}"
2;"2011-09-14";"{1,2,3,4,5,6}"


hope it helps.
regards,
eduardo


On Tue, Sep 13, 2011 at 1:13 PM, Reid Thompson <Reid.Thompson@ateb.com> wrote:
On Tue, 2011-09-13 at 16:39 +0200, Hannes Erven wrote:
> Reid,
>
>
> > where any one of these 3
> >
> > 1    1       2011-01-01
> > 1    1       2011-01-01
> > 1    3       2011-01-01
> >
> > or any one of these 2
> > 3    1       2011-01-05
> > 3    2       2011-01-05
> >
> > are suitable for val = 1, val = 3 respectively.
>
>
> Can you please describe in words what you are trying to accomplish? When
> I look at your data and expected output, I'd say you want this:
>
>    For each distinct value of "val1", return any value of "val2" and
>    the lowest value of "date".

for each distinct value of "val1', return the highest value(most recent)
of "date" and the value of "val2" associated with that date




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general