Thread: Finding "most recent" using daterange
Coming back to PostgreSQL after a (decades-)long absence... If I have something like: CREATE TABLE example ( id integer NOT NULL, value text NOT NULL, dates daterange NOT NULL ); INSERT INTO example VALUES (1, 'a', '[2010-01-01,2020-01-01)'), (1, 'b', '[2010-01-01,)'), (1, 'c', '[,2021-01-01)'), (2, 'd', '[2010-01-01,2021-01-01)'), (2, 'e', '[2015-01-01,2020-01-01)'), (3, 'f', '[2014-01-01,2016-01-01)'), (3, 'g', '[2013-01-01,)'), (3, 'h', '[2012-01-01,)'), (3, 'i', '[2013-01-01,2017-01-01)'), (4, 'j', '[2010-01-01,2015-01-01)'); and I want to find the "most recent" value out of each group, meaning that having the greatest upper bound followed by the greatest lower bound, what I've managed to come up with thus far is: WITH intermediate AS ( SELECT e.id, e.value, e.dates FROM example AS e JOIN ( SELECT id, max(coalesce(upper(dates), 'infinity')) AS max_date FROM example GROUP BY id ) AS max_upper ON e.id = max_upper.id AND coalesce(upper(dates), 'infinity') = max_upper.max_date ) SELECT i.id, i.value, i.dates FROM intermediate AS i JOIN ( SELECT id, max(coalesce(lower(dates), '-infinity')) AS max_date FROM intermediate GROUP BY id ) AS max_lower ON i.id = max_lower.id AND coalesce(lower(dates), '-infinity') = max_lower.max_date; which produces the desired result for this minimal example: id | value | dates ----+-------+------------------------- 1 | b | [2010-01-01,) 2 | d | [2010-01-01,2021-01-01) 3 | g | [2013-01-01,) 4 | j | [2010-01-01,2015-01-01) I pretty quickly discovered that there's no max(daterange) -- although it isn't obvious what that would do, anyway -- and the "intermediate" CTE is what followed. Is there a better way? (Note that this doesn't try to handle duplicate ranges -- I haven't decided whether that'll be necessary in the real case. Assume it'll have something beyond daterange NOT NULL and/or some _agg() magic, eventually.) -Rob
This is a good candidate for a window function. Also note that nulls already get sorted correctly by the DESC so no need to get 'infinity' involved, although you could write 'DESC NULLS FIRST' to be explicit about it.
with x as (select *, row_number() over (partition by id order by upper(dates) desc, lower(dates) desc) from example)
select id,value,dates from x where row_number = 1;
id | value | dates
----+-------+-------------------------
1 | b | [2010-01-01,)
2 | d | [2010-01-01,2021-01-01)
3 | g | [2013-01-01,)
4 | j | [2010-01-01,2015-01-01)
(4 rows)
----+-------+-------------------------
1 | b | [2010-01-01,)
2 | d | [2010-01-01,2021-01-01)
3 | g | [2013-01-01,)
4 | j | [2010-01-01,2015-01-01)
(4 rows)
Cheers,
Greg
On Wed, 22 May 2024 at 10:15, Greg Sabino Mullane <htamfids@gmail.com> wrote:
This is a good candidate for a window function. Also note that nulls already get sorted correctly by the DESC so no need to get 'infinity' involved, although you could write 'DESC NULLS FIRST' to be explicit about it.with x as (select *, row_number() over (partition by id order by upper(dates) desc, lower(dates) desc) from example)select id,value,dates from x where row_number = 1;
Don’t you need NULLS LAST for the lower bounds? There NULL means something closer to -infinity and should appear after the non-NULL values in a descending sort.
Actually it strikes me that this sorting issue could be a reason to avoid NULL bounds on ranges and prefer the use of +/-infinity if the underlying data type supports it.
Oh, you are right - NULLS LAST, got my -infinity crossed with my infinity. :)
Cheers,
Greg
On Wed, 22 May 2024 at 11:36, Greg Sabino Mullane <htamfids@gmail.com> wrote:
Oh, you are right - NULLS LAST, got my -infinity crossed with my infinity. :)
NULLS LAST for lower bound, NULLS FIRST for upper bound.
The other way around if you were doing an ascending sort.
> On 22 May 2024, at 09:58, Rob Foehl <rwf@loonybin.net> wrote: > > Coming back to PostgreSQL after a (decades-)long absence... If I have > something like: > > CREATE TABLE example ( > id integer NOT NULL, > value text NOT NULL, > dates daterange NOT NULL > ); > > INSERT INTO example VALUES > (1, 'a', '[2010-01-01,2020-01-01)'), > (1, 'b', '[2010-01-01,)'), > (1, 'c', '[,2021-01-01)'), > (2, 'd', '[2010-01-01,2021-01-01)'), > (2, 'e', '[2015-01-01,2020-01-01)'), > (3, 'f', '[2014-01-01,2016-01-01)'), > (3, 'g', '[2013-01-01,)'), > (3, 'h', '[2012-01-01,)'), > (3, 'i', '[2013-01-01,2017-01-01)'), > (4, 'j', '[2010-01-01,2015-01-01)'); > > and I want to find the "most recent" value out of each group, meaning > that having the greatest upper bound followed by the greatest lower > bound, what I've managed to come up with thus far is: Sounds like a good candidate for using EXISTS to prove that no more recent value exists for a given id: SELECT e.id, e.value, e.dates FROM example AS e WHERE NOT EXISTS ( SELECT 1 FROM example AS i WHERE i.id = e.id AND (coalesce(upper(i.dates), 'infinity') > coalesce(upper(e.dates), 'infinity') OR (coalesce(upper(i.dates), 'infinity') = coalesce(upper(e.dates), 'infinity') AND coalesce(lower(i.dates), '-infinity') > coalesce(lower(e.dates), '-infinity')) ) ); id | value | dates ----+-------+------------------------- 1 | b | [2010-01-01,) 2 | d | [2010-01-01,2021-01-01) 3 | g | [2013-01-01,) 4 | j | [2010-01-01,2015-01-01) (4 rows) Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
On Wed, May 22, 2024 at 11:07 AM Alban Hertroys <haramrae@gmail.com> wrote:
Sounds like a good candidate for using EXISTS to prove that no more recent value exists for a given id:
SELECT e.id, e.value, e.dates
FROM example AS e
WHERE NOT EXISTS (
SELECT 1
FROM example AS i
WHERE i.id = e.id
AND (coalesce(upper(i.dates), 'infinity') > coalesce(upper(e.dates), 'infinity')
OR (coalesce(upper(i.dates), 'infinity') = coalesce(upper(e.dates), 'infinity')
AND coalesce(lower(i.dates), '-infinity') > coalesce(lower(e.dates), '-infinity'))
)
);
Not sure if I'm missing something, but what about just using DISTINCT?
SELECT DISTINCT ON (id) id,value,dates FROM example ORDER BY id,upper(dates) desc, lower(dates) desc;
id | value | dates
----+-------+-------------------------
1 | b | [2010-01-01,)
2 | d | [2010-01-01,2021-01-01)
3 | g | [2013-01-01,)
4 | j | [2010-01-01,2015-01-01)
(4 rows)
Cheers,
Ken
AGENCY Software
A Free Software data system
By and for non-profits
(253) 245-3801
learn more about AGENCY or
follow the discussion.