Re: Given a set of daterange, finding the continuous range thatincludes a particular date (aggregates) - Mailing list pgsql-general

From Ken Tanzer
Subject Re: Given a set of daterange, finding the continuous range thatincludes a particular date (aggregates)
Date
Msg-id CAD3a31X_swvaODUt=yShCvN1fBvnEq3RPsto-p1-GB6q6UqKjA@mail.gmail.com
Whole thread Raw
Responses Re: Given a set of daterange, finding the continuous range thatincludes a particular date (aggregates)
List pgsql-general
On Fri, Feb 23, 2018 at 6:10 AM, mariusz <marius@mtvk.pl> wrote: 

i guess, you can easily get max continuous range for each row with
something like this:

CREATE OR REPLACE FUNCTION
        append_daterange(d1 daterange, d2 daterange)
        RETURNS daterange
        LANGUAGE sql
        AS
$$
        SELECT CASE WHEN d1 && d2 OR d1 -|- d2 THEN d1 + d2 ELSE d2 END;
$$;

CREATE AGGREGATE agg_daterange (daterange) (
        sfunc = append_daterange,
        stype = daterange
);

SELECT dr,
       lower(agg_daterange(dr) OVER (ORDER BY dr ASC)),
       upper(agg_daterange(dr) OVER (ORDER BY dr DESC))
  FROM ...

above example is simplified to selecting only daterange column "dr" for
readability, which in your case should be something like

daterange(staff_assign_date,staff_assign_date_end,'[)')

please note that daterange would be normalized to [) format so upper()
above returns exactly your max "staff_assign_date_end" for each
continuous range when dateranges are created with '[)' format.

the key point is ... ELSE d2 in append_daterange() which starts with new
value each time that new value is discontinuous with agg's state value
and order in which rows are processed (ASC for lower of daterange, DESC
for upper of daterange).

unfortunately this involves reading all rows for "client_id" and
additional sorting for each window.
i recall reading that you already pay the price of reading all rows for
client_id anyway, so the only question is the cost of two additional
sorts (maybe extracting dateranges to subset on which to do windows and
rejoining result of continuous ranges to original set would help to
lower the cost).


Thank you, and I wanted to follow up on this.  I couldn't quite get your example working as described, but I also ended up trying something very similar that got me very close but not quite there.  Basically, I can see that it is correctly calculating the ranges (in the notices), but it is only returning the last range for each client.  (Because I said PARTITION BY client_id).

So I'm not sure if I should be calling this differently, or if the function needs to work differently, or if this just isn't possible.  Do I need to partition by something else, and if so what?  I don't see what I could specify that would indicate a new episode.

Also, it's not clear to me how an aggregate might define/return different values within a partition.  Although this must be possible, since functions like rank() and row_number() seem to do it.

Hoping there is something easy that can be tweaked here.  See below for copy/pastable test stuff.  It includes output from both functions.  Both look to be returning the same results, which makes me wonder if my passing in a start date was a waste of time, though it seems to me it would be necessary.

Cheers,
Ken

BEGIN;
CREATE TEMP TABLE sample_data (
client_id INTEGER,
start_date DATE,
end_date DATE,
episode INTEGER -- Just a label, for clarity
);

INSERT INTO sample_data VALUES
(1,'1990-01-01','1990-12-31',1),
(1,'1991-01-01','1991-12-31',1),

(1,'2000-01-01','2000-12-31',2),
(1,'2001-01-01','2001-12-31',2),
(1,'2002-01-01','2002-12-31',2),
(1,'2003-01-01','2003-12-31',2),
(1,'2004-01-01','2004-12-31',2),
(1,'2005-01-01','2005-12-31',2),
(1,'2006-01-01','2006-12-31',2),

(1,'2014-01-01','2014-12-31',3),
(1,'2015-01-01','2015-12-31',3),
(1,'2017-06-30','2017-12-31',4),
(1,'2018-01-01',NULL,4),

(2,'2014-02-01','2015-01-31',1),
(2,'2015-02-01','2015-12-31',1),
(2,'2017-09-30','2018-01-31',2),
(2,'2018-02-01','2018-02-14',2)
;

CREATE OR REPLACE FUNCTION
        append_daterange(d1 daterange, d2 daterange)
        RETURNS daterange
        LANGUAGE sql
        AS
$$
        SELECT CASE WHEN d1 && d2 OR d1 -|- d2 THEN d1 + d2 ELSE d2 END;
$$;

CREATE AGGREGATE agg_daterange (daterange) (
        sfunc = append_daterange,
        stype = daterange
);

CREATE OR REPLACE FUNCTION range_continuous_merge( daterange, daterange, date ) RETURNS daterange AS $$

DECLARE
    res daterange;

BEGIN
res:=  CASE
    WHEN $1 IS NULL AND NOT $2 @> $3 THEN NULL
    WHEN $1 IS NULL AND $2 @> $3 THEN $2
    WHEN ($1 && $2) OR ($1 -|- $2) THEN $1 + $2
    WHEN NOT $1 @> $3 THEN $2
    ELSE $1
END;
RAISE NOTICE 'Inputs: %,%,%.  Returning %',$1::text,$2::text,$3::text,res;
RETURN res;
END;
$$ LANGUAGE plpgsql STABLE;

CREATE AGGREGATE range_continuous( daterange, date ) (

    sfunc =  range_continuous_merge,
    stype = daterange
--  initcond = '{0,0,0}'

);

SELECT client_id,episode,start_date,end_date,range_continuous(daterange(start_date,end_date,'[]'),start_date) OVER (PARTITION by client_id) FROM sample_data ;

SELECT client_id,episode,start_date,end_date,agg_daterange(daterange(start_date,end_date,'[]')) OVER (PARTITION by client_id) FROM sample_data ;



-- RANGE_CONTINUOUS_MERGE

NOTICE:  Inputs: <NULL>,[1990-01-01,1991-01-01),1990-01-01.  Returning [1990-01-01,1991-01-01)
NOTICE:  Inputs: [1990-01-01,1991-01-01),[1991-01-01,1992-01-01),1991-01-01.  Returning [1990-01-01,1992-01-01)
NOTICE:  Inputs: [1990-01-01,1992-01-01),[2000-01-01,2001-01-01),2000-01-01.  Returning [2000-01-01,2001-01-01)
NOTICE:  Inputs: [2000-01-01,2001-01-01),[2001-01-01,2002-01-01),2001-01-01.  Returning [2000-01-01,2002-01-01)
NOTICE:  Inputs: [2000-01-01,2002-01-01),[2002-01-01,2003-01-01),2002-01-01.  Returning [2000-01-01,2003-01-01)
NOTICE:  Inputs: [2000-01-01,2003-01-01),[2003-01-01,2004-01-01),2003-01-01.  Returning [2000-01-01,2004-01-01)
NOTICE:  Inputs: [2000-01-01,2004-01-01),[2004-01-01,2005-01-01),2004-01-01.  Returning [2000-01-01,2005-01-01)
NOTICE:  Inputs: [2000-01-01,2005-01-01),[2005-01-01,2006-01-01),2005-01-01.  Returning [2000-01-01,2006-01-01)
NOTICE:  Inputs: [2000-01-01,2006-01-01),[2006-01-01,2007-01-01),2006-01-01.  Returning [2000-01-01,2007-01-01)
NOTICE:  Inputs: [2000-01-01,2007-01-01),[2014-01-01,2015-01-01),2014-01-01.  Returning [2014-01-01,2015-01-01)
NOTICE:  Inputs: [2014-01-01,2015-01-01),[2015-01-01,2016-01-01),2015-01-01.  Returning [2014-01-01,2016-01-01)
NOTICE:  Inputs: [2014-01-01,2016-01-01),[2017-06-30,2018-01-01),2017-06-30.  Returning [2017-06-30,2018-01-01)
NOTICE:  Inputs: [2017-06-30,2018-01-01),[2018-01-01,),2018-01-01.  Returning [2017-06-30,)
NOTICE:  Inputs: <NULL>,[2014-02-01,2015-02-01),2014-02-01.  Returning [2014-02-01,2015-02-01)
NOTICE:  Inputs: [2014-02-01,2015-02-01),[2015-02-01,2016-01-01),2015-02-01.  Returning [2014-02-01,2016-01-01)
NOTICE:  Inputs: [2014-02-01,2016-01-01),[2017-09-30,2018-02-01),2017-09-30.  Returning [2017-09-30,2018-02-01)
NOTICE:  Inputs: [2017-09-30,2018-02-01),[2018-02-01,2018-02-15),2018-02-01.  Returning [2017-09-30,2018-02-15)

 client_id | episode | start_date |  end_date  |    range_continuous
-----------+---------+------------+------------+-------------------------
         1 |       1 | 1990-01-01 | 1990-12-31 | [2017-06-30,)
         1 |       1 | 1991-01-01 | 1991-12-31 | [2017-06-30,)
         1 |       2 | 2000-01-01 | 2000-12-31 | [2017-06-30,)
         1 |       2 | 2001-01-01 | 2001-12-31 | [2017-06-30,)
         1 |       2 | 2002-01-01 | 2002-12-31 | [2017-06-30,)
         1 |       2 | 2003-01-01 | 2003-12-31 | [2017-06-30,)
         1 |       2 | 2004-01-01 | 2004-12-31 | [2017-06-30,)
         1 |       2 | 2005-01-01 | 2005-12-31 | [2017-06-30,)
         1 |       2 | 2006-01-01 | 2006-12-31 | [2017-06-30,)
         1 |       3 | 2014-01-01 | 2014-12-31 | [2017-06-30,)
         1 |       3 | 2015-01-01 | 2015-12-31 | [2017-06-30,)
         1 |       4 | 2017-06-30 | 2017-12-31 | [2017-06-30,)
         1 |       4 | 2018-01-01 |            | [2017-06-30,)
         2 |       1 | 2014-02-01 | 2015-01-31 | [2017-09-30,2018-02-15)
         2 |       1 | 2015-02-01 | 2015-12-31 | [2017-09-30,2018-02-15)
         2 |       2 | 2017-09-30 | 2018-01-31 | [2017-09-30,2018-02-15)
         2 |       2 | 2018-02-01 | 2018-02-14 | [2017-09-30,2018-02-15)

(17 rows)


-- AGG_DATERANGE

 client_id | episode | start_date |  end_date  |      agg_daterange      
-----------+---------+------------+------------+-------------------------
         1 |       1 | 1990-01-01 | 1990-12-31 | [2017-06-30,)
         1 |       1 | 1991-01-01 | 1991-12-31 | [2017-06-30,)
         1 |       2 | 2000-01-01 | 2000-12-31 | [2017-06-30,)
         1 |       2 | 2001-01-01 | 2001-12-31 | [2017-06-30,)
         1 |       2 | 2002-01-01 | 2002-12-31 | [2017-06-30,)
         1 |       2 | 2003-01-01 | 2003-12-31 | [2017-06-30,)
         1 |       2 | 2004-01-01 | 2004-12-31 | [2017-06-30,)
         1 |       2 | 2005-01-01 | 2005-12-31 | [2017-06-30,)
         1 |       2 | 2006-01-01 | 2006-12-31 | [2017-06-30,)
         1 |       3 | 2014-01-01 | 2014-12-31 | [2017-06-30,)
         1 |       3 | 2015-01-01 | 2015-12-31 | [2017-06-30,)
         1 |       4 | 2017-06-30 | 2017-12-31 | [2017-06-30,)
         1 |       4 | 2018-01-01 |            | [2017-06-30,)
         2 |       1 | 2014-02-01 | 2015-01-31 | [2017-09-30,2018-02-15)
         2 |       1 | 2015-02-01 | 2015-12-31 | [2017-09-30,2018-02-15)
         2 |       2 | 2017-09-30 | 2018-01-31 | [2017-09-30,2018-02-15)
         2 |       2 | 2018-02-01 | 2018-02-14 | [2017-09-30,2018-02-15)
(17 rows)



--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.

pgsql-general by date:

Previous
From: Ken Tanzer
Date:
Subject: Re: Given a set of daterange, finding the continuous range thatincludes a particular date
Next
From: Tom Kazimiers
Date:
Subject: Unexpected behavior with transition tables in update statementtrigger