Thread: Given a set of daterange, finding the continuous range that includesa particular date

Hi, hoping to get some help with this.  I'm needing to take a specific date, a series of dateranges  and, given a specific date, return a single conitinuous daterange that includes that date.

To elaborate a bit, I've got lots of tables that include start and end dates.  For example:

CREATE TABLE tbl_staff_assign (
    staff_assign_id         SERIAL PRIMARY KEY,
    client_id               INTEGER NOT NULL REFERENCES tbl_client (client_id),
    staff_id                INTEGER REFERENCES tbl_staff(staff_id),
    staff_assign_type_code      VARCHAR(10) NOT NULL REFERENCES tbl_l_staff_assign_type (staff_assign_type_code),
    staff_assign_date           DATE NOT NULL,
    staff_assign_date_end       DATE,
...

So a client might leave a progrma and then return later, or they might simply switch to another staff_id.  (In which case one record will have and end date, and the next record will start on the next day.)  In this case I need to know "what period were they continuously in the program that includes X date?"  So I'd like to be able to do something like:

"SELECT staff_assign_date,continuous_daterange( staff_assign_date, (SELECT array_agg(daterange(staff_assign_date,staff_assign_date_end,'[]')
) FROM staff_assign sa2 WHERE sa2.client_id=sa.client_id) FROM staff_assign sa

I've done this before with procedures specific to a particular table, and working with the start and end dates.  I'm now wanting to try to do this once generically that will work for all my cases.  So I'm hoping to do this in a way that performance isn't horrible.  And it's a little unclear to me how much and how I might be able to use the daterange operators to accomplish this efficiently.

Any advice or suggestions or ways to go about this appreciated.  Thanks!

Ken

p.s.,  Another small wrinkle is these records aren't always perfect, and ideally I'd allow for an optional fudge factor that would allow small gaps to be ignored.  I could just add that in every query (start_date+2,end_date-2), but it might be nice to have the function do it, if it didn't badly hurt performance.


--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.
On 02/22/2018 04:44 PM, Ken Tanzer wrote:
> Hi, hoping to get some help with this.  I'm needing to take a specific 
> date, a series of dateranges  and, given a specific date, return a 
> single conitinuous daterange that includes that date.
> 
> To elaborate a bit, I've got lots of tables that include start and end 
> dates.  For example:
> 
> CREATE TABLE tbl_staff_assign (
>      staff_assign_id         SERIAL PRIMARY KEY,
>      client_id               INTEGER NOT NULL REFERENCES tbl_client 
> (client_id),
>      staff_id                INTEGER REFERENCES tbl_staff(staff_id),
>      staff_assign_type_code      VARCHAR(10) NOT NULL REFERENCES 
> tbl_l_staff_assign_type (staff_assign_type_code),
>      staff_assign_date           DATE NOT NULL,
>      staff_assign_date_end       DATE,
> ...
> 
> So a client might leave a progrma and then return later, or they might 
> simply switch to another staff_id.  (In which case one record will have 
> and end date, and the next record will start on the next day.)  In this 
> case I need to know "what period were they continuously in the program 
> that includes X date?"  So I'd like to be able to do something like:
> 
> "SELECT staff_assign_date,continuous_daterange( staff_assign_date, 
> (SELECT array_agg(daterange(staff_assign_date,staff_assign_date_end,'[]')
> ) FROM staff_assign sa2 WHERE sa2.client_id=sa.client_id) FROM 
> staff_assign sa
> 
> I've done this before with procedures specific to a particular table, 
> and working with the start and end dates.  I'm now wanting to try to do 
> this once generically that will work for all my cases.  So I'm hoping to 
> do this in a way that performance isn't horrible.  And it's a little 
> unclear to me how much and how I might be able to use the daterange 
> operators to accomplish this efficiently.

The operator I use to solve similar problems:

https://www.postgresql.org/docs/10/static/functions-range.html

@>     contains element     '[2011-01-01,2011-03-01)'::tsrange @> 
'2011-01-10'::timestamp     t


> 
> Any advice or suggestions or ways to go about this appreciated.  Thanks!
> 
> Ken
> 
> p.s.,  Another small wrinkle is these records aren't always perfect, and 
> ideally I'd allow for an optional fudge factor that would allow small 
> gaps to be ignored.  I could just add that in every query 
> (start_date+2,end_date-2), but it might be nice to have the function do 
> it, if it didn't badly hurt performance.
> 
> 
> -- 
> AGENCY Software
> A Free Software data system
> By and for non-profits
> /http://agency-software.org//
> /https://demo.agency-software.org/client/
> ken.tanzer@agency-software.org <mailto:ken.tanzer@agency-software.org>
> (253) 245-3801
> 
> Subscribe to the mailing list 
> <mailto:agency-general-request@lists.sourceforge.net?body=subscribe> to
> learn more about AGENCY or
> follow the discussion.


-- 
Adrian Klaver
adrian.klaver@aklaver.com




On Thu, Feb 22, 2018 at 4:53 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 02/22/2018 04:44 PM, Ken Tanzer wrote:
Hi, hoping to get some help with this.  I'm needing to take a specific date, a series of dateranges  and, given a specific date, return a single conitinuous daterange that includes that date.

To elaborate a bit, I've got lots of tables that include start and end dates.  For example:

CREATE TABLE tbl_staff_assign (
     staff_assign_id         SERIAL PRIMARY KEY,
     client_id               INTEGER NOT NULL REFERENCES tbl_client (client_id),
     staff_id                INTEGER REFERENCES tbl_staff(staff_id),
     staff_assign_type_code      VARCHAR(10) NOT NULL REFERENCES tbl_l_staff_assign_type (staff_assign_type_code),
     staff_assign_date           DATE NOT NULL,
     staff_assign_date_end       DATE,
...

So a client might leave a progrma and then return later, or they might simply switch to another staff_id.  (In which case one record will have and end date, and the next record will start on the next day.)  In this case I need to know "what period were they continuously in the program that includes X date?"  So I'd like to be able to do something like:

"SELECT staff_assign_date,continuous_daterange( staff_assign_date, (SELECT array_agg(daterange(staff_assign_date,staff_assign_date_end,'[]')
) FROM staff_assign sa2 WHERE sa2.client_id=sa.client_id) FROM staff_assign sa

I've done this before with procedures specific to a particular table, and working with the start and end dates.  I'm now wanting to try to do this once generically that will work for all my cases.  So I'm hoping to do this in a way that performance isn't horrible.  And it's a little unclear to me how much and how I might be able to use the daterange operators to accomplish this efficiently.

The operator I use to solve similar problems:

https://www.postgresql.org/docs/10/static/functions-range.html

@>      contains element        '[2011-01-01,2011-03-01)'::tsrange @> '2011-01-10'::timestamp   t


Thanks Adrian.  But how would you apply that to this situation, where I have a series of (quite possibly discontinuous) dateranges? 


--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.
On 02/22/2018 04:58 PM, Ken Tanzer wrote:
> 
> 
> On Thu, Feb 22, 2018 at 4:53 PM, Adrian Klaver 
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>     On 02/22/2018 04:44 PM, Ken Tanzer wrote:
> 
>         Hi, hoping to get some help with this.  I'm needing to take a
>         specific date, a series of dateranges  and, given a specific
>         date, return a single conitinuous daterange that includes that date.
> 
>         To elaborate a bit, I've got lots of tables that include start
>         and end dates.  For example:
> 
>         CREATE TABLE tbl_staff_assign (
>               staff_assign_id         SERIAL PRIMARY KEY,
>               client_id               INTEGER NOT NULL REFERENCES
>         tbl_client (client_id),
>               staff_id                INTEGER REFERENCES
>         tbl_staff(staff_id),
>               staff_assign_type_code      VARCHAR(10) NOT NULL
>         REFERENCES tbl_l_staff_assign_type (staff_assign_type_code),
>               staff_assign_date           DATE NOT NULL,
>               staff_assign_date_end       DATE,
>         ...
> 
>         So a client might leave a progrma and then return later, or they
>         might simply switch to another staff_id.  (In which case one
>         record will have and end date, and the next record will start on
>         the next day.)  In this case I need to know "what period were
>         they continuously in the program that includes X date?"  So I'd
>         like to be able to do something like:
> 
>         "SELECT staff_assign_date,continuous_daterange(
>         staff_assign_date, (SELECT
>         array_agg(daterange(staff_assign_date,staff_assign_date_end,'[]')
>         ) FROM staff_assign sa2 WHERE sa2.client_id=sa.client_id) FROM
>         staff_assign sa
> 
>         I've done this before with procedures specific to a particular
>         table, and working with the start and end dates.  I'm now
>         wanting to try to do this once generically that will work for
>         all my cases.  So I'm hoping to do this in a way that
>         performance isn't horrible.  And it's a little unclear to me how
>         much and how I might be able to use the daterange operators to
>         accomplish this efficiently.
> 
> 
>     The operator I use to solve similar problems:
> 
>     https://www.postgresql.org/docs/10/static/functions-range.html
>     <https://www.postgresql.org/docs/10/static/functions-range.html>
> 
>     @>      contains element        '[2011-01-01,2011-03-01)'::tsrange
>     @> '2011-01-10'::timestamp   t
> 
> 
> Thanks Adrian.  But how would you apply that to this situation, where I 
> have a series of (quite possibly discontinuous) dateranges?

This is going to depend on a more formal definition of the problem with 
some sample data. Right now I am trying to reconcile "what period were 
they continuously in the program that includes X date?" with "where I 
have a series of (quite possibly discontinuous) dateranges? " Maybe its 
just me,  I can't see how discontinuous can also be continuously.

> 
> 
> -- 
> AGENCY Software
> A Free Software data system
> By and for non-profits
> /http://agency-software.org//
> /https://demo.agency-software.org/client/
> ken.tanzer@agency-software.org <mailto:ken.tanzer@agency-software.org>
> (253) 245-3801
> 
> Subscribe to the mailing list 
> <mailto:agency-general-request@lists.sourceforge.net?body=subscribe> to
> learn more about AGENCY or
> follow the discussion.


-- 
Adrian Klaver
adrian.klaver@aklaver.com




On Thu, Feb 22, 2018 at 5:05 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 02/22/2018 04:58 PM, Ken Tanzer wrote:



On Thu, Feb 22, 2018 at 4:53 PM, Adrian Klaver <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:

    On 02/22/2018 04:44 PM, Ken Tanzer wrote:

        Hi, hoping to get some help with this.  I'm needing to take a
        specific date, a series of dateranges  and, given a specific
        date, return a single conitinuous daterange that includes that date.

        To elaborate a bit, I've got lots of tables that include start
        and end dates.  For example:

        CREATE TABLE tbl_staff_assign (
              staff_assign_id         SERIAL PRIMARY KEY,
              client_id               INTEGER NOT NULL REFERENCES
        tbl_client (client_id),
              staff_id                INTEGER REFERENCES
        tbl_staff(staff_id),
              staff_assign_type_code      VARCHAR(10) NOT NULL
        REFERENCES tbl_l_staff_assign_type (staff_assign_type_code),
              staff_assign_date           DATE NOT NULL,
              staff_assign_date_end       DATE,
        ...

        So a client might leave a progrma and then return later, or they
        might simply switch to another staff_id.  (In which case one
        record will have and end date, and the next record will start on
        the next day.)  In this case I need to know "what period were
        they continuously in the program that includes X date?"  So I'd
        like to be able to do something like:

        "SELECT staff_assign_date,continuous_daterange(
        staff_assign_date, (SELECT
        array_agg(daterange(staff_assign_date,staff_assign_date_end,'[]')
        ) FROM staff_assign sa2 WHERE sa2.client_id=sa.client_id) FROM
        staff_assign sa

        I've done this before with procedures specific to a particular
        table, and working with the start and end dates.  I'm now
        wanting to try to do this once generically that will work for
        all my cases.  So I'm hoping to do this in a way that
        performance isn't horrible.  And it's a little unclear to me how
        much and how I might be able to use the daterange operators to
        accomplish this efficiently.


    The operator I use to solve similar problems:

    https://www.postgresql.org/docs/10/static/functions-range.html
    <https://www.postgresql.org/docs/10/static/functions-range.html>

    @>      contains element        '[2011-01-01,2011-03-01)'::tsrange
    @> '2011-01-10'::timestamp   t


Thanks Adrian.  But how would you apply that to this situation, where I have a series of (quite possibly discontinuous) dateranges?

This is going to depend on a more formal definition of the problem with some sample data. Right now I am trying to reconcile "what period were they continuously in the program that includes X date?" with "where I have a series of (quite possibly discontinuous) dateranges? " Maybe its just me,  I can't see how discontinuous can also be continuously.



Start   End
-- Episode 1
1/1/16  3/30/16
4/1/16  4/30/16
-- Episode 2
1/1/18  1/31/18
2/1/18  NULL

Given 1/15/18, should return 1/1/18-->NULL
Given 1/15/16, should return 1/1/16-->4/30/16

Thinking about this a bit more, I'm wondering if a window function could be used for this?  I've used them a bit, but definitely never wrote one.   Something like 

continuous_daterange(staff_assign_date,daterange(staff_assign_date,staff_assign_date_end,'[[]') OVER (PARTION BY client_id)

But then a window function can only see the rows included in the query, yes?  In which case this would only work if you queried for all the rows for a particular client_id?  I guess in the example I gave I was doing that anyway, so maybe this would be no worse.

Ken
--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.
On 02/22/2018 05:23 PM, Ken Tanzer wrote:
> 
> 
> On Thu, Feb 22, 2018 at 5:05 PM, Adrian Klaver 
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>     On 02/22/2018 04:58 PM, Ken Tanzer wrote:
> 
> 
> 
>         On Thu, Feb 22, 2018 at 4:53 PM, Adrian Klaver
>         <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>         <mailto:adrian.klaver@aklaver.com
>         <mailto:adrian.klaver@aklaver.com>>> wrote:
> 
>              On 02/22/2018 04:44 PM, Ken Tanzer wrote:
> 
>                  Hi, hoping to get some help with this.  I'm needing to
>         take a
>                  specific date, a series of dateranges  and, given a
>         specific
>                  date, return a single conitinuous daterange that
>         includes that date.
> 
>                  To elaborate a bit, I've got lots of tables that
>         include start
>                  and end dates.  For example:
> 
>                  CREATE TABLE tbl_staff_assign (
>                        staff_assign_id         SERIAL PRIMARY KEY,
>                        client_id               INTEGER NOT NULL REFERENCES
>                  tbl_client (client_id),
>                        staff_id                INTEGER REFERENCES
>                  tbl_staff(staff_id),
>                        staff_assign_type_code      VARCHAR(10) NOT NULL
>                  REFERENCES tbl_l_staff_assign_type
>         (staff_assign_type_code),
>                        staff_assign_date           DATE NOT NULL,
>                        staff_assign_date_end       DATE,
>                  ...
> 
>                  So a client might leave a progrma and then return
>         later, or they
>                  might simply switch to another staff_id.  (In which
>         case one
>                  record will have and end date, and the next record will
>         start on
>                  the next day.)  In this case I need to know "what
>         period were
>                  they continuously in the program that includes X date?"
>           So I'd
>                  like to be able to do something like:
> 
>                  "SELECT staff_assign_date,continuous_daterange(
>                  staff_assign_date, (SELECT
>                 
>         array_agg(daterange(staff_assign_date,staff_assign_date_end,'[]')
>                  ) FROM staff_assign sa2 WHERE
>         sa2.client_id=sa.client_id) FROM
>                  staff_assign sa
> 
>                  I've done this before with procedures specific to a
>         particular
>                  table, and working with the start and end dates.  I'm now
>                  wanting to try to do this once generically that will
>         work for
>                  all my cases.  So I'm hoping to do this in a way that
>                  performance isn't horrible.  And it's a little unclear
>         to me how
>                  much and how I might be able to use the daterange
>         operators to
>                  accomplish this efficiently.
> 
> 
>              The operator I use to solve similar problems:
> 
>         https://www.postgresql.org/docs/10/static/functions-range.html
>         <https://www.postgresql.org/docs/10/static/functions-range.html>
>             
>         <https://www.postgresql.org/docs/10/static/functions-range.html
>         <https://www.postgresql.org/docs/10/static/functions-range.html>>
> 
>              @>      contains element       
>         '[2011-01-01,2011-03-01)'::tsrange
>              @> '2011-01-10'::timestamp   t
> 
> 
>         Thanks Adrian.  But how would you apply that to this situation,
>         where I have a series of (quite possibly discontinuous) dateranges?
> 
> 
>     This is going to depend on a more formal definition of the problem
>     with some sample data. Right now I am trying to reconcile "what
>     period were they continuously in the program that includes X date?"
>     with "where I have a series of (quite possibly discontinuous)
>     dateranges? " Maybe its just me,  I can't see how discontinuous can
>     also be continuously.
> 
> 
> 
> *Start   End*
> -- Episode 1
> 1/1/16  3/30/16
> 4/1/16  4/30/16
> -- Episode 2
> 1/1/18  1/31/18
> 2/1/18  NULL
> 
> Given 1/15/18, should return 1/1/18-->NULL

I have similar case. There I use the coalesce to substitute the date the 
report is run for the NULL value, mainly because I have not way of 
predicting the future:)

Is there actually an episode value you can sort by or is that a made up 
value for this example?


> Given 1/15/16, should return 1/1/16-->4/30/16
> 
> Thinking about this a bit more, I'm wondering if a window function could 
> be used for this?  I've used them a bit, but definitely never wrote one. 
>    Something like
> 
> continuous_daterange(staff_assign_date,daterange(staff_assign_date,staff_assign_date_end,'[[]') 
> OVER (PARTION BY client_id)
> 
> But then a window function can only see the rows included in the query, 
> yes?  In which case this would only work if you queried for all the rows 
> for a particular client_id?  I guess in the example I gave I was doing 
> that anyway, so maybe this would be no worse.
> 
> Ken
> -- 
> AGENCY Software
> A Free Software data system
> By and for non-profits
> /http://agency-software.org//
> /https://demo.agency-software.org/client/
> ken.tanzer@agency-software.org <mailto:ken.tanzer@agency-software.org>
> (253) 245-3801
> 
> Subscribe to the mailing list 
> <mailto:agency-general-request@lists.sourceforge.net?body=subscribe> to
> learn more about AGENCY or
> follow the discussion.


-- 
Adrian Klaver
adrian.klaver@aklaver.com


On 02/22/2018 05:23 PM, Ken Tanzer wrote:
> 
> 
> On Thu, Feb 22, 2018 at 5:05 PM, Adrian Klaver 
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>     On 02/22/2018 04:58 PM, Ken Tanzer wrote:
> 
> 
> 
>         On Thu, Feb 22, 2018 at 4:53 PM, Adrian Klaver
>         <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>         <mailto:adrian.klaver@aklaver.com
>         <mailto:adrian.klaver@aklaver.com>>> wrote:
> 
>              On 02/22/2018 04:44 PM, Ken Tanzer wrote:
> 
>                  Hi, hoping to get some help with this.  I'm needing to
>         take a
>                  specific date, a series of dateranges  and, given a
>         specific
>                  date, return a single conitinuous daterange that
>         includes that date.
> 
>                  To elaborate a bit, I've got lots of tables that
>         include start
>                  and end dates.  For example:
> 
>                  CREATE TABLE tbl_staff_assign (
>                        staff_assign_id         SERIAL PRIMARY KEY,
>                        client_id               INTEGER NOT NULL REFERENCES
>                  tbl_client (client_id),
>                        staff_id                INTEGER REFERENCES
>                  tbl_staff(staff_id),
>                        staff_assign_type_code      VARCHAR(10) NOT NULL
>                  REFERENCES tbl_l_staff_assign_type
>         (staff_assign_type_code),
>                        staff_assign_date           DATE NOT NULL,
>                        staff_assign_date_end       DATE,
>                  ...
> 
>                  So a client might leave a progrma and then return
>         later, or they
>                  might simply switch to another staff_id.  (In which
>         case one
>                  record will have and end date, and the next record will
>         start on
>                  the next day.)  In this case I need to know "what
>         period were
>                  they continuously in the program that includes X date?"
>           So I'd
>                  like to be able to do something like:
> 
>                  "SELECT staff_assign_date,continuous_daterange(
>                  staff_assign_date, (SELECT
>                 
>         array_agg(daterange(staff_assign_date,staff_assign_date_end,'[]')
>                  ) FROM staff_assign sa2 WHERE
>         sa2.client_id=sa.client_id) FROM
>                  staff_assign sa
> 
>                  I've done this before with procedures specific to a
>         particular
>                  table, and working with the start and end dates.  I'm now
>                  wanting to try to do this once generically that will
>         work for
>                  all my cases.  So I'm hoping to do this in a way that
>                  performance isn't horrible.  And it's a little unclear
>         to me how
>                  much and how I might be able to use the daterange
>         operators to
>                  accomplish this efficiently.
> 
> 
>              The operator I use to solve similar problems:
> 
>         https://www.postgresql.org/docs/10/static/functions-range.html
>         <https://www.postgresql.org/docs/10/static/functions-range.html>
>             
>         <https://www.postgresql.org/docs/10/static/functions-range.html
>         <https://www.postgresql.org/docs/10/static/functions-range.html>>
> 
>              @>      contains element       
>         '[2011-01-01,2011-03-01)'::tsrange
>              @> '2011-01-10'::timestamp   t
> 
> 
>         Thanks Adrian.  But how would you apply that to this situation,
>         where I have a series of (quite possibly discontinuous) dateranges?
> 
> 
>     This is going to depend on a more formal definition of the problem
>     with some sample data. Right now I am trying to reconcile "what
>     period were they continuously in the program that includes X date?"
>     with "where I have a series of (quite possibly discontinuous)
>     dateranges? " Maybe its just me,  I can't see how discontinuous can
>     also be continuously.
> 
> 
> 
> *Start   End*
> -- Episode 1
> 1/1/16  3/30/16
> 4/1/16  4/30/16
> -- Episode 2
> 1/1/18  1/31/18
> 2/1/18  NULL
> 
> Given 1/15/18, should return 1/1/18-->NULL
> Given 1/15/16, should return 1/1/16-->4/30/16

Just thinking out loud here, in a function:

1) For a client_id you can find the min(staff_assign_date).

2) You can create a max(staff_assign_date_end) by using 
COALESCE(staff_assign_date_end, current_date)

3) You now have the outer range for the episodes.

4) In that range of dates for each staff_assign_date you find the 
staff_assign_date_end that greater then the staff_assign_date but less 
then the next staff_assign_date. That will give you your episodes.

5) You can then test to see if the X date is in the calculated episodes.

6) As a shortcut you can find min(staff_assign_date) that is less then 
the X date and see if the staff_assign_date_end is greater then the X date.

> 
> Thinking about this a bit more, I'm wondering if a window function could 
> be used for this?  I've used them a bit, but definitely never wrote one. 
>    Something like
> 
> continuous_daterange(staff_assign_date,daterange(staff_assign_date,staff_assign_date_end,'[[]') 
> OVER (PARTION BY client_id)
> 
> But then a window function can only see the rows included in the query, 
> yes?  In which case this would only work if you queried for all the rows 
> for a particular client_id?  I guess in the example I gave I was doing 
> that anyway, so maybe this would be no worse.
> 
> Ken
> -- 
> AGENCY Software
> A Free Software data system
> By and for non-profits
> /http://agency-software.org//
> /https://demo.agency-software.org/client/
> ken.tanzer@agency-software.org <mailto:ken.tanzer@agency-software.org>
> (253) 245-3801
> 
> Subscribe to the mailing list 
> <mailto:agency-general-request@lists.sourceforge.net?body=subscribe> to
> learn more about AGENCY or
> follow the discussion.


-- 
Adrian Klaver
adrian.klaver@aklaver.com


On 02/22/2018 09:05 PM, Adrian Klaver wrote:
> On 02/22/2018 05:23 PM, Ken Tanzer wrote:
>>
>>
>> On Thu, Feb 22, 2018 at 5:05 PM, Adrian Klaver 
>> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
>>
>>     On 02/22/2018 04:58 PM, Ken Tanzer wrote:
>>
>>
>>
>>         On Thu, Feb 22, 2018 at 4:53 PM, Adrian Klaver
>>         <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>>         <mailto:adrian.klaver@aklaver.com
>>         <mailto:adrian.klaver@aklaver.com>>> wrote:
>>
>>              On 02/22/2018 04:44 PM, Ken Tanzer wrote:
>>
>>                  Hi, hoping to get some help with this.  I'm needing to
>>         take a
>>                  specific date, a series of dateranges  and, given a
>>         specific
>>                  date, return a single conitinuous daterange that
>>         includes that date.
>>
>>                  To elaborate a bit, I've got lots of tables that
>>         include start
>>                  and end dates.  For example:
>>
>>                  CREATE TABLE tbl_staff_assign (
>>                        staff_assign_id         SERIAL PRIMARY KEY,
>>                        client_id               INTEGER NOT NULL 
>> REFERENCES
>>                  tbl_client (client_id),
>>                        staff_id                INTEGER REFERENCES
>>                  tbl_staff(staff_id),
>>                        staff_assign_type_code      VARCHAR(10) NOT NULL
>>                  REFERENCES tbl_l_staff_assign_type
>>         (staff_assign_type_code),
>>                        staff_assign_date           DATE NOT NULL,
>>                        staff_assign_date_end       DATE,
>>                  ...
>>
>>                  So a client might leave a progrma and then return
>>         later, or they
>>                  might simply switch to another staff_id.  (In which
>>         case one
>>                  record will have and end date, and the next record will
>>         start on
>>                  the next day.)  In this case I need to know "what
>>         period were
>>                  they continuously in the program that includes X date?"
>>           So I'd
>>                  like to be able to do something like:
>>
>>                  "SELECT staff_assign_date,continuous_daterange(
>>                  staff_assign_date, (SELECT
>>         array_agg(daterange(staff_assign_date,staff_assign_date_end,'[]')
>>                  ) FROM staff_assign sa2 WHERE
>>         sa2.client_id=sa.client_id) FROM
>>                  staff_assign sa
>>
>>                  I've done this before with procedures specific to a
>>         particular
>>                  table, and working with the start and end dates.  I'm 
>> now
>>                  wanting to try to do this once generically that will
>>         work for
>>                  all my cases.  So I'm hoping to do this in a way that
>>                  performance isn't horrible.  And it's a little unclear
>>         to me how
>>                  much and how I might be able to use the daterange
>>         operators to
>>                  accomplish this efficiently.
>>
>>
>>              The operator I use to solve similar problems:
>>
>>         https://www.postgresql.org/docs/10/static/functions-range.html
>>         <https://www.postgresql.org/docs/10/static/functions-range.html>
>>         <https://www.postgresql.org/docs/10/static/functions-range.html
>>         <https://www.postgresql.org/docs/10/static/functions-range.html>>
>>
>>              @>      contains element         
>> '[2011-01-01,2011-03-01)'::tsrange
>>              @> '2011-01-10'::timestamp   t
>>
>>
>>         Thanks Adrian.  But how would you apply that to this situation,
>>         where I have a series of (quite possibly discontinuous) 
>> dateranges?
>>
>>
>>     This is going to depend on a more formal definition of the problem
>>     with some sample data. Right now I am trying to reconcile "what
>>     period were they continuously in the program that includes X date?"
>>     with "where I have a series of (quite possibly discontinuous)
>>     dateranges? " Maybe its just me,  I can't see how discontinuous can
>>     also be continuously.
>>
>>
>>
>> *Start   End*
>> -- Episode 1
>> 1/1/16  3/30/16
>> 4/1/16  4/30/16
>> -- Episode 2
>> 1/1/18  1/31/18
>> 2/1/18  NULL
>>
>> Given 1/15/18, should return 1/1/18-->NULL
>> Given 1/15/16, should return 1/1/16-->4/30/16
> 
> Just thinking out loud here, in a function:
> 
> 1) For a client_id you can find the min(staff_assign_date).
> 
> 2) You can create a max(staff_assign_date_end) by using 
> COALESCE(staff_assign_date_end, current_date)
> 
> 3) You now have the outer range for the episodes.
> 
> 4) In that range of dates for each staff_assign_date you find the 
> staff_assign_date_end that greater then the staff_assign_date but less 
> then the next staff_assign_date. That will give you your episodes.
> 
> 5) You can then test to see if the X date is in the calculated episodes.
> 
> 6) As a shortcut you can find min(staff_assign_date) that is less then 
> the X date and see if the staff_assign_date_end is greater then the X date.
> 

Correction to 6)

6) As a shortcut you can find max(staff_assign_date) that is less then 
the X date and see if the staff_assign_date_end is greater then the X date.


-- 
Adrian Klaver
adrian.klaver@aklaver.com


On 02/22/2018 04:44 PM, Ken Tanzer wrote:
> Hi, hoping to get some help with this.  I'm needing to take a specific 
> date, a series of dateranges  and, given a specific date, return a 
> single conitinuous daterange that includes that date.

The part about joining multiple touching dateranges to give a single 
continuous daterange is what Richard Snodgrass calls "coalescing" in 
*Developing Time-Oriented Database Applications in SQL*, pages 159 - 
169, available printed or as a free PDF at 
http://www2.cs.arizona.edu/~rts/publications.html (His approach also 
supports overlapping ranges, but it sounds like you don't need that.) If 
you had a coalesced view (or maybe a set-returning function), you could 
do this:

SELECT  term
FROM    coalesced_staff_assign
WHERE   client_id = 5
AND     term @> '2018-15-01'
;

I can't think of any way to avoid scanning all of a given client's 
records, but hopefully client_id alone would be selective enough to 
still give you good performance.

Oh also: in reading Snodgrass's SQL, note that he assumes closed-open 
ranges (i.e. '[)'), so you'll need to adjust some things to fit with 
your closed-closed ranges (or always use `staff_assign_date_end - 
INTERVAL '1 day'` if every assignment is at least 1 day long). On the 
other hand with built-in range types you might be able to simplify his 
pure-SQL solutions.

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com


On Thu, 2018-02-22 at 17:23 -0800, Ken Tanzer wrote:
> 
> 
> On Thu, Feb 22, 2018 at 5:05 PM, Adrian Klaver
> <adrian.klaver@aklaver.com> wrote:
>         On 02/22/2018 04:58 PM, Ken Tanzer wrote:
>         
>                 
>                 
>                 On Thu, Feb 22, 2018 at 4:53 PM, Adrian Klaver
>                 <adrian.klaver@aklaver.com
>                 <mailto:adrian.klaver@aklaver.com>> wrote:
>                 
>                     On 02/22/2018 04:44 PM, Ken Tanzer wrote:
>                 
>                         Hi, hoping to get some help with this.  I'm
>                 needing to take a
>                         specific date, a series of dateranges  and,
>                 given a specific
>                         date, return a single conitinuous daterange
>                 that includes that date.
>                 
>                         To elaborate a bit, I've got lots of tables
>                 that include start
>                         and end dates.  For example:
>                 
>                         CREATE TABLE tbl_staff_assign (
>                               staff_assign_id         SERIAL PRIMARY
>                 KEY,
>                               client_id               INTEGER NOT NULL
>                 REFERENCES
>                         tbl_client (client_id),
>                               staff_id                INTEGER
>                 REFERENCES
>                         tbl_staff(staff_id),
>                               staff_assign_type_code      VARCHAR(10)
>                 NOT NULL
>                         REFERENCES tbl_l_staff_assign_type
>                 (staff_assign_type_code),
>                               staff_assign_date           DATE NOT
>                 NULL,
>                               staff_assign_date_end       DATE,
>                         ...
>                 
>                         So a client might leave a progrma and then
>                 return later, or they
>                         might simply switch to another staff_id.  (In
>                 which case one
>                         record will have and end date, and the next
>                 record will start on
>                         the next day.)  In this case I need to know
>                 "what period were
>                         they continuously in the program that includes
>                 X date?"  So I'd
>                         like to be able to do something like:
>                 
>                         "SELECT
>                 staff_assign_date,continuous_daterange(
>                         staff_assign_date, (SELECT
>                 
>                 array_agg(daterange(staff_assign_date,staff_assign_date_end,'[]')
>                         ) FROM staff_assign sa2 WHERE
>                 sa2.client_id=sa.client_id) FROM
>                         staff_assign sa
>                 
>                         I've done this before with procedures specific
>                 to a particular
>                         table, and working with the start and end
>                 dates.  I'm now
>                         wanting to try to do this once generically
>                 that will work for
>                         all my cases.  So I'm hoping to do this in a
>                 way that
>                         performance isn't horrible.  And it's a little
>                 unclear to me how
>                         much and how I might be able to use the
>                 daterange operators to
>                         accomplish this efficiently.
>                 
>                 
>                     The operator I use to solve similar problems:
>                 
>                 
>                 https://www.postgresql.org/docs/10/static/functions-range.html
>                 
>                 <https://www.postgresql.org/docs/10/static/functions-range.html>
>                 
>                     @>      contains element
>                 '[2011-01-01,2011-03-01)'::tsrange
>                     @> '2011-01-10'::timestamp   t
>                 
>                 
>                 Thanks Adrian.  But how would you apply that to this
>                 situation, where I have a series of (quite possibly
>                 discontinuous) dateranges?
>         
>         
>         This is going to depend on a more formal definition of the
>         problem with some sample data. Right now I am trying to
>         reconcile "what period were they continuously in the program
>         that includes X date?" with "where I have a series of (quite
>         possibly discontinuous) dateranges? " Maybe its just me,  I
>         can't see how discontinuous can also be continuously.
>         
>                 
> 
> 
> Start   End
> -- Episode 1
> 1/1/16  3/30/16
> 4/1/16  4/30/16
> -- Episode 2
> 
> 1/1/18  1/31/18
> 2/1/18  NULL
> 
> 
> Given 1/15/18, should return 1/1/18-->NULL
> Given 1/15/16, should return 1/1/16-->4/30/16
> 


i guess, you can easily get max continuous range for each row with
something like this:

CREATE OR REPLACE FUNCTION
        append_daterange(d1 daterange, d2 daterange)
        RETURNS daterange
        LANGUAGE sql
        AS
$$
        SELECT CASE WHEN d1 && d2 OR d1 -|- d2 THEN d1 + d2 ELSE d2 END;
$$;

CREATE AGGREGATE agg_daterange (daterange) (
        sfunc = append_daterange,
        stype = daterange
);

SELECT dr,
       lower(agg_daterange(dr) OVER (ORDER BY dr ASC)),
       upper(agg_daterange(dr) OVER (ORDER BY dr DESC))
  FROM ...

above example is simplified to selecting only daterange column "dr" for
readability, which in your case should be something like

daterange(staff_assign_date,staff_assign_date_end,'[)')

please note that daterange would be normalized to [) format so upper()
above returns exactly your max "staff_assign_date_end" for each
continuous range when dateranges are created with '[)' format.

the key point is ... ELSE d2 in append_daterange() which starts with new
value each time that new value is discontinuous with agg's state value
and order in which rows are processed (ASC for lower of daterange, DESC
for upper of daterange).

unfortunately this involves reading all rows for "client_id" and
additional sorting for each window.
i recall reading that you already pay the price of reading all rows for
client_id anyway, so the only question is the cost of two additional
sorts (maybe extracting dateranges to subset on which to do windows and
rejoining result of continuous ranges to original set would help to
lower the cost).

another way would be recursive cte, easy to append continuous dateranges
one by one from a set of dateranges staring with some date of choice,
but i doubt it would be cheaper, not to mention additional
postprocessing to get what you really want. and definitely would be less
readable.


i guess, from that point where for each row you have start and end date
of max continuous range including given row, you can easily format or
filter whatever you want


> Thinking about this a bit more, I'm wondering if a window function
> could be used for this?  I've used them a bit, but definitely never
> wrote one.   Something like 
> 
> 
> continuous_daterange(staff_assign_date,daterange(staff_assign_date,staff_assign_date_end,'[[]') OVER (PARTION BY
client_id)
> 
> 
> But then a window function can only see the rows included in the
> query, yes?  In which case this would only work if you queried for all
> the rows for a particular client_id?  I guess in the example I gave I
> was doing that anyway, so maybe this would be no worse.
> 
> 
> Ken
> -- 
> 
> AGENCY Software  
> A Free Software data system
> By and for non-profits
> http://agency-software.org/
> https://demo.agency-software.org/client
> 
> ken.tanzer@agency-software.org
> (253) 245-3801
> 
> 
> Subscribe to the mailing list to
> learn more about AGENCY or
> follow the discussion.





Hi.

This answer is perhaps useful if I understand your problem correctly. But I might have interpreted it wrongly. :-)


I would probably start with merging intervals so that overlapping and adjacent intervals become single continuous intervals, then select from those merged intervals.

We have an application with a lot of interval handling in PostgreSQL, and we use many variants of algorithms based on one by Itzik Ben Gan which he calls “packing intervals”. The post we started with was an old reader’s challenge from SQL Magazine. It has since been updated when MS-SQL started supporting window functions better (Itzik Ben Gan is a MS-SQL-guy).

Basically, it is a few CTE:s which convert the intervals into “start” (+1) and “stop” (-1) events, then keeps a running sum of these, and finally creates new intervals by noting that the merged (or “packed”) intervals starts with events that had sum=0 before them and stops with events which have sum=0 after them.

It involves both CTE:s and window functions and is quite a beautiful example of SQL, IMHO.

I think it’s best to google it, he does a good job of explaining how it works.

Hope that helps a bit at least.

/Viktor

On 23 Feb 2018, at 01:44, Ken Tanzer <ken.tanzer@gmail.com> wrote:

Hi, hoping to get some help with this.  I'm needing to take a specific date, a series of dateranges  and, given a specific date, return a single conitinuous daterange that includes that date.

To elaborate a bit, I've got lots of tables that include start and end dates.  For example:

CREATE TABLE tbl_staff_assign (
    staff_assign_id         SERIAL PRIMARY KEY,
    client_id               INTEGER NOT NULL REFERENCES tbl_client (client_id),
    staff_id                INTEGER REFERENCES tbl_staff(staff_id),
    staff_assign_type_code      VARCHAR(10) NOT NULL REFERENCES tbl_l_staff_assign_type (staff_assign_type_code),
    staff_assign_date           DATE NOT NULL,
    staff_assign_date_end       DATE,
...

So a client might leave a progrma and then return later, or they might simply switch to another staff_id.  (In which case one record will have and end date, and the next record will start on the next day.)  In this case I need to know "what period were they continuously in the program that includes X date?"  So I'd like to be able to do something like:

"SELECT staff_assign_date,continuous_daterange( staff_assign_date, (SELECT array_agg(daterange(staff_assign_date,staff_assign_date_end,'[]')
) FROM staff_assign sa2 WHERE sa2.client_id=sa.client_id) FROM staff_assign sa

I've done this before with procedures specific to a particular table, and working with the start and end dates.  I'm now wanting to try to do this once generically that will work for all my cases.  So I'm hoping to do this in a way that performance isn't horrible.  And it's a little unclear to me how much and how I might be able to use the daterange operators to accomplish this efficiently.

Any advice or suggestions or ways to go about this appreciated.  Thanks!

Ken

p.s.,  Another small wrinkle is these records aren't always perfect, and ideally I'd allow for an optional fudge factor that would allow small gaps to be ignored.  I could just add that in every query (start_date+2,end_date-2), but it might be nice to have the function do it, if it didn't badly hurt performance.


--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.



On Fri, Feb 23, 2018 at 6:42 AM, Viktor Fougstedt <viktor@chalmers.se> wrote:

Hi.

This answer is perhaps useful if I understand your problem correctly. But I might have interpreted it wrongly. :-)


I would probably start with merging intervals so that overlapping and adjacent intervals become single continuous intervals, then select from those merged intervals.

We have an application with a lot of interval handling in PostgreSQL, and we use many variants of algorithms based on one by Itzik Ben Gan which he calls “packing intervals”. The post we started with was an old reader’s challenge from SQL Magazine. It has since been updated when MS-SQL started supporting window functions better (Itzik Ben Gan is a MS-SQL-guy).

Basically, it is a few CTE:s which convert the intervals into “start” (+1) and “stop” (-1) events, then keeps a running sum of these, and finally creates new intervals by noting that the merged (or “packed”) intervals starts with events that had sum=0 before them and stops with events which have sum=0 after them.

It involves both CTE:s and window functions and is quite a beautiful example of SQL, IMHO.

I think it’s best to google it, he does a good job of explaining how it works.

Hope that helps a bit at least.

/Viktor

Hi.  Thanks for the many helpful responses!  Although I didn't end up with exactly what I was looking for, I think I got to something that works good enough for now.  I did it with CTEs, in a way I think similar to what you were suggesting (my printer is out of ink, so I didn't actually get to look at that book yet!)  I ended up having to do 4 passes:

1) Identify the starts and ends of continuous ranges
2) Eliminate the middle-point records (non stops/starts)
3) Merge the stop date in with the starts
4) Eliminate the stops

I couldn't see how to do it in less steps, but if there's a way...

I posted the query along with some sample data below.

Thanks again!

Ken

BEGIN;

CREATE TEMP TABLE sample_data (
client_id INTEGER,
start_date DATE,
end_date DATE,
episode INTEGER -- I won't actually have this info; just a label I added for clarity and testing
);

INSERT INTO sample_data VALUES


(1,'1990-01-01','1990-12-31',0),
(1,'1991-01-01','1991-12-31',0),
(1,'1995-01-01','1995-06-30',1),
(1,'2000-01-01','2000-12-31',2),
(1,'2001-01-01','2001-12-31',2),
(1,'2002-01-01','2002-12-31',2),
(1,'2003-01-01','2003-12-31',2),
(1,'2004-01-01','2004-12-31',2),
(1,'2005-01-01','2005-12-31',2),
(1,'2006-01-01','2006-12-31',2),
(1,'2014-01-01','2014-12-31',3),
(1,'2015-01-01','2015-12-31',3),
(1,'2017-06-30','2017-12-31',4),
(1,'2018-01-01',NULL,4),

(2,'2014-02-01','2015-01-31',0),
(2,'2015-02-01','2015-12-31',0),
(2,'2017-09-30','2018-01-31',1),
(2,'2018-02-01','2018-02-14',1)
;

WITH peek3 AS (
WITH peek2 AS (
WITH peek AS (
SELECT
  client_id,
  episode,
  daterange(start_date,end_date,'[]') AS range,
  COALESCE(daterange(start_date,end_date+1,'[]') &&
  lead(daterange(start_date,end_date,'[]')) OVER (PARTITION BY client_id ORDER BY start_date),false) AS continues,
  COALESCE(daterange(start_date,end_date,'[]') &&
  lag(daterange(start_date,end_date+1,'[]')) OVER (PARTITION BY client_id ORDER BY start_date),false) AS is_continued
FROM
  sample_data
)
SELECT
  *
FROM peek
  WHERE NOT (is_continued AND continues)
)
SELECT client_id,episode,range,
daterange(lower(range),
  CASE WHEN lead(is_continued) OVER (PARTITION BY client_id ORDER BY range) THEN
  lead(upper(range)) OVER (PARTITION BY client_id ORDER BY range)
  ELSE upper(range)
  END) AS full_range
,continues,is_continued
FROM peek2
)
SELECT * FROM peek3
WHERE NOT is_continued
;



--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.