Thread: Given a set of daterange, finding the continuous range that includesa particular date
Given a set of daterange, finding the continuous range that includesa particular date
From
Ken Tanzer
Date:
Hi, hoping to get some help with this. I'm needing to take a specific date, a series of dateranges and, given a specific date, return a single conitinuous daterange that includes that date.
--
To elaborate a bit, I've got lots of tables that include start and end dates. For example:
CREATE TABLE tbl_staff_assign (
staff_assign_id SERIAL PRIMARY KEY,
client_id INTEGER NOT NULL REFERENCES tbl_client (client_id),
staff_id INTEGER REFERENCES tbl_staff(staff_id),
staff_assign_type_code VARCHAR(10) NOT NULL REFERENCES tbl_l_staff_assign_type (staff_assign_type_code),
staff_assign_date DATE NOT NULL,
staff_assign_date_end DATE,
...
So a client might leave a progrma and then return later, or they might simply switch to another staff_id. (In which case one record will have and end date, and the next record will start on the next day.) In this case I need to know "what period were they continuously in the program that includes X date?" So I'd like to be able to do something like:
"SELECT staff_assign_date,continuous_daterange( staff_assign_date, (SELECT array_agg(daterange(staff_assign_date,staff_assign_date_end,'[]')
) FROM staff_assign sa2 WHERE sa2.client_id=sa.client_id) FROM staff_assign sa
I've done this before with procedures specific to a particular table, and working with the start and end dates. I'm now wanting to try to do this once generically that will work for all my cases. So I'm hoping to do this in a way that performance isn't horrible. And it's a little unclear to me how much and how I might be able to use the daterange operators to accomplish this efficiently.
Any advice or suggestions or ways to go about this appreciated. Thanks!
Ken
p.s., Another small wrinkle is these records aren't always perfect, and ideally I'd allow for an optional fudge factor that would allow small gaps to be ignored. I could just add that in every query (start_date+2,end_date-2), but it might be nice to have the function do it, if it didn't badly hurt performance.
AGENCY Software
A Free Software data system
By and for non-profits
(253) 245-3801
learn more about AGENCY or
follow the discussion.
Re: Given a set of daterange, finding the continuous range thatincludes a particular date
From
Adrian Klaver
Date:
On 02/22/2018 04:44 PM, Ken Tanzer wrote: > Hi, hoping to get some help with this. I'm needing to take a specific > date, a series of dateranges and, given a specific date, return a > single conitinuous daterange that includes that date. > > To elaborate a bit, I've got lots of tables that include start and end > dates. For example: > > CREATE TABLE tbl_staff_assign ( > staff_assign_id SERIAL PRIMARY KEY, > client_id INTEGER NOT NULL REFERENCES tbl_client > (client_id), > staff_id INTEGER REFERENCES tbl_staff(staff_id), > staff_assign_type_code VARCHAR(10) NOT NULL REFERENCES > tbl_l_staff_assign_type (staff_assign_type_code), > staff_assign_date DATE NOT NULL, > staff_assign_date_end DATE, > ... > > So a client might leave a progrma and then return later, or they might > simply switch to another staff_id. (In which case one record will have > and end date, and the next record will start on the next day.) In this > case I need to know "what period were they continuously in the program > that includes X date?" So I'd like to be able to do something like: > > "SELECT staff_assign_date,continuous_daterange( staff_assign_date, > (SELECT array_agg(daterange(staff_assign_date,staff_assign_date_end,'[]') > ) FROM staff_assign sa2 WHERE sa2.client_id=sa.client_id) FROM > staff_assign sa > > I've done this before with procedures specific to a particular table, > and working with the start and end dates. I'm now wanting to try to do > this once generically that will work for all my cases. So I'm hoping to > do this in a way that performance isn't horrible. And it's a little > unclear to me how much and how I might be able to use the daterange > operators to accomplish this efficiently. The operator I use to solve similar problems: https://www.postgresql.org/docs/10/static/functions-range.html @> contains element '[2011-01-01,2011-03-01)'::tsrange @> '2011-01-10'::timestamp t > > Any advice or suggestions or ways to go about this appreciated. Thanks! > > Ken > > p.s., Another small wrinkle is these records aren't always perfect, and > ideally I'd allow for an optional fudge factor that would allow small > gaps to be ignored. I could just add that in every query > (start_date+2,end_date-2), but it might be nice to have the function do > it, if it didn't badly hurt performance. > > > -- > AGENCY Software > A Free Software data system > By and for non-profits > /http://agency-software.org// > /https://demo.agency-software.org/client/ > ken.tanzer@agency-software.org <mailto:ken.tanzer@agency-software.org> > (253) 245-3801 > > Subscribe to the mailing list > <mailto:agency-general-request@lists.sourceforge.net?body=subscribe> to > learn more about AGENCY or > follow the discussion. -- Adrian Klaver adrian.klaver@aklaver.com
Re: Given a set of daterange, finding the continuous range thatincludes a particular date
From
Ken Tanzer
Date:
On Thu, Feb 22, 2018 at 4:53 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 02/22/2018 04:44 PM, Ken Tanzer wrote:Hi, hoping to get some help with this. I'm needing to take a specific date, a series of dateranges and, given a specific date, return a single conitinuous daterange that includes that date.
To elaborate a bit, I've got lots of tables that include start and end dates. For example:
CREATE TABLE tbl_staff_assign (
staff_assign_id SERIAL PRIMARY KEY,
client_id INTEGER NOT NULL REFERENCES tbl_client (client_id),
staff_id INTEGER REFERENCES tbl_staff(staff_id),
staff_assign_type_code VARCHAR(10) NOT NULL REFERENCES tbl_l_staff_assign_type (staff_assign_type_code),
staff_assign_date DATE NOT NULL,
staff_assign_date_end DATE,
...
So a client might leave a progrma and then return later, or they might simply switch to another staff_id. (In which case one record will have and end date, and the next record will start on the next day.) In this case I need to know "what period were they continuously in the program that includes X date?" So I'd like to be able to do something like:
"SELECT staff_assign_date,continuous_daterange( staff_assign_date, (SELECT array_agg(daterange(staff_assi gn_date,staff_assign_date_end, '[]')
) FROM staff_assign sa2 WHERE sa2.client_id=sa.client_id) FROM staff_assign sa
I've done this before with procedures specific to a particular table, and working with the start and end dates. I'm now wanting to try to do this once generically that will work for all my cases. So I'm hoping to do this in a way that performance isn't horrible. And it's a little unclear to me how much and how I might be able to use the daterange operators to accomplish this efficiently.
The operator I use to solve similar problems:
https://www.postgresql.org/docs/10/static/functions-range. html
@> contains element '[2011-01-01,2011-03-01)'::tsrange @> '2011-01-10'::timestamp t
Thanks Adrian. But how would you apply that to this situation, where I have a series of (quite possibly discontinuous) dateranges?
AGENCY Software
A Free Software data system
By and for non-profits
(253) 245-3801
learn more about AGENCY or
follow the discussion.
Re: Given a set of daterange, finding the continuous range thatincludes a particular date
From
Adrian Klaver
Date:
On 02/22/2018 04:58 PM, Ken Tanzer wrote: > > > On Thu, Feb 22, 2018 at 4:53 PM, Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: > > On 02/22/2018 04:44 PM, Ken Tanzer wrote: > > Hi, hoping to get some help with this. I'm needing to take a > specific date, a series of dateranges and, given a specific > date, return a single conitinuous daterange that includes that date. > > To elaborate a bit, I've got lots of tables that include start > and end dates. For example: > > CREATE TABLE tbl_staff_assign ( > staff_assign_id SERIAL PRIMARY KEY, > client_id INTEGER NOT NULL REFERENCES > tbl_client (client_id), > staff_id INTEGER REFERENCES > tbl_staff(staff_id), > staff_assign_type_code VARCHAR(10) NOT NULL > REFERENCES tbl_l_staff_assign_type (staff_assign_type_code), > staff_assign_date DATE NOT NULL, > staff_assign_date_end DATE, > ... > > So a client might leave a progrma and then return later, or they > might simply switch to another staff_id. (In which case one > record will have and end date, and the next record will start on > the next day.) In this case I need to know "what period were > they continuously in the program that includes X date?" So I'd > like to be able to do something like: > > "SELECT staff_assign_date,continuous_daterange( > staff_assign_date, (SELECT > array_agg(daterange(staff_assign_date,staff_assign_date_end,'[]') > ) FROM staff_assign sa2 WHERE sa2.client_id=sa.client_id) FROM > staff_assign sa > > I've done this before with procedures specific to a particular > table, and working with the start and end dates. I'm now > wanting to try to do this once generically that will work for > all my cases. So I'm hoping to do this in a way that > performance isn't horrible. And it's a little unclear to me how > much and how I might be able to use the daterange operators to > accomplish this efficiently. > > > The operator I use to solve similar problems: > > https://www.postgresql.org/docs/10/static/functions-range.html > <https://www.postgresql.org/docs/10/static/functions-range.html> > > @> contains element '[2011-01-01,2011-03-01)'::tsrange > @> '2011-01-10'::timestamp t > > > Thanks Adrian. But how would you apply that to this situation, where I > have a series of (quite possibly discontinuous) dateranges? This is going to depend on a more formal definition of the problem with some sample data. Right now I am trying to reconcile "what period were they continuously in the program that includes X date?" with "where I have a series of (quite possibly discontinuous) dateranges? " Maybe its just me, I can't see how discontinuous can also be continuously. > > > -- > AGENCY Software > A Free Software data system > By and for non-profits > /http://agency-software.org// > /https://demo.agency-software.org/client/ > ken.tanzer@agency-software.org <mailto:ken.tanzer@agency-software.org> > (253) 245-3801 > > Subscribe to the mailing list > <mailto:agency-general-request@lists.sourceforge.net?body=subscribe> to > learn more about AGENCY or > follow the discussion. -- Adrian Klaver adrian.klaver@aklaver.com
Re: Given a set of daterange, finding the continuous range thatincludes a particular date
From
Ken Tanzer
Date:
On Thu, Feb 22, 2018 at 5:05 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
-- On 02/22/2018 04:58 PM, Ken Tanzer wrote:This is going to depend on a more formal definition of the problem with some sample data. Right now I am trying to reconcile "what period were they continuously in the program that includes X date?" with "where I have a series of (quite possibly discontinuous) dateranges? " Maybe its just me, I can't see how discontinuous can also be continuously.
On Thu, Feb 22, 2018 at 4:53 PM, Adrian Klaver <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
On 02/22/2018 04:44 PM, Ken Tanzer wrote:
Hi, hoping to get some help with this. I'm needing to take a
specific date, a series of dateranges and, given a specific
date, return a single conitinuous daterange that includes that date.
To elaborate a bit, I've got lots of tables that include start
and end dates. For example:
CREATE TABLE tbl_staff_assign (
staff_assign_id SERIAL PRIMARY KEY,
client_id INTEGER NOT NULL REFERENCES
tbl_client (client_id),
staff_id INTEGER REFERENCES
tbl_staff(staff_id),
staff_assign_type_code VARCHAR(10) NOT NULL
REFERENCES tbl_l_staff_assign_type (staff_assign_type_code),
staff_assign_date DATE NOT NULL,
staff_assign_date_end DATE,
...
So a client might leave a progrma and then return later, or they
might simply switch to another staff_id. (In which case one
record will have and end date, and the next record will start on
the next day.) In this case I need to know "what period were
they continuously in the program that includes X date?" So I'd
like to be able to do something like:
"SELECT staff_assign_date,continuous_daterange(
staff_assign_date, (SELECT
array_agg(daterange(staff_assign_date,staff_assign_date_end, '[]')
) FROM staff_assign sa2 WHERE sa2.client_id=sa.client_id) FROM
staff_assign sa
I've done this before with procedures specific to a particular
table, and working with the start and end dates. I'm now
wanting to try to do this once generically that will work for
all my cases. So I'm hoping to do this in a way that
performance isn't horrible. And it's a little unclear to me how
much and how I might be able to use the daterange operators to
accomplish this efficiently.
The operator I use to solve similar problems:
https://www.postgresql.org/docs/10/static/functions-range. html
<https://www.postgresql.org/docs/10/static/functions-range. html>
@> contains element '[2011-01-01,2011-03-01)'::tsrange
@> '2011-01-10'::timestamp t
Thanks Adrian. But how would you apply that to this situation, where I have a series of (quite possibly discontinuous) dateranges?
Start End
-- Episode 1
1/1/16 3/30/16
4/1/16 4/30/16
-- Episode 2
1/1/18 1/31/18
2/1/18 NULL
Given 1/15/18, should return 1/1/18-->NULL
Given 1/15/16, should return 1/1/16-->4/30/16
Thinking about this a bit more, I'm wondering if a window function could be used for this? I've used them a bit, but definitely never wrote one. Something like
continuous_daterange(staff_assign_date,daterange(staff_assign_date,staff_assign_date_end,'[[]') OVER (PARTION BY client_id)
But then a window function can only see the rows included in the query, yes? In which case this would only work if you queried for all the rows for a particular client_id? I guess in the example I gave I was doing that anyway, so maybe this would be no worse.
Ken
AGENCY Software
A Free Software data system
By and for non-profits
(253) 245-3801
learn more about AGENCY or
follow the discussion.
Re: Given a set of daterange, finding the continuous range thatincludes a particular date
From
Adrian Klaver
Date:
On 02/22/2018 05:23 PM, Ken Tanzer wrote: > > > On Thu, Feb 22, 2018 at 5:05 PM, Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: > > On 02/22/2018 04:58 PM, Ken Tanzer wrote: > > > > On Thu, Feb 22, 2018 at 4:53 PM, Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > <mailto:adrian.klaver@aklaver.com > <mailto:adrian.klaver@aklaver.com>>> wrote: > > On 02/22/2018 04:44 PM, Ken Tanzer wrote: > > Hi, hoping to get some help with this. I'm needing to > take a > specific date, a series of dateranges and, given a > specific > date, return a single conitinuous daterange that > includes that date. > > To elaborate a bit, I've got lots of tables that > include start > and end dates. For example: > > CREATE TABLE tbl_staff_assign ( > staff_assign_id SERIAL PRIMARY KEY, > client_id INTEGER NOT NULL REFERENCES > tbl_client (client_id), > staff_id INTEGER REFERENCES > tbl_staff(staff_id), > staff_assign_type_code VARCHAR(10) NOT NULL > REFERENCES tbl_l_staff_assign_type > (staff_assign_type_code), > staff_assign_date DATE NOT NULL, > staff_assign_date_end DATE, > ... > > So a client might leave a progrma and then return > later, or they > might simply switch to another staff_id. (In which > case one > record will have and end date, and the next record will > start on > the next day.) In this case I need to know "what > period were > they continuously in the program that includes X date?" > So I'd > like to be able to do something like: > > "SELECT staff_assign_date,continuous_daterange( > staff_assign_date, (SELECT > > array_agg(daterange(staff_assign_date,staff_assign_date_end,'[]') > ) FROM staff_assign sa2 WHERE > sa2.client_id=sa.client_id) FROM > staff_assign sa > > I've done this before with procedures specific to a > particular > table, and working with the start and end dates. I'm now > wanting to try to do this once generically that will > work for > all my cases. So I'm hoping to do this in a way that > performance isn't horrible. And it's a little unclear > to me how > much and how I might be able to use the daterange > operators to > accomplish this efficiently. > > > The operator I use to solve similar problems: > > https://www.postgresql.org/docs/10/static/functions-range.html > <https://www.postgresql.org/docs/10/static/functions-range.html> > > <https://www.postgresql.org/docs/10/static/functions-range.html > <https://www.postgresql.org/docs/10/static/functions-range.html>> > > @> contains element > '[2011-01-01,2011-03-01)'::tsrange > @> '2011-01-10'::timestamp t > > > Thanks Adrian. But how would you apply that to this situation, > where I have a series of (quite possibly discontinuous) dateranges? > > > This is going to depend on a more formal definition of the problem > with some sample data. Right now I am trying to reconcile "what > period were they continuously in the program that includes X date?" > with "where I have a series of (quite possibly discontinuous) > dateranges? " Maybe its just me, I can't see how discontinuous can > also be continuously. > > > > *Start End* > -- Episode 1 > 1/1/16 3/30/16 > 4/1/16 4/30/16 > -- Episode 2 > 1/1/18 1/31/18 > 2/1/18 NULL > > Given 1/15/18, should return 1/1/18-->NULL I have similar case. There I use the coalesce to substitute the date the report is run for the NULL value, mainly because I have not way of predicting the future:) Is there actually an episode value you can sort by or is that a made up value for this example? > Given 1/15/16, should return 1/1/16-->4/30/16 > > Thinking about this a bit more, I'm wondering if a window function could > be used for this? I've used them a bit, but definitely never wrote one. > Something like > > continuous_daterange(staff_assign_date,daterange(staff_assign_date,staff_assign_date_end,'[[]') > OVER (PARTION BY client_id) > > But then a window function can only see the rows included in the query, > yes? In which case this would only work if you queried for all the rows > for a particular client_id? I guess in the example I gave I was doing > that anyway, so maybe this would be no worse. > > Ken > -- > AGENCY Software > A Free Software data system > By and for non-profits > /http://agency-software.org// > /https://demo.agency-software.org/client/ > ken.tanzer@agency-software.org <mailto:ken.tanzer@agency-software.org> > (253) 245-3801 > > Subscribe to the mailing list > <mailto:agency-general-request@lists.sourceforge.net?body=subscribe> to > learn more about AGENCY or > follow the discussion. -- Adrian Klaver adrian.klaver@aklaver.com
Re: Given a set of daterange, finding the continuous range thatincludes a particular date
From
Adrian Klaver
Date:
On 02/22/2018 05:23 PM, Ken Tanzer wrote: > > > On Thu, Feb 22, 2018 at 5:05 PM, Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: > > On 02/22/2018 04:58 PM, Ken Tanzer wrote: > > > > On Thu, Feb 22, 2018 at 4:53 PM, Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > <mailto:adrian.klaver@aklaver.com > <mailto:adrian.klaver@aklaver.com>>> wrote: > > On 02/22/2018 04:44 PM, Ken Tanzer wrote: > > Hi, hoping to get some help with this. I'm needing to > take a > specific date, a series of dateranges and, given a > specific > date, return a single conitinuous daterange that > includes that date. > > To elaborate a bit, I've got lots of tables that > include start > and end dates. For example: > > CREATE TABLE tbl_staff_assign ( > staff_assign_id SERIAL PRIMARY KEY, > client_id INTEGER NOT NULL REFERENCES > tbl_client (client_id), > staff_id INTEGER REFERENCES > tbl_staff(staff_id), > staff_assign_type_code VARCHAR(10) NOT NULL > REFERENCES tbl_l_staff_assign_type > (staff_assign_type_code), > staff_assign_date DATE NOT NULL, > staff_assign_date_end DATE, > ... > > So a client might leave a progrma and then return > later, or they > might simply switch to another staff_id. (In which > case one > record will have and end date, and the next record will > start on > the next day.) In this case I need to know "what > period were > they continuously in the program that includes X date?" > So I'd > like to be able to do something like: > > "SELECT staff_assign_date,continuous_daterange( > staff_assign_date, (SELECT > > array_agg(daterange(staff_assign_date,staff_assign_date_end,'[]') > ) FROM staff_assign sa2 WHERE > sa2.client_id=sa.client_id) FROM > staff_assign sa > > I've done this before with procedures specific to a > particular > table, and working with the start and end dates. I'm now > wanting to try to do this once generically that will > work for > all my cases. So I'm hoping to do this in a way that > performance isn't horrible. And it's a little unclear > to me how > much and how I might be able to use the daterange > operators to > accomplish this efficiently. > > > The operator I use to solve similar problems: > > https://www.postgresql.org/docs/10/static/functions-range.html > <https://www.postgresql.org/docs/10/static/functions-range.html> > > <https://www.postgresql.org/docs/10/static/functions-range.html > <https://www.postgresql.org/docs/10/static/functions-range.html>> > > @> contains element > '[2011-01-01,2011-03-01)'::tsrange > @> '2011-01-10'::timestamp t > > > Thanks Adrian. But how would you apply that to this situation, > where I have a series of (quite possibly discontinuous) dateranges? > > > This is going to depend on a more formal definition of the problem > with some sample data. Right now I am trying to reconcile "what > period were they continuously in the program that includes X date?" > with "where I have a series of (quite possibly discontinuous) > dateranges? " Maybe its just me, I can't see how discontinuous can > also be continuously. > > > > *Start End* > -- Episode 1 > 1/1/16 3/30/16 > 4/1/16 4/30/16 > -- Episode 2 > 1/1/18 1/31/18 > 2/1/18 NULL > > Given 1/15/18, should return 1/1/18-->NULL > Given 1/15/16, should return 1/1/16-->4/30/16 Just thinking out loud here, in a function: 1) For a client_id you can find the min(staff_assign_date). 2) You can create a max(staff_assign_date_end) by using COALESCE(staff_assign_date_end, current_date) 3) You now have the outer range for the episodes. 4) In that range of dates for each staff_assign_date you find the staff_assign_date_end that greater then the staff_assign_date but less then the next staff_assign_date. That will give you your episodes. 5) You can then test to see if the X date is in the calculated episodes. 6) As a shortcut you can find min(staff_assign_date) that is less then the X date and see if the staff_assign_date_end is greater then the X date. > > Thinking about this a bit more, I'm wondering if a window function could > be used for this? I've used them a bit, but definitely never wrote one. > Something like > > continuous_daterange(staff_assign_date,daterange(staff_assign_date,staff_assign_date_end,'[[]') > OVER (PARTION BY client_id) > > But then a window function can only see the rows included in the query, > yes? In which case this would only work if you queried for all the rows > for a particular client_id? I guess in the example I gave I was doing > that anyway, so maybe this would be no worse. > > Ken > -- > AGENCY Software > A Free Software data system > By and for non-profits > /http://agency-software.org// > /https://demo.agency-software.org/client/ > ken.tanzer@agency-software.org <mailto:ken.tanzer@agency-software.org> > (253) 245-3801 > > Subscribe to the mailing list > <mailto:agency-general-request@lists.sourceforge.net?body=subscribe> to > learn more about AGENCY or > follow the discussion. -- Adrian Klaver adrian.klaver@aklaver.com
Re: Given a set of daterange, finding the continuous range thatincludes a particular date
From
Adrian Klaver
Date:
On 02/22/2018 09:05 PM, Adrian Klaver wrote: > On 02/22/2018 05:23 PM, Ken Tanzer wrote: >> >> >> On Thu, Feb 22, 2018 at 5:05 PM, Adrian Klaver >> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: >> >> On 02/22/2018 04:58 PM, Ken Tanzer wrote: >> >> >> >> On Thu, Feb 22, 2018 at 4:53 PM, Adrian Klaver >> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> >> <mailto:adrian.klaver@aklaver.com >> <mailto:adrian.klaver@aklaver.com>>> wrote: >> >> On 02/22/2018 04:44 PM, Ken Tanzer wrote: >> >> Hi, hoping to get some help with this. I'm needing to >> take a >> specific date, a series of dateranges and, given a >> specific >> date, return a single conitinuous daterange that >> includes that date. >> >> To elaborate a bit, I've got lots of tables that >> include start >> and end dates. For example: >> >> CREATE TABLE tbl_staff_assign ( >> staff_assign_id SERIAL PRIMARY KEY, >> client_id INTEGER NOT NULL >> REFERENCES >> tbl_client (client_id), >> staff_id INTEGER REFERENCES >> tbl_staff(staff_id), >> staff_assign_type_code VARCHAR(10) NOT NULL >> REFERENCES tbl_l_staff_assign_type >> (staff_assign_type_code), >> staff_assign_date DATE NOT NULL, >> staff_assign_date_end DATE, >> ... >> >> So a client might leave a progrma and then return >> later, or they >> might simply switch to another staff_id. (In which >> case one >> record will have and end date, and the next record will >> start on >> the next day.) In this case I need to know "what >> period were >> they continuously in the program that includes X date?" >> So I'd >> like to be able to do something like: >> >> "SELECT staff_assign_date,continuous_daterange( >> staff_assign_date, (SELECT >> array_agg(daterange(staff_assign_date,staff_assign_date_end,'[]') >> ) FROM staff_assign sa2 WHERE >> sa2.client_id=sa.client_id) FROM >> staff_assign sa >> >> I've done this before with procedures specific to a >> particular >> table, and working with the start and end dates. I'm >> now >> wanting to try to do this once generically that will >> work for >> all my cases. So I'm hoping to do this in a way that >> performance isn't horrible. And it's a little unclear >> to me how >> much and how I might be able to use the daterange >> operators to >> accomplish this efficiently. >> >> >> The operator I use to solve similar problems: >> >> https://www.postgresql.org/docs/10/static/functions-range.html >> <https://www.postgresql.org/docs/10/static/functions-range.html> >> <https://www.postgresql.org/docs/10/static/functions-range.html >> <https://www.postgresql.org/docs/10/static/functions-range.html>> >> >> @> contains element >> '[2011-01-01,2011-03-01)'::tsrange >> @> '2011-01-10'::timestamp t >> >> >> Thanks Adrian. But how would you apply that to this situation, >> where I have a series of (quite possibly discontinuous) >> dateranges? >> >> >> This is going to depend on a more formal definition of the problem >> with some sample data. Right now I am trying to reconcile "what >> period were they continuously in the program that includes X date?" >> with "where I have a series of (quite possibly discontinuous) >> dateranges? " Maybe its just me, I can't see how discontinuous can >> also be continuously. >> >> >> >> *Start End* >> -- Episode 1 >> 1/1/16 3/30/16 >> 4/1/16 4/30/16 >> -- Episode 2 >> 1/1/18 1/31/18 >> 2/1/18 NULL >> >> Given 1/15/18, should return 1/1/18-->NULL >> Given 1/15/16, should return 1/1/16-->4/30/16 > > Just thinking out loud here, in a function: > > 1) For a client_id you can find the min(staff_assign_date). > > 2) You can create a max(staff_assign_date_end) by using > COALESCE(staff_assign_date_end, current_date) > > 3) You now have the outer range for the episodes. > > 4) In that range of dates for each staff_assign_date you find the > staff_assign_date_end that greater then the staff_assign_date but less > then the next staff_assign_date. That will give you your episodes. > > 5) You can then test to see if the X date is in the calculated episodes. > > 6) As a shortcut you can find min(staff_assign_date) that is less then > the X date and see if the staff_assign_date_end is greater then the X date. > Correction to 6) 6) As a shortcut you can find max(staff_assign_date) that is less then the X date and see if the staff_assign_date_end is greater then the X date. -- Adrian Klaver adrian.klaver@aklaver.com
Re: Given a set of daterange, finding the continuous range thatincludes a particular date
From
Paul Jungwirth
Date:
On 02/22/2018 04:44 PM, Ken Tanzer wrote: > Hi, hoping to get some help with this. I'm needing to take a specific > date, a series of dateranges and, given a specific date, return a > single conitinuous daterange that includes that date. The part about joining multiple touching dateranges to give a single continuous daterange is what Richard Snodgrass calls "coalescing" in *Developing Time-Oriented Database Applications in SQL*, pages 159 - 169, available printed or as a free PDF at http://www2.cs.arizona.edu/~rts/publications.html (His approach also supports overlapping ranges, but it sounds like you don't need that.) If you had a coalesced view (or maybe a set-returning function), you could do this: SELECT term FROM coalesced_staff_assign WHERE client_id = 5 AND term @> '2018-15-01' ; I can't think of any way to avoid scanning all of a given client's records, but hopefully client_id alone would be selective enough to still give you good performance. Oh also: in reading Snodgrass's SQL, note that he assumes closed-open ranges (i.e. '[)'), so you'll need to adjust some things to fit with your closed-closed ranges (or always use `staff_assign_date_end - INTERVAL '1 day'` if every assignment is at least 1 day long). On the other hand with built-in range types you might be able to simplify his pure-SQL solutions. -- Paul ~{:-) pj@illuminatedcomputing.com
Re: Given a set of daterange, finding the continuous range thatincludes a particular date
From
mariusz
Date:
On Thu, 2018-02-22 at 17:23 -0800, Ken Tanzer wrote: > > > On Thu, Feb 22, 2018 at 5:05 PM, Adrian Klaver > <adrian.klaver@aklaver.com> wrote: > On 02/22/2018 04:58 PM, Ken Tanzer wrote: > > > > On Thu, Feb 22, 2018 at 4:53 PM, Adrian Klaver > <adrian.klaver@aklaver.com > <mailto:adrian.klaver@aklaver.com>> wrote: > > On 02/22/2018 04:44 PM, Ken Tanzer wrote: > > Hi, hoping to get some help with this. I'm > needing to take a > specific date, a series of dateranges and, > given a specific > date, return a single conitinuous daterange > that includes that date. > > To elaborate a bit, I've got lots of tables > that include start > and end dates. For example: > > CREATE TABLE tbl_staff_assign ( > staff_assign_id SERIAL PRIMARY > KEY, > client_id INTEGER NOT NULL > REFERENCES > tbl_client (client_id), > staff_id INTEGER > REFERENCES > tbl_staff(staff_id), > staff_assign_type_code VARCHAR(10) > NOT NULL > REFERENCES tbl_l_staff_assign_type > (staff_assign_type_code), > staff_assign_date DATE NOT > NULL, > staff_assign_date_end DATE, > ... > > So a client might leave a progrma and then > return later, or they > might simply switch to another staff_id. (In > which case one > record will have and end date, and the next > record will start on > the next day.) In this case I need to know > "what period were > they continuously in the program that includes > X date?" So I'd > like to be able to do something like: > > "SELECT > staff_assign_date,continuous_daterange( > staff_assign_date, (SELECT > > array_agg(daterange(staff_assign_date,staff_assign_date_end,'[]') > ) FROM staff_assign sa2 WHERE > sa2.client_id=sa.client_id) FROM > staff_assign sa > > I've done this before with procedures specific > to a particular > table, and working with the start and end > dates. I'm now > wanting to try to do this once generically > that will work for > all my cases. So I'm hoping to do this in a > way that > performance isn't horrible. And it's a little > unclear to me how > much and how I might be able to use the > daterange operators to > accomplish this efficiently. > > > The operator I use to solve similar problems: > > > https://www.postgresql.org/docs/10/static/functions-range.html > > <https://www.postgresql.org/docs/10/static/functions-range.html> > > @> contains element > '[2011-01-01,2011-03-01)'::tsrange > @> '2011-01-10'::timestamp t > > > Thanks Adrian. But how would you apply that to this > situation, where I have a series of (quite possibly > discontinuous) dateranges? > > > This is going to depend on a more formal definition of the > problem with some sample data. Right now I am trying to > reconcile "what period were they continuously in the program > that includes X date?" with "where I have a series of (quite > possibly discontinuous) dateranges? " Maybe its just me, I > can't see how discontinuous can also be continuously. > > > > > Start End > -- Episode 1 > 1/1/16 3/30/16 > 4/1/16 4/30/16 > -- Episode 2 > > 1/1/18 1/31/18 > 2/1/18 NULL > > > Given 1/15/18, should return 1/1/18-->NULL > Given 1/15/16, should return 1/1/16-->4/30/16 > i guess, you can easily get max continuous range for each row with something like this: CREATE OR REPLACE FUNCTION append_daterange(d1 daterange, d2 daterange) RETURNS daterange LANGUAGE sql AS $$ SELECT CASE WHEN d1 && d2 OR d1 -|- d2 THEN d1 + d2 ELSE d2 END; $$; CREATE AGGREGATE agg_daterange (daterange) ( sfunc = append_daterange, stype = daterange ); SELECT dr, lower(agg_daterange(dr) OVER (ORDER BY dr ASC)), upper(agg_daterange(dr) OVER (ORDER BY dr DESC)) FROM ... above example is simplified to selecting only daterange column "dr" for readability, which in your case should be something like daterange(staff_assign_date,staff_assign_date_end,'[)') please note that daterange would be normalized to [) format so upper() above returns exactly your max "staff_assign_date_end" for each continuous range when dateranges are created with '[)' format. the key point is ... ELSE d2 in append_daterange() which starts with new value each time that new value is discontinuous with agg's state value and order in which rows are processed (ASC for lower of daterange, DESC for upper of daterange). unfortunately this involves reading all rows for "client_id" and additional sorting for each window. i recall reading that you already pay the price of reading all rows for client_id anyway, so the only question is the cost of two additional sorts (maybe extracting dateranges to subset on which to do windows and rejoining result of continuous ranges to original set would help to lower the cost). another way would be recursive cte, easy to append continuous dateranges one by one from a set of dateranges staring with some date of choice, but i doubt it would be cheaper, not to mention additional postprocessing to get what you really want. and definitely would be less readable. i guess, from that point where for each row you have start and end date of max continuous range including given row, you can easily format or filter whatever you want > Thinking about this a bit more, I'm wondering if a window function > could be used for this? I've used them a bit, but definitely never > wrote one. Something like > > > continuous_daterange(staff_assign_date,daterange(staff_assign_date,staff_assign_date_end,'[[]') OVER (PARTION BY client_id) > > > But then a window function can only see the rows included in the > query, yes? In which case this would only work if you queried for all > the rows for a particular client_id? I guess in the example I gave I > was doing that anyway, so maybe this would be no worse. > > > Ken > -- > > AGENCY Software > A Free Software data system > By and for non-profits > http://agency-software.org/ > https://demo.agency-software.org/client > > ken.tanzer@agency-software.org > (253) 245-3801 > > > Subscribe to the mailing list to > learn more about AGENCY or > follow the discussion.
Re: Given a set of daterange, finding the continuous range thatincludes a particular date
From
Viktor Fougstedt
Date:
Hi.
This answer is perhaps useful if I understand your problem correctly. But I might have interpreted it wrongly. :-)
I would probably start with merging intervals so that overlapping and adjacent intervals become single continuous intervals, then select from those merged intervals.
We have an application with a lot of interval handling in PostgreSQL, and we use many variants of algorithms based on one by Itzik Ben Gan which he calls “packing intervals”. The post we started with was an old reader’s challenge from SQL Magazine. It has since been updated when MS-SQL started supporting window functions better (Itzik Ben Gan is a MS-SQL-guy).
Basically, it is a few CTE:s which convert the intervals into “start” (+1) and “stop” (-1) events, then keeps a running sum of these, and finally creates new intervals by noting that the merged (or “packed”) intervals starts with events that had sum=0 before them and stops with events which have sum=0 after them.
It involves both CTE:s and window functions and is quite a beautiful example of SQL, IMHO.
I think it’s best to google it, he does a good job of explaining how it works.
Hope that helps a bit at least.
/Viktor
On 23 Feb 2018, at 01:44, Ken Tanzer <ken.tanzer@gmail.com> wrote:Hi, hoping to get some help with this. I'm needing to take a specific date, a series of dateranges and, given a specific date, return a single conitinuous daterange that includes that date.To elaborate a bit, I've got lots of tables that include start and end dates. For example:CREATE TABLE tbl_staff_assign (staff_assign_id SERIAL PRIMARY KEY,client_id INTEGER NOT NULL REFERENCES tbl_client (client_id),staff_id INTEGER REFERENCES tbl_staff(staff_id),staff_assign_type_code VARCHAR(10) NOT NULL REFERENCES tbl_l_staff_assign_type (staff_assign_type_code),staff_assign_date DATE NOT NULL,staff_assign_date_end DATE,...So a client might leave a progrma and then return later, or they might simply switch to another staff_id. (In which case one record will have and end date, and the next record will start on the next day.) In this case I need to know "what period were they continuously in the program that includes X date?" So I'd like to be able to do something like:"SELECT staff_assign_date,continuous_daterange( staff_assign_date, (SELECT array_agg(daterange(staff_assign_date,staff_assign_date_end,'[]')) FROM staff_assign sa2 WHERE sa2.client_id=sa.client_id) FROM staff_assign saI've done this before with procedures specific to a particular table, and working with the start and end dates. I'm now wanting to try to do this once generically that will work for all my cases. So I'm hoping to do this in a way that performance isn't horrible. And it's a little unclear to me how much and how I might be able to use the daterange operators to accomplish this efficiently.Any advice or suggestions or ways to go about this appreciated. Thanks!Kenp.s., Another small wrinkle is these records aren't always perfect, and ideally I'd allow for an optional fudge factor that would allow small gaps to be ignored. I could just add that in every query (start_date+2,end_date-2), but it might be nice to have the function do it, if it didn't badly hurt performance.--AGENCY SoftwareA Free Software data systemBy and for non-profits(253) 245-3801learn more about AGENCY orfollow the discussion.
Re: Given a set of daterange, finding the continuous range thatincludes a particular date
From
Ken Tanzer
Date:
On Fri, Feb 23, 2018 at 6:42 AM, Viktor Fougstedt <viktor@chalmers.se> wrote:
BEGIN;
CREATE TEMP TABLE sample_data (
client_id INTEGER,
start_date DATE,
end_date DATE,
episode INTEGER -- I won't actually have this info; just a label I added for clarity and testing
);
INSERT INTO sample_data VALUES
Hi.This answer is perhaps useful if I understand your problem correctly. But I might have interpreted it wrongly. :-)I would probably start with merging intervals so that overlapping and adjacent intervals become single continuous intervals, then select from those merged intervals.We have an application with a lot of interval handling in PostgreSQL, and we use many variants of algorithms based on one by Itzik Ben Gan which he calls “packing intervals”. The post we started with was an old reader’s challenge from SQL Magazine. It has since been updated when MS-SQL started supporting window functions better (Itzik Ben Gan is a MS-SQL-guy).Basically, it is a few CTE:s which convert the intervals into “start” (+1) and “stop” (-1) events, then keeps a running sum of these, and finally creates new intervals by noting that the merged (or “packed”) intervals starts with events that had sum=0 before them and stops with events which have sum=0 after them.It involves both CTE:s and window functions and is quite a beautiful example of SQL, IMHO.I think it’s best to google it, he does a good job of explaining how it works.Hope that helps a bit at least./Viktor
Hi. Thanks for the many helpful responses! Although I didn't end up with exactly what I was looking for, I think I got to something that works good enough for now. I did it with CTEs, in a way I think similar to what you were suggesting (my printer is out of ink, so I didn't actually get to look at that book yet!) I ended up having to do 4 passes:
1) Identify the starts and ends of continuous ranges
2) Eliminate the middle-point records (non stops/starts)
3) Merge the stop date in with the starts
4) Eliminate the stops
I couldn't see how to do it in less steps, but if there's a way...
I posted the query along with some sample data below.
Thanks again!
Ken
CREATE TEMP TABLE sample_data (
client_id INTEGER,
start_date DATE,
end_date DATE,
episode INTEGER -- I won't actually have this info; just a label I added for clarity and testing
);
INSERT INTO sample_data VALUES
(1,'1990-01-01','1990-12-31',0),
(1,'1991-01-01','1991-12-31',0),
(1,'1995-01-01','1995-06-30',1),
(1,'2000-01-01','2000-12-31',2),
(1,'2001-01-01','2001-12-31',2),
(1,'2002-01-01','2002-12-31',2),
(1,'2003-01-01','2003-12-31',2),
(1,'2004-01-01','2004-12-31',2),
(1,'2005-01-01','2005-12-31',2),
(1,'2006-01-01','2006-12-31',2),
(1,'2014-01-01','2014-12-31',3),
(1,'2015-01-01','2015-12-31',3),
(1,'2017-06-30','2017-12-31',4),
(1,'2018-01-01',NULL,4),
(2,'2014-02-01','2015-01-31',0),
(2,'2015-02-01','2015-12-31',0),
(2,'2017-09-30','2018-01-31',1),
(2,'2018-02-01','2018-02-14',1)
;
WITH peek3 AS (
WITH peek2 AS (
WITH peek AS (
SELECT
client_id,
episode,
daterange(start_date,end_date,'[]') AS range,
COALESCE(daterange(start_date,end_date+1,'[]') &&
lead(daterange(start_date,end_date,'[]')) OVER (PARTITION BY client_id ORDER BY start_date),false) AS continues,
COALESCE(daterange(start_date,end_date,'[]') &&
lag(daterange(start_date,end_date+1,'[]')) OVER (PARTITION BY client_id ORDER BY start_date),false) AS is_continued
FROM
sample_data
)
SELECT
*
FROM peek
WHERE NOT (is_continued AND continues)
)
SELECT client_id,episode,range,
daterange(lower(range),
CASE WHEN lead(is_continued) OVER (PARTITION BY client_id ORDER BY range) THEN
lead(upper(range)) OVER (PARTITION BY client_id ORDER BY range)
ELSE upper(range)
END) AS full_range
,continues,is_continued
FROM peek2
)
SELECT * FROM peek3
WHERE NOT is_continued
;
AGENCY Software
A Free Software data system
By and for non-profits
(253) 245-3801
learn more about AGENCY or
follow the discussion.