Thread: GROUP BY overlapping (tsrange) entries
Hi all.
I'm trying to count() overlapping entries (timestamp-ranges, tsrange) and have the following test-data:
create table event( id SERIAL PRIMARY KEY, start_time timestamp NOT NULL, end_time TIMESTAMP, tsrange TSRANGE NOT NULL
);
CREATE INDEX event_range_idx ON event USING gist (tsrange);
-- Populate tsrange in this trigger
CREATE OR REPLACE FUNCTION event_update_tf() returns TRIGGER AS $$
BEGIN
if NEW.end_time IS NOT NULL then
NEW.tsrange = tsrange(NEW.start_time, NEW.end_time, '[]');
else
NEW.tsrange = tsrange(NEW.start_time, null, '[)');
end if;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER event_update_t BEFORE INSERT OR UPDATE ON event
FOR EACH ROW EXECUTE PROCEDURE event_update_tf();
insert into event(start_time, end_time) values('2015-12-20', NULL) , ('2015-12-20', '2015-12-31') , ('2015-12-25', '2016-01-01') , ('2015-11-20', '2015-11-24') , ('2016-02-01', '2016-02-03') , ('2016-02-01', '2016-02-04') , ('2016-02-01', NULL)
;
What I'd like is output like this:
count
───────
1
3
3
(3 rows)
Something like:
SELECT count(*) FROM event group by (tsrange with &&);
PS: In my real query the tsrange and other data is the result of a query involving multile tables, this is just a simplified example to deal with the "group by tsquery"
Thanks.
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment
It is unclear to me how you got from your input data to your expected output. If you are "trying to count() overlapping entries" then it would seem to me that you would only have only one value for the count. Either a range overlaps or it does not.
On Thu, Jan 28, 2016 at 4:05 PM, Andreas Joseph Krogh <andreas@visena.com> wrote:
Hi all.I'm trying to count() overlapping entries (timestamp-ranges, tsrange) and have the following test-data:create table event( id SERIAL PRIMARY KEY, start_time timestamp NOT NULL, end_time TIMESTAMP, tsrange TSRANGE NOT NULL ); CREATE INDEX event_range_idx ON event USING gist (tsrange); -- Populate tsrange in this trigger CREATE OR REPLACE FUNCTION event_update_tf() returns TRIGGER AS $$ BEGIN if NEW.end_time IS NOT NULL then NEW.tsrange = tsrange(NEW.start_time, NEW.end_time, '[]'); else NEW.tsrange = tsrange(NEW.start_time, null, '[)'); end if; RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER event_update_t BEFORE INSERT OR UPDATE ON event FOR EACH ROW EXECUTE PROCEDURE event_update_tf(); insert into event(start_time, end_time) values('2015-12-20', NULL) , ('2015-12-20', '2015-12-31') , ('2015-12-25', '2016-01-01') , ('2015-11-20', '2015-11-24') , ('2016-02-01', '2016-02-03') , ('2016-02-01', '2016-02-04') , ('2016-02-01', NULL) ;
What I'd like is output like this:
count
───────
1
3
3
(3 rows)Something like:SELECT count(*) FROM event group by (tsrange with &&);PS: In my real query the tsrange and other data is the result of a query involving multile tables, this is just a simplified example to deal with the "group by tsquery"Thanks.
Attachment
På fredag 29. januar 2016 kl. 02:02:46, skrev Michael Moore <michaeljmoore@gmail.com>:
It is unclear to me how you got from your input data to your expected output. If you are "trying to count() overlapping entries" then it would seem to me that you would only have only one value for the count. Either a range overlaps or it does not.
Oh, sorry, the count was in wrong order.
Let me explain,
insert into event(name, start_time, end_time) values('a', '2015-12-20', NULL) , ('a', '2015-12-20', '2015-12-31') , ('a', '2015-12-25', '2016-01-01') , ('b', '2015-11-20', '2015-11-24') , ('c', '2016-02-01', '2016-02-03') , ('c', '2016-02-01', '2016-02-04')
, ('c', '2016-02-01', NULL)
;
All 'a', 'b' and 'c' have points in common, with count a=3, b=1, c=3.
Thanks.
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment
På fredag 29. januar 2016 kl. 02:11:52, skrev Andreas Joseph Krogh <andreas@visena.com>:
På fredag 29. januar 2016 kl. 02:02:46, skrev Michael Moore <michaeljmoore@gmail.com>:It is unclear to me how you got from your input data to your expected output. If you are "trying to count() overlapping entries" then it would seem to me that you would only have only one value for the count. Either a range overlaps or it does not.Oh, sorry, the count was in wrong order.Let me explain,insert into event(name, start_time, end_time) values('a', '2015-12-20', NULL) , ('a', '2015-12-20', '2015-12-31') , ('a', '2015-12-25', '2016-01-01') , ('b', '2015-11-20', '2015-11-24') , ('c', '2016-02-01', '2016-02-03') , ('c', '2016-02-01', '2016-02-04') , ('c', '2016-02-01', NULL) ;
All 'a', 'b' and 'c' have points in common, with count a=3, b=1, c=3.
Note that the 'name'-column here is just to explain what I'm after and that I have no such column.
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment
På fredag 29. januar 2016 kl. 02:43:37, skrev Andreas Joseph Krogh <andreas@visena.com>:
På fredag 29. januar 2016 kl. 02:11:52, skrev Andreas Joseph Krogh <andreas@visena.com>:På fredag 29. januar 2016 kl. 02:02:46, skrev Michael Moore <michaeljmoore@gmail.com>:It is unclear to me how you got from your input data to your expected output. If you are "trying to count() overlapping entries" then it would seem to me that you would only have only one value for the count. Either a range overlaps or it does not.Oh, sorry, the count was in wrong order.Let me explain,insert into event(name, start_time, end_time) values('a', '2015-12-20', NULL) , ('a', '2015-12-20', '2015-12-31') , ('a', '2015-12-25', '2016-01-01') , ('b', '2015-11-20', '2015-11-24') , ('c', '2016-02-01', '2016-02-03') , ('c', '2016-02-01', '2016-02-04') , ('c', '2016-02-01', NULL) ;
All 'a', 'b' and 'c' have points in common, with count a=3, b=1, c=3.Note that the 'name'-column here is just to explain what I'm after and that I have no such column.
Any clever hints anyone?
Thanks.
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment
På fredag 29. januar 2016 kl. 02:43:37, skrev Andreas Joseph Krogh <andreas@visena.com>:På fredag 29. januar 2016 kl. 02:11:52, skrev Andreas Joseph Krogh <andreas@visena.com>:På fredag 29. januar 2016 kl. 02:02:46, skrev Michael Moore <michaeljmoore@gmail.com>:It is unclear to me how you got from your input data to your expected output. If you are "trying to count() overlapping entries" then it would seem to me that you would only have only one value for the count. Either a range overlaps or it does not.Oh, sorry, the count was in wrong order.Let me explain,insert into event(name, start_time, end_time) values('a', '2015-12-20', NULL) , ('a', '2015-12-20', '2015-12-31') , ('a', '2015-12-25', '2016-01-01') , ('b', '2015-11-20', '2015-11-24') , ('c', '2016-02-01', '2016-02-03') , ('c', '2016-02-01', '2016-02-04') , ('c', '2016-02-01', NULL) ;
All 'a', 'b' and 'c' have points in common, with count a=3, b=1, c=3.Note that the 'name'-column here is just to explain what I'm after and that I have no such column.Any clever hints anyone?
Maybe this will help...?
David J.
På fredag 29. januar 2016 kl. 17:59:52, skrev David G. Johnston <david.g.johnston@gmail.com>:
På fredag 29. januar 2016 kl. 02:43:37, skrev Andreas Joseph Krogh <andreas@visena.com>:På fredag 29. januar 2016 kl. 02:11:52, skrev Andreas Joseph Krogh <andreas@visena.com>:På fredag 29. januar 2016 kl. 02:02:46, skrev Michael Moore <michaeljmoore@gmail.com>:It is unclear to me how you got from your input data to your expected output. If you are "trying to count() overlapping entries" then it would seem to me that you would only have only one value for the count. Either a range overlaps or it does not.Oh, sorry, the count was in wrong order.Let me explain,insert into event(name, start_time, end_time) values('a', '2015-12-20', NULL) , ('a', '2015-12-20', '2015-12-31') , ('a', '2015-12-25', '2016-01-01') , ('b', '2015-11-20', '2015-11-24') , ('c', '2016-02-01', '2016-02-03') , ('c', '2016-02-01', '2016-02-04') , ('c', '2016-02-01', NULL) ;
All 'a', 'b' and 'c' have points in common, with count a=3, b=1, c=3.Note that the 'name'-column here is just to explain what I'm after and that I have no such column.Any clever hints anyone?Maybe this will help...?David J.
Yea, I've seen it, but don't like it.
I was (am) hoping some clever PG-guy would step up and craft som clever GROUP BY stuff like "GROUP BY magic_gist_equals(tsrange with &&)"
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment
I was thinking along the lines that David J's link shows. But now I see that it's already been done and documented. So, I got nothing.
On Fri, Jan 29, 2016 at 9:30 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:
På fredag 29. januar 2016 kl. 17:59:52, skrev David G. Johnston <david.g.johnston@gmail.com>:På fredag 29. januar 2016 kl. 02:43:37, skrev Andreas Joseph Krogh <andreas@visena.com>:På fredag 29. januar 2016 kl. 02:11:52, skrev Andreas Joseph Krogh <andreas@visena.com>:På fredag 29. januar 2016 kl. 02:02:46, skrev Michael Moore <michaeljmoore@gmail.com>:It is unclear to me how you got from your input data to your expected output. If you are "trying to count() overlapping entries" then it would seem to me that you would only have only one value for the count. Either a range overlaps or it does not.Oh, sorry, the count was in wrong order.Let me explain,insert into event(name, start_time, end_time) values('a', '2015-12-20', NULL) , ('a', '2015-12-20', '2015-12-31') , ('a', '2015-12-25', '2016-01-01') , ('b', '2015-11-20', '2015-11-24') , ('c', '2016-02-01', '2016-02-03') , ('c', '2016-02-01', '2016-02-04') , ('c', '2016-02-01', NULL) ;
All 'a', 'b' and 'c' have points in common, with count a=3, b=1, c=3.Note that the 'name'-column here is just to explain what I'm after and that I have no such column.Any clever hints anyone?Maybe this will help...?David J.Yea, I've seen it, but don't like it.I was (am) hoping some clever PG-guy would step up and craft som clever GROUP BY stuff like "GROUP BY magic_gist_equals(tsrange with &&)"
Attachment
På fredag 29. januar 2016 kl. 17:59:52, skrev David G. Johnston <david.g.johnston@gmail.com>:På fredag 29. januar 2016 kl. 02:43:37, skrev Andreas Joseph Krogh <andreas@visena.com>:På fredag 29. januar 2016 kl. 02:11:52, skrev Andreas Joseph Krogh <andreas@visena.com>:På fredag 29. januar 2016 kl. 02:02:46, skrev Michael Moore <michaeljmoore@gmail.com>:It is unclear to me how you got from your input data to your expected output. If you are "trying to count() overlapping entries" then it would seem to me that you would only have only one value for the count. Either a range overlaps or it does not.Oh, sorry, the count was in wrong order.Let me explain,insert into event(name, start_time, end_time) values('a', '2015-12-20', NULL) , ('a', '2015-12-20', '2015-12-31') , ('a', '2015-12-25', '2016-01-01') , ('b', '2015-11-20', '2015-11-24') , ('c', '2016-02-01', '2016-02-03') , ('c', '2016-02-01', '2016-02-04') , ('c', '2016-02-01', NULL) ;
All 'a', 'b' and 'c' have points in common, with count a=3, b=1, c=3.Note that the 'name'-column here is just to explain what I'm after and that I have no such column.Any clever hints anyone?Maybe this will help...?David J.Yea, I've seen it, but don't like it.I was (am) hoping some clever PG-guy would step up and craft som clever GROUP BY stuff like "GROUP BY magic_gist_equals(tsrange with &&)"
"""
GROUP BY will condense into a single row all selected rows that share the same values for the grouped expressions"""
Which means the only valid comparison for GROUP BY is equals. The processes of finding a single value upon which such an equality comparison can be performed is the subject of the wiki page I linked.
The only other potential query approach that comes to mind is some kind of recursive CTE.
A more structural potential approach would involve triggers and maintaining some form of master range table that evolves as DML is executed against the base table.
David J.
2016-01-29 2:11 GMT+01:00 Andreas Joseph Krogh <andreas@visena.com>:
På fredag 29. januar 2016 kl. 02:02:46, skrev Michael Moore <michaeljmoore@gmail.com>:It is unclear to me how you got from your input data to your expected output. If you are "trying to count() overlapping entries" then it would seem to me that you would only have only one value for the count. Either a range overlaps or it does not.Oh, sorry, the count was in wrong order.Let me explain,insert into event(name, start_time, end_time) values('a', '2015-12-20', NULL) , ('a', '2015-12-20', '2015-12-31') , ('a', '2015-12-25', '2016-01-01') , ('b', '2015-11-20', '2015-11-24') , ('c', '2016-02-01', '2016-02-03') , ('c', '2016-02-01', '2016-02-04') , ('c', '2016-02-01', NULL) ;
All 'a', 'b' and 'c' have points in common, with count a=3, b=1, c=3.Thanks.
I think data are not correct...
Expected result is the same as count() group by name...
But I guess you have included name column just to different ranges for overlap...
But actually there is just 2 ranges:name b is 1 range, name a & c are second range. all overlaps by first range '2015-12-20, null) - it contains all records named as C ranges
Attachment
På fredag 29. januar 2016 kl. 20:23:01, skrev David G. Johnston <david.g.johnston@gmail.com>:
På fredag 29. januar 2016 kl. 17:59:52, skrev David G. Johnston <david.g.johnston@gmail.com>:På fredag 29. januar 2016 kl. 02:43:37, skrev Andreas Joseph Krogh <andreas@visena.com>:På fredag 29. januar 2016 kl. 02:11:52, skrev Andreas Joseph Krogh <andreas@visena.com>:På fredag 29. januar 2016 kl. 02:02:46, skrev Michael Moore <michaeljmoore@gmail.com>:It is unclear to me how you got from your input data to your expected output. If you are "trying to count() overlapping entries" then it would seem to me that you would only have only one value for the count. Either a range overlaps or it does not.Oh, sorry, the count was in wrong order.Let me explain,insert into event(name, start_time, end_time) values('a', '2015-12-20', NULL) , ('a', '2015-12-20', '2015-12-31') , ('a', '2015-12-25', '2016-01-01') , ('b', '2015-11-20', '2015-11-24') , ('c', '2016-02-01', '2016-02-03') , ('c', '2016-02-01', '2016-02-04') , ('c', '2016-02-01', NULL) ;
All 'a', 'b' and 'c' have points in common, with count a=3, b=1, c=3.Note that the 'name'-column here is just to explain what I'm after and that I have no such column.Any clever hints anyone?Maybe this will help...?David J.Yea, I've seen it, but don't like it.I was (am) hoping some clever PG-guy would step up and craft som clever GROUP BY stuff like "GROUP BY magic_gist_equals(tsrange with &&)""""GROUP BY will condense into a single row all selected rows that share the same values for the grouped expressions"""Which means the only valid comparison for GROUP BY is equals. The processes of finding a single value upon which such an equality comparison can be performed is the subject of the wiki page I linked.
Yes, I know but I'm hoping someone has a more clever idea.
The only other potential query approach that comes to mind is some kind of recursive CTE.
Interesting. Care to give an example of how to solve this using recursive CTE?
A more structural potential approach would involve triggers and maintaining some form of master range table that evolves as DML is executed against the base table.
Yes, but I'm using the count() for calculating the cardinality of something, which might change based on some other column in another table, which is JOIN'ed in in my real query. So a de-normalized piggy-backing table would be quite non-trivial to maintain.
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment
You could create a temporary table like
CREATE TABLE rng_counts (
cnt int,
during tsrange, EXCLUDE USING gist (during WITH &&) );
FOR cursor over event table LOOP
INSERT INTO reservation VALUES (1, event. tsrange )
on conflict (during) do update set cnt = cnt + 1
;
LOOP END;
This is very rough but should be enough to give you an idea of what I driving at.
On Fri, Jan 29, 2016 at 3:34 PM, Andreas Joseph Krogh <andreas@visena.com> wrote:
På fredag 29. januar 2016 kl. 20:23:01, skrev David G. Johnston <david.g.johnston@gmail.com>:På fredag 29. januar 2016 kl. 17:59:52, skrev David G. Johnston <david.g.johnston@gmail.com>:På fredag 29. januar 2016 kl. 02:43:37, skrev Andreas Joseph Krogh <andreas@visena.com>:På fredag 29. januar 2016 kl. 02:11:52, skrev Andreas Joseph Krogh <andreas@visena.com>:På fredag 29. januar 2016 kl. 02:02:46, skrev Michael Moore <michaeljmoore@gmail.com>:It is unclear to me how you got from your input data to your expected output. If you are "trying to count() overlapping entries" then it would seem to me that you would only have only one value for the count. Either a range overlaps or it does not.Oh, sorry, the count was in wrong order.Let me explain,insert into event(name, start_time, end_time) values('a', '2015-12-20', NULL) , ('a', '2015-12-20', '2015-12-31') , ('a', '2015-12-25', '2016-01-01') , ('b', '2015-11-20', '2015-11-24') , ('c', '2016-02-01', '2016-02-03') , ('c', '2016-02-01', '2016-02-04') , ('c', '2016-02-01', NULL) ;
All 'a', 'b' and 'c' have points in common, with count a=3, b=1, c=3.Note that the 'name'-column here is just to explain what I'm after and that I have no such column.Any clever hints anyone?Maybe this will help...?David J.Yea, I've seen it, but don't like it.I was (am) hoping some clever PG-guy would step up and craft som clever GROUP BY stuff like "GROUP BY magic_gist_equals(tsrange with &&)""""GROUP BY will condense into a single row all selected rows that share the same values for the grouped expressions"""Which means the only valid comparison for GROUP BY is equals. The processes of finding a single value upon which such an equality comparison can be performed is the subject of the wiki page I linked.Yes, I know but I'm hoping someone has a more clever idea.The only other potential query approach that comes to mind is some kind of recursive CTE.Interesting. Care to give an example of how to solve this using recursive CTE?A more structural potential approach would involve triggers and maintaining some form of master range table that evolves as DML is executed against the base table.Yes, but I'm using the count() for calculating the cardinality of something, which might change based on some other column in another table, which is JOIN'ed in in my real query. So a de-normalized piggy-backing table would be quite non-trivial to maintain.
Attachment
På fredag 29. januar 2016 kl. 20:23:01, skrev David G. Johnston <david.g.johnston@gmail.com>:The only other potential query approach that comes to mind is some kind of recursive CTE.Interesting. Care to give an example of how to solve this using recursive CTE?
My skill with recursive CTEs is abyssmal, so no, that is not a challenge that I wish to take on at this time. I'm not positive there is a meaningful one to even be had but the idea of choosing a parent time and then either expanding its covered range or adding a child is at least compelling enough to make it worth exploring if current solutions are undesirable.
David J.
2016-01-30 0:25 GMT+01:00 Andreas Joseph Krogh <andreas@visena.com>:
På fredag 29. januar 2016 kl. 20:33:08, skrev Misa Simic <misa.simic@gmail.com>:2016-01-29 2:11 GMT+01:00 Andreas Joseph Krogh <andreas@visena.com>:På fredag 29. januar 2016 kl. 02:02:46, skrev Michael Moore <michaeljmoore@gmail.com>:It is unclear to me how you got from your input data to your expected output. If you are "trying to count() overlapping entries" then it would seem to me that you would only have only one value for the count. Either a range overlaps or it does not.Oh, sorry, the count was in wrong order.Let me explain,insert into event(name, start_time, end_time) values('a', '2015-12-20', NULL) , ('a', '2015-12-20', '2015-12-31') , ('a', '2015-12-25', '2016-01-01') , ('b', '2015-11-20', '2015-11-24') , ('c', '2016-02-01', '2016-02-03') , ('c', '2016-02-01', '2016-02-04') , ('c', '2016-02-01', NULL) ;
All 'a', 'b' and 'c' have points in common, with count a=3, b=1, c=3.Thanks.I think data are not correct...Expected result is the same as count() group by name...But I guess you have included name column just to different ranges for overlap...Yes, as I worte in the followup:"Note that the 'name'-column here is just to explain what I'm after and that I have no such column."But actually there is just 2 ranges:name b is 1 range, name a & c are second range. all overlaps by first range '2015-12-20, null) - it contains all records named as C rangesYes, my bad. Pretend the first range for 'a' was '2015-12-20" - "2015-12-27".
In that case, the result you can get by:
SELECT COUNT(1) FROM (
SELECT (SELECT tsrange(min(start_time), max(COALESCE(end_time, 'infinity'))) FROM event e WHERE e.tsrange && main.tsrange) as full_range
FROM event main
) t
GROUP BY full_range
Attachment
På lørdag 30. januar 2016 kl. 13:45:11, skrev Misa Simic <misa.simic@gmail.com>:
2016-01-30 0:25 GMT+01:00 Andreas Joseph Krogh <andreas@visena.com>:På fredag 29. januar 2016 kl. 20:33:08, skrev Misa Simic <misa.simic@gmail.com>:2016-01-29 2:11 GMT+01:00 Andreas Joseph Krogh <andreas@visena.com>:På fredag 29. januar 2016 kl. 02:02:46, skrev Michael Moore <michaeljmoore@gmail.com>:It is unclear to me how you got from your input data to your expected output. If you are "trying to count() overlapping entries" then it would seem to me that you would only have only one value for the count. Either a range overlaps or it does not.Oh, sorry, the count was in wrong order.Let me explain,insert into event(name, start_time, end_time) values('a', '2015-12-20', NULL) , ('a', '2015-12-20', '2015-12-31') , ('a', '2015-12-25', '2016-01-01') , ('b', '2015-11-20', '2015-11-24') , ('c', '2016-02-01', '2016-02-03') , ('c', '2016-02-01', '2016-02-04') , ('c', '2016-02-01', NULL) ;
All 'a', 'b' and 'c' have points in common, with count a=3, b=1, c=3.Thanks.I think data are not correct...Expected result is the same as count() group by name...But I guess you have included name column just to different ranges for overlap...Yes, as I worte in the followup:"Note that the 'name'-column here is just to explain what I'm after and that I have no such column."But actually there is just 2 ranges:name b is 1 range, name a & c are second range. all overlaps by first range '2015-12-20, null) - it contains all records named as C rangesYes, my bad. Pretend the first range for 'a' was '2015-12-20" - "2015-12-27".In that case, the result you can get by:SELECT COUNT(1) FROM (SELECT (SELECT tsrange(min(start_time), max(COALESCE(end_time, 'infinity'))) FROM event e WHERE e.tsrange && main.tsrange) as full_rangeFROM event main) tGROUP BY full_range
This is the clever guy I'm taking about:-) Thanks, works great!
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963