Re: Group by range in hour of day - Mailing list pgsql-general
| From | Israel Brewster |
|---|---|
| Subject | Re: Group by range in hour of day |
| Date | |
| Msg-id | 467F41CB-F555-4BB6-9585-90DC64237E8F@ravnalaska.net Whole thread Raw |
| In response to | Re: Group by range in hour of day (Adrian Klaver <adrian.klaver@aklaver.com>) |
| Responses |
Re: Group by range in hour of day
|
| List | pgsql-general |
> On Mar 16, 2015, at 3:46 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
> On 03/16/2015 04:16 PM, Israel Brewster wrote:
>> On Mar 16, 2015, at 2:22 PM, David G. Johnston
>> <david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com>> wrote:
>>>
>>> On Mon, Mar 16, 2015 at 3:13 PM, Adrian Klaver
>>> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>>wrote:
>>>
>>> On 03/16/2015 02:57 PM, Israel Brewster wrote:
>>>
>>> I have a table with two timestamp columns for the start time
>>> and end
>>> time of each record (call them start and end).I'm trying to
>>> figure out
>>> if there is a way to group these records by "hour of day",
>>> that is the
>>> record should be included in the group if the hour of the day
>>> for the
>>> group falls anywhere in the range [start,end]. Obviously each
>>> record may
>>> well fall into multiple groups under this scenario.
>>>
>>> The goal here is to figure out, for each hour of the day, a)
>>> what is the
>>> total number of "active" records for that hour, and b) what is
>>> the total
>>> "active" time for those records during the hour, with an
>>> ultimate goal
>>> of figuring out the average active time per record per hour.
>>>
>>> So, for simplified example, if the table contained three records:
>>>
>>> start | end
>>> ------------------------------__-----------------------
>>> 2015-03-15 08:15 | 2015-03-15 10:45
>>> 2015-03-15 09:30 | 2015-03-15 10:15
>>> 2015-03-15 10:30 | 2015-03-15 11:30
>>>
>>>
>>> Then the results should break out something like this:
>>>
>>> hour | count | sum
>>> -----------------------------
>>> 8 | 1 | 0.75
>>> 9 | 2 | 1.5
>>> 10 | 3 | 1.5
>>> 11 | 1 | 0.5
>>>
>>> I can then easily manipulate these values to get my ultimate
>>> goal of the
>>> average, which would of course always be less than or equal to
>>> 1. Is
>>> this doable in postgress? Or would it be a better idea to
>>> simply pull
>>> the raw data and post-process in code? Thanks.
>>>
>>>
>>> Do not have an answer for you, but a question:
>>>
>>> What version of Postgres are you on?
>>>
>>> This will help determine what tools are available to work with.
>>>
>>>
>>> The following will give you endpoints for your bounds. Version is
>>> important since "range types" could be very useful in this situation -
>>> but you'd still need to generate the bounds info regardless.
>>>
>>>
>>> SELECT *
>>> FROM
>>> (SELECT * FROM generate_series('2015-03-15'::timestamptz,
>>> '2015-03-16'::timestamptz, '1 hour'::interval) start (start_ts)) AS s
>>> CROSS JOIN
>>> (SELECT end_ts + '1 hour'::interval AS end_ts FROM
>>> generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz,
>>> '1 hour'::interval) e (end_ts)) AS e
>>>
>>> You would join this using an ON condition with an OR (start BETWEEN
>>> [...] OR end BETWEEN [...]) - range logic will be better and you may
>>> want to adjust the upper bound by negative 1 (nano-second?) to allow
>>> for easier "<=" logic if using BETWEEN.
>>>
>>
>> Thanks, that is very helpful, but are you sure CROSS JOIN is what you
>> wanted here? using that, I get a 625 row result set where each row from
>> the first SELECT is paired up with EVERY row from the second select. I
>> would think I would want the first row of the first SELECT paired up
>> with only the first row of the second, second row of the first paired
>> with the second row of the second, etc - i.e. 24 start and end bounds.
>> Or am I missing something?
>
> Given this:
>
> test=> select * from start_end ;
> id | start_time | end_time
> ----+------------------------+------------------------
> 1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07
> 2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07
> 3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07
>
> using Pauls hints I got:
>
> test=> select * from start_end, generate_series(0, 23) as s(h) where h between extract(hour from start_time) and
extract(hourfrom end_time) ;
>
> id | start_time | end_time | h
> ----+------------------------+------------------------+----
> 1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 | 8
> 3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 8
> 1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 | 9
> 2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 9
> 3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 9
> 2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 10
> 3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 10
> 2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 11
> 3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 11
>
>
> test=> select h, count(*) from start_end, generate_series(0, 23) as s(h) where h between extract(hour from
start_time)and extract(hour from end_time) group by h order by h;
>
> h | count
> ----+-------
> 8 | 2
> 9 | 3
> 10 | 2
> 11 | 2
>
Awesome! That looks perfect. I still need to incorporate it into the bigger picture, of course, but I'm sure I can
handlethat. Thanks!
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------
>>
>>
>>
>> -----------------------------------------------
>> Israel Brewster
>> Systems Analyst II
>> Ravn Alaska
>> 5245 Airport Industrial Rd
>> Fairbanks, AK 99709
>> (907) 450-7293
>> -----------------------------------------------
>>
>>> David J.
>>>
>>
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com
Attachment
pgsql-general by date: