Re: Summing activity intervals without any obvious column to group by - Mailing list pgsql-general

From David Johnston
Subject Re: Summing activity intervals without any obvious column to group by
Date
Msg-id 0AEE1CE9-1D54-4DDA-A1F8-561DDEB8D73A@yahoo.com
Whole thread Raw
In response to Summing activity intervals without any obvious column to group by  (Carey Tilden <carey.tilden@gmail.com>)
Responses Re: Summing activity intervals without any obvious column to group by
List pgsql-general
On Aug 13, 2012, at 20:28, Carey Tilden <carey.tilden@gmail.com> wrote:

> Apologies for the awkward title.  I haven't quite thought of the right way to describe my problem, which may be why
I'vehad a hard time figuring out how to solve it.  I have a list of program start/stop times, and I want to know how
longeach run takes to complete.  The thing that's really tripping me up is there are gaps in the sequence.  I've
figuredout how to collapse the results down to a single row per attempt, but I can't quite figure out how to further
collapsedown each full run to its own row.  It'd be easy if I had a session_id or something to group on, but I don't.
AllI have are the start/stop times. 
>
> Here's some sample data.  Hopefully this clarifies what I'm talking about:
>
>     drop table if exists program_runs;
>
>     create temporary table program_runs (
>         id serial,
>         time_stamp timestamptz,
>         action text
>     );
>
>     insert into program_runs (time_stamp, action) values
>         ('2012-01-01 10:00:00 PST', 'started'), ('2012-01-01 10:10:00 PST', 'stopped early'),
>         ('2012-01-01 10:20:00 PST', 'started'), ('2012-01-01 10:30:00 PST', 'stopped early'),
>         ('2012-01-01 10:40:00 PST', 'started'), ('2012-01-01 10:47:00 PST', 'completed'),
>         ('2012-01-01 10:50:00 PST', 'started'), ('2012-01-01 11:00:00 PST', 'stopped early'),
>         ('2012-01-01 11:10:00 PST', 'started'), ('2012-01-01 11:13:00 PST', 'completed'),
>         ('2012-01-01 11:20:00 PST', 'started'), ('2012-01-01 11:30:00 PST', 'stopped early'),
>         ('2012-01-01 11:40:00 PST', 'started'), ('2012-01-01 11:50:00 PST', 'stopped early'),
>         ('2012-01-01 12:00:00 PST', 'started'), ('2012-01-01 12:10:00 PST', 'stopped early'),
>         ('2012-01-01 12:20:00 PST', 'started'), ('2012-01-01 12:29:00 PST', 'completed');
>
>     select
>         this_time_stamp as starting_time_stamp,
>         next_time_stamp - this_time_stamp as time_elapsed,
>         next_action as closing_action
>     from (
>         select
>             time_stamp as this_time_stamp, lead(time_stamp) over (order by id) as next_time_stamp,
>             action as this_action, lead(action) over (order by id) as next_action,
>             id as this_id, lead(id) over (order by id) as next_id
>         from program_runs
>     ) q
>     where this_action = 'started';
>
> Note that each run has a pair of entries in the table.  The first is always "started", but the second may be either
"stoppedearly" or "completed".  The final results I'd like to see are: 
>
>       starting_time_stamp   | total_time_elapsed
>     ------------------------+--------------------
>      2012-01-01 10:00:00-08 | 00:27:00
>      2012-01-01 10:50:00-08 | 00:13:00
>      2012-01-01 11:20:00-08 | 00:39:00
>
> Hope that's enough detail.  Any ideas or suggestions gladly accepted!
>
> Regards,
> Carey

First artificially generate row (pair) identifiers by integer dividing the ordered row number by 2.

Using window or sub-queries identify the bookends for each group (i.e., the identifier for each completed and the prior
completed). Give these groups artificial session identifiers/row numbers. 

Assign the artificial session id to each transaction row by using the bookends.

Now you have identifiers with which to group.

This makes a number of assumptions regarding the form of the input data. It will solve for your example data but it may
notgeneralize.  In particular it assumes non-overlapping sessions. 

HTH

David J.

pgsql-general by date:

Previous
From: Carey Tilden
Date:
Subject: Re: Summing activity intervals without any obvious column to group by
Next
From: Carey Tilden
Date:
Subject: Re: Summing activity intervals without any obvious column to group by