Re: [GENERAL] time series data - Mailing list pgsql-general

From Khalil Khamlichi
Subject Re: [GENERAL] time series data
Date
Msg-id CAEK98WYwPa__BvTpxzh-PouE2gY+s=inqqxvGd7u6tePoKbJMw@mail.gmail.com
Whole thread Raw
In response to Re: [GENERAL] time series data  (Schneider <schneider@ardentperf.com>)
List pgsql-general
Thanks a lot Jeremy, we ended up integrating the code you provided
into our software (just before you patent it) :)

Best regards,

Kkh

On Tue, Oct 3, 2017 at 7:58 PM, Schneider <schneider@ardentperf.com> wrote:
> On Mon, Oct 2, 2017 at 10:27 AM, Khalil Khamlichi
> <khamlichi.khalil@gmail.com> wrote:
>> we have records like this
>>
>> ccdb1=# select user_name, agent_status, event_time from cc_events ;
>>
>>  user_name | agent_status |     event_time
>> -----------+--------------+---------------------
>>  user1     | ready        | 2017-01-01 10:00:00
>>  user1     | talking      | 2017-01-01 10:02:00
>>  user1     | after_call   | 2017-01-01 10:08:00
>>  user1     | ready        | 2017-01-01 10:10:00
>>  user1     | talking      | 2017-01-01 10:12:00
>>  user1     | after_call   | 2017-01-01 10:15:00
>>  user1     | paused       | 2017-01-01 10:17:00
>>  user1     | ready        | 2017-01-01 10:25:00
>>  user1     | talking      | 2017-01-01 10:26:00
>> (9 rows)
>>
>>
>> so user1 was READY at  2017-01-01 10:00:00 then he received a call
>> that he attended at 2017-01-01 10:02:00 and so on ...
>> so user1 was ready for 2 minutes, then he was talking for 6 minutes
>> then he was in after_call (doing after call work)  for 2 minutes and
>> this is the kind of information we want to query.
>>
>> my solution so far that I came with, is in my table I have 1 more
>> field : end_time
>> so when an event comes in and before the insert I do :
>> update cc_events set end_time = current_timestamp  where user_name =
>> 'user_of_event' and end_time is null;
>>
>> then I insert new event leaving the end_time as null so that next
>> event will update it and so on.
>>
>> its working fine, I have the start and end times for each event, its
>> not too painful to query (sum(end-start) while grouping by user_name,
>> agent_status), but its one more update on the table and also limited
>> in what you can query about,
>>
>> I know this must be a common problem in every software that deals with
>> events, so I suppose something is already built-in in postgres to deal
>> with it.
>
> Khalil, changing your schema is one solution with certain benefits -
> but it's definitely not necessary when you have the power of
> PostgreSQL at your fingertips.  You can solve your problem without
> changing anything at all.  :)
>
> All you need is a window function:
> https://www.postgresql.org/docs/9.6/static/tutorial-window.html
>
> Here's an example which I just now tested on 9.6.3, 9.5.7, 9.4.12 and
> 9.3.17 (all the latest minors currently supported on RDS).  You can
> try these queries on your own system; they should work anywhere.
>
> First, I added a second user to your data set to make sure we were
> handling that case correctly.
>
> ==========
> create table cc_events (user_name varchar(8), agent_status
> varchar(20), event_time timestamp);
>
> insert into cc_events values
> ('user1', 'ready', '2017-01-01 10:00:00'),
> ('user1', 'talking', '2017-01-01 10:02:00'),
> ('user2', 'ready', '2017-01-01 10:04:00'),
> ('user2', 'talking', '2017-01-01 10:05:00'),
> ('user1', 'after_call', '2017-01-01 10:07:00'),
> ('user1', 'ready', '2017-01-01 10:08:00'),
> ('user1', 'talking', '2017-01-01 10:10:00'),
> ('user1', 'after_call', '2017-01-01 10:15:00'),
> ('user2', 'after_call', '2017-01-01 10:18:00'),
> ('user1', 'paused', '2017-01-01 10:20:00'),
> ('user2', 'paused', '2017-01-01 10:21:00');
>
> select * from cc_events order by user_name, event_time;
>
> ==========
> here's a basic window function in action:
>
> ==========
> select user_name, agent_status, event_time,
>   lead(event_time) over (partition by user_name order by event_time)
> next_event_time
> from cc_events order by event_time;
>
>  user_name | agent_status |     event_time      |   next_event_time
> -----------+--------------+---------------------+---------------------
>  user1     | ready        | 2017-01-01 10:00:00 | 2017-01-01 10:02:00
>  user1     | talking      | 2017-01-01 10:02:00 | 2017-01-01 10:07:00
>  user2     | ready        | 2017-01-01 10:04:00 | 2017-01-01 10:05:00
>  user2     | talking      | 2017-01-01 10:05:00 | 2017-01-01 10:18:00
>  user1     | after_call   | 2017-01-01 10:07:00 | 2017-01-01 10:08:00
>  user1     | ready        | 2017-01-01 10:08:00 | 2017-01-01 10:10:00
>  user1     | talking      | 2017-01-01 10:10:00 | 2017-01-01 10:15:00
>  user1     | after_call   | 2017-01-01 10:15:00 | 2017-01-01 10:20:00
>  user2     | after_call   | 2017-01-01 10:18:00 | 2017-01-01 10:21:00
>  user1     | paused       | 2017-01-01 10:20:00 |
>  user2     | paused       | 2017-01-01 10:21:00 |
>
> ==========
> and now we just add one more column which does the subtraction to
> calculate the duration:
>
> ==========
> select user_name, agent_status, event_time,
>   lead(event_time) over (partition by user_name order by event_time)
> next_event_time,
>   (lead(event_time) over (partition by user_name order by event_time))
> - event_time as duration
> from cc_events order by event_time;
>
> user_name | agent_status |     event_time      |   next_event_time   | duration
> -----------+--------------+---------------------+---------------------+----------
>  user1     | ready        | 2017-01-01 10:00:00 | 2017-01-01 10:02:00 | 00:02:00
>  user1     | talking      | 2017-01-01 10:02:00 | 2017-01-01 10:07:00 | 00:05:00
>  user2     | ready        | 2017-01-01 10:04:00 | 2017-01-01 10:05:00 | 00:01:00
>  user2     | talking      | 2017-01-01 10:05:00 | 2017-01-01 10:18:00 | 00:13:00
>  user1     | after_call   | 2017-01-01 10:07:00 | 2017-01-01 10:08:00 | 00:01:00
>  user1     | ready        | 2017-01-01 10:08:00 | 2017-01-01 10:10:00 | 00:02:00
>  user1     | talking      | 2017-01-01 10:10:00 | 2017-01-01 10:15:00 | 00:05:00
>  user1     | after_call   | 2017-01-01 10:15:00 | 2017-01-01 10:20:00 | 00:05:00
>  user2     | after_call   | 2017-01-01 10:18:00 | 2017-01-01 10:21:00 | 00:03:00
>  user1     | paused       | 2017-01-01 10:20:00 |                     |
>  user2     | paused       | 2017-01-01 10:21:00 |                     |
>
> ==========
> it might also be convenient to wrap the window function in a common
> table expression
>
> https://www.postgresql.org/docs/9.6/static/queries-with.html
>
> ==========
> with calculate_next_events as (
>   select user_name, agent_status, event_time,
>     lead(event_time) over (partition by user_name order by event_time)
> next_event_time
>   from cc_events order by event_time
> )
> select user_name, agent_status, next_event_time-event_time duration
> from calculate_next_events order by event_time;
>
> ==========
> Finally, if you really want to supercharge this and power-up even
> more, besides temporal databases you might check out this recent blog
> post about implementing a state machine in postgresql... it's really
> interesting and closely related to what you're solving.
>
> https://felixge.de/2017/07/27/implementing-state-machines-in-postgresql.html
>
> Hope this is helpful. Great to see that you're working on PostgreSQL -
> it's a powerful engine to build with!
>
> -Jeremy
>
> --
> http://about.me/jeremy_schneider


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

pgsql-general by date:

Previous
From: "Igal @ Lucee.org"
Date:
Subject: Re: [GENERAL] Delete Duplicates with Using
Next
From: Sam Gendler
Date:
Subject: Re: [GENERAL] REASSIGN OWNED simply doesn't work