Re: [GENERAL] time series data - Mailing list pgsql-general
From | Khalil Khamlichi |
---|---|
Subject | Re: [GENERAL] time series data |
Date | |
Msg-id | CAEK98WYwPa__BvTpxzh-PouE2gY+s=inqqxvGd7u6tePoKbJMw@mail.gmail.com Whole thread Raw |
In response to | Re: [GENERAL] time series data (Schneider <schneider@ardentperf.com>) |
List | pgsql-general |
Thanks a lot Jeremy, we ended up integrating the code you provided into our software (just before you patent it) :) Best regards, Kkh On Tue, Oct 3, 2017 at 7:58 PM, Schneider <schneider@ardentperf.com> wrote: > On Mon, Oct 2, 2017 at 10:27 AM, Khalil Khamlichi > <khamlichi.khalil@gmail.com> wrote: >> we have records like this >> >> ccdb1=# select user_name, agent_status, event_time from cc_events ; >> >> user_name | agent_status | event_time >> -----------+--------------+--------------------- >> user1 | ready | 2017-01-01 10:00:00 >> user1 | talking | 2017-01-01 10:02:00 >> user1 | after_call | 2017-01-01 10:08:00 >> user1 | ready | 2017-01-01 10:10:00 >> user1 | talking | 2017-01-01 10:12:00 >> user1 | after_call | 2017-01-01 10:15:00 >> user1 | paused | 2017-01-01 10:17:00 >> user1 | ready | 2017-01-01 10:25:00 >> user1 | talking | 2017-01-01 10:26:00 >> (9 rows) >> >> >> so user1 was READY at 2017-01-01 10:00:00 then he received a call >> that he attended at 2017-01-01 10:02:00 and so on ... >> so user1 was ready for 2 minutes, then he was talking for 6 minutes >> then he was in after_call (doing after call work) for 2 minutes and >> this is the kind of information we want to query. >> >> my solution so far that I came with, is in my table I have 1 more >> field : end_time >> so when an event comes in and before the insert I do : >> update cc_events set end_time = current_timestamp where user_name = >> 'user_of_event' and end_time is null; >> >> then I insert new event leaving the end_time as null so that next >> event will update it and so on. >> >> its working fine, I have the start and end times for each event, its >> not too painful to query (sum(end-start) while grouping by user_name, >> agent_status), but its one more update on the table and also limited >> in what you can query about, >> >> I know this must be a common problem in every software that deals with >> events, so I suppose something is already built-in in postgres to deal >> with it. > > Khalil, changing your schema is one solution with certain benefits - > but it's definitely not necessary when you have the power of > PostgreSQL at your fingertips. You can solve your problem without > changing anything at all. :) > > All you need is a window function: > https://www.postgresql.org/docs/9.6/static/tutorial-window.html > > Here's an example which I just now tested on 9.6.3, 9.5.7, 9.4.12 and > 9.3.17 (all the latest minors currently supported on RDS). You can > try these queries on your own system; they should work anywhere. > > First, I added a second user to your data set to make sure we were > handling that case correctly. > > ========== > create table cc_events (user_name varchar(8), agent_status > varchar(20), event_time timestamp); > > insert into cc_events values > ('user1', 'ready', '2017-01-01 10:00:00'), > ('user1', 'talking', '2017-01-01 10:02:00'), > ('user2', 'ready', '2017-01-01 10:04:00'), > ('user2', 'talking', '2017-01-01 10:05:00'), > ('user1', 'after_call', '2017-01-01 10:07:00'), > ('user1', 'ready', '2017-01-01 10:08:00'), > ('user1', 'talking', '2017-01-01 10:10:00'), > ('user1', 'after_call', '2017-01-01 10:15:00'), > ('user2', 'after_call', '2017-01-01 10:18:00'), > ('user1', 'paused', '2017-01-01 10:20:00'), > ('user2', 'paused', '2017-01-01 10:21:00'); > > select * from cc_events order by user_name, event_time; > > ========== > here's a basic window function in action: > > ========== > select user_name, agent_status, event_time, > lead(event_time) over (partition by user_name order by event_time) > next_event_time > from cc_events order by event_time; > > user_name | agent_status | event_time | next_event_time > -----------+--------------+---------------------+--------------------- > user1 | ready | 2017-01-01 10:00:00 | 2017-01-01 10:02:00 > user1 | talking | 2017-01-01 10:02:00 | 2017-01-01 10:07:00 > user2 | ready | 2017-01-01 10:04:00 | 2017-01-01 10:05:00 > user2 | talking | 2017-01-01 10:05:00 | 2017-01-01 10:18:00 > user1 | after_call | 2017-01-01 10:07:00 | 2017-01-01 10:08:00 > user1 | ready | 2017-01-01 10:08:00 | 2017-01-01 10:10:00 > user1 | talking | 2017-01-01 10:10:00 | 2017-01-01 10:15:00 > user1 | after_call | 2017-01-01 10:15:00 | 2017-01-01 10:20:00 > user2 | after_call | 2017-01-01 10:18:00 | 2017-01-01 10:21:00 > user1 | paused | 2017-01-01 10:20:00 | > user2 | paused | 2017-01-01 10:21:00 | > > ========== > and now we just add one more column which does the subtraction to > calculate the duration: > > ========== > select user_name, agent_status, event_time, > lead(event_time) over (partition by user_name order by event_time) > next_event_time, > (lead(event_time) over (partition by user_name order by event_time)) > - event_time as duration > from cc_events order by event_time; > > user_name | agent_status | event_time | next_event_time | duration > -----------+--------------+---------------------+---------------------+---------- > user1 | ready | 2017-01-01 10:00:00 | 2017-01-01 10:02:00 | 00:02:00 > user1 | talking | 2017-01-01 10:02:00 | 2017-01-01 10:07:00 | 00:05:00 > user2 | ready | 2017-01-01 10:04:00 | 2017-01-01 10:05:00 | 00:01:00 > user2 | talking | 2017-01-01 10:05:00 | 2017-01-01 10:18:00 | 00:13:00 > user1 | after_call | 2017-01-01 10:07:00 | 2017-01-01 10:08:00 | 00:01:00 > user1 | ready | 2017-01-01 10:08:00 | 2017-01-01 10:10:00 | 00:02:00 > user1 | talking | 2017-01-01 10:10:00 | 2017-01-01 10:15:00 | 00:05:00 > user1 | after_call | 2017-01-01 10:15:00 | 2017-01-01 10:20:00 | 00:05:00 > user2 | after_call | 2017-01-01 10:18:00 | 2017-01-01 10:21:00 | 00:03:00 > user1 | paused | 2017-01-01 10:20:00 | | > user2 | paused | 2017-01-01 10:21:00 | | > > ========== > it might also be convenient to wrap the window function in a common > table expression > > https://www.postgresql.org/docs/9.6/static/queries-with.html > > ========== > with calculate_next_events as ( > select user_name, agent_status, event_time, > lead(event_time) over (partition by user_name order by event_time) > next_event_time > from cc_events order by event_time > ) > select user_name, agent_status, next_event_time-event_time duration > from calculate_next_events order by event_time; > > ========== > Finally, if you really want to supercharge this and power-up even > more, besides temporal databases you might check out this recent blog > post about implementing a state machine in postgresql... it's really > interesting and closely related to what you're solving. > > https://felixge.de/2017/07/27/implementing-state-machines-in-postgresql.html > > Hope this is helpful. Great to see that you're working on PostgreSQL - > it's a powerful engine to build with! > > -Jeremy > > -- > http://about.me/jeremy_schneider -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
pgsql-general by date: