Re: [GENERAL] time series data - Mailing list pgsql-general

From Khalil Khamlichi
Subject Re: [GENERAL] time series data
Date
Msg-id CAEK98WY1hfjiLKZAwUdRk-pLuNgnetFgRUqdqA3JFW6fo1_8Bw@mail.gmail.com
Whole thread Raw
In response to Re: [GENERAL] time series data  (Melvin Davidson <melvin6925@gmail.com>)
Responses Re: [GENERAL] time series data  (Schneider <schneider@ardentperf.com>)
List pgsql-general
Hi Melvin,

Thanks a lot for your help,

let me explain to you my problem.

we have records like this

ccdb1=# select user_name, agent_status, event_time from cc_events ;
user_name | agent_status |     event_time
-----------+--------------+---------------------user1     | ready        | 2017-01-01 10:00:00user1     | talking
|2017-01-01 10:02:00user1     | after_call   | 2017-01-01 10:08:00user1     | ready        | 2017-01-01 10:10:00user1
 | talking      | 2017-01-01 10:12:00user1     | after_call   | 2017-01-01 10:15:00user1     | paused       |
2017-01-0110:17:00user1     | ready        | 2017-01-01 10:25:00user1     | talking      | 2017-01-01 10:26:00
 
(9 rows)


so user1 was READY at  2017-01-01 10:00:00 then he received a call
that he attended at 2017-01-01 10:02:00 and so on ...
so user1 was ready for 2 minutes, then he was talking for 6 minutes
then he was in after_call (doing after call work)  for 2 minutes and
this is the kind of information we want to query.

my solution so far that I came with, is in my table I have 1 more
field : end_time
so when an event comes in and before the insert I do :
update cc_events set end_time = current_timestamp  where user_name =
'user_of_event' and end_time is null;

then I insert new event leaving the end_time as null so that next
event will update it and so on.

its working fine, I have the start and end times for each event, its
not too painful to query (sum(end-start) while grouping by user_name,
agent_status), but its one more update on the table and also limited
in what you can query about,

I know this must be a common problem in every software that deals with
events, so I suppose something is already built-in in postgres to deal
with it.

I looked at your solution, it's very clever and we use something
similar but on another module where we manage live calls and route
them to available agents.

kkh


On Mon, Oct 2, 2017 at 4:06 PM, Melvin Davidson <melvin6925@gmail.com> wrote:
>
>
>
> On Sun, Oct 1, 2017 at 6:20 PM, Clifford Snow <clifford@snowandsnow.us> wrote:
>>
>> I have a stream that updates every minute with a trigger that updates another table with information from the
stream.That way I'm constantly updated with no need to run a script to update before I want a report.
 
>>
>> Clifford
>>
>> On Sun, Oct 1, 2017 at 10:08 AM, Melvin Davidson <melvin6925@gmail.com> wrote:
>>>
>>>
>>>
>>> On Sun, Oct 1, 2017 at 4:17 AM, Khalil Khamlichi <khamlichi.khalil@gmail.com> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> I have a data stream of a call center application coming in  to postgres in this format :
>>>>
>>>> user_name, user_status, event_time
>>>>
>>>> 'user1', 'ready', '2017-01-01 10:00:00'
>>>> 'user1', 'talking', '2017-01-01 10:02:00'
>>>> 'user1', 'after_call', '2017-01-01 10:07:00'
>>>> 'user1', 'ready', '2017-01-01 10:08:00'
>>>> 'user1', 'talking', '2017-01-01 10:10:00'
>>>> 'user1', 'after_call', '2017-01-01 10:15:00'
>>>> 'user1', 'paused', '2017-01-01 10:20:00'
>>>> ...
>>>> ...
>>>>
>>>> so as you see each new insert of an "event" is in fact the start_time of that event and also the end_time of the
previousone so should be used to calculate the duration of this previous one.
 
>>>>
>>>> What is the best way to get user_status statistics like total duration, frequency, avg ...etc , does any body have
anexperience with this sort of data streams ?
 
>>>>
>>>>
>>>> Thanks in advance.
>>>
>>>
>>> Just a suggestion, but here is what I would do.
>>> First, create your tables similar to as follows
>>>
>>> CREATE TABLE status
>>> (
>>>  call_status    varchar(10) NOT NULL,
>>>  CONSTRAINT status_pk PRIMARY KEY (call_status)
>>> );
>>>
>>> INSERT INTO status
>>> (call_status)
>>> VALUES
>>> ('ready'),
>>> ('talking'),
>>> ('after_call');
>>>
>>> CREATE TABLE user_sessions
>>> (
>>>  username        name        NOT NULL,
>>>  session_id        bigint        NOT NULL,
>>>  call_status     varchar(10) NOT NULL,
>>>  call_time        timestamp NOT NULL,
>>>  CONSTRAINT user_sessions_pk PRIMARY KEY (username, session_id,call_status),
>>>  CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status)
>>>     REFERENCES status(call_status)
>>> );
>>>
>>> Next, you will need to generate a unique session_id for each
>>> user, but only for when call_status is 'ready'. So probably
>>> a table of the form:
>>>
>>> CREATE TABLE current_session
>>> (
>>> username        name        NOT NULL,
>>> session_id        serial        NOT NULL,
>>> CONSTRAINT current_session_pk PRIMARY KEY (username)
>>> );
>>>
>>> Then all you need to do is:
>>> 1. Update current_session and get the new session_id each time a user connects (call_status = 'ready'.
>>>    Probably best to use a BEFORE trigger to do this, but you will need to code it yourself.
>>>
>>> 2. You can then do
>>>
>>> SELECT username,
>>>        age ( (SELECT call_time FROM current_session WHERE call_status = 'talking'),
>>>               ( SELECT call_time FROM current_session WHERE call_status = 'after_call')
>>>            ) as duration
>>>   FROM user_sessions
>>>  WHERE username = 'actual_user_name'
>>>    AND session_id =  actual_session_id;
>>>
>>> You can use similar queries for avg and frequency.
>>>
>>> --
>>> Melvin Davidson
>>> I reserve the right to fantasize.  Whether or not you
>>> wish to share my fantasy is entirely up to you.
>>
>>
>>
>>
>> --
>> @osm_seattle
>> osm_seattle.snowandsnow.us
>> OpenStreetMap: Maps with a human touch
>
>
> I thought about the table design for user_sessions and came up
> with a better one:
>
> CREATE TABLE user_sessions
> (
>  username      name          NOT NULL,
>  session_id     bigint           NOT NULL,
>  call_status     varchar(10) NOT NULL,
>  call_ready      timestamp   NOT NULL,
>  call_talking     timestamp,
>  call_after_call timestamp,
>  call_duration   interval,
>  CONSTRAINT user_sessions_pk PRIMARY KEY (username, session_id),
>  CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status)
>     REFERENCES status(call_status)
> );
>
> So in essence, when the call starts, just do:
>
> INSERT INTO user_sessions
> (username, call_ready)
> VALUES
> ('actual_user_name', now() );
>
> Then
> SELECT max(session_id) AS current_session
>   FROM user_sessions
>  WHERE username = 'actual_user_name';
>
> When talking starts:
> UPDATE user_sessions
>    SET call_status = 'talking',
>        call_talking = now()
>  WHERE username = 'actual_user_name'
>    AND session_id = current_session;
>
> When call ends:
> UPDATE user_sessions
>    SET call_status = 'after_call',
>        call_after_call = now()
>  WHERE username = 'actual_user_name'
>    AND session_id = current_session;
>
> Now all you have to do to get call length is:
>
> SELECT username,
>        age ( call_after_call, call_talking ) as duration
>   FROM user_sessions
>  WHERE username = 'actual_user_name'
>    AND session_id =  current_session;
>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

pgsql-general by date:

Previous
From: Khalil Khamlichi
Date:
Subject: Re: [GENERAL] time series data
Next
From: Nico Williams
Date:
Subject: Re: [GENERAL] time series data