Re: [GENERAL] time series data - Mailing list pgsql-general

From Melvin Davidson
Subject Re: [GENERAL] time series data
Date
Msg-id CANu8FiyxwQ6S0T-2dWhp0GksvJ85Ec9yqfpoeZ8Yb9LjaMs45A@mail.gmail.com
Whole thread Raw
In response to Re: [GENERAL] time series data  (Clifford Snow <clifford@snowandsnow.us>)
Responses Re: [GENERAL] time series data  (Khalil Khamlichi <khamlichi.khalil@gmail.com>)
List pgsql-general


On Sun, Oct 1, 2017 at 6:20 PM, Clifford Snow <clifford@snowandsnow.us> wrote:
I have a stream that updates every minute with a trigger that updates another table with information from the stream. That way I'm constantly updated with no need to run a script to update before I want a report.

Clifford

On Sun, Oct 1, 2017 at 10:08 AM, Melvin Davidson <melvin6925@gmail.com> wrote:


On Sun, Oct 1, 2017 at 4:17 AM, Khalil Khamlichi <khamlichi.khalil@gmail.com> wrote:
Hi everyone,

I have a data stream of a call center application coming in  to postgres in this format :

user_name, user_status, event_time

'user1', 'ready', '2017-01-01 10:00:00'
'user1', 'talking', '2017-01-01 10:02:00'
'user1', 'after_call', '2017-01-01 10:07:00'
'user1', 'ready', '2017-01-01 10:08:00'
'user1', 'talking', '2017-01-01 10:10:00'
'user1', 'after_call', '2017-01-01 10:15:00'
'user1', 'paused', '2017-01-01 10:20:00'
...
...

so as you see each new insert of an "event" is in fact the start_time of that event and also the end_time of the previous one so should be used to calculate the duration of this previous one.

What is the best way to get user_status statistics like total duration, frequency, avg ...etc , does any body have an experience with this sort of data streams ?


Thanks in advance.

Just a suggestion, but here is what I would do.
First, create your tables similar to as follows

CREATE TABLE status
(
 call_status    varchar(10) NOT NULL,
 CONSTRAINT status_pk PRIMARY KEY (call_status)
);

INSERT INTO status
(call_status)
VALUES
('ready'),
('talking'),
('after_call');

CREATE TABLE user_sessions
(
 username        name        NOT NULL,
 session_id        bigint        NOT NULL,
 call_status     varchar(10) NOT NULL,
 call_time        timestamp NOT NULL,
 CONSTRAINT user_sessions_pk PRIMARY KEY (username, session_id,call_status),
 CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status)
    REFERENCES status(call_status)
);

Next, you will need to generate a unique session_id for each
user, but only for when call_status is 'ready'. So probably
a table of the form:

CREATE TABLE current_session
(
username        name        NOT NULL,
session_id        serial        NOT NULL,
CONSTRAINT current_session_pk PRIMARY KEY (username)
);

Then all you need to do is:
1. Update current_session and get the new session_id each time a user connects (call_status = 'ready'.
   Probably best to use a BEFORE trigger to do this, but you will need to code it yourself.

2. You can then do

SELECT username,
       age ( (SELECT call_time FROM current_session WHERE call_status = 'talking'),
              ( SELECT call_time FROM current_session WHERE call_status = 'after_call')
           ) as duration
  FROM user_sessions
 WHERE username = 'actual_user_name'
   AND session_id =  actual_session_id;  

You can use similar queries for avg and frequency.

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.




--
@osm_seattle
OpenStreetMap: Maps with a human touch

I thought about the table design for user_sessions and came up
with a better one:

CREATE TABLE user_sessions
(
 username      name          NOT NULL,
 session_id     bigint           NOT NULL,
 call_status     varchar(10) NOT NULL,
 call_ready      timestamp   NOT NULL,
 call_talking     timestamp,
 call_after_call timestamp,
 call_duration   interval,
 CONSTRAINT user_sessions_pk PRIMARY KEY (username, session_id),
 CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status)
    REFERENCES status(call_status)
);

So in essence, when the call starts, just do:

INSERT INTO user_sessions
(username, call_ready)
VALUES
('actual_user_name', now() );

Then
SELECT max(session_id) AS current_session
  FROM user_sessions
 WHERE username = 'actual_user_name';
   
When talking starts:
UPDATE user_sessions
   SET call_status = 'talking',
       call_talking = now()
 WHERE username = 'actual_user_name'
   AND session_id = current_session;

When call ends:
UPDATE user_sessions
   SET call_status = 'after_call',
       call_after_call = now()
 WHERE username = 'actual_user_name'
   AND session_id = current_session;
   
Now all you have to do to get call length is:

SELECT username,
       age ( call_after_call, call_talking ) as duration
  FROM user_sessions
 WHERE username = 'actual_user_name'
   AND session_id =  current_session;

pgsql-general by date:

Previous
From: Guyren Howe
Date:
Subject: [GENERAL] Setting search_path ignored
Next
From: "David G. Johnston"
Date:
Subject: Re: [GENERAL] Setting search_path ignored