Thread: [GENERAL] time series data
Hi everyone,
I have a data stream of a call center application coming in to postgres in this format :
user_name, user_status, event_time
'user1', 'ready', '2017-01-01 10:00:00'
'user1', 'talking', '2017-01-01 10:02:00'
'user1', 'after_call', '2017-01-01 10:07:00'
'user1', 'ready', '2017-01-01 10:08:00'
'user1', 'talking', '2017-01-01 10:10:00'
'user1', 'after_call', '2017-01-01 10:15:00'
'user1', 'paused', '2017-01-01 10:20:00'
...
...
so as you see each new insert of an "event" is in fact the start_time of that event and also the end_time of the previous one so should be used to calculate the duration of this previous one.
What is the best way to get user_status statistics like total duration, frequency, avg ...etc , does any body have an experience with this sort of data streams ?
Thanks in advance.
On Sun, Oct 1, 2017 at 4:17 AM, Khalil Khamlichi <khamlichi.khalil@gmail.com> wrote:
Hi everyone,I have a data stream of a call center application coming in to postgres in this format :user_name, user_status, event_time'user1', 'ready', '2017-01-01 10:00:00''user1', 'talking', '2017-01-01 10:02:00''user1', 'after_call', '2017-01-01 10:07:00''user1', 'ready', '2017-01-01 10:08:00''user1', 'talking', '2017-01-01 10:10:00''user1', 'after_call', '2017-01-01 10:15:00''user1', 'paused', '2017-01-01 10:20:00'......so as you see each new insert of an "event" is in fact the start_time of that event and also the end_time of the previous one so should be used to calculate the duration of this previous one.What is the best way to get user_status statistics like total duration, frequency, avg ...etc , does any body have an experience with this sort of data streams ?Thanks in advance.
Just a suggestion, but here is what I would do.
First, create your tables similar to as follows
CREATE TABLE status
(
call_status varchar(10) NOT NULL,
CONSTRAINT status_pk PRIMARY KEY (call_status)
);
INSERT INTO status
(call_status)
VALUES
('ready'),
('talking'),
('after_call');
CREATE TABLE user_sessions
(
username name NOT NULL,
session_id bigint NOT NULL,
call_status varchar(10) NOT NULL,
call_time timestamp NOT NULL,
CONSTRAINT user_sessions_pk PRIMARY KEY (username, session_id,call_status),
CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status)
REFERENCES status(call_status)
);
Next, you will need to generate a unique session_id for each
user, but only for when call_status is 'ready'. So probably
a table of the form:
CREATE TABLE current_session
(
username name NOT NULL,
session_id serial NOT NULL,
CONSTRAINT current_session_pk PRIMARY KEY (username)
);
Then all you need to do is:
1. Update current_session and get the new session_id each time a user connects (call_status = 'ready'.
Probably best to use a BEFORE trigger to do this, but you will need to code it yourself.
2. You can then do
SELECT username,
age ( (SELECT call_time FROM current_session WHERE call_status = 'talking'),
( SELECT call_time FROM current_session WHERE call_status = 'after_call')
) as duration
FROM user_sessions
WHERE username = 'actual_user_name'
AND session_id = actual_session_id;
SELECT username,
age ( (SELECT call_time FROM current_session WHERE call_status = 'talking'),
( SELECT call_time FROM current_session WHERE call_status = 'after_call')
) as duration
FROM user_sessions
WHERE username = 'actual_user_name'
AND session_id = actual_session_id;
You can use similar queries for avg and frequency.
--
--
Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.
I have a stream that updates every minute with a trigger that updates another table with information from the stream. That way I'm constantly updated with no need to run a script to update before I want a report.
Clifford
On Sun, Oct 1, 2017 at 10:08 AM, Melvin Davidson <melvin6925@gmail.com> wrote:
On Sun, Oct 1, 2017 at 4:17 AM, Khalil Khamlichi <khamlichi.khalil@gmail.com> wrote:Hi everyone,I have a data stream of a call center application coming in to postgres in this format :user_name, user_status, event_time'user1', 'ready', '2017-01-01 10:00:00''user1', 'talking', '2017-01-01 10:02:00''user1', 'after_call', '2017-01-01 10:07:00''user1', 'ready', '2017-01-01 10:08:00''user1', 'talking', '2017-01-01 10:10:00''user1', 'after_call', '2017-01-01 10:15:00''user1', 'paused', '2017-01-01 10:20:00'......so as you see each new insert of an "event" is in fact the start_time of that event and also the end_time of the previous one so should be used to calculate the duration of this previous one.What is the best way to get user_status statistics like total duration, frequency, avg ...etc , does any body have an experience with this sort of data streams ?Thanks in advance.
Just a suggestion, but here is what I would do.
First, create your tables similar to as follows
CREATE TABLE status
(
call_status varchar(10) NOT NULL,
CONSTRAINT status_pk PRIMARY KEY (call_status)
);
INSERT INTO status
(call_status)
VALUES
('ready'),
('talking'),
('after_call');
CREATE TABLE user_sessions
(
username name NOT NULL,
session_id bigint NOT NULL,
call_status varchar(10) NOT NULL,
call_time timestamp NOT NULL,
CONSTRAINT user_sessions_pk PRIMARY KEY (username, session_id,call_status),
CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status)
REFERENCES status(call_status)
);
Next, you will need to generate a unique session_id for each
user, but only for when call_status is 'ready'. So probably
a table of the form:
CREATE TABLE current_session
(
username name NOT NULL,
session_id serial NOT NULL,
CONSTRAINT current_session_pk PRIMARY KEY (username)
);
Then all you need to do is:
1. Update current_session and get the new session_id each time a user connects (call_status = 'ready'.
Probably best to use a BEFORE trigger to do this, but you will need to code it yourself.2. You can then do
SELECT username,
age ( (SELECT call_time FROM current_session WHERE call_status = 'talking'),
( SELECT call_time FROM current_session WHERE call_status = 'after_call')
) as duration
FROM user_sessions
WHERE username = 'actual_user_name'
AND session_id = actual_session_id;You can use similar queries for avg and frequency.
--Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.
On Sun, Oct 1, 2017 at 6:20 PM, Clifford Snow <clifford@snowandsnow.us> wrote:
I have a stream that updates every minute with a trigger that updates another table with information from the stream. That way I'm constantly updated with no need to run a script to update before I want a report.Clifford--On Sun, Oct 1, 2017 at 10:08 AM, Melvin Davidson <melvin6925@gmail.com> wrote:On Sun, Oct 1, 2017 at 4:17 AM, Khalil Khamlichi <khamlichi.khalil@gmail.com> wrote:Hi everyone,I have a data stream of a call center application coming in to postgres in this format :user_name, user_status, event_time'user1', 'ready', '2017-01-01 10:00:00''user1', 'talking', '2017-01-01 10:02:00''user1', 'after_call', '2017-01-01 10:07:00''user1', 'ready', '2017-01-01 10:08:00''user1', 'talking', '2017-01-01 10:10:00''user1', 'after_call', '2017-01-01 10:15:00''user1', 'paused', '2017-01-01 10:20:00'......so as you see each new insert of an "event" is in fact the start_time of that event and also the end_time of the previous one so should be used to calculate the duration of this previous one.What is the best way to get user_status statistics like total duration, frequency, avg ...etc , does any body have an experience with this sort of data streams ?Thanks in advance.
Just a suggestion, but here is what I would do.
First, create your tables similar to as follows
CREATE TABLE status
(
call_status varchar(10) NOT NULL,
CONSTRAINT status_pk PRIMARY KEY (call_status)
);
INSERT INTO status
(call_status)
VALUES
('ready'),
('talking'),
('after_call');
CREATE TABLE user_sessions
(
username name NOT NULL,
session_id bigint NOT NULL,
call_status varchar(10) NOT NULL,
call_time timestamp NOT NULL,
CONSTRAINT user_sessions_pk PRIMARY KEY (username, session_id,call_status),
CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status)
REFERENCES status(call_status)
);
Next, you will need to generate a unique session_id for each
user, but only for when call_status is 'ready'. So probably
a table of the form:
CREATE TABLE current_session
(
username name NOT NULL,
session_id serial NOT NULL,
CONSTRAINT current_session_pk PRIMARY KEY (username)
);
Then all you need to do is:
1. Update current_session and get the new session_id each time a user connects (call_status = 'ready'.
Probably best to use a BEFORE trigger to do this, but you will need to code it yourself.2. You can then do
SELECT username,
age ( (SELECT call_time FROM current_session WHERE call_status = 'talking'),
( SELECT call_time FROM current_session WHERE call_status = 'after_call')
) as duration
FROM user_sessions
WHERE username = 'actual_user_name'
AND session_id = actual_session_id;You can use similar queries for avg and frequency.
--Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.
I thought about the table design for user_sessions and came up
with a better one:
CREATE TABLE user_sessions
(
username name NOT NULL,
session_id bigint NOT NULL,
call_status varchar(10) NOT NULL,
call_ready timestamp NOT NULL,
call_talking timestamp,
call_after_call timestamp,
call_duration interval,
CONSTRAINT user_sessions_pk PRIMARY KEY (username, session_id),
CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status)
REFERENCES status(call_status)
);
So in essence, when the call starts, just do:
INSERT INTO user_sessions
(username, call_ready)
VALUES
('actual_user_name', now() );
Then
SELECT max(session_id) AS current_session
FROM user_sessions
WHERE username = 'actual_user_name';
When talking starts:
UPDATE user_sessions
SET call_status = 'talking',
call_talking = now()
WHERE username = 'actual_user_name'
AND session_id = current_session;
When call ends:
UPDATE user_sessions
SET call_status = 'after_call',
call_after_call = now()
WHERE username = 'actual_user_name'
AND session_id = current_session;
Now all you have to do to get call length is:
SELECT username,
age ( call_after_call, call_talking ) as duration
FROM user_sessions
WHERE username = 'actual_user_name'
AND session_id = current_session;
On Sun, Oct 1, 2017 at 2:17 AM, Khalil Khamlichi <khamlichi.khalil@gmail.com> wrote: > Hi everyone, > > I have a data stream of a call center application coming in to postgres in > this format : > > user_name, user_status, event_time > > 'user1', 'ready', '2017-01-01 10:00:00' > 'user1', 'talking', '2017-01-01 10:02:00' > 'user1', 'after_call', '2017-01-01 10:07:00' > 'user1', 'ready', '2017-01-01 10:08:00' > 'user1', 'talking', '2017-01-01 10:10:00' > 'user1', 'after_call', '2017-01-01 10:15:00' > 'user1', 'paused', '2017-01-01 10:20:00' > ... > ... > > so as you see each new insert of an "event" is in fact the start_time of > that event and also the end_time of the previous one so should be used to > calculate the duration of this previous one. > > What is the best way to get user_status statistics like total duration, > frequency, avg ...etc , does any body have an experience with this sort of > data streams ? Have you looked at temporal_tables extension? It seems custom made for what you're trying to do. http://clarkdave.net/2015/02/historical-records-with-postgresql-and-temporal-tables-and-sql-2011/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
interesting proposition, I am reading the docs.
On Mon, Oct 2, 2017 at 6:08 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Sun, Oct 1, 2017 at 2:17 AM, Khalil Khamlichi
<khamlichi.khalil@gmail.com> wrote:
> Hi everyone,
>
> I have a data stream of a call center application coming in to postgres in
> this format :
>
> user_name, user_status, event_time
>
> 'user1', 'ready', '2017-01-01 10:00:00'
> 'user1', 'talking', '2017-01-01 10:02:00'
> 'user1', 'after_call', '2017-01-01 10:07:00'
> 'user1', 'ready', '2017-01-01 10:08:00'
> 'user1', 'talking', '2017-01-01 10:10:00'
> 'user1', 'after_call', '2017-01-01 10:15:00'
> 'user1', 'paused', '2017-01-01 10:20:00'
> ...
> ...
>
> so as you see each new insert of an "event" is in fact the start_time of
> that event and also the end_time of the previous one so should be used to
> calculate the duration of this previous one.
>
> What is the best way to get user_status statistics like total duration,
> frequency, avg ...etc , does any body have an experience with this sort of
> data streams ?
Have you looked at temporal_tables extension? It seems custom made for
what you're trying to do.
http://clarkdave.net/2015/02/historical-records-with- postgresql-and-temporal- tables-and-sql-2011/
Hi Melvin, Thanks a lot for your help, let me explain to you my problem. we have records like this ccdb1=# select user_name, agent_status, event_time from cc_events ; user_name | agent_status | event_time -----------+--------------+---------------------user1 | ready | 2017-01-01 10:00:00user1 | talking |2017-01-01 10:02:00user1 | after_call | 2017-01-01 10:08:00user1 | ready | 2017-01-01 10:10:00user1 | talking | 2017-01-01 10:12:00user1 | after_call | 2017-01-01 10:15:00user1 | paused | 2017-01-0110:17:00user1 | ready | 2017-01-01 10:25:00user1 | talking | 2017-01-01 10:26:00 (9 rows) so user1 was READY at 2017-01-01 10:00:00 then he received a call that he attended at 2017-01-01 10:02:00 and so on ... so user1 was ready for 2 minutes, then he was talking for 6 minutes then he was in after_call (doing after call work) for 2 minutes and this is the kind of information we want to query. my solution so far that I came with, is in my table I have 1 more field : end_time so when an event comes in and before the insert I do : update cc_events set end_time = current_timestamp where user_name = 'user_of_event' and end_time is null; then I insert new event leaving the end_time as null so that next event will update it and so on. its working fine, I have the start and end times for each event, its not too painful to query (sum(end-start) while grouping by user_name, agent_status), but its one more update on the table and also limited in what you can query about, I know this must be a common problem in every software that deals with events, so I suppose something is already built-in in postgres to deal with it. I looked at your solution, it's very clever and we use something similar but on another module where we manage live calls and route them to available agents. kkh On Mon, Oct 2, 2017 at 4:06 PM, Melvin Davidson <melvin6925@gmail.com> wrote: > > > > On Sun, Oct 1, 2017 at 6:20 PM, Clifford Snow <clifford@snowandsnow.us> wrote: >> >> I have a stream that updates every minute with a trigger that updates another table with information from the stream.That way I'm constantly updated with no need to run a script to update before I want a report. >> >> Clifford >> >> On Sun, Oct 1, 2017 at 10:08 AM, Melvin Davidson <melvin6925@gmail.com> wrote: >>> >>> >>> >>> On Sun, Oct 1, 2017 at 4:17 AM, Khalil Khamlichi <khamlichi.khalil@gmail.com> wrote: >>>> >>>> Hi everyone, >>>> >>>> I have a data stream of a call center application coming in to postgres in this format : >>>> >>>> user_name, user_status, event_time >>>> >>>> 'user1', 'ready', '2017-01-01 10:00:00' >>>> 'user1', 'talking', '2017-01-01 10:02:00' >>>> 'user1', 'after_call', '2017-01-01 10:07:00' >>>> 'user1', 'ready', '2017-01-01 10:08:00' >>>> 'user1', 'talking', '2017-01-01 10:10:00' >>>> 'user1', 'after_call', '2017-01-01 10:15:00' >>>> 'user1', 'paused', '2017-01-01 10:20:00' >>>> ... >>>> ... >>>> >>>> so as you see each new insert of an "event" is in fact the start_time of that event and also the end_time of the previousone so should be used to calculate the duration of this previous one. >>>> >>>> What is the best way to get user_status statistics like total duration, frequency, avg ...etc , does any body have anexperience with this sort of data streams ? >>>> >>>> >>>> Thanks in advance. >>> >>> >>> Just a suggestion, but here is what I would do. >>> First, create your tables similar to as follows >>> >>> CREATE TABLE status >>> ( >>> call_status varchar(10) NOT NULL, >>> CONSTRAINT status_pk PRIMARY KEY (call_status) >>> ); >>> >>> INSERT INTO status >>> (call_status) >>> VALUES >>> ('ready'), >>> ('talking'), >>> ('after_call'); >>> >>> CREATE TABLE user_sessions >>> ( >>> username name NOT NULL, >>> session_id bigint NOT NULL, >>> call_status varchar(10) NOT NULL, >>> call_time timestamp NOT NULL, >>> CONSTRAINT user_sessions_pk PRIMARY KEY (username, session_id,call_status), >>> CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status) >>> REFERENCES status(call_status) >>> ); >>> >>> Next, you will need to generate a unique session_id for each >>> user, but only for when call_status is 'ready'. So probably >>> a table of the form: >>> >>> CREATE TABLE current_session >>> ( >>> username name NOT NULL, >>> session_id serial NOT NULL, >>> CONSTRAINT current_session_pk PRIMARY KEY (username) >>> ); >>> >>> Then all you need to do is: >>> 1. Update current_session and get the new session_id each time a user connects (call_status = 'ready'. >>> Probably best to use a BEFORE trigger to do this, but you will need to code it yourself. >>> >>> 2. You can then do >>> >>> SELECT username, >>> age ( (SELECT call_time FROM current_session WHERE call_status = 'talking'), >>> ( SELECT call_time FROM current_session WHERE call_status = 'after_call') >>> ) as duration >>> FROM user_sessions >>> WHERE username = 'actual_user_name' >>> AND session_id = actual_session_id; >>> >>> You can use similar queries for avg and frequency. >>> >>> -- >>> Melvin Davidson >>> I reserve the right to fantasize. Whether or not you >>> wish to share my fantasy is entirely up to you. >> >> >> >> >> -- >> @osm_seattle >> osm_seattle.snowandsnow.us >> OpenStreetMap: Maps with a human touch > > > I thought about the table design for user_sessions and came up > with a better one: > > CREATE TABLE user_sessions > ( > username name NOT NULL, > session_id bigint NOT NULL, > call_status varchar(10) NOT NULL, > call_ready timestamp NOT NULL, > call_talking timestamp, > call_after_call timestamp, > call_duration interval, > CONSTRAINT user_sessions_pk PRIMARY KEY (username, session_id), > CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status) > REFERENCES status(call_status) > ); > > So in essence, when the call starts, just do: > > INSERT INTO user_sessions > (username, call_ready) > VALUES > ('actual_user_name', now() ); > > Then > SELECT max(session_id) AS current_session > FROM user_sessions > WHERE username = 'actual_user_name'; > > When talking starts: > UPDATE user_sessions > SET call_status = 'talking', > call_talking = now() > WHERE username = 'actual_user_name' > AND session_id = current_session; > > When call ends: > UPDATE user_sessions > SET call_status = 'after_call', > call_after_call = now() > WHERE username = 'actual_user_name' > AND session_id = current_session; > > Now all you have to do to get call length is: > > SELECT username, > age ( call_after_call, call_talking ) as duration > FROM user_sessions > WHERE username = 'actual_user_name' > AND session_id = current_session; > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
You have these choices: - turn events into INSERTs and UPDATES on a table that represents a single call You might have an events VIEW with INSTED OF insert/update triggers so you can insert events as the interface for updatingcalls. - store the events and have a VIEW on the events table that gives you rows that summarize each call - both: store the events and the summaries of the calls You might have an events table with AFTER INSERT triggers to insert or update the corresponding rows in the calls table. Nico -- -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On Mon, Oct 2, 2017 at 10:27 AM, Khalil Khamlichi <khamlichi.khalil@gmail.com> wrote: > we have records like this > > ccdb1=# select user_name, agent_status, event_time from cc_events ; > > user_name | agent_status | event_time > -----------+--------------+--------------------- > user1 | ready | 2017-01-01 10:00:00 > user1 | talking | 2017-01-01 10:02:00 > user1 | after_call | 2017-01-01 10:08:00 > user1 | ready | 2017-01-01 10:10:00 > user1 | talking | 2017-01-01 10:12:00 > user1 | after_call | 2017-01-01 10:15:00 > user1 | paused | 2017-01-01 10:17:00 > user1 | ready | 2017-01-01 10:25:00 > user1 | talking | 2017-01-01 10:26:00 > (9 rows) > > > so user1 was READY at 2017-01-01 10:00:00 then he received a call > that he attended at 2017-01-01 10:02:00 and so on ... > so user1 was ready for 2 minutes, then he was talking for 6 minutes > then he was in after_call (doing after call work) for 2 minutes and > this is the kind of information we want to query. > > my solution so far that I came with, is in my table I have 1 more > field : end_time > so when an event comes in and before the insert I do : > update cc_events set end_time = current_timestamp where user_name = > 'user_of_event' and end_time is null; > > then I insert new event leaving the end_time as null so that next > event will update it and so on. > > its working fine, I have the start and end times for each event, its > not too painful to query (sum(end-start) while grouping by user_name, > agent_status), but its one more update on the table and also limited > in what you can query about, > > I know this must be a common problem in every software that deals with > events, so I suppose something is already built-in in postgres to deal > with it. Khalil, changing your schema is one solution with certain benefits - but it's definitely not necessary when you have the power of PostgreSQL at your fingertips. You can solve your problem without changing anything at all. :) All you need is a window function: https://www.postgresql.org/docs/9.6/static/tutorial-window.html Here's an example which I just now tested on 9.6.3, 9.5.7, 9.4.12 and 9.3.17 (all the latest minors currently supported on RDS). You can try these queries on your own system; they should work anywhere. First, I added a second user to your data set to make sure we were handling that case correctly. ========== create table cc_events (user_name varchar(8), agent_status varchar(20), event_time timestamp); insert into cc_events values ('user1', 'ready', '2017-01-01 10:00:00'), ('user1', 'talking', '2017-01-01 10:02:00'), ('user2', 'ready', '2017-01-01 10:04:00'), ('user2', 'talking', '2017-01-01 10:05:00'), ('user1', 'after_call', '2017-01-01 10:07:00'), ('user1', 'ready', '2017-01-01 10:08:00'), ('user1', 'talking', '2017-01-01 10:10:00'), ('user1', 'after_call', '2017-01-01 10:15:00'), ('user2', 'after_call', '2017-01-01 10:18:00'), ('user1', 'paused', '2017-01-01 10:20:00'), ('user2', 'paused', '2017-01-01 10:21:00'); select * from cc_events order by user_name, event_time; ========== here's a basic window function in action: ========== select user_name, agent_status, event_time, lead(event_time) over (partition by user_name order by event_time) next_event_time from cc_events order by event_time; user_name | agent_status | event_time | next_event_time -----------+--------------+---------------------+---------------------user1 | ready | 2017-01-01 10:00:00 | 2017-01-0110:02:00user1 | talking | 2017-01-01 10:02:00 | 2017-01-01 10:07:00user2 | ready | 2017-01-0110:04:00 | 2017-01-01 10:05:00user2 | talking | 2017-01-01 10:05:00 | 2017-01-01 10:18:00user1 | after_call | 2017-01-01 10:07:00 | 2017-01-01 10:08:00user1 | ready | 2017-01-01 10:08:00 | 2017-01-01 10:10:00user1 | talking | 2017-01-01 10:10:00 | 2017-01-01 10:15:00user1 | after_call | 2017-01-01 10:15:00| 2017-01-01 10:20:00user2 | after_call | 2017-01-01 10:18:00 | 2017-01-01 10:21:00user1 | paused | 2017-01-01 10:20:00 |user2 | paused | 2017-01-01 10:21:00 | ========== and now we just add one more column which does the subtraction to calculate the duration: ========== select user_name, agent_status, event_time, lead(event_time) over (partition by user_name order by event_time) next_event_time, (lead(event_time) over (partition by user_name order by event_time)) - event_time as duration from cc_events order by event_time; user_name | agent_status | event_time | next_event_time | duration -----------+--------------+---------------------+---------------------+----------user1 | ready | 2017-01-01 10:00:00| 2017-01-01 10:02:00 | 00:02:00user1 | talking | 2017-01-01 10:02:00 | 2017-01-01 10:07:00 | 00:05:00user2 | ready | 2017-01-01 10:04:00 | 2017-01-01 10:05:00 | 00:01:00user2 | talking | 2017-01-0110:05:00 | 2017-01-01 10:18:00 | 00:13:00user1 | after_call | 2017-01-01 10:07:00 | 2017-01-01 10:08:00 |00:01:00user1 | ready | 2017-01-01 10:08:00 | 2017-01-01 10:10:00 | 00:02:00user1 | talking | 2017-01-0110:10:00 | 2017-01-01 10:15:00 | 00:05:00user1 | after_call | 2017-01-01 10:15:00 | 2017-01-01 10:20:00 |00:05:00user2 | after_call | 2017-01-01 10:18:00 | 2017-01-01 10:21:00 | 00:03:00user1 | paused | 2017-01-0110:20:00 | |user2 | paused | 2017-01-01 10:21:00 | | ========== it might also be convenient to wrap the window function in a common table expression https://www.postgresql.org/docs/9.6/static/queries-with.html ========== with calculate_next_events as ( select user_name, agent_status, event_time, lead(event_time) over (partition by user_nameorder by event_time) next_event_time from cc_events order by event_time ) select user_name, agent_status, next_event_time-event_time duration from calculate_next_events order by event_time; ========== Finally, if you really want to supercharge this and power-up even more, besides temporal databases you might check out this recent blog post about implementing a state machine in postgresql... it's really interesting and closely related to what you're solving. https://felixge.de/2017/07/27/implementing-state-machines-in-postgresql.html Hope this is helpful. Great to see that you're working on PostgreSQL - it's a powerful engine to build with! -Jeremy -- http://about.me/jeremy_schneider -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On 10/01/2017 01:17 AM, Khalil Khamlichi wrote: > Hi everyone, Take a look at TimescaleDB they have an extension to Postgres that makes this awesome (and yes its free and open source). jD > > I have a data stream of a call center application coming in to postgres > in this format : > > user_name, user_status, event_time > > 'user1', 'ready', '2017-01-01 10:00:00' > 'user1', 'talking', '2017-01-01 10:02:00' > 'user1', 'after_call', '2017-01-01 10:07:00' > 'user1', 'ready', '2017-01-01 10:08:00' > 'user1', 'talking', '2017-01-01 10:10:00' > 'user1', 'after_call', '2017-01-01 10:15:00' > 'user1', 'paused', '2017-01-01 10:20:00' > ... > ... > > so as you see each new insert of an "event" is in fact the start_time of > that event and also the end_time of the previous one so should be used > to calculate the duration of this previous one. > > What is the best way to get user_status statistics like total duration, > frequency, avg ...etc , does any body have an experience with this sort > of data streams ? > > > Thanks in advance. -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Thanks, I'll check it out.
Sent via mobile, please forgive typos and brevity
On Oct 14, 2017 3:23 PM, "Joshua D. Drake" <jd@commandprompt.com> wrote:
On 10/01/2017 01:17 AM, Khalil Khamlichi wrote:Hi everyone,
Take a look at TimescaleDB they have an extension to Postgres that makes this awesome (and yes its free and open source).
jD
I have a data stream of a call center application coming in to postgres in this format :
user_name, user_status, event_time
'user1', 'ready', '2017-01-01 10:00:00'
'user1', 'talking', '2017-01-01 10:02:00'
'user1', 'after_call', '2017-01-01 10:07:00'
'user1', 'ready', '2017-01-01 10:08:00'
'user1', 'talking', '2017-01-01 10:10:00'
'user1', 'after_call', '2017-01-01 10:15:00'
'user1', 'paused', '2017-01-01 10:20:00'
...
...
so as you see each new insert of an "event" is in fact the start_time of that event and also the end_time of the previous one so should be used to calculate the duration of this previous one.
What is the best way to get user_status statistics like total duration, frequency, avg ...etc , does any body have an experience with this sort of data streams ?
Thanks in advance.
--
Command Prompt, Inc. http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Thanks a lot Jeremy, we ended up integrating the code you provided into our software (just before you patent it) :) Best regards, Kkh On Tue, Oct 3, 2017 at 7:58 PM, Schneider <schneider@ardentperf.com> wrote: > On Mon, Oct 2, 2017 at 10:27 AM, Khalil Khamlichi > <khamlichi.khalil@gmail.com> wrote: >> we have records like this >> >> ccdb1=# select user_name, agent_status, event_time from cc_events ; >> >> user_name | agent_status | event_time >> -----------+--------------+--------------------- >> user1 | ready | 2017-01-01 10:00:00 >> user1 | talking | 2017-01-01 10:02:00 >> user1 | after_call | 2017-01-01 10:08:00 >> user1 | ready | 2017-01-01 10:10:00 >> user1 | talking | 2017-01-01 10:12:00 >> user1 | after_call | 2017-01-01 10:15:00 >> user1 | paused | 2017-01-01 10:17:00 >> user1 | ready | 2017-01-01 10:25:00 >> user1 | talking | 2017-01-01 10:26:00 >> (9 rows) >> >> >> so user1 was READY at 2017-01-01 10:00:00 then he received a call >> that he attended at 2017-01-01 10:02:00 and so on ... >> so user1 was ready for 2 minutes, then he was talking for 6 minutes >> then he was in after_call (doing after call work) for 2 minutes and >> this is the kind of information we want to query. >> >> my solution so far that I came with, is in my table I have 1 more >> field : end_time >> so when an event comes in and before the insert I do : >> update cc_events set end_time = current_timestamp where user_name = >> 'user_of_event' and end_time is null; >> >> then I insert new event leaving the end_time as null so that next >> event will update it and so on. >> >> its working fine, I have the start and end times for each event, its >> not too painful to query (sum(end-start) while grouping by user_name, >> agent_status), but its one more update on the table and also limited >> in what you can query about, >> >> I know this must be a common problem in every software that deals with >> events, so I suppose something is already built-in in postgres to deal >> with it. > > Khalil, changing your schema is one solution with certain benefits - > but it's definitely not necessary when you have the power of > PostgreSQL at your fingertips. You can solve your problem without > changing anything at all. :) > > All you need is a window function: > https://www.postgresql.org/docs/9.6/static/tutorial-window.html > > Here's an example which I just now tested on 9.6.3, 9.5.7, 9.4.12 and > 9.3.17 (all the latest minors currently supported on RDS). You can > try these queries on your own system; they should work anywhere. > > First, I added a second user to your data set to make sure we were > handling that case correctly. > > ========== > create table cc_events (user_name varchar(8), agent_status > varchar(20), event_time timestamp); > > insert into cc_events values > ('user1', 'ready', '2017-01-01 10:00:00'), > ('user1', 'talking', '2017-01-01 10:02:00'), > ('user2', 'ready', '2017-01-01 10:04:00'), > ('user2', 'talking', '2017-01-01 10:05:00'), > ('user1', 'after_call', '2017-01-01 10:07:00'), > ('user1', 'ready', '2017-01-01 10:08:00'), > ('user1', 'talking', '2017-01-01 10:10:00'), > ('user1', 'after_call', '2017-01-01 10:15:00'), > ('user2', 'after_call', '2017-01-01 10:18:00'), > ('user1', 'paused', '2017-01-01 10:20:00'), > ('user2', 'paused', '2017-01-01 10:21:00'); > > select * from cc_events order by user_name, event_time; > > ========== > here's a basic window function in action: > > ========== > select user_name, agent_status, event_time, > lead(event_time) over (partition by user_name order by event_time) > next_event_time > from cc_events order by event_time; > > user_name | agent_status | event_time | next_event_time > -----------+--------------+---------------------+--------------------- > user1 | ready | 2017-01-01 10:00:00 | 2017-01-01 10:02:00 > user1 | talking | 2017-01-01 10:02:00 | 2017-01-01 10:07:00 > user2 | ready | 2017-01-01 10:04:00 | 2017-01-01 10:05:00 > user2 | talking | 2017-01-01 10:05:00 | 2017-01-01 10:18:00 > user1 | after_call | 2017-01-01 10:07:00 | 2017-01-01 10:08:00 > user1 | ready | 2017-01-01 10:08:00 | 2017-01-01 10:10:00 > user1 | talking | 2017-01-01 10:10:00 | 2017-01-01 10:15:00 > user1 | after_call | 2017-01-01 10:15:00 | 2017-01-01 10:20:00 > user2 | after_call | 2017-01-01 10:18:00 | 2017-01-01 10:21:00 > user1 | paused | 2017-01-01 10:20:00 | > user2 | paused | 2017-01-01 10:21:00 | > > ========== > and now we just add one more column which does the subtraction to > calculate the duration: > > ========== > select user_name, agent_status, event_time, > lead(event_time) over (partition by user_name order by event_time) > next_event_time, > (lead(event_time) over (partition by user_name order by event_time)) > - event_time as duration > from cc_events order by event_time; > > user_name | agent_status | event_time | next_event_time | duration > -----------+--------------+---------------------+---------------------+---------- > user1 | ready | 2017-01-01 10:00:00 | 2017-01-01 10:02:00 | 00:02:00 > user1 | talking | 2017-01-01 10:02:00 | 2017-01-01 10:07:00 | 00:05:00 > user2 | ready | 2017-01-01 10:04:00 | 2017-01-01 10:05:00 | 00:01:00 > user2 | talking | 2017-01-01 10:05:00 | 2017-01-01 10:18:00 | 00:13:00 > user1 | after_call | 2017-01-01 10:07:00 | 2017-01-01 10:08:00 | 00:01:00 > user1 | ready | 2017-01-01 10:08:00 | 2017-01-01 10:10:00 | 00:02:00 > user1 | talking | 2017-01-01 10:10:00 | 2017-01-01 10:15:00 | 00:05:00 > user1 | after_call | 2017-01-01 10:15:00 | 2017-01-01 10:20:00 | 00:05:00 > user2 | after_call | 2017-01-01 10:18:00 | 2017-01-01 10:21:00 | 00:03:00 > user1 | paused | 2017-01-01 10:20:00 | | > user2 | paused | 2017-01-01 10:21:00 | | > > ========== > it might also be convenient to wrap the window function in a common > table expression > > https://www.postgresql.org/docs/9.6/static/queries-with.html > > ========== > with calculate_next_events as ( > select user_name, agent_status, event_time, > lead(event_time) over (partition by user_name order by event_time) > next_event_time > from cc_events order by event_time > ) > select user_name, agent_status, next_event_time-event_time duration > from calculate_next_events order by event_time; > > ========== > Finally, if you really want to supercharge this and power-up even > more, besides temporal databases you might check out this recent blog > post about implementing a state machine in postgresql... it's really > interesting and closely related to what you're solving. > > https://felixge.de/2017/07/27/implementing-state-machines-in-postgresql.html > > Hope this is helpful. Great to see that you're working on PostgreSQL - > it's a powerful engine to build with! > > -Jeremy > > -- > http://about.me/jeremy_schneider -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general