Re: partitioning a dataset + employing hysteresis condition - Mailing list pgsql-general

From David Johnston
Subject Re: partitioning a dataset + employing hysteresis condition
Date
Msg-id B7D14493-1D12-4178-B357-20E79EAB22F5@yahoo.com
Whole thread Raw
In response to Re: partitioning a dataset + employing hysteresis condition  (Amit Dor-Shifer <amit.dor.shifer@gmail.com>)
List pgsql-general
On Nov 14, 2011, at 0:35, Amit Dor-Shifer <amit.dor.shifer@gmail.com> wrote:
On Mon, Nov 14, 2011 at 4:29 PM, Amit Dor-Shifer <amit.dor.shifer@gmail.com> wrote:
Hi,
I've got this table:
create table phone_calls
(
    start_time timestamp,
    device_id integer,
    term_status integer
);

It describes phone call events. A 'term_status' is a sort-of an exit status for the call, whereby a value != 0 indicates some sort of error.
Given that, I wish to retrieve data on devices with a persisting error on them, of a specific type. I.E. that their last term_status was, say 2. I'd like to employ some hysteresis on the query: only consider a device as errorring if:
1. the last "good" (0) term_status pre-dates a "bad" (2) term_status.
2. it has at least N "bad" term_status events following the last "good" one.
3. The time span between the first "bad" term_status event and the last one is >= T minutes

For instance, w/the following data set:

INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '10 minutes', 1, 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '9 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '7 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '6 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '5 minutes', 1, 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '4 minutes', 1, 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '3 minutes', 1, 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '2 minutes', 1, 2, 2);

with N=3, T=3
The query should return device_id 2 as errorring, as it registered 3 "bad" events for at least 3 minutes.

I assume some partitioning needs to be employed here, but am not very sure-footed on the subject.

Would appreciate some guidance.
10x,

... fixed data set:

INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '10 minutes', 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '9 minutes', 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '7 minutes', 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '6 minutes', 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '5 minutes', 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '4 minutes', 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '3 minutes', 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '2 minutes', 2, 2);

While a query is doable how about having another table which you update via a trigger on this table?  Whenever you insert a zero for a device you reset the support table.  Upon inserting a non-zero value you update a second timestamp with when the error occurred. At any point you can query this table for all devices whose error duration is longer than desired.  If you include a counter field to track log entry counts as well.  Build a third table where you can define N and T on a per-device basis and maybe have the support table use a trigger to send out a NOTIFY instead of constantly polling the table.

For a raw query you want the most recent 0 timestamp for each device and then, in the main query, select and count any later entries for the same device.  Use the MAX aggregate on those same records and compare it to the 0 timestamp.

David J.

pgsql-general by date:

Previous
From: Venkat Balaji
Date:
Subject: Re: Incremental backup with RSYNC or something?
Next
From: Frank van Vugt
Date:
Subject: Re: = any((select )) needs cast, why?