Thread: partitioning a dataset + employing hysteresis condition

partitioning a dataset + employing hysteresis condition

From
Amit Dor-Shifer
Date:
Hi,
I've got this table:
create table phone_calls
(
    start_time timestamp,
    device_id integer,
    term_status integer
);

It describes phone call events. A 'term_status' is a sort-of an exit status for the call, whereby a value != 0 indicates some sort of error.
Given that, I wish to retrieve data on devices with a persisting error on them, of a specific type. I.E. that their last term_status was, say 2. I'd like to employ some hysteresis on the query: only consider a device as errorring if:
1. the last "good" (0) term_status pre-dates a "bad" (2) term_status.
2. it has at least N "bad" term_status events following the last "good" one.
3. The time span between the first "bad" term_status event and the last one is >= T minutes

For instance, w/the following data set:

INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '10 minutes', 1, 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '9 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '7 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '6 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '5 minutes', 1, 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '4 minutes', 1, 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '3 minutes', 1, 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '2 minutes', 1, 2, 2);

with N=3, T=3
The query should return device_id 2 as errorring, as it registered 3 "bad" events for at least 3 minutes.

I assume some partitioning needs to be employed here, but am not very sure-footed on the subject.

Would appreciate some guidance.
10x,

Re: partitioning a dataset + employing hysteresis condition

From
Amit Dor-Shifer
Date:

On Mon, Nov 14, 2011 at 4:29 PM, Amit Dor-Shifer <amit.dor.shifer@gmail.com> wrote:
Hi,
I've got this table:
create table phone_calls
(
    start_time timestamp,
    device_id integer,
    term_status integer
);

It describes phone call events. A 'term_status' is a sort-of an exit status for the call, whereby a value != 0 indicates some sort of error.
Given that, I wish to retrieve data on devices with a persisting error on them, of a specific type. I.E. that their last term_status was, say 2. I'd like to employ some hysteresis on the query: only consider a device as errorring if:
1. the last "good" (0) term_status pre-dates a "bad" (2) term_status.
2. it has at least N "bad" term_status events following the last "good" one.
3. The time span between the first "bad" term_status event and the last one is >= T minutes

For instance, w/the following data set:

INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '10 minutes', 1, 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '9 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '7 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '6 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '5 minutes', 1, 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '4 minutes', 1, 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '3 minutes', 1, 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '2 minutes', 1, 2, 2);

with N=3, T=3
The query should return device_id 2 as errorring, as it registered 3 "bad" events for at least 3 minutes.

I assume some partitioning needs to be employed here, but am not very sure-footed on the subject.

Would appreciate some guidance.
10x,

... fixed data set:

INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '10 minutes', 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '9 minutes', 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '7 minutes', 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '6 minutes', 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '5 minutes', 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '4 minutes', 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '3 minutes', 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '2 minutes', 2, 2);

Re: partitioning a dataset + employing hysteresis condition

From
David Johnston
Date:
On Nov 14, 2011, at 0:35, Amit Dor-Shifer <amit.dor.shifer@gmail.com> wrote:
On Mon, Nov 14, 2011 at 4:29 PM, Amit Dor-Shifer <amit.dor.shifer@gmail.com> wrote:
Hi,
I've got this table:
create table phone_calls
(
    start_time timestamp,
    device_id integer,
    term_status integer
);

It describes phone call events. A 'term_status' is a sort-of an exit status for the call, whereby a value != 0 indicates some sort of error.
Given that, I wish to retrieve data on devices with a persisting error on them, of a specific type. I.E. that their last term_status was, say 2. I'd like to employ some hysteresis on the query: only consider a device as errorring if:
1. the last "good" (0) term_status pre-dates a "bad" (2) term_status.
2. it has at least N "bad" term_status events following the last "good" one.
3. The time span between the first "bad" term_status event and the last one is >= T minutes

For instance, w/the following data set:

INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '10 minutes', 1, 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '9 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '7 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '6 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '5 minutes', 1, 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '4 minutes', 1, 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '3 minutes', 1, 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '2 minutes', 1, 2, 2);

with N=3, T=3
The query should return device_id 2 as errorring, as it registered 3 "bad" events for at least 3 minutes.

I assume some partitioning needs to be employed here, but am not very sure-footed on the subject.

Would appreciate some guidance.
10x,

... fixed data set:

INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '10 minutes', 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '9 minutes', 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '7 minutes', 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '6 minutes', 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '5 minutes', 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '4 minutes', 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '3 minutes', 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '2 minutes', 2, 2);

While a query is doable how about having another table which you update via a trigger on this table?  Whenever you insert a zero for a device you reset the support table.  Upon inserting a non-zero value you update a second timestamp with when the error occurred. At any point you can query this table for all devices whose error duration is longer than desired.  If you include a counter field to track log entry counts as well.  Build a third table where you can define N and T on a per-device basis and maybe have the support table use a trigger to send out a NOTIFY instead of constantly polling the table.

For a raw query you want the most recent 0 timestamp for each device and then, in the main query, select and count any later entries for the same device.  Use the MAX aggregate on those same records and compare it to the 0 timestamp.

David J.

Re: partitioning a dataset + employing hysteresis condition

From
Gavin Flower
Date:
On 14/11/11 18:35, Amit Dor-Shifer wrote:

On Mon, Nov 14, 2011 at 4:29 PM, Amit Dor-Shifer <amit.dor.shifer@gmail.com> wrote:
Hi,
I've got this table:
create table phone_calls
(
    start_time timestamp,
    device_id integer,
    term_status integer
);

It describes phone call events. A 'term_status' is a sort-of an exit status for the call, whereby a value != 0 indicates some sort of error.
Given that, I wish to retrieve data on devices with a persisting error on them, of a specific type. I.E. that their last term_status was, say 2. I'd like to employ some hysteresis on the query: only consider a device as errorring if:
1. the last "good" (0) term_status pre-dates a "bad" (2) term_status.
2. it has at least N "bad" term_status events following the last "good" one.
3. The time span between the first "bad" term_status event and the last one is >= T minutes

For instance, w/the following data set:

INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '10 minutes', 1, 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '9 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '7 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '6 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '5 minutes', 1, 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '4 minutes', 1, 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '3 minutes', 1, 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '2 minutes', 1, 2, 2);

with N=3, T=3
The query should return device_id 2 as errorring, as it registered 3 "bad" events for at least 3 minutes.

I assume some partitioning needs to be employed here, but am not very sure-footed on the subject.

Would appreciate some guidance.
10x,

... fixed data set:

INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '10 minutes', 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '9 minutes', 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '7 minutes', 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '6 minutes', 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '5 minutes', 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '4 minutes', 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '3 minutes', 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '2 minutes', 2, 2);
Hi ,

This is my attempt...

DROP TABLE IF EXISTS phone_call;


CREATE TABLE phone_call
(
    device_id   int         NOT NULL,
    start_time  timestamptz NOT NULL,
    term_status int         NOT NULL,
    PRIMARY KEY (device_id, start_time, term_status)
);


INSERT INTO
    phone_call
    (
        device_id,
        start_time,
        term_status
    )
VALUES
    (10, '20100701T151433', 0),
    (20, '20100701T151533', 0),
    (20, '20100701T151633', 2),
    (30, '20100701T151433', 0),
    (30, '20100701T151533', 2),
    (30, '20100701T151633', 2),
    (40, '20100701T004022', 0),
    (40, '20100701T004122', 2),
    (40, '20100701T004622', 2),
    (40, '20100701T010022', 2),
    (40, '20100701T012122', 2),
    (50, '20100701T120000', 0),
    (50, '20100701T120100', 2),
    (50, '20100701T120200', 2),
    (50, '20100701T120300', 2),
    (60, '20100701T090000', 0),
    (60, '20100701T090200', 2),
    (60, '20100701T100000', 0),
    (60, '20100701T100100', 2),
    (60, '20100701T100200', 2),
    (60, '20100701T100300', 2),
    (60, '20100701T101000', 2),
    (60, '20100701T102000', 2),
    (60, '20100701T104000', 2),
    (60, '20100701T105000', 2),
    (60, '20100701T105200', 2),
    (60, '20100701T105600', 2),
    (60, '20100701T500300', 0),
    (60, '20100701T501400', 2);
INSERT INTO phone_call(start_time, device_id, term_status) VALUES (now() - interval '10 minutes', 2, 0);
INSERT INTO phone_call(start_time, device_id, term_status) VALUES (now() - interval '9 minutes', 2, 1);
INSERT INTO phone_call(start_time, device_id, term_status) VALUES (now() - interval '7 minutes', 2, 1);
INSERT INTO phone_call(start_time, device_id, term_status) VALUES (now() - interval '6 minutes', 2, 1);
INSERT INTO phone_call(start_time, device_id, term_status) VALUES (now() - interval '5 minutes', 2, 0);
INSERT INTO phone_call(start_time, device_id, term_status) VALUES (now() - interval '4 minutes', 2, 2);
INSERT INTO phone_call(start_time, device_id, term_status)VALUES (now() - interval '3 minutes', 2, 2);
INSERT INTO phone_call(start_time, device_id, term_status)VALUES (now() - interval '2 minutes', 2, 2);   

TABLE phone_call;

SELECT DISTINCT
    pc.device_id
FROM
    phone_call  pc
WHERE
    pc.term_status = 0
    AND 3 <=
    (
        SELECT count(*)
        FROM phone_call  pc1
        WHERE
                pc1.device_id = pc.device_id
            AND pc1.term_status = 2
            AND pc1.start_time > pc.start_time
            AND pc1.start_time <
            COALESCE
            (
                (
                    SELECT min(pc2.start_time)
                    FROM phone_call  pc2
                    WHERE
                            pc2.term_status = 0
                        AND pc2.start_time > pc.start_time
                )
                , '9999-12-31'
            )
        GROUP BY
            pc1.device_id
        HAVING
            max(pc1.start_time) >= min(pc1.start_time) + interval '2 minute'
    )
ORDER BY
    pc.device_id
/**/;/**/






Regards,
Gavin

Re: partitioning a dataset + employing hysteresis condition

From
Gavin Flower
Date:
On 14/11/11 18:35, Amit Dor-Shifer wrote:

On Mon, Nov 14, 2011 at 4:29 PM, Amit Dor-Shifer <amit.dor.shifer@gmail.com> wrote:
Hi,
I've got this table:
create table phone_calls
(
    start_time timestamp,
    device_id integer,
    term_status integer
);

[…]

3 points

POINT 1:
I should have given the results of my attempt...


[...]
device_id
-----------
2
40
50
60
(4 rows)


POINT 2:
I also realized I left of a condition in the HAVBING part

HAVING
    max(pc1.start_time) >= min(pc1.start_time) + interval '2 minute'

I think it should be

HAVING
        max(pc1.start_time) >= min(pc1.start_time) + interval '2 minute'
    AND pc1.term_status = 2


POINT 3:
Timestamps should almost always be stored with a time zone (using timestamptz rather than just timestamp) – or you will have problems when Summer time ends or begins, and using timestamptz allows for date&time to be displayed currectly in different locales.


Regards,
Gavin

P.S. Since my post has not shown up yet, I had to attach to its parent!
(I won't offer my first born to get direct posting rights,
as his wife may object -
besides which, it is probably illegal in my jurisdiction!)

Re: partitioning a dataset + employing hysteresis condition

From
David Johnston
Date:
On Nov 15, 2011, at 15:28, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:

On 14/11/11 18:35, Amit Dor-Shifer wrote:

On Mon, Nov 14, 2011 at 4:29 PM, Amit Dor-Shifer <amit.dor.shifer@gmail.com> wrote:
Hi,
I've got this table:
create table phone_calls
(
    start_time timestamp,
    device_id integer,
    term_status integer
);

It describes phone call events. A 'term_status' is a sort-of an exit status for the call, whereby a value != 0 indicates some sort of error.
Given that, I wish to retrieve data on devices with a persisting error on them, of a specific type. I.E. that their last term_status was, say 2. I'd like to employ some hysteresis on the query: only consider a device as errorring if:
1. the last "good" (0) term_status pre-dates a "bad" (2) term_status.
2. it has at least N "bad" term_status events following the last "good" one.
3. The time span between the first "bad" term_status event and the last one is >= T minutes

For instance, w/the following data set:


Alternative thought,

Have a Boolean field which is set to true for non-zero entries and false for zeros.  Upon entering a zero into the table, for a given device, set all currently true records to false.  Combine with a partial index on the true and you can quickly get a listing of all devices in error mode and all the recent error entries.

David J.

Re: partitioning a dataset + employing hysteresis condition

From
Amit Dor-Shifer
Date:


On Wed, Nov 16, 2011 at 4:58 PM, David Johnston <polobo@yahoo.com> wrote:
On Nov 15, 2011, at 15:28, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:

On 14/11/11 18:35, Amit Dor-Shifer wrote:

On Mon, Nov 14, 2011 at 4:29 PM, Amit Dor-Shifer <amit.dor.shifer@gmail.com> wrote:
Hi,
I've got this table:
create table phone_calls
(
    start_time timestamp,
    device_id integer,
    term_status integer
);

It describes phone call events. A 'term_status' is a sort-of an exit status for the call, whereby a value != 0 indicates some sort of error.
Given that, I wish to retrieve data on devices with a persisting error on them, of a specific type. I.E. that their last term_status was, say 2. I'd like to employ some hysteresis on the query: only consider a device as errorring if:
1. the last "good" (0) term_status pre-dates a "bad" (2) term_status.
2. it has at least N "bad" term_status events following the last "good" one.
3. The time span between the first "bad" term_status event and the last one is >= T minutes

For instance, w/the following data set:


Alternative thought,

Have a Boolean field which is set to true for non-zero entries and false for zeros.  Upon entering a zero into the table, for a given device, set all currently true records to false.  Combine with a partial index on the true and you can quickly get a listing of all devices in error mode and all the recent error entries.

David J.

Really appreciate the attention. Thanks!

Here's what I've so-far come up with:

SELECT pc.device_id, MAX(lsc.last_successful) AS last_successful, MIN(pc.start_time) AS faulting_since
FROM (
SELECT MAX(start_time) AS last_successful, device_id FROM phone_calls
WHERE term_status IS NOT NULL AND term_status = 0
GROUP BY device_id
UNION
SELECT NULL AS last_successful, device_id FROM phone_calls
GROUP BY device_id
HAVING EVERY(term_status = 2) = TRUE
) AS lsc,
phone_calls pc
WHERE
pc.device_id=lsc.device_id
AND pc.term_status=2
AND (lsc.last_successful is NULL OR pc.start_time > lsc.last_successful)
GROUP BY pc.device_id
HAVING MIN(pc.start_time) < ?
AND COUNT(term_status) >= ?

The two parameters to the query are T & N, in order, with a slight change: T is a timestamp, so instead of specifying the a time span of 2 minutes, I pass it as NOW() - INTERVAL '2 minute'.

with T=NOW() - INTERVAL '2 minute'  I get the following on the a/m dataset:
N=4:
40;"2010-07-01 00:40:22";"2010-07-01 00:41:22"

N=3:
40;"2010-07-01 00:40:22";"2010-07-01 00:41:22"
50;"2010-07-01 12:00:00";"2010-07-01 12:01:00"
2;"2011-11-16 21:56:59.52107";"2011-11-16 21:57:59.52107"

N=2:
40;"2010-07-01 00:40:22";"2010-07-01 00:41:22"
2;"2011-11-16 21:55:16.88869";"2011-11-16 21:56:16.88869"
30;"2010-07-01 15:14:33";"2010-07-01 15:15:33"
50;"2010-07-01 12:00:00";"2010-07-01 12:01:00"

* The bit with the union is to accommodate for devices which never see a single successful term_status.

Thanks a lot for the helpful hints :)