Re: Statistics query - Mailing list pgsql-general

From Steve Crawford
Subject Re: Statistics query
Date
Msg-id 5165A6B8.1000500@pinpointresearch.com
Whole thread Raw
In response to Re: Statistics query  (Chris Curvey <chris@chriscurvey.com>)
Responses Re: SOLVED Statistics query
List pgsql-general
On 04/10/2013 10:31 AM, Chris Curvey wrote:

On Wed, Apr 10, 2013 at 12:30 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote:
I'm seeking ideas on the best way to craft the following query. I've stripped everything down to the bare essentials and simplified it below.

Input data has a timestamp (actually an int received from the system in the form of a Unix epoch), a unit identifier and a status:

 event_time | unit_id | status
------------+---------+--------
 1357056011 |      60 |      1
 1357056012 |     178 |      0
 1357056019 |     168 |      0
 1357056021 |       3 |      0
 1357056021 |       4 |      1
 1357056021 |     179 |      0
 1357056022 |       0 |      1
 1357056022 |       1 |      0
 1357056023 |       2 |      0
 1357056024 |       9 |      0
 1357056025 |       5 |      0
 1357056025 |       6 |      0
 1357056026 |       7 |      1
...

A given unit_id cannot have two events at the same time (enforced by constraints).

Given a point in time I would like to:

1. Identify all distinct unit_ids with an entry that exists in the preceding hour then

2. Count both the total events and sum the status=1 events for the most recent 50 events for each unit_id that fall within a limited period (e.g. don't look at data earlier than midnight). So unit_id 60 might have 50 events in the last 15 minutes while unit_id 4 might have only 12 events after midnight.

The output would look something like:

 unit_id | events | status_1_count
---------+--------+----------------
      1  |     50 |             34
      2  |     27 |             18
      1  |     50 |             34
      1  |      2 |              0
...

Each sub-portion is easy and while I could use external processing or set-returning functions I was hoping first to find the secret-sauce to glue everything together into a single query.

Cheers,
Steve


something like 

select unit_id, count(*), sum(status)
from mytable a
where event_time >= [whatever unix epoch translates to "last midnight"]
and exists
(  select *
   from mytable b
   where b.unit_id = a.unit_id
   and b.epoch >= [unix epoch that translates to "one hour ago"])
group by unit _id;

1) I think figuring out the unix epoch should be reasonable...but I don't know how to do it off the top of my head.
2) I could completely be misunderstanding this.  I'm not sure why the example results would have unit id 1 repeated. (which my suggestion WON'T do)

Because I screwed up cutting and pasting to make an example. The unit_id in the output should, in fact, be distinct:

 unit_id | events | status_1_count
---------+--------+----------------
      1  |     50 |             34
      2  |     27 |             18
      3  |     50 |             34
      4  |      2 |              0

You are correct, epoch is easy:
abstime(epoch)
or
extract(epoch from timestamptz)
depending on which direction you are going or for an hour difference just subtract 3600.

The solution, however, misses the important complicating gotcha. The units I want listed are only those that have had at least one event in the last hour. But for each such unit, I only want the statistics to reflect the most-recent 50 events (even if those events occurred earlier than the current hour) provided the event occurred on the current date. So the events column can never be less than 1 nor more than 50.

For example...

One unit might have a single event at the start of the last hour but 49 more in the preceding 10 minutes. I want to see that unit and the stats for those 50 events.

Same thing if a unit has 50 events clustered at the end of an hour - I don't want the earlier ones.

Another might have 50 events early in the day but none this hour. I don't want to see that one.

But I do want to see the one that had an event in the last hour late in the day along with the 48 other events that have accumulated since midnight.

Cheers,
Steve

pgsql-general by date:

Previous
From: Chris Curvey
Date:
Subject: Re: Statistics query
Next
From: John R Pierce
Date:
Subject: Re: How to convert US date format to European date format ?