I'm seeking ideas on the best way to craft the following query. I've
stripped everything down to the bare essentials and simplified it below.
Input data has a timestamp (actually an int received from the system in
the form of a Unix epoch), a unit identifier and a status:
event_time | unit_id | status
------------+---------+--------
1357056011 | 60 | 1
1357056012 | 178 | 0
1357056019 | 168 | 0
1357056021 | 3 | 0
1357056021 | 4 | 1
1357056021 | 179 | 0
1357056022 | 0 | 1
1357056022 | 1 | 0
1357056023 | 2 | 0
1357056024 | 9 | 0
1357056025 | 5 | 0
1357056025 | 6 | 0
1357056026 | 7 | 1
...
A given unit_id cannot have two events at the same time (enforced by
constraints).
Given a point in time I would like to:
1. Identify all distinct unit_ids with an entry that exists in the
preceding hour then
2. Count both the total events and sum the status=1 events for the most
recent 50 events for each unit_id that fall within a limited period
(e.g. don't look at data earlier than midnight). So unit_id 60 might
have 50 events in the last 15 minutes while unit_id 4 might have only 12
events after midnight.
The output would look something like:
unit_id | events | status_1_count
---------+--------+----------------
1 | 50 | 34
2 | 27 | 18
1 | 50 | 34
1 | 2 | 0
...
Each sub-portion is easy and while I could use external processing or
set-returning functions I was hoping first to find the secret-sauce to
glue everything together into a single query.
Cheers,
Steve