Thread: Window function?
Hi,
I have time series data from multiple inputs with start and
I have time series data from multiple inputs with start and
end timestamps.
Can anyone suggest an elegant way to coalesce consecutive rows so only the
Can anyone suggest an elegant way to coalesce consecutive rows so only the
first start time and last end time for each group of events (by input) is returned.
Sample from and to below where the rows for Input number 4 could be massaged.
(Timestamps selected with timestamp(0) for convenience)
From this:
Input Start End
5 2022-06-04 09:09:00 2022-06-04 09:09:29
4 2022-06-04 09:08:50 2022-06-04 09:09:00
4 2022-06-04 09:08:10 2022-06-04 09:08:50
4 2022-06-04 09:07:47 2022-06-04 09:08:10
17 2022-06-04 09:06:47 2022-06-04 09:07:47
4 2022-06-04 09:06:37 2022-06-04 09:06:47
4 2022-06-04 09:06:29 2022-06-04 09:06:37
4 2022-06-04 09:06:17 2022-06-04 09:06:29
4 2022-06-04 09:05:53 2022-06-04 09:06:17
16 2022-06-04 09:04:33 2022-06-04 09:05:53
To this:
Input Start End
5 2022-06-04 09:09:00 2022-06-04 09:09:29
4 2022-06-04 09:07:47 2022-06-04 09:09:00
17 2022-06-04 09:06:47 2022-06-04 09:07:47
4 2022-06-04 09:05:53 2022-06-04 09:06:47
16 2022-06-04 09:04:33 2022-06-04 09:05:53
Thanks in advance to anyone who can help!
Robert
From this:
Input Start End
5 2022-06-04 09:09:00 2022-06-04 09:09:29
4 2022-06-04 09:08:50 2022-06-04 09:09:00
4 2022-06-04 09:08:10 2022-06-04 09:08:50
4 2022-06-04 09:07:47 2022-06-04 09:08:10
17 2022-06-04 09:06:47 2022-06-04 09:07:47
4 2022-06-04 09:06:37 2022-06-04 09:06:47
4 2022-06-04 09:06:29 2022-06-04 09:06:37
4 2022-06-04 09:06:17 2022-06-04 09:06:29
4 2022-06-04 09:05:53 2022-06-04 09:06:17
16 2022-06-04 09:04:33 2022-06-04 09:05:53
To this:
Input Start End
5 2022-06-04 09:09:00 2022-06-04 09:09:29
4 2022-06-04 09:07:47 2022-06-04 09:09:00
17 2022-06-04 09:06:47 2022-06-04 09:07:47
4 2022-06-04 09:05:53 2022-06-04 09:06:47
16 2022-06-04 09:04:33 2022-06-04 09:05:53
Thanks in advance to anyone who can help!
Robert
Hi Robert
Interesting problem. I need to think about it.
You need to figure out when Input changes. You can achieve this by using lead or lag (depending of the sort direction over start) https://www.postgresql.org/docs/current/functions-window.html .
Hope this nudges you to a solution.
Kind regards
Thiemo
Am 04.06.22 um 10:18 schrieb Robert Stanford:
Hi,
I have time series data from multiple inputs with start andend timestamps.
Can anyone suggest an elegant way to coalesce consecutive rows so only thefirst start time and last end time for each group of events (by input) is returned.
Sample from and to below where the rows for Input number 4 could be massaged.(Timestamps selected with timestamp(0) for convenience)
From this:
Input Start End
5 2022-06-04 09:09:00 2022-06-04 09:09:29
4 2022-06-04 09:08:50 2022-06-04 09:09:00
4 2022-06-04 09:08:10 2022-06-04 09:08:50
4 2022-06-04 09:07:47 2022-06-04 09:08:10
17 2022-06-04 09:06:47 2022-06-04 09:07:47
4 2022-06-04 09:06:37 2022-06-04 09:06:47
4 2022-06-04 09:06:29 2022-06-04 09:06:37
4 2022-06-04 09:06:17 2022-06-04 09:06:29
4 2022-06-04 09:05:53 2022-06-04 09:06:17
16 2022-06-04 09:04:33 2022-06-04 09:05:53
To this:
Input Start End
5 2022-06-04 09:09:00 2022-06-04 09:09:29
4 2022-06-04 09:07:47 2022-06-04 09:09:00
17 2022-06-04 09:06:47 2022-06-04 09:07:47
4 2022-06-04 09:05:53 2022-06-04 09:06:47
16 2022-06-04 09:04:33 2022-06-04 09:05:53
Thanks in advance to anyone who can help!
Robert
On Sat, 4 Jun 2022 at 22:56, Thiemo Kellner <thiemo@gelassene-pferde.biz> wrote:
Hi Robert
Interesting problem. I need to think about it.
You need to figure out when Input changes. You can achieve this by using lead or lag (depending of the sort direction over start) https://www.postgresql.org/docs/current/functions-window.html .
After a bit of poking around this seems to get things going in the right direction.
select row_number() over (order by start) - row_number() over (partition by input order by start)
Robert Stanford <rstanford@gmail.com> writes: > Can anyone suggest an elegant way to coalesce consecutive rows so only the > first start time and last end time for each group of events (by input) is > returned. Window functions don't change the number of query result rows, so that idea is a dead end. The standard way to solve this is GROUP BY: SELECT Input, MIN(Start) AS Start, MAX(End) AS End FROM ... GROUP BY Input ORDER BY Input -- optional If MIN/MAX don't quite express what you need to happen, then you might need to write custom aggregates. Point though is that you want aggregation within GROUP BY groups, not windowing. regards, tom lane
On Sat, Jun 4, 2022 at 7:50 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Stanford <rstanford@gmail.com> writes:
> Can anyone suggest an elegant way to coalesce consecutive rows so only the
> first start time and last end time for each group of events (by input) is
> returned.
Window functions don't change the number of query result rows, so that
idea is a dead end. The standard way to solve this is GROUP BY:
SELECT Input, MIN(Start) AS Start, MAX(End) AS End
FROM ...
GROUP BY Input
ORDER BY Input -- optional
Assuming that input group 4 really is supposed to show up twice in the output then the final solution is going to have to be a combination of this and window functions. You need the later in order to be able to say "input 4, appearance 1" and "input 4, appearance 2" - which lead/lag tends to help with. Then the group by becomes {input, appearance}.
David J.
On Sat, Jun 4, 2022 at 1:18 AM Robert Stanford <rstanford@gmail.com> wrote:
Hi,
From this:
Input Start End
5 2022-06-04 09:09:00 2022-06-04 09:09:29
4 2022-06-04 09:08:50 2022-06-04 09:09:00
4 2022-06-04 09:08:10 2022-06-04 09:08:50
4 2022-06-04 09:07:47 2022-06-04 09:08:10
17 2022-06-04 09:06:47 2022-06-04 09:07:47
4 2022-06-04 09:06:37 2022-06-04 09:06:47
4 2022-06-04 09:06:29 2022-06-04 09:06:37
4 2022-06-04 09:06:17 2022-06-04 09:06:29
4 2022-06-04 09:05:53 2022-06-04 09:06:17
16 2022-06-04 09:04:33 2022-06-04 09:05:53
To this:
Input Start End
5 2022-06-04 09:09:00 2022-06-04 09:09:29
4 2022-06-04 09:07:47 2022-06-04 09:09:00
17 2022-06-04 09:06:47 2022-06-04 09:07:47
4 2022-06-04 09:05:53 2022-06-04 09:06:47
16 2022-06-04 09:04:33 2022-06-04 09:05:53
lag is indeed your friend here - assuming times is your table name
with times_cte as (select *, lag(input, 1) over () from times)
select input, start, end from times_cte where input != coalesce(lag, -1);
The coalesce to -1 is needed at the end to get the first row which has null for the lag value because it's the first row.
John