Thread: [QUESTION] Window function with partition by and order by
Hello, While looking at aggregates in window function, I found something unusual and would be glad I could get some clarification. Consider following table (mytable): id, name 1, A 1, A 2, B 3, A 1, A select *, avg(id) over (partition by name, order by id) from mytable; Output: id, name, avg 1, A, 1 1, A, 1 1, A, 1 3, A, 1.5 2, B, 2 Question is: Average of id for partition name (A) should be 6/4 = 1.5 for all rows in that partition but this result is seen only at the last one row in partition (A). Am I missing here something? Thanks -- Regards, Ankit Kumar Pandey
Hello Ankit.
This behavior is correct. This is because you are using ORDER BY in your aggregation function.
Looking at the documentation you will find the following quote:
"You can also control the order in which rows are processed by window functions by using ORDER BY within OVER. (The window ORDER BY doesn't even have to match the order in which the rows are produced)."
So, if you use ORDER BY you are controlling how the row are processed.
If you don't use, the result is like this:
postgres=# select *, avg(id) over (partition by name) from my_teste ;
id | name | avg
----+------+--------------------
1 | A | 1.5000000000000000
1 | A | 1.5000000000000000
3 | A | 1.5000000000000000
1 | A | 1.5000000000000000
2 | B | 2.0000000000000000
(5 rows)
This behavior is correct. This is because you are using ORDER BY in your aggregation function.
Looking at the documentation you will find the following quote:
"You can also control the order in which rows are processed by window functions by using ORDER BY within OVER. (The window ORDER BY doesn't even have to match the order in which the rows are produced)."
So, if you use ORDER BY you are controlling how the row are processed.
If you don't use, the result is like this:
postgres=# select *, avg(id) over (partition by name) from my_teste ;
id | name | avg
----+------+--------------------
1 | A | 1.5000000000000000
1 | A | 1.5000000000000000
3 | A | 1.5000000000000000
1 | A | 1.5000000000000000
2 | B | 2.0000000000000000
(5 rows)
--
Regards,
William Alves
On 27 Nov 2022 13:08 -0300, Ankit Kumar Pandey <itsankitkp@gmail.com>, wrote:
Hello,
While looking at aggregates in window function, I found something
unusual and would be glad I could get some clarification.
Consider following table (mytable):
id, name
1, A
1, A
2, B
3, A
1, A
select *, avg(id) over (partition by name, order by id) from mytable;
Output:
id, name, avg
1, A, 1
1, A, 1
1, A, 1
3, A, 1.5
2, B, 2
Question is: Average of id for partition name (A) should be 6/4 = 1.5
for all rows in that partition but this result is seen only at the last
one row in partition (A). Am I missing here something?
Thanks
--
Regards,
Ankit Kumar Pandey
Hello Ankit,
It is absolutely expected behaviour of a window function with ORDER BY clause. The default frame clause of window definition is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. If you add an ORDER BY clause in a window definition, PostgreSQL takes the current row and all rows before it within the partition into calculation. If you don't add, it means all rows within the partition are peers, and PostgreSQL uses all rows for calculation. I'm putting the related part from the documentation and its link below.
The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW; it sets the frame to be all rows from the partition start up through the current row's last peer (a row that the window's ORDER BY clause considers equivalent to the current row; all rows are peers if there is no ORDER BY).
Best regards.
Samed YILDIRIM
On Sun, 27 Nov 2022 at 18:08, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
Hello,
While looking at aggregates in window function, I found something
unusual and would be glad I could get some clarification.
Consider following table (mytable):
id, name
1, A
1, A
2, B
3, A
1, A
select *, avg(id) over (partition by name, order by id) from mytable;
Output:
id, name, avg
1, A, 1
1, A, 1
1, A, 1
3, A, 1.5
2, B, 2
Question is: Average of id for partition name (A) should be 6/4 = 1.5
for all rows in that partition but this result is seen only at the last
one row in partition (A). Am I missing here something?
Thanks
--
Regards,
Ankit Kumar Pandey
Ankit Kumar Pandey <itsankitkp@gmail.com> writes: > Question is: Average of id for partition name (A) should be 6/4 = 1.5 > for all rows in that partition but this result is seen only at the last > one row in partition (A). Am I missing here something? https://www.postgresql.org/docs/current/tutorial-window.html particularly: By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause. regards, tom lane
On 27/11/22 21:58, Tom Lane wrote: > Ankit Kumar Pandey <itsankitkp@gmail.com> writes: >> Question is: Average of id for partition name (A) should be 6/4 = 1.5 >> for all rows in that partition but this result is seen only at the last >> one row in partition (A). Am I missing here something? > https://www.postgresql.org/docs/current/tutorial-window.html > > particularly: > > By default, if ORDER BY is supplied then the frame consists of all > rows from the start of the partition up through the current row, plus > any following rows that are equal to the current row according to the > ORDER BY clause. > > regards, tom lane Thanks, I understand the behavior now. -- Regards, Ankit Kumar Pandey
On 27/11/22 21:53, Samed YILDIRIM wrote:
Thanks, this makes sense.Hello Ankit,It is absolutely expected behaviour of a window function with ORDER BY clause. The default frame clause of window definition is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. If you add an ORDER BY clause in a window definition, PostgreSQL takes the current row and all rows before it within the partition into calculation. If you don't add, it means all rows within the partition are peers, and PostgreSQL uses all rows for calculation. I'm putting the related part from the documentation and its link below.The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW; it sets the frame to be all rows from the partition start up through the current row's last peer (a row that the window's ORDER BY clause considers equivalent to the current row; all rows are peers if there is no ORDER BY).Best regards.Samed YILDIRIMOn Sun, 27 Nov 2022 at 18:08, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:Hello,
While looking at aggregates in window function, I found something
unusual and would be glad I could get some clarification.
Consider following table (mytable):
id, name
1, A
1, A
2, B
3, A
1, A
select *, avg(id) over (partition by name, order by id) from mytable;
Output:
id, name, avg
1, A, 1
1, A, 1
1, A, 1
3, A, 1.5
2, B, 2
Question is: Average of id for partition name (A) should be 6/4 = 1.5
for all rows in that partition but this result is seen only at the last
one row in partition (A). Am I missing here something?
Thanks
--
Regards,
Ankit Kumar Pandey
-- Regards, Ankit Kumar Pandey
On 27/11/22 21:53, Samed YILDIRIM wrote:
Hello Ankit,It is absolutely expected behaviour of a window function with ORDER BY clause. The default frame clause of window definition is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. If you add an ORDER BY clause in a window definition, PostgreSQL takes the current row and all rows before it within the partition into calculation. If you don't add, it means all rows within the partition are peers, and PostgreSQL uses all rows for calculation. I'm putting the related part from the documentation and its link below.The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW; it sets the frame to be all rows from the partition start up through the current row's last peer (a row that the window's ORDER BY clause considers equivalent to the current row; all rows are peers if there is no ORDER BY).Best regards.Samed YILDIRIMOn Sun, 27 Nov 2022 at 18:08, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:Hello,
While looking at aggregates in window function, I found something
unusual and would be glad I could get some clarification.
Consider following table (mytable):
id, name
1, A
1, A
2, B
3, A
1, A
select *, avg(id) over (partition by name, order by id) from mytable;
Output:
id, name, avg
1, A, 1
1, A, 1
1, A, 1
3, A, 1.5
2, B, 2
Question is: Average of id for partition name (A) should be 6/4 = 1.5
for all rows in that partition but this result is seen only at the last
one row in partition (A). Am I missing here something?
Thanks
--
Regards,
Ankit Kumar Pandey
Okay, I understand this now.
-- Regards, Ankit Kumar Pandey