Home > mailing lists

Thread: FIRST_VALUE: need to group by argument?

FIRST_VALUE: need to group by argument?

From

Guyren Howe

Date:

24 May 2016, 03:48:36

I am missing something here.

I have two tables:

orders

delivery_route_segments

id,

order_id,

position,

completed

I want to find the first uncompleted deliver_route_segment for each order, by position. Seems to me I ought to be able to do this:

SELECT
  o.id,
  FIRST_VALUE(drs.id)
FROM
  orders o JOIN
  delivery_route_segments drs ON (drs.order_id = o.id AND NOT drs.completed)
GROUP BY
  o.id
but I'm told I need an over clause.



So I try this:
SELECT
  o.id,
  FIRST_VALUE(drs.id) OVER (ORDER BY position ASC)
FROM
  orders o JOIN
  delivery_route_segments drs ON (drs.order_id = o.id AND NOT drs.completed)
GROUP BY
  o.id
here I'm told "drs.id must appear in the GROUP BY clause". This doesn't make sense to me; I shouldn't need to group by a value that's inside an aggregate function.



Tried this.
SELECT
  o.id,
  FIRST_VALUE(drs.id) OVER (PARTITION BY o.id ORDER BY position ASC)
FROM
  orders o JOIN
  delivery_route_segments drs ON (drs.order_id = o.id AND NOT drs.completed)
GROUP BY
  o.id
but it has the same problem.

I can solve this with a subquery, but:
- I'd still like to know what's wrong; and
- I expect the subquery to be slower (yes?)

Re: FIRST_VALUE: need to group by argument?

From

"David G. Johnston"

Date:

24 May 2016, 12:51:01

On Mon, May 23, 2016 at 11:48 PM, Guyren Howe <guyren@gmail.com> wrote:

I am missing something here.

I have two tables:

orders

delivery_route_segments

id,

order_id,

position,

completed

I want to find the first uncompleted deliver_route_segment for each order, by position. Seems to me I ought to be able to do this:

SELECT
  o.id,
  FIRST_VALUE(drs.id)
FROM
  orders o JOIN
  delivery_route_segments drs ON (drs.order_id = o.id AND NOT drs.completed)
GROUP BY
  o.id
but I'm told I need an over clause.



So I try this:
SELECT
  o.id,
  FIRST_VALUE(drs.id) OVER (ORDER BY position ASC)
FROM
  orders o JOIN
  delivery_route_segments drs ON (drs.order_id = o.id AND NOT drs.completed)
GROUP BY
  o.id
here I'm told "drs.id must appear in the GROUP BY clause". This doesn't make sense to me; I shouldn't need to group by a value that's inside an aggregate function.



Tried this.
SELECT
  o.id,
  FIRST_VALUE(drs.id) OVER (PARTITION BY o.id ORDER BY position ASC)
FROM
  orders o JOIN
  delivery_route_segments drs ON (drs.order_id = o.id AND NOT drs.completed)
GROUP BY
  o.id
but it has the same problem.

I can solve this with a subquery, but:
- I'd still like to know what's wrong; and
- I expect the subquery to be slower (yes?)

Window functions don't provoke grouping. That it their blessing. The docs on them are pretty good. first_value is strictly a window function (I may have missed this point when you first asked the question a few days ago).

SELECT i, sum(i) OVER () FROM ( VALUES (a,1), (a,2), (b,3) ) val (x,i)

yields

1, 6

2, 6

3, 6

You can see how all three rows are still present in the output - thus no grouping.

SELECT x, sum(i), sum(sum(i)) OVER (PARTITION BY x) FROM ( VALUES (a,1), (a,2), (b,3) ) val (x,i) GROUP BY x

yields

a, 3, 6

b, 3, 6

Here you have to write sum(sum(i)) in the window expression since i is not in the group by and must first be aggregated somehow to satisfy the group by and then the corresponding window evaluation sums those aggregated i's.

The query you are looking for doesn't involve aggregate functions at all.

SELECT DISTINCT ON (order_id) order_id, drs.id

FROM delivery_route_segments drs

WHERE NOT drs.completed

ORDER BY order_id, position ASC

A semantically equivalent query using a window function would be.

WITH ordered_routes AS (

SELECT drs.order_id, drs.id AS drsid, row_number() OVER (PARTITION BY order_id ORDER BY position ASC) AS order_row

FROM delivery_route_segments drs

)

SELECT order_id, drsid

FROM ordered_routes

WHERE order_row = 1;

As I mentioned in the other thread if you are using group by to simply remove duplicates introduced by a join, like you are here, you are probably doing something wrong.

David J.

Re: FIRST_VALUE: need to group by argument?

From

Manuel Gómez

Date:

24 May 2016, 16:13:44

On Tue, May 24, 2016 at 8:50 AM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> SELECT x, sum(i), sum(sum(i)) OVER (PARTITION BY x) FROM ( VALUES (a,1),
> (a,2), (b,3) ) val (x,i) GROUP BY x
> yields
> a, 3, 6
> b, 3, 6

Thank you for this enlightening explanation!  I was, however, very
confused from this specific bit, so I tested it and indeed there must
have been a typo:

postgres=# SELECT x, sum(i), sum(sum(i)) OVER (PARTITION BY x) FROM
(VALUES ('a', 1), ('a', 2), ('b', 3)) val (x,i) GROUP BY x;
 x | sum | sum
---+-----+-----
 a |   3 |   3
 b |   3 |   3
(2 rows)

Re: FIRST_VALUE: need to group by argument?

From

"David G. Johnston"

Date:

24 May 2016, 16:17:47

On Tue, May 24, 2016 at 12:12 PM, Manuel Gómez <targen@gmail.com> wrote:

On Tue, May 24, 2016 at 8:50 AM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> SELECT x, sum(i), sum(sum(i)) OVER (PARTITION BY x) FROM ( VALUES (a,1),
> (a,2), (b,3) ) val (x,i) GROUP BY x
> yields
> a, 3, 6
> b, 3, 6

Thank you for this enlightening explanation! I was, however, very
confused from this specific bit, so I tested it and indeed there must
have been a typo:

postgres=# SELECT x, sum(i), sum(sum(i)) OVER (PARTITION BY x) FROM
(VALUES ('a', 1), ('a', 2), ('b', 3)) val (x,i) GROUP BY x;
x | sum | sum
---+-----+-----
a | 3 | 3
b | 3 | 3
(2 rows)

Yes, that was me being careless. The partition in the window makes the sum(sum(i)) evaluate to 3

David J.