Home > mailing lists

Thread: Grouping, Aggregate, Min, Max

Grouping, Aggregate, Min, Max

From

Misa Simic

Date:

13 December 2013, 11:12:36

Hi All,

I am not sure how to define with words what I want to accomplish (so can't ask google the right question :) )

So will try to explain with sample data and expected result:

Scenario 1)

id	thing_id	category	period_id
1	1	A	1
2	1	A	2
3	1	A	3
4	1	A	4
5	1	A	5
6	1	A	6
7	1	A	7
8	1	A	8
9	1	A	9
10	2	A	1
11	2	A	2
12	2	A	3
13	2	A	4

Expected result:

thing_id	category	periods
1	A	1-9
2	A	1-4

(Sounds easy, group by, thing_id, category use Min and Max for period id - but further scenarios makes it a bit complicated...)

Scenario 2)

id	thing_id	category	period_id
1	1	A	1
2	1	A	2
3	1	A	3
4	1	A	4
5	1	B	5
6	1	B	6
7	1	B	7
8	1	A	8
9	1	A	9
10	2	A	1
11	2	A	2
12	2	A	3
13	2	A	4

Expected result:

thing_id	category	periods
1	A	1-4, 8-9
1	B	5-7
2	A	1-4

Scenario 3)

id	thing_id	category	period_id
1	1	A	1
2	1	A	2
3	1	A	3
4	1	A	7
5	1	A	8
6	1	A	9
7	2	A	1
8	2	A	2
9	2	A	3
10	2	A	4

Expected result:

thing_id	category	periods
1	A	1-3, 7-9
2	A	1-4

So goal is, to group by thing_id, category id - but if period_id is interupted (not in incremented by 1) to have aggregated spans...

To desired results we have came up using several CTE's (what makes a query a bit big, and more "procedural way": make cte what calculated diff between current and previous row, next cte uses previous one to define groupings, next cte to make aggregates etc...)

So I wonder - is there some kind of aggregate window function what does desired results?

Many Thanks,

Misa

Re: Grouping, Aggregate, Min, Max

From

Rémi Cura

Date:

13 December 2013, 11:55:28

There is a trick to simplify the thing and avoid using aggregates :

I think it will give you your answer.

http://postgresql.1045698.n5.nabble.com/Count-of-records-in-a-row-td5775363i20.html

Cheers,

Rémi-C

2013/12/13 Misa Simic <misa.simic@gmail.com>

Hi All,

I am not sure how to define with words what I want to accomplish (so can't ask google the right question :) )

So will try to explain with sample data and expected result:

Scenario 1)

id thing_id category period_id
1 1 A 1
2 1 A 2
3 1 A 3
4 1 A 4
5 1 A 5
6 1 A 6
7 1 A 7
8 1 A 8
9 1 A 9
10 2 A 1
11 2 A 2
12 2 A 3
13 2 A 4

Expected result:

thing_id category periods
1 A 1-9
2 A 1-4

(Sounds easy, group by, thing_id, category use Min and Max for period id - but further scenarios makes it a bit complicated...)

Scenario 2)

id thing_id category period_id
1 1 A 1
2 1 A 2
3 1 A 3
4 1 A 4
5 1 B 5
6 1 B 6
7 1 B 7
8 1 A 8
9 1 A 9
10 2 A 1
11 2 A 2
12 2 A 3
13 2 A 4

Expected result:
thing_id category periods
1 A 1-4, 8-9
1 B 5-7
2 A 1-4

Scenario 3)

id thing_id category period_id
1 1 A 1
2 1 A 2
3 1 A 3
4 1 A 7
5 1 A 8
6 1 A 9
7 2 A 1
8 2 A 2
9 2 A 3
10 2 A 4

Expected result:

thing_id category periods
1 A 1-3, 7-9
2 A 1-4

So goal is, to group by thing_id, category id - but if period_id is interupted (not in incremented by 1) to have aggregated spans...

To desired results we have came up using several CTE's (what makes a query a bit big, and more "procedural way": make cte what calculated diff between current and previous row, next cte uses previous one to define groupings, next cte to make aggregates etc...)

So I wonder - is there some kind of aggregate window function what does desired results?

Many Thanks,

Misa

Re: Grouping, Aggregate, Min, Max

From

Misa Simic

Date:

13 December 2013, 12:35:41

Thanks Rémi-C,

Well, not sure is it a goal to avoid aggregates...

Bellow problem/solution even works (not sure) I guess would produce (if we imagine instead of count it use min and max in a row, though this case a bit complicated because of it should take real values from the source table, because of simplicity it starts in examples from 1 - what is not always the case...)

1-4

5-7

8-9

so next step, is to aggregate1-4,8-9 in one row

I think we have used similar approach, what with a few CTE's provides desired result, just think would be simpler with 1 windowed aggregate function.... i.e.

for:

c1, c2

A 1

A 2

A 3

SELECT DISTINCT c1, custom_agg_function(c2) OVER (PARTITION BY c1 ORDER BY c2)

result

a, 1 -3

in case

c1, c2

A 1

A 2

A 3

A 5

result:

A , 1-3, 5-5

thanks,

Misa

2013/12/13 Rémi Cura <remi.cura@gmail.com>

There is a trick to simplify the thing and avoid using aggregates :
I think it will give you your answer.

http://postgresql.1045698.n5.nabble.com/Count-of-records-in-a-row-td5775363i20.html

Cheers,
Rémi-C

2013/12/13 Misa Simic <misa.simic@gmail.com>
Hi All,

I am not sure how to define with words what I want to accomplish (so can't ask google the right question :) )

So will try to explain with sample data and expected result:

Scenario 1)

id thing_id category period_id
1 1 A 1
2 1 A 2
3 1 A 3
4 1 A 4
5 1 A 5
6 1 A 6
7 1 A 7
8 1 A 8
9 1 A 9
10 2 A 1
11 2 A 2
12 2 A 3
13 2 A 4

Expected result:

thing_id category periods
1 A 1-9
2 A 1-4

(Sounds easy, group by, thing_id, category use Min and Max for period id - but further scenarios makes it a bit complicated...)

Scenario 2)

id thing_id category period_id
1 1 A 1
2 1 A 2
3 1 A 3
4 1 A 4
5 1 B 5
6 1 B 6
7 1 B 7
8 1 A 8
9 1 A 9
10 2 A 1
11 2 A 2
12 2 A 3
13 2 A 4

Expected result:
thing_id category periods
1 A 1-4, 8-9
1 B 5-7
2 A 1-4

Scenario 3)

id thing_id category period_id
1 1 A 1
2 1 A 2
3 1 A 3
4 1 A 7
5 1 A 8
6 1 A 9
7 2 A 1
8 2 A 2
9 2 A 3
10 2 A 4

Expected result:

thing_id category periods
1 A 1-3, 7-9
2 A 1-4

So goal is, to group by thing_id, category id - but if period_id is interupted (not in incremented by 1) to have aggregated spans...

To desired results we have came up using several CTE's (what makes a query a bit big, and more "procedural way": make cte what calculated diff between current and previous row, next cte uses previous one to define groupings, next cte to make aggregates etc...)

So I wonder - is there some kind of aggregate window function what does desired results?

Many Thanks,

Misa

Re: Grouping, Aggregate, Min, Max

From

David Johnston

Date:

13 December 2013, 15:42:42

Re:custom aggregate:

I'd probably try building a two dimensional array in the state transition
function.  Take the new value and check if it is adjacent to the last value
in the last bin of the current state. If so add it to that bin.  If not
create a new bin and store it there.  Requires sorted input.

You could also just store all the values encountered and at the end group
them into bins after sorting internally.  That way you just need to get the
partition right - not the order by and sub-groups.

The final output is just a call to string_agg though you would have to
unnest the array in a custom manner since unnest() flattens
multiple-dimensional arrays.  See a recent thread for specifics.

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/Grouping-Aggregate-Min-Max-tp5783279p5783318.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: Grouping, Aggregate, Min, Max

From

Kevin Grittner

Date:

13 December 2013, 20:13:46

Misa Simic <misa.simic@gmail.com> wrote:

> So I wonder - is there some kind of aggregate window function
> what does desired results?

Not built in, but PostgreSQL makes it pretty easy to do so.  With a
little effort to define your own aggregate function, your query can
look like this:

SELECT
    thing_id,
    category,
    int4range_list(period_id)
  FROM thing
  GROUP BY thing_id, category
  ORDER BY 1, 2;

I've attached a couple files -- one which creates the desired
aggregate function, and the other loads a table with two of your
sample data sets and runs the above.  This is just intended as a
quick example of the capabilities available to you.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: Grouping, Aggregate, Min, Max

From

Misa Simic

Date:

16 December 2013, 15:26:44

2013/12/13 Kevin Grittner <kgrittn@ymail.com>

Misa Simic <misa.simic@gmail.com> wrote:

> So I wonder - is there some kind of aggregate window function
> what does desired results?

Not built in, but PostgreSQL makes it pretty easy to do so. With a
little effort to define your own aggregate function, your query can
look like this:

SELECT
    thing_id,
    category,
    int4range_list(period_id)
FROM thing
GROUP BY thing_id, category
ORDER BY 1, 2;

I've attached a couple files -- one which creates the desired
aggregate function, and the other loads a table with two of your
sample data sets and runs the above. This is just intended as a
quick example of the capabilities available to you.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Many thanks Kevin! :)

It simply works - perfect! :)

Many thanks,

Misa