Thread: how to use aggregate functions in this case

how to use aggregate functions in this case

From

"Janek Sendrowski"

Date:

25 August 2013, 21:59:09

Hi,

Thats my code snipped:
 
SELECT  v_rec1.user,
        sum(CASE WHEN v_rec_fts.lev BETWEEN 0 AND 25 THEN 1 ELSE 0 END) as "0 to 25",
        sum(CASE WHEN v_rec_fts.lev BETWEEN 25 AND 50 THEN 1 ELSE 0 END) as "25 to 50",
        sum(CASE WHEN v_rec_fts.lev BETWEEN 50 AND 100 THEN 1 ELSE 0 END) as "50 to 100"
        INTO v_rec2
        GROUP BY user;

Now I want to summuarize the "0 to 25" values and the others in the same query.
Somehow like this: count("0 to 25")
But I want to do it with every single user and I don't know how to do that

A result should look like this:

user      percentage    count
smith     "0 to 25"     5
smith     "25 to 50"    7
smith     "50 to 75"    2
jones     "0 to 25"     11
jones     "25 to 50"    1
jones     "50 to 75"    3

Hope someone who can help me

Janek Sendrowski

Re: how to use aggregate functions in this case

From

David Johnston

Date:

25 August 2013, 22:10:43

Janek Sendrowski wrote
> Hi,
>
> Thats my code snipped:
>  
> SELECT  v_rec1.user,
>         sum(CASE WHEN v_rec_fts.lev BETWEEN 0 AND 25 THEN 1 ELSE 0 END) as
> "0 to 25",
>         sum(CASE WHEN v_rec_fts.lev BETWEEN 25 AND 50 THEN 1 ELSE 0 END)
> as "25 to 50",
>         sum(CASE WHEN v_rec_fts.lev BETWEEN 50 AND 100 THEN 1 ELSE 0 END)
> as "50 to 100"
>         INTO v_rec2
>         GROUP BY user;
>
> Now I want to summuarize the "0 to 25" values and the others in the same
> query.
> Somehow like this: count("0 to 25")
> But I want to do it with every single user and I don't know how to do that
>
> A result should look like this:
>
> user      percentage    count
> smith     "0 to 25"     5
> smith     "25 to 50"    7
> smith     "50 to 75"    2
> jones     "0 to 25"     11
> jones     "25 to 50"    1
> jones     "50 to 75"    3
>
> Hope someone who can help me
>
> Janek Sendrowski
>
>
> --
> Sent via pgsql-general mailing list (

> pgsql-general@

> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

You have to write three queries (select statements) and then "UNION ALL"
them together.  Each sub-query has a where clause matching the range you
wish to aggregate for.

David J.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/how-to-use-aggregate-functions-in-this-case-tp5768522p5768523.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: how to use aggregate functions in this case

From

"Janek Sendrowski"

Date:

25 August 2013, 23:26:29

Sorry, I formulated it wrong.

My problem is, that I want to count the ranges for every user, but if I use count(range), it counts the ranges of
allusers. 

 

Janek Sendrowski

Re: how to use aggregate functions in this case

From

David Johnston

Date:

26 August 2013, 00:07:21

Janek Sendrowski wrote
> Sorry, I formulated it wrong.
>
> My problem is, that I want to count the ranges for every user, but if I
> use count(range), it counts the ranges of all users.

Assuming your example output is indeed what you desire:

SELECT user, '0 to 25'::varchar AS percentage, count(*) FROM tbl WHERE val
BETWEEN 0 AND 25 GROUP BY user

UNION ALL

SELECT user, '25 to 50'::varchar, count(*) FROM tbl WHERE val BETWEEN 25 AND
50 GROUP BY user

...and so forth

Note that BETWEEN has inclusive end-points so anything with exactly 25 (for
example) is being double-counted.

Your other option is something like:

SELECT user, percentage_range_category, count(*) FROM (
   SELECT recordid, user, CASE .... END::varchar AS
percentage_range_category
) categorize
GROUP BY user, percentage_range_category;

In short for every record you simply categorize the value then add that
category to your group-by.

Both are equally valid and the second one is probably easier to comprehend;
the first option just happened to occur to me first.  I have no idea which
one would perform better in theory nor specifically with your data.

David J.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/how-to-use-aggregate-functions-in-this-case-tp5768522p5768525.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: how to use aggregate functions in this case

From

BladeOfLight16

Date:

26 August 2013, 00:37:46

On Sun, Aug 25, 2013 at 5:59 PM, Janek Sendrowski <janek12@web.de> wrote:

SELECT v_rec1.user,
   sum(CASE WHEN v_rec_fts.lev BETWEEN 0 AND 25 THEN 1 ELSE 0 END) as "0 to 25",
   sum(CASE WHEN v_rec_fts.lev BETWEEN 25 AND 50 THEN 1 ELSE 0 END) as "25 to 50",
sum(CASE WHEN v_rec_fts.lev BETWEEN 50 AND 100 THEN 1 ELSE 0 END) as "50 to 100"
   INTO v_rec2
   GROUP BY user;

Now I want to summuarize the "0 to 25" values and the others in the same query.
Somehow like this: count("0 to 25")
But I want to do it with every single user and I don't know how to do that

A result should look like this:

user percentage count
smith "0 to 25" 5
smith "25 to 50" 7
smith "50 to 75" 2
jones "0 to 25" 11
jones "25 to 50" 1
jones "50 to 75" 3

This appears to be some kind of equal interval problem.

SELECT v_rec1.user,

WIDTH_BUCKET(v_rec_fts.lev, 0, 100, 4) AS bucket

COUNT(*) as count,
FROM v_rec2
GROUP BY user, bucket;

(Untested, but this should be the gist.)

Bucket 1 would be 0 to 25, bucket 2 is 25 to 50, 3 is 50 to 75, 4 is 75 to 100. If you really need to change the bucket number to some kind of text, you can probably nest this query inside another that uses a CASE to pick the text based on on the bucket number.

Good luck.

Re: how to use aggregate functions in this case

From

BladeOfLight16

Date:

26 August 2013, 00:49:40

On Sun, Aug 25, 2013 at 8:36 PM, BladeOfLight16 <bladeoflight16@gmail.com> wrote:

This appears to be some kind of equal interval problem.

SELECT v_rec1.user,
WIDTH_BUCKET(v_rec_fts.lev, 0, 100, 4) AS bucket
COUNT(*) as count,
FROM v_rec2
GROUP BY user, bucket;

(Untested, but this should be the gist.)

Bucket 1 would be 0 to 25, bucket 2 is 25 to 50, 3 is 50 to 75, 4 is 75 to 100. If you really need to change the bucket number to some kind of text, you can probably nest this query inside another that uses a CASE to pick the text based on on the bucket number.

Good luck.

Then again, I guess you don't need a nested query.

SELECT v_rec1.user,
CASE WIDTH_BUCKET(v_rec_fts.lev, 0, 100, 4)

WHEN 1 THEN '0 to 25'

WHEN 2 THEN '25 to 50'

WHEN 3 THEN '50 to 75'

WHEN 4 THEN '75 to 100'

ELSE 'But how?'
END CASE AS quarter_percentage

COUNT(*) as count,
FROM v_rec2
GROUP BY user, quarter_percentage;

Re: how to use aggregate functions in this case

From

David Johnston

Date:

26 August 2013, 13:59:33

BladeOfLight16 wrote
> Then again, I guess you don't need a nested query.
>
> SELECT  v_rec1.user,
>         CASE WIDTH_BUCKET(v_rec_fts.lev, 0, 100, 4)
>               WHEN 1 THEN '0 to 25'
>               WHEN 2 THEN '25 to 50'
>               WHEN 3 THEN '50 to 75'
>               WHEN 4 THEN '75 to 100'
>               ELSE 'But how?'
>         END CASE AS quarter_percentage
>          COUNT(*) as count,
> FROM v_rec2
> GROUP BY user, quarter_percentage;

This is clean but requires the use of equal intervals.

Another option, though I am unfamiliar with the exact syntax, is to use the
contains operator and an "intrange" (range type, for integer or whatever
actual type is needed).

SELECT CASE WHEN val @> '(0, 25]' -- '(25,50]', etc...

This allows for uneven bucket sizes and avoid the double-inclusive endpoints
problem that results from using BETWEEN.

Requires 9.2

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/how-to-use-aggregate-functions-in-this-case-tp5768522p5768573.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: how to use aggregate functions in this case

From

"Janek Sendrowski"

Date:

26 August 2013, 22:11:08

Hi,

 

thanks for all your answers.

I'll have a try with the contains operator and the intrange, but before I'd like to know if this would work:

 

CASE WHEN a >= 0 AND a < 25

CASE WHEN a >= 25 AND a < 50

 

There wouldn't be a double endpoint. I just have to decide which range the endpoint includes.

 

Janek Sendrowski

Re: how to use aggregate functions in this case

From

David Johnston

Date:

27 August 2013, 01:15:33

Janek Sendrowski wrote
> Hi,
>
>  
>
> thanks for all your answers.
>
> I'll have a try with the contains operator and the intrange, but
> before I'd like to know if this would work:
>
>  
>
> CASE WHEN a >= 0 AND a < 25
>
> CASE WHEN a >= 25 AND a < 50
>
>  
>
> There wouldn't be a double endpoint. I just have to decide which range
> the endpoint includes.
>
>  
>
> Janek Sendrowski

Yes, using explicit comparison operators with "AND" will work just fine; its
just a little more verbose so the other options, if available, are preferred
from a readability standpoint.  I do not know whether the different options
may have different performance characteristics.

David J.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/how-to-use-aggregate-functions-in-this-case-tp5768522p5768636.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: how to use aggregate functions in this case

From

David Johnston

Date:

27 August 2013, 18:27:12

David Johnston wrote
>
> Janek Sendrowski wrote
>> Hi,
>>
>>  
>>
>> thanks for all your answers.
>>
>> I'll have a try with the contains operator and the intrange, but
>> before I'd like to know if this would work:
>>
>>  
>>
>> CASE WHEN a >= 0 AND a < 25
>>
>> CASE WHEN a >= 25 AND a < 50
>>
>>  
>>
>> There wouldn't be a double endpoint. I just have to decide which
>> range the endpoint includes.
>>
>>  
>>
>> Janek Sendrowski
> Yes, using explicit comparison operators with "AND" will work just fine;
> its just a little more verbose so the other options, if available, are
> preferred from a readability standpoint.  I do not know whether the
> different options may have different performance characteristics.
>
> David J.

And just for completeness:

WITH range_def (low, high, label) AS (
VALUES (0, 25, '0 to 25'::varchar), (25, 50, '25 to 50')
)
SELECT ...
FROM source_data
JOIN range_def ON (val >= low AND val < high)
[the rest of the query]

David J.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/how-to-use-aggregate-functions-in-this-case-tp5768522p5768721.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.