Thread: group by range of values

group by range of values

From
Carol Cheung
Date:
Hello,

Here's my table:

db=# select * from tester order by birth_year; birth_year | salary
------------+--------       1946 |  78000       1949 |  61000       1951 |  58000       1953 |  56000       1958 |
52000      1962 |  50000       1965 |  45000       1967 |  60000       1968 |  57000       1970 |  47000       1972 |
32000      1973 |  42000
 
(12 rows)

How can I display the average salary grouped by decade of birth year?
That is, is it possible to display the average salary of those born in 
the 1940's, the average salary of those born in the 1950's, average 
salary of those born in the 1960's, and those born in the 1970's, all in 
one result table?
Something like:

decade | average(salary)
-------+-----------------  1940 |  69500  1950 |  53333.33  1960 |  53000  1970 |  40333.33

Thanks in advance,
C


Re: group by range of values

From
Jon Sime
Date:
Carol Cheung wrote:
> Hello,
> 
> Here's my table:
> 
> db=# select * from tester order by birth_year;
>  birth_year | salary
> ------------+--------
>        1946 |  78000
>        1949 |  61000
>        1951 |  58000
>        1953 |  56000
>        1958 |  52000
>        1962 |  50000
>        1965 |  45000
>        1967 |  60000
>        1968 |  57000
>        1970 |  47000
>        1972 |  32000
>        1973 |  42000
> (12 rows)
> 
> How can I display the average salary grouped by decade of birth year?
> That is, is it possible to display the average salary of those born in 
> the 1940's, the average salary of those born in the 1950's, average 
> salary of those born in the 1960's, and those born in the 1970's, all in 
> one result table?
> Something like:
> 
> decade | average(salary)
> -------+-----------------
>   1940 |  69500
>   1950 |  53333.33
>   1960 |  53000
>   1970 |  40333.33
> 

Assuming birth_year is an integer (if it's not, then just change the 
query to cast it to one before the division), one possible approach 
might be:
    select birth_year / 10 || '0' as decade,        avg(salary::numeric) as average_salary    from tester    group by
decade   order by decade asc;
 

-Jon

-- 
Senior Systems Developer
Media Matters for America
http://mediamatters.org/


Re: group by range of values

From
"Pavel Stehule"
Date:
Hello

you can use CASE like

SELECT CASE WHEN birth_year BETWEEN 1940 AND 1949 THEN 1940 WHEN birth_year BETWEEN 1950 AND 1959 THEN 1950 WHEN
birth_yearBETWEEN 1960 AND 1969 THEN 1960 WHEN birth_year BETWEEN 1970 AND 1979  THEN 1979 END, AVG(salary) FROM tester
GROUPBY 1 ORDER BY 1;
 

Regards
Pavel Stehule


2007/7/27, Carol Cheung <cacheung@consumercontact.com>:
> Hello,
>
> Here's my table:
>
> db=# select * from tester order by birth_year;
>   birth_year | salary
> ------------+--------
>         1946 |  78000
>         1949 |  61000
>         1951 |  58000
>         1953 |  56000
>         1958 |  52000
>         1962 |  50000
>         1965 |  45000
>         1967 |  60000
>         1968 |  57000
>         1970 |  47000
>         1972 |  32000
>         1973 |  42000
> (12 rows)
>
> How can I display the average salary grouped by decade of birth year?
> That is, is it possible to display the average salary of those born in
> the 1940's, the average salary of those born in the 1950's, average
> salary of those born in the 1960's, and those born in the 1970's, all in
> one result table?
> Something like:
>
> decade | average(salary)
> -------+-----------------
>    1940 |  69500
>    1950 |  53333.33
>    1960 |  53000
>    1970 |  40333.33
>
> Thanks in advance,
> C
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
>


Re: group by range of values

From
"Rodrigo De León"
Date:
On 7/27/07, Carol Cheung <cacheung@consumercontact.com> wrote:
> Something like:
>
> decade | average(salary)
> -------+-----------------
>    1940 |  69500
>    1950 |  53333.33
>    1960 |  53000
>    1970 |  40333.33

CREATE TABLE tester ( birth_year integer, salary numeric(10,2)
);

SELECT
SUBSTRING(TO_CHAR(BIRTH_YEAR,'FM9999') FROM 1 FOR 3)||'0'
AS DECADE
, AVG(SALARY) AS AVG_SALARY
FROM TESTER
GROUP BY
SUBSTRING(TO_CHAR(BIRTH_YEAR,'FM9999') FROM 1 FOR 3)||'0'
ORDER BY DECADE;


Re: group by range of values

From
"Rodrigo De León"
Date:
On 7/27/07, I wrote:
> On 7/27/07, Carol Cheung <cacheung@consumercontact.com> wrote:
> > Something like:
> >
> > decade | average(salary)
> > -------+-----------------
> >    1940 |  69500
> >    1950 |  53333.33
> >    1960 |  53000
> >    1970 |  40333.33
>
> CREATE TABLE tester (
>   birth_year integer,
>   salary numeric(10,2)
> );
>
> SELECT
> SUBSTRING(TO_CHAR(BIRTH_YEAR,'FM9999') FROM 1 FOR 3)||'0'
> AS DECADE
> , AVG(SALARY) AS AVG_SALARY
> FROM TESTER
> GROUP BY
> SUBSTRING(TO_CHAR(BIRTH_YEAR,'FM9999') FROM 1 FOR 3)||'0'
> ORDER BY DECADE;

A bit simpler, if year is int:

SELECT
BIRTH_YEAR/10*10
AS DECADE
, AVG(SALARY) AS AVG_SALARY
FROM TESTER
GROUP BY
BIRTH_YEAR/10*10
ORDER BY DECADE;


Re: group by range of values

From
Michael Glaesemann
Date:
2007/7/27, Carol Cheung <cacheung@consumercontact.com>:

> db=# select * from tester order by birth_year;
>   birth_year | salary
> ------------+--------
>         1946 |  78000
>         1949 |  61000

What is the data type of the birth_year column? I'd suggest using  
date if you can, as what it is is a date with year precision. You  
can't specify such a precision, but you can decide that all  
birth_year's will have month and year of January 1 (enforced by a  
CHECK constraint, if you wish), or you could just choose to ignore  
the month and year part in your calculations.

> How can I display the average salary grouped by decade of birth year?
> That is, is it possible to display the average salary of those born in
> the 1940's, the average salary of those born in the 1950's, average
> salary of those born in the 1960's, and those born in the 1970's,  
> all in
> one result table?
> Something like:
>
> decade | average(salary)
> -------+-----------------
>    1940 |  69500
>    1950 |  53333.33
>    1960 |  53000
>    1970 |  40333.33


Here's an example:

CREATE TABLE salaries (birth_year DATE PRIMARY KEY, salary NUMERIC  
NOT NULL);

INSERT INTO salaries (birth_year, salary) VALUES    ('1946-01-01',78000), ('1949-01-01',61000), ('1951-01-01',58000)
,('1953-01-01',56000), ('1958-01-01',52000), ('1962-01-01',50000)    , ('1965-01-01',45000), ('1967-01-01',60000),
('1968-01-01',57000)   , ('1970-01-01',47000), ('1972-01-01',32000), ('1973-01-01',42000);
 

SELECT birth_decade, AVG(salary)
FROM (    SELECT birth_year        , date_trunc('decade', birth_year)::date as birth_decade        , salary    FROM
salaries)as salaries_with_decades
 
GROUP BY birth_decade
ORDER BY birth_decade;
birth_decade |        avg
--------------+--------------------
1940-01-01   | 69500.000000000000
1950-01-01   | 55333.333333333333
1960-01-01   | 53000.000000000000
1970-01-01   | 40333.333333333333
(4 rows)

If birth_year is an integer column, here's another way to do it,  
taking advantage of the fact that integer division truncates.

CREATE TABLE salaries (birth_year INTEGER PRIMARY KEY, salary NUMERIC  
NOT NULL);

INSERT INTO salaries (birth_year, salary) VALUES    (1946,78000), (1949,61000), (1951,58000), (1953,56000),  
(1958,52000)    , (1962,50000), (1965,45000), (1967,60000), (1968,57000),  
(1970,47000)    , (1972,32000), (1973,42000);

SELECT birth_decade, AVG(salary)
FROM (    SELECT birth_year        , birth_year / 10 * 10 as birth_decade        , salary    FROM salaries) as
salaries_with_decades
GROUP BY birth_decade
ORDER BY birth_decade;
birth_decade |        avg
--------------+--------------------         1940 | 69500.000000000000         1950 | 55333.333333333333         1960 |
53000.000000000000        1970 | 40333.333333333333
 
(4 rows)

Hope this gives you some options.

Michael Glaesemann
grzm seespotcode net