Re: group by range of values - Mailing list pgsql-sql

From Michael Glaesemann
Subject Re: group by range of values
Date
Msg-id 6F7EBD6A-A1FD-44A8-B4E6-EA415DFA3535@seespotcode.net
Whole thread Raw
In response to Re: group by range of values  ("Pavel Stehule" <pavel.stehule@gmail.com>)
List pgsql-sql
2007/7/27, Carol Cheung <cacheung@consumercontact.com>:

> db=# select * from tester order by birth_year;
>   birth_year | salary
> ------------+--------
>         1946 |  78000
>         1949 |  61000

What is the data type of the birth_year column? I'd suggest using  
date if you can, as what it is is a date with year precision. You  
can't specify such a precision, but you can decide that all  
birth_year's will have month and year of January 1 (enforced by a  
CHECK constraint, if you wish), or you could just choose to ignore  
the month and year part in your calculations.

> How can I display the average salary grouped by decade of birth year?
> That is, is it possible to display the average salary of those born in
> the 1940's, the average salary of those born in the 1950's, average
> salary of those born in the 1960's, and those born in the 1970's,  
> all in
> one result table?
> Something like:
>
> decade | average(salary)
> -------+-----------------
>    1940 |  69500
>    1950 |  53333.33
>    1960 |  53000
>    1970 |  40333.33


Here's an example:

CREATE TABLE salaries (birth_year DATE PRIMARY KEY, salary NUMERIC  
NOT NULL);

INSERT INTO salaries (birth_year, salary) VALUES    ('1946-01-01',78000), ('1949-01-01',61000), ('1951-01-01',58000)
,('1953-01-01',56000), ('1958-01-01',52000), ('1962-01-01',50000)    , ('1965-01-01',45000), ('1967-01-01',60000),
('1968-01-01',57000)   , ('1970-01-01',47000), ('1972-01-01',32000), ('1973-01-01',42000);
 

SELECT birth_decade, AVG(salary)
FROM (    SELECT birth_year        , date_trunc('decade', birth_year)::date as birth_decade        , salary    FROM
salaries)as salaries_with_decades
 
GROUP BY birth_decade
ORDER BY birth_decade;
birth_decade |        avg
--------------+--------------------
1940-01-01   | 69500.000000000000
1950-01-01   | 55333.333333333333
1960-01-01   | 53000.000000000000
1970-01-01   | 40333.333333333333
(4 rows)

If birth_year is an integer column, here's another way to do it,  
taking advantage of the fact that integer division truncates.

CREATE TABLE salaries (birth_year INTEGER PRIMARY KEY, salary NUMERIC  
NOT NULL);

INSERT INTO salaries (birth_year, salary) VALUES    (1946,78000), (1949,61000), (1951,58000), (1953,56000),  
(1958,52000)    , (1962,50000), (1965,45000), (1967,60000), (1968,57000),  
(1970,47000)    , (1972,32000), (1973,42000);

SELECT birth_decade, AVG(salary)
FROM (    SELECT birth_year        , birth_year / 10 * 10 as birth_decade        , salary    FROM salaries) as
salaries_with_decades
GROUP BY birth_decade
ORDER BY birth_decade;
birth_decade |        avg
--------------+--------------------         1940 | 69500.000000000000         1950 | 55333.333333333333         1960 |
53000.000000000000        1970 | 40333.333333333333
 
(4 rows)

Hope this gives you some options.

Michael Glaesemann
grzm seespotcode net




pgsql-sql by date:

Previous
From: "Rodrigo De León"
Date:
Subject: Re: group by range of values
Next
From: "Marcin Krawczyk"
Date:
Subject: raise exception and transaction handling