Re: aggregate query - Mailing list pgsql-sql

From Andrew Kroeger
Subject Re: aggregate query
Date
Msg-id 465BDE9F.4010605@sprocks.gotdns.com
Whole thread Raw
In response to aggregate query  ("Raj A" <raj.ayappan@gmail.com>)
Responses Re: aggregate query  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-sql
Raj A wrote:
> I have a table
> 
> CREATE TABLE survey_load
> (
>  meter_id character(5) NOT NULL,
>  number_of_bays integer NOT NULL,
>  bay_1_use integer,
>  bay_2_use integer,
>  bay_3_use integer,
>  bay_4_use integer,
>  bay_5_use integer,
>  date date NOT NULL,
>  inspection_id integer NOT NULL DEFAULT,
> )
> 
> How do i present an aggregate query
> 
> inspection_id  |  meter_id  |  bay_use
> 1                 12345        (value of bay_1_use)
> 1                 12345        (value of bay_2_use)
> 1                 12345        (value of bay_3_use)
> 2                 23456        (value of bay_1_use)
> 2                 23456        (value of bay_2_use)
> 2                 23456        (value of bay_3_use)
> 2                 23456        (value of bay_4_use)
> 2                 23456        (value of bay_5_use)

Firstly, the table definition you provided has an invalid specification
for the inspection_id column.  For my examples, I simply used a "NOT
NULL" constraint, and did not implement any DEFAULT clause.

If I understand your issue correctly, it seems like the denormalized
nature of your table is causing you some problems.  With the table
definition and output you provided, I am assuming the following contents
for the survey_load table:
meter_id | number_of_bays | bay_1_use | bay_2_use | bay_3_use |
bay_4_use | bay_5_use |    date    | inspection_id

----------+----------------+-----------+-----------+-----------+-----------+-----------+------------+---------------12345
  |              3 |        11 |        12 |        13 |  |           | 2007-05-29 |             123456    |
 5 |        21 |        22 |        23 |
 
24 |        25 | 2007-05-29 |             2
(2 rows)

The value of the number_of_bays column is driving the number of results
in your summary query, as opposed to allowing the data relations to
drive that on their own with a normalized representation.  You could
normalize your data representation into 2 tables:

tmp1=> \d+ new_survey_load             Table "new_survey_load"   Column     |     Type     | Modifiers | Description
---------------+--------------+-----------+-------------survey_id     | integer      | not null  |meter_id      |
character(5)| not null  |date          | date         | not null  |inspection_id | integer      | not null  |
 
Indexes:   "new_survey_load_pkey" PRIMARY KEY, btree (survey_id)
Has OIDs: no

tmp1=> \d+ new_bay_use           Table "new_bay_use"  Column   |  Type   | Modifiers | Description
------------+---------+-----------+-------------survey_id  | integer | not null  |bay_number | integer | not null
|bay_use   | integer | not null  |
 
Indexes:   "ind_new_bay_use__survey_id" btree (survey_id)
Foreign-key constraints:   "new_bay_use_survey_id_fkey" FOREIGN KEY (survey_id) REFERENCES
new_survey_load(survey_id)
Has OIDs: no

This normalization also makes it easier in the future to track
additional bays without having to alter the table structure.

Following your example from above, those tables would contain the
following values:

tmp1=> select * from new_survey_load ;survey_id | meter_id |    date    | inspection_id
-----------+----------+------------+---------------       13 | 12345    | 2007-05-29 |             1       14 | 23456
| 2007-05-29 |             2
 
(2 rows)

tmp1=> select * from new_bay_use ;survey_id | bay_number | bay_use
-----------+------------+---------       13 |          1 |      11       13 |          2 |      12       13 |
3|      13       14 |          1 |      21       14 |          2 |      22       14 |          3 |      23       14 |
      4 |      24       14 |          5 |      25
 
(8 rows)

The query to get the summary results you are looking for would then be:

selectl.inspection_id,l.meter_id,u.bay_use
fromnew_survey_load l
join new_bay_use u onl.survey_id = u.survey_id
order byl.inspection_id, l.meter_id, u.bay_number;

You could also use a view to join the 2 new tables together to match the
format of your original table definition:

selectl.meter_id,count(u.bay_number) as number_of_bays,u1.bay_use as bay_1_use,u2.bay_use as bay_2_use,u3.bay_use as
bay_3_use,u4.bay_useas bay_4_use,u5.bay_use as bay_5_use,l.date,l.inspection_id
 
fromnew_survey_load l
join new_bay_use u onl.survey_id = u.survey_id
left join new_bay_use u1 onl.survey_id = u1.survey_id and u1.bay_number = 1
left join new_bay_use u2 onl.survey_id = u2.survey_id and u2.bay_number = 2
left join new_bay_use u3 onl.survey_id = u3.survey_id and u3.bay_number = 3
left join new_bay_use u4 onl.survey_id = u4.survey_id and u4.bay_number = 4
left join new_bay_use u5 onl.survey_id = u5.survey_id and u5.bay_number = 5
group by1,3,4,5,6,7,8,9;

If you add more bays for tracking, you can simply add additional joins
against new_bay_use to mimic what you would have had in your original
table structure.

Hope this helps.

Andrew



pgsql-sql by date:

Previous
From: "Raj A"
Date:
Subject: aggregate query
Next
From: Gregory Stark
Date:
Subject: Re: aggregate query