Thread: Join query help

Join query help

From
novice
Date:
Hi, We have the following three tables.

safety=> SELECT record_id, record_date FROM record;record_id |      record_date
-----------+------------------------        1 | 2007-07-23 11:30:37+10        2 | 2007-07-27 11:30:14+10        3 |
2007-07-1713:15:03+10
 
(3 rows)

safety=> SELECT observation_id, record_id, score_id FROM observation;observation_id | record_id | score_id
----------------+-----------+----------          3240 |         1 |        1          3239 |         1 |        1
  3238 |         1 |        2          3237 |         1 |        1          2872 |         2 |        1          2869 |
       2 |        2          2870 |         2 |        1          2871 |         2 |        1          3218 |         3
|       2          3217 |         3 |        1
 
(10 rows)

safety=> SELECT * FROM SCORE;score_id | score_description
----------+-------------------       0 | NA       1 | SAFE       2 | AT RISK
(3 rows)


What query do I write to generate the following?

week_no | count(record_id | count(observation_id) | sum(score_id) where = '1'
2007, 30  |   2   |   8   |   6
2007, 29  |   1   |   2   |   1


Re: [GENERAL] Join query help

From
novice
Date:
Try 2: Here are my DDLs & DMLs

--  Start

CREATE TABLE record
(
  record_id integer PRIMARY KEY,
  record_date timestamp with time zone NOT NULL
);

INSERT INTO record(record_id, record_date) VALUES ('1', '2007-07-23 11:30:37');
INSERT INTO record(record_id, record_date) VALUES ('2', '2007-07-27 11:30:14');
INSERT INTO record(record_id, record_date) VALUES ('3', '2007-07-17 13:15:03');


CREATE TABLE score
(
  score_id integer PRIMARY KEY,
  score_description character(7) NOT NULL

);

INSERT INTO score(score_id, score_description) VALUES ('0', 'NA');
INSERT INTO score(score_id, score_description) VALUES ('1', 'SAFE');
INSERT INTO score(score_id, score_description) VALUES ('2', 'AT RISK');


CREATE TABLE observation
(
  observation_id integer PRIMARY KEY,
  record_id integer REFERENCES record (record_id),
  score_id integer REFERENCES score (score_id)
);

INSERT INTO observation(observation_id, record_id, score_id) VALUES
('3240', '1', '1');
INSERT INTO observation(observation_id, record_id, score_id) VALUES
('3239', '1', '1');
INSERT INTO observation(observation_id, record_id, score_id) VALUES
('3238', '1', '2');
INSERT INTO observation(observation_id, record_id, score_id) VALUES
('3237', '1', '1');
INSERT INTO observation(observation_id, record_id, score_id) VALUES
('2872', '2', '1');
INSERT INTO observation(observation_id, record_id, score_id) VALUES
('2869', '2', '2');
INSERT INTO observation(observation_id, record_id, score_id) VALUES
('2870', '2', '1');
INSERT INTO observation(observation_id, record_id, score_id) VALUES
('2871', '2', '1');
INSERT INTO observation(observation_id, record_id, score_id) VALUES
('3218', '3', '2');
INSERT INTO observation(observation_id, record_id, score_id) VALUES
('3217', '3', '1');

-- End

By executing

SELECT week,
COUNT(record) as records,
SUM(inspection) as inspections

  FROM
    (
    SELECT TO_CHAR(record.record_date, 'YYYY, WW') as week,
    RECORD.RECORD_ID AS RECORD,
    COUNT(OBSERVATION_ID) AS INSPECTION
      FROM OBSERVATION
      LEFT JOIN record ON record.record_id = observation.record_id
      GROUP BY RECORD, WEEK
         ) A

  GROUP BY WEEK;

I get

   week   | records | inspections
----------+---------+--------------
 2007, 30 |       2 |            8
 2007, 29 |       1 |            2
(2 rows)


I like to query for a result set that will also have the sum(score_id)
where score_id = '1' like the following

   week   | records | inspections | score
----------+---------+-------------+------
 2007, 30 |       2 |           8 |    6
 2007, 29 |       1 |           2 |    1
(2 rows)

This will help identify that there were 6 SAFE observations found from
the 8 inspections on week 30.

I hope this is not too confusing that I have to re-design the whole schema

Re: Join query help

From
Michael Glaesemann
Date:
On Aug 18, 2007, at 0:35 , novice wrote:

> What query do I write to generate the following?
>
> week_no | count(record_id | count(observation_id) | sum(score_id)  
> where = '1'
> 2007, 30  |   2   |   8   |   6
> 2007, 29  |   1   |   2   |   1

Okay: let's take a look at what you're trying to get:

first column is year and week of year (which might be better labeled  
"week" rather than "week_no")
Checking the available date/time functions, I see extract can get at  
both of these attributes of a date value. That could be helpful.  
However, this splits the week and year into two separate values,  
which are two things to keep track of. As well, those values are now  
integers, which have little to do with dates. date_trunc allows you  
to truncate timestamps to the week, returning a timestamp value. This  
mean's we can look at timestamps with week-precision: and they're  
still timestamps, which means we can rely on them to act as dates in  
terms of ordering (Some would go ahead and format the year, week  
column as text right a way using TO_CHAR and rely on string  
comparisons for grouping and ordering, but there's no reason to do  
this, and you can no longer handle the value easily as a datetime  
value. If you need a specific output format, do it at the end or in  
your middleware.)

If you're going to group records by week (rather than timestamp)  
often, I'd go ahead and create a VIEW:

CREATE VIEW record_with_week AS
SELECT record_id       , date_trunc('week', record_date) AS record_week
FROM record;
SELECT *, extract(week from record_week) as week_number
test-# FROM record_with_week;
record_id |      record_week       | week_number
-----------+------------------------+-------------         1 | 2007-07-16 00:00:00-05 |          29         2 |
2007-07-2300:00:00-05 |          30         3 | 2007-07-16 00:00:00-05 |          29
 

You're also looking for the count of distinct records and  
observations per week, so let's do that:

SELECT record_week    , count(DISTINCT record_id) AS record_count    , count(DISTINCT observation_id) AS
observation_count
FROM record_with_week
NATURAL JOIN observation
GROUP BY record_week;      record_week       | record_count | observation_count
------------------------+--------------+-------------------
2007-07-16 00:00:00-05 |            2 |                 6
2007-07-23 00:00:00-05 |            1 |                 4
(2 rows)

Now here's where I started having trouble. I can't figure out how to  
get 2 observations for week 29 (record_id 1 & 3) and 8 for week 30  
(record_id 2). Assuming the data is wrong (which is admittedly a poor  
assumption), I moved ahead.

I'm interpreting "sum(score_id) where = '1'" as the sum of scores for  
observations of that week where score_description is 'SAFE' (note  
that '1' is text, and it appears that your score_id column is an  
integer). So, let's get the observations that were safe:

SELECT record_week, sum(score_id) as safe_score_sum
FROM record_with_week
NATURAL JOIN observation
NATURAL JOIN score
WHERE score_description = 'SAFE'
GROUP BY record_week;      record_week       | safe_score_sum
------------------------+----------------
2007-07-23 00:00:00-05 |              3
2007-07-16 00:00:00-05 |              4
(2 rows)

Note I'm using score_description = 'SAFE' rather than score_id = 1,  
as this is much more descriptive of what you're actually doing. It  
makes the query easier to read as well.

Again, I can't get my numbers to agree with yours, so I'm probably  
misinterpreting something, but I can't figure out another way to  
interpret what you've described.

So, let's join this to the query:

SELECT record_week    , count(DISTINCT record_id) AS record_count    , count(DISTINCT observation_id) AS
observation_count   , safe_score_sum
 
FROM record_with_week
NATURAL JOIN observation
NATURAL JOIN (    SELECT record_week, sum(score_id) as safe_score_sum    FROM record_with_week    NATURAL JOIN
observation   NATURAL JOIN score    WHERE score_description = 'SAFE'    GROUP BY record_week    ) safe_observation
 
GROUP BY record_week, safe_score_sum;      record_week       | record_count | observation_count |  
safe_score_sum
------------------------+--------------+------------------- 
+----------------
2007-07-16 00:00:00-05 |            2 |                 6  
|              4
2007-07-23 00:00:00-05 |            1 |                 4  
|              3
(2 rows)

One thing that struck me as odd is that you're summing an ID column.  
I'm guessing you're doing this as an attempt to count the total safe  
observations, taking advantage of the coincidence that the score_id  
is 1, so sum of observations with score_id = 1 is the same as the  
count of observations where score_id is 1. If this is indeed what's  
happening, I think I've got a better way to do it:

SELECT record_week    , count(DISTINCT record_id) AS record_count    , count(DISTINCT observation_id) AS
observation_count   , count(DISTINCT safe_observation_id) as safe_observation_count
 
FROM record_with_week
NATURAL JOIN observation
NATURAL JOIN (    SELECT record_week        , observation_id as safe_observation_id    FROM record_with_week    NATURAL
JOINobservation    NATURAL JOIN score    WHERE score_description = 'SAFE'    ) safe_observation
 
GROUP BY record_week;      record_week       | record_count | observation_count |  
safe_observation_count
------------------------+--------------+------------------- 
+------------------------
2007-07-16 00:00:00-05 |            2 |                 6  
|                      4
2007-07-23 00:00:00-05 |            1 |                 4  
|                      3
(2 rows)

And finally, formatting for output:

SELECT TO_CHAR(record_week, 'YYYY, IW') AS formatted_record_week    , count(DISTINCT record_id) AS record_count    ,
count(DISTINCTobservation_id) AS observation_count    , count(DISTINCT safe_observation_id) as safe_observation_count
 
FROM record_with_week
NATURAL JOIN observation
NATURAL JOIN (    SELECT record_week        , observation_id as safe_observation_id    FROM record_with_week    NATURAL
JOINobservation    NATURAL JOIN score    WHERE score_description = 'SAFE'    ) safe_observation
 
GROUP BY record_week;
formatted_record_week | record_count | observation_count |  
safe_observation_count
-----------------------+--------------+------------------- 
+------------------------
2007, 29              |            2 |                 6  
|                      4
2007, 30              |            1 |                 4  
|                      3
(2 rows)

I've got data below so others may play along at home.

Does this help?

Michael Glaesemann
grzm seespotcode net

CREATE TABLE record
(    record_id INTEGER PRIMARY KEY    , record_date TIMESTAMP(0) WITH TIME ZONE
);

INSERT INTO record (record_id, record_date) VALUES
(1, '2007-07-23 11:30:37+10')
, (2, '2007-07-27 11:30:14+10')
, (3, '2007-07-17 13:15:03+10');

CREATE TABLE score
(    score_id INTEGER PRIMARY KEY    , score_description TEXT NOT NULL UNIQUE
);
INSERT INTO score (score_id, score_description) VALUES
(0, 'NA')
, (1, 'SAFE')
, (2, 'AT RISK');

CREATE TABLE observation
(    observation_id INTEGER PRIMARY KEY    , record_id INTEGER NOT NULL    , score_id INTEGER NOT NULL
REFERENCESscore
 
);
INSERT INTO observation (observation_id, record_id, score_id) VALUES
(3240,1,1)
, (3239,1,1)
, (3238,1,2)
, (3237,1,1)
, (2872,2,1)
, (2869,2,2)
, (2870,2,1)
, (2871,2,1)
, (3218,3,2)
, (3217,3,1);



Re: Join query help

From
novice
Date:
On 21/08/07, Michael Glaesemann <grzm@seespotcode.net> wrote:

> Now here's where I started having trouble. I can't figure out how to
> get 2 observations for week 29 (record_id 1 & 3) and 8 for week 30
> (record_id 2). Assuming the data is wrong (which is admittedly a poor
> assumption), I moved ahead.

> > week_no | count(record_id | count(observation_id) | sum(score_id)
> > where = '1'
> > 2007, 30  |   2   |   8   |   6
> > 2007, 29  |   1   |   2   |   1

For week 29, there were 1 SAFE observation out of the 2 observations and
for week 30, there were 6 SAFE obseravations out of the 8
observations.  Hope this helps.

Thanks!


Re: [GENERAL] Join query help

From
Michael Glaesemann
Date:
On Aug 20, 2007, at 19:52 , novice wrote:

> Try 2: Here are my DDLs & DMLs

Thanks for the data. It looks like the data you provided in the first
set was a little different, and the queries I supplied in my previous
message give you the results you want.

> CREATE TABLE record
> (
>   record_id integer PRIMARY KEY,
>   record_date timestamp with time zone NOT NULL
> );
>
> INSERT INTO record(record_id, record_date) VALUES ('1', '2007-07-23
> 11:30:37');

Note: record_id is in integer, yet you're quoting the value ('1').
This causes the server to cast the text value to an integer. Here
it's not going to cause much of a problem, just a couple CPU cycles.
In table definitions (and possibly queries?) it can cause the server
to ignore otherwise usable indexes when planning queries. (I see
below your other inserts also quote integer values: you should drop
the quotes there as well.)

> CREATE TABLE score
> (
>   score_id integer PRIMARY KEY,
>   score_description character(7) NOT NULL
>
> );

Unless you have a strict business rule that score_description can
have no more than 7 characters, you should just use text instead of
character(7). It provides you more freedom (for example, you don't
have to change the column type if you ever want to use longer
descriptions). character(7) does not gain you anything in terms of
performance.

I'd also recommend adding a UNIQUE constraint to score_description.
It appears your score_id has no meaning other than use as a surrogate
key. There's nothing to prevent INSERT INTO score (score_id,
score_description) VALUES (5, 'SAFE') which would be quite confusing,
I should think. And if you don't have any particular reason to use a
surrogate key, you could just use score_description as the primary
key  of the table, dropping score_id altogether. it would reduce the
number of joins you needed to do to have easily interpretable query
results. (This all goes for your record table as well.)

> I like to query for a result set that will also have the sum(score_id)
> where score_id = '1' like the following
>
>    week   | records | inspections | score
> ----------+---------+-------------+------
>  2007, 30 |       2 |           8 |    6
>  2007, 29 |       1 |           2 |    1
> (2 rows)
>

(repeating from earlier post)

SELECT to_char(record_week, 'YYYY, IW') as formatted_record_week
     , count(DISTINCT record_id) AS record_count
     , count(DISTINCT observation_id) AS observation_count
     , safe_score_sum
FROM record_with_week
NATURAL JOIN observation
NATURAL JOIN (
     SELECT record_week, sum(score_id) as safe_score_sum
     FROM record_with_week
     NATURAL JOIN observation
     NATURAL JOIN score
     WHERE score_description = 'SAFE'
     GROUP BY record_week
     ) safe_observation
GROUP BY record_week, safe_score_sum;
formatted_record_week | record_count | observation_count |
safe_score_sum
-----------------------+--------------+-------------------
+----------------
2007, 29              |            1 |                 2
|              1
2007, 30              |            2 |                 8
|              6
(2 rows)


> This will help identify that there were 6 SAFE observations found from
> the 8 inspections on week 30.

Yeah, I thought so: you're actually looking for the *count* of SAFE
observations, not the sum of the score_id for 'SAFE'. So what you
really want is:

SELECT TO_CHAR(record_week, 'YYYY, IW') AS formatted_record_week
     , count(DISTINCT record_id) AS record_count
     , count(DISTINCT observation_id) AS observation_count
     , count(DISTINCT safe_observation_id) as safe_observation_count
FROM record_with_week
NATURAL JOIN observation
NATURAL JOIN (
     SELECT record_week
         , observation_id as safe_observation_id
     FROM record_with_week
     NATURAL JOIN observation
     NATURAL JOIN score
     WHERE score_description = 'SAFE'
     ) safe_observation
GROUP BY record_week;
formatted_record_week | record_count | observation_count |
safe_observation_count
-----------------------+--------------+-------------------
+------------------------
2007, 29              |            1 |                 2
|                      1
2007, 30              |            2 |                 8
|                      6
(2 rows)

Michael Glaesemann
grzm seespotcode net




Re: [GENERAL] Join query help

From
novice
Date:
Many many thanks for all the advice =)


On 21/08/07, Michael Glaesemann <grzm@seespotcode.net> wrote:
>
> On Aug 20, 2007, at 19:52 , novice wrote:
>
> > Try 2: Here are my DDLs & DMLs
>
> Thanks for the data. It looks like the data you provided in the first
> set was a little different, and the queries I supplied in my previous
> message give you the results you want.
>
> > CREATE TABLE record
> > (
> >   record_id integer PRIMARY KEY,
> >   record_date timestamp with time zone NOT NULL
> > );
> >
> > INSERT INTO record(record_id, record_date) VALUES ('1', '2007-07-23
> > 11:30:37');
>
> Note: record_id is in integer, yet you're quoting the value ('1').
> This causes the server to cast the text value to an integer. Here
> it's not going to cause much of a problem, just a couple CPU cycles.
> In table definitions (and possibly queries?) it can cause the server
> to ignore otherwise usable indexes when planning queries. (I see
> below your other inserts also quote integer values: you should drop
> the quotes there as well.)
>
> > CREATE TABLE score
> > (
> >   score_id integer PRIMARY KEY,
> >   score_description character(7) NOT NULL
> >
> > );
>
> Unless you have a strict business rule that score_description can
> have no more than 7 characters, you should just use text instead of
> character(7). It provides you more freedom (for example, you don't
> have to change the column type if you ever want to use longer
> descriptions). character(7) does not gain you anything in terms of
> performance.
>
> I'd also recommend adding a UNIQUE constraint to score_description.
> It appears your score_id has no meaning other than use as a surrogate
> key. There's nothing to prevent INSERT INTO score (score_id,
> score_description) VALUES (5, 'SAFE') which would be quite confusing,
> I should think. And if you don't have any particular reason to use a
> surrogate key, you could just use score_description as the primary
> key  of the table, dropping score_id altogether. it would reduce the
> number of joins you needed to do to have easily interpretable query
> results. (This all goes for your record table as well.)
>
> > I like to query for a result set that will also have the sum(score_id)
> > where score_id = '1' like the following
> >
> >    week   | records | inspections | score
> > ----------+---------+-------------+------
> >  2007, 30 |       2 |           8 |    6
> >  2007, 29 |       1 |           2 |    1
> > (2 rows)
> >
>
> (repeating from earlier post)
>
> SELECT to_char(record_week, 'YYYY, IW') as formatted_record_week
>      , count(DISTINCT record_id) AS record_count
>      , count(DISTINCT observation_id) AS observation_count
>      , safe_score_sum
> FROM record_with_week
> NATURAL JOIN observation
> NATURAL JOIN (
>      SELECT record_week, sum(score_id) as safe_score_sum
>      FROM record_with_week
>      NATURAL JOIN observation
>      NATURAL JOIN score
>      WHERE score_description = 'SAFE'
>      GROUP BY record_week
>      ) safe_observation
> GROUP BY record_week, safe_score_sum;
> formatted_record_week | record_count | observation_count |
> safe_score_sum
> -----------------------+--------------+-------------------
> +----------------
> 2007, 29              |            1 |                 2
> |              1
> 2007, 30              |            2 |                 8
> |              6
> (2 rows)
>
>
> > This will help identify that there were 6 SAFE observations found from
> > the 8 inspections on week 30.
>
> Yeah, I thought so: you're actually looking for the *count* of SAFE
> observations, not the sum of the score_id for 'SAFE'. So what you
> really want is:
>
> SELECT TO_CHAR(record_week, 'YYYY, IW') AS formatted_record_week
>      , count(DISTINCT record_id) AS record_count
>      , count(DISTINCT observation_id) AS observation_count
>      , count(DISTINCT safe_observation_id) as safe_observation_count
> FROM record_with_week
> NATURAL JOIN observation
> NATURAL JOIN (
>      SELECT record_week
>          , observation_id as safe_observation_id
>      FROM record_with_week
>      NATURAL JOIN observation
>      NATURAL JOIN score
>      WHERE score_description = 'SAFE'
>      ) safe_observation
> GROUP BY record_week;
> formatted_record_week | record_count | observation_count |
> safe_observation_count
> -----------------------+--------------+-------------------
> +------------------------
> 2007, 29              |            1 |                 2
> |                      1
> 2007, 30              |            2 |                 8
> |                      6
> (2 rows)
>
> Michael Glaesemann
> grzm seespotcode net
>
>
>
>

Re: [GENERAL] Join query help

From
Michael Glaesemann
Date:
On Aug 20, 2007, at 20:27 , Michael Glaesemann wrote:

> Note: record_id is in integer, yet you're quoting the value ('1').
> This causes the server to cast the text value to an integer. Here
> it's not going to cause much of a problem, just a couple CPU
> cycles. In table definitions (and possibly queries?) it can cause
> the server to ignore otherwise usable indexes when planning
> queries. (I see below your other inserts also quote integer values:
> you should drop the quotes there as well.)

Some clarification: what I meant by "table definitions" is if you're
actually defining text columns that reference integer columns. For
example:

CREATE TABLE foos (foo_id INTEGER PRIMARY KEY, foo text NOT NULL
UNIQUE);
CREATE TABLE bars (bar text PRIMARY KEY, foo_id text NOT NULL
REFERENCES foos (foo_id));

Michael Glaesemann
grzm seespotcode net



Re: [GENERAL] Join query help

From
Michael Glaesemann
Date:
On Aug 20, 2007, at 20:33 , novice wrote:

> Many many thanks for all the advice =)

Glad to help. Good luck!

Michael Glaesemann
grzm seespotcode net