Re: Querying multiple tables - Mailing list pgsql-novice

From Michael Glaesemann
Subject Re: Querying multiple tables
Date
Msg-id 63CC1079-5011-4EF2-B1F8-2626A5BAEE05@seespotcode.net
Whole thread Raw
In response to Re: Querying multiple tables  (Paul Malherbe <paul@tartan.co.za>)
Responses Re: Querying multiple tables
List pgsql-novice
On Aug 1, 2007, at 14:04 , Paul Malherbe wrote:

> Sorry about that, here is an example of what I am doing and what I
> would like to achieve.

Thanks. Providing an example of the results you're seeking is very
useful, as is providing example data.

 From your definitions (and the fact that you're posting to the -
novice list), I suspect you're quite new to SQL. I'll point out some
things I think you should do to improve your code.

> Create table rcaowm (rom_owner varchar(7), rom_name varchar(30));
> Create unique index rcaowm_key1 on rcaowm (rom_owner);

SQL syntax is case insensitive (except for identifiers if they're
quoted). Capitalizing the first letter of the statement is
unnecessary. It doesn't matter, but it looks a little odd.

Also, I think you should take some time to learn about database
normalization constraints such as NOT NULL, PRIMARY KEY and FOREIGN
KEY. These are really fundamental to using a relational database
server. I would rewrite the above statements as:

CREATE TABLE rcaowm
(
     rom_owner TEXT PRIMARY KEY
     , rom_name TEXT NOT NULL
);

For the most part, PRIMARY KEY is shorthand for UNIQUE NOT NULL. I
recommend setting at NOT NULL constraint on your columns by default
as well, as it will prevent perhaps unexpected query results,
especially if you're just starting out.

As for using TEXT instead of varchar(7), there's no performance
advantage to specifying the length of the varchar column. Unless you
have a compelling business reason to restrict the number of
characters, I'd leave it as TEXT.

> Insert into rcaowm values ('OOOOOOO', 'Owners Name');

Be explicit when naming columns in an INSERT statement. This
increases readability and prevents unexpected bugs caused by changes
in schema definition:

INSERT INTO rcaowm (rom_owner, rom_name)  VALUES  ('OOOOOOO', 'Owners
Name');

> Create table rcaowt (rot_owner varchar(7), rot_date int4, rot_tramt
> decimal);

CREATE TABLE rcaowt (
     rot_owner TEXT NOT NULL
         REFERENCES rcaowm
     , rot_date DATE NOT NULL
     , rot_tramt DECIMAL NOT NULL
);

I assume that the value of the rot_owner column must appear in the
rcaowm table in the rom_owner column. If this is so, set a foreign
key constraint so the database server can enforce it. That's what the
"REFERENCES rcaowm" clause does. If no column is listed in the
REFERENCES clause, it references the primary key of the referenced
table, in this case rom_owner. To be explicit, it could have been
written REFERENCES rcaowm (rom_owner).

Also, use appropriate datatypes. rot_date is a date, so use a DATE
column. This adds an additional constraint to be sure only valid date
values can be submitted. Right now, there's nothing to prevent
someone from doing something like:

INSERT INTO rcaowt (rot_owner, rot_date, rot_tramt) VALUES
('OOOOOOO', '20074533', 5000);

Using a date column will prevent that. Also, there are a lot of
functions that are available for handling dates which you might find
useful.

> Create index rcaowt_key1 on rcaowt (rot_owner);

A plain index doesn't really add anything to the schema definition,
though it can be used by the database server to look up information
in the table.

> Create table rcatnm (rtm_owner varchar(7), rtm_tenant varchar(7),
> rtm_name varchar(30));
> Create unique index rcatnm_key1 on rcatnm (rtm_owner, rtm_tenant);
> Insert into rcatnm values ('OOOOOOO', 'TTTTTTT', 'Tenants Name');

Same as above: use TEXT instead of VARCHAR(n), use NOT NULL, PRIMARY
KEY and REFERENCES, list columns in INSERT statements.

> Create table rcatnt (rtt_owner varchar(7), rtt_tenant varchar(7),
> rtt_date int4, rtt_tramt decimal);

This is a question I had in my first response: can rtt_owner be
anything other than the rot_owner for rtt_tenant? If so, your table
definition should probably be

CREATE TABLE rcatnt
(
     rtt_owner TEXT NOT NULL
         REFERENCES rcaowm
     , rtt_tenant TEXT NOT NULL
         REFERENCES rcaowt
     , rtt_date DATE NOT NULL
     , rtt_tramt DECIMAL NOT NULL
);

If the rtt_owner must be the same as rot_owner, just leave it out.
You can always look up the owner by joining through the rtt_tenant
table.

CREATE TABLE rcatnt
(
     rtt_tenant TEXT NOT NULL
         REFERENCES rcaowt
     , rtt_date DATE NOT NULL
     , rtt_tramt DECIMAL NOT NULL
);

In either case, it's not obvious to me what the primary key on this
table should be. You need some way to uniquely identify each row in
the table.

> Create index rcatnt_key1 on rcatnt (rtt_owner, rtt_tenant);

Again, this index doesn't do anything for your schema definition.

> Select rom_name as "Owner_Name", rtm_name as "Tenant_Name", sum
> (rot_tramt) as "Sum_Owner_Value" , sum(rtt_tramt) as
> "Sum_Tenant_Value" from rcaowm join rcaowt on (rom_owner =
> rot_owner) join rcatnt on (rom_owner = rtt_owner) join (select
> rtm_tenant, rtm_name from rcatnm) as tenants on (rtm_tenant =
> rtt_tenant) group by rom_name, rtm_name;

Do you find this easy to read? Personally, I don't. It helps if you
can format the query to make it easy for others (and yourself!) to read.

> This query returns:
>
> Owner_Name  | Tenant_Name  | Sum_Owner_Value | Sum_Tenant_Value
> -------------+--------------+-----------------+------------------
> Owners Name | Tenants Name |         -7202.0 |          12000.0
>
>
> I would like to return:
>
> Owner_Name  | Tenant_Name  | Sum_Owner_Value | Sum_Tenant_Value
> -------------+--------------+-----------------+------------------
> Owners Name | Tenants Name |         -3601.0 |           4000.0

This should get you most of the way there, as it gives the amounts
you want. However, it's not clear to me that this is *really* the
result you're looking for because of the ambiguity as to what owner
you're looking for (see my questions about the rcatnt table).

SELECT rot_owner, sum_rot_tramt, rtt_tenant, sum_rtt_tramt
FROM rcatnm
JOIN (
     SELECT rot_owner,  sum(rot_tramt) as sum_rot_tramt
     FROM rcaowt
     GROUP BY rot_owner) o ON (rtm_owner = rot_owner)
JOIN (
     SELECT rtt_tenant, sum(rtt_tramt) as sum_rtt_tramt
     FROM rcatnt
     GROUP BY rtt_tenant ) t ON (rtm_tenant = rtt_tenant);
  rot_owner | sum_rot_tramt | rtt_tenant | sum_rtt_tramt
-----------+---------------+------------+---------------
OOOOOOO   |       -3601.0 | TTTTTTT    |        4000.0
(1 row)

Hope this gets you pointed in the right direction. And I still
recommend naming your tables and columns with less cryptic names. I
know I had to repeatedly look up the table and column names when
working on this, and that's not time well spent. Letters are cheap!

Good luck!

Michael Glaesemann
grzm seespotcode net



pgsql-novice by date:

Previous
From: Paul Malherbe
Date:
Subject: Re: Querying multiple tables
Next
From: Paul Malherbe
Date:
Subject: Re: Querying multiple tables