Thread: Critique needed for contact-DB draft

Critique needed for contact-DB draft

From
"Felix E. Klee"
Date:
Hi,

I'm a bit under temporal pressure and I don't have much experience with
data base design (only did some stuff with DBase a couple of years ago)
and am a total newbie to PostgreSQL. That's why I bluntly ask you to
criticize the design attached below. The database is for storing
contacts to persons and organizations, basic relations between them (who
is a member of whom), and events organized by them. More tables/fields
may be added later, but that's pretty much what I need right now.

I'm especially interested on your take of my use of arrays. They avoid
the need for additional tables, but maybe they are not good,
nevertheless?

BTW, what I'm really missing is as a newbie is a way to define new data
types as "structures" as known from e.g. the C programming language (or
is this supported by PostgreSQL?). That would IMHO be a more natural
concept for storing data (for example, then I could create an array of
addresses for each contact).

Felix

The draft:

"Major" tables:

    CONTACTS (not used "standalone"):
    contact_id, locations (location_id[]), phones (phone_id[]), emails
    (email_id[]), urls (url_id[]), description

        PERSONS (inherited from CONTACTS):
        surname, given_names, pseudonyms, prefix, suffix

        ORGANIZATIONS (inherited from CONTACTS):
        type, name

    CONTACT_ORGANIZATION_RELATIONS:
    member (contact_id), umbrella_organization (contact_id of organization),
    position_in_organization (position_id)

    EVENTS:
    event_id, name, date, location (location_id), description, organizers
    (contact_id[])

"Minor" tables:

    LOCATION:
    location_id, street1, street2, street3, city, postal_code,
    country (iso3166), subcountry (iso3166)

    PHONES:
    phone_id, number, type (tel|fax|mob|etc.), location (location_id|NULL)

    EMAILS:
    email_id, email_address, send_info (yes|no), put_in_ml (yes|no)

    URLS:
    url_id, url, put_in_links_list (yes|no)

    POSITIONS:
    position_id, position (CEO|CTO|subsidiary|...)

Re: Critique needed for contact-DB draft

From
Bruno Wolff III
Date:
On Thu, Jul 15, 2004 at 02:27:37 +0200,
  "Felix E. Klee" <felix.klee@inka.de> wrote:
>
> I'm especially interested on your take of my use of arrays. They avoid
> the need for additional tables, but maybe they are not good,
> nevertheless?
>
> BTW, what I'm really missing is as a newbie is a way to define new data
> types as "structures" as known from e.g. the C programming language (or
> is this supported by PostgreSQL?). That would IMHO be a more natural
> concept for storing data (for example, then I could create an array of
> addresses for each contact).

You don't want to do that. You want to have a person table and a table
that contains a record for each contact address combination. Depending
on how you are using the addresses you might also want a separate address
table.

Re: Critique needed for contact-DB draft

From
Steve Crawford
Date:
On Wednesday 14 July 2004 5:27 pm, Felix E. Klee wrote:
> ...I bluntly ask you to criticize the design attached below...
>
> I'm especially interested on your take of my use of arrays. They
> avoid the need for additional tables, but maybe they are not good,
> nevertheless?

Correct, they are not. Failure to create a well thought out database
structure will cause you to lose the ability to realize the power of
a SQL database and will be much harder to correct later than earlier.
I know you are under "temporal pressure" but this is truely a case of
"pay me now or pay me later".

By way of example, say you want to write a query showing all contacts
for a particular location. With an array of 0..n location elements
that is very hard to do.

Additionally, basic integrity checks that are easy to establish at the
database level if the data structure is sound will be difficult to
impossible with this array structure. An example here is ensuring
that a location appears only once for a given contact and that all
locations for a contact exist in the location table (and further,
that the location record cannot be deleted if it is referenced by any
other table).

> BTW, what I'm really missing is as a newbie is a way to define new
> data types as "structures" as known from e.g. the C programming
> language (or is this supported by PostgreSQL?). That would IMHO be
> a more natural concept for storing data (for example, then I could
> create an array of addresses for each contact).

No, what you are actually missing is experience "thinking in SQL." You
are trying to develop in SQL while thinking in C. This will yield
less than satisfactory results.

Generally you should ask of each piece of data is it multivariate (an
address field instead of separate fields for streetaddress, city,
state and zip for example) and is it multivalued (ie. a field with
0..n phone numbers). If either of these exist then there is almost
certainly something wrong with the design as this is basically the
first step in data normalization.

You may want to check out "Database Design for Mere Mortals".

SQL has its roots in set theory. I can't try to write your entire
schema but will offer an example to get you started. Say you have
some people (name, sex, birthdate) and those people have zero or more
phone numbers. You also have some organizations (name, president) to
which those people belong. You might consider the following:

people:
person_id integer (primary key==> not null, unique - probably use a
sequence number)
name text,
sex char(1) (validated as M or F)
birthdate date (validated as appropriate for your data)

phones:
phone_id integer (primary key ==> not null, unique - use a sequence)
phone number (validate as appropriate or even split into ac, prefix,
number if necessary for your app_
number type (home, work, fax, cell, etc. - this can be hard coded but
could also be validated against a "phone types" table)

organizations:
org_id integer (primary key ==> not null, unique, use a sequence)
org_name
president integer (links to people database)

people_organizations (a "linking table")
person_id integer
org_id integer

people_phones (another "linking table")
person_id integer
phone_id integer

The linking table is the key to your array problem. It allows you to
set up all sorts of many-to-many relationships and still enforce data
integrity and perform efficient queries.

A person can belong to several organizations and organizations will
presumably have many members but organizing the data this way allows
you to easily answer questions like how many organizations does Bob
belong to, how many members does Fooorg have, who belongs to more
than two organizations, etc.

Organize your data right and you can run all sorts of queries easily.
Do it wrong and you will suffer.

Hope this helps.

Cheers,
Steve


Re: Critique needed for contact-DB draft

From
"Felix E. Klee"
Date:
On Wed, 14 Jul 2004 22:30:26 -0700 Steve Crawford wrote:
> [arrays are bad]
> I know you are under "temporal pressure" but this is truely a case of
> "pay me now or pay me later".

That's why I'm asking for advice on this list. It's not that I have to
have the database today, but I don't want to spend days reorganizing it
over and over again. For now, I'm aiming at the perfect database for our
current needs. Planing for all possible future needs, OTOH, is is a bad
idea, IMHO. It would probably complicate things considerably and in the
end one has to reorganize anyways.

> By way of example, say you want to write a query showing all contacts
> for a particular location. With an array of 0..n location elements
> that is very hard to do.

Well, it is possible even for me as a newbie, but it is probably not as
elegant and efficient as otherwise. Here's an example that I just built:

SELECT * FROM contacts WHERE (SELECT location_id FROM locations WHERE name = 'Hannover') = ANY(locations)

> You may want to check out "Database Design for Mere Mortals".

Hm, I'm somewhat sceptical that a book on that topic needs to have 672
pages. I guess that *basic* database design principles and maybe also
basic common naming conventions can be summarized on a dozen or so
pages. If database design proves to be of further importance for me,
I'll first investigate what other books there are.

> [An example for a contacts scheme]

Thanks a lot for your recommendations. I just build a new draft (see
below). Note that I removed the inheritance scheme used in my last draft
because I first like to get a feeling for basic database design
features. Also note that in the current design an organization is an
umbrella term that also includes single persons. I did that because
persons and companies, etc. share many properties.

Felix

The new draft:

"Major" tables:
    ORGANIZATIONS:
    contact_id, offers, demands, description, type (person|non-person),
    type_id (a person_id or a non-person_id)

    PERSONS:
    person_id, surname, given_names, pseudonyms, prefix, suffix, sex (M|F)

    NON-PERSONS:
    non-person_id, name, type (company|non-profit|...)

    EVENTS:
    event_id, name, date, address (an address_id), description

"Minor" tables:
    ADDRESSES:
    address_id, street1, street2, street3, city, postal_code,
    country (iso3166), subcountry (iso3166)

    EMAIL_ADDS:
    email_id, address, send_info (yes|no), put_in_ml (yes|no)

    PHONE_ADDS:
    phone_id, number, type (tel|fax|mob|etc.), address (an address_id|NULL)

    URLS:
    url_id, url, list (yes|no)

Link tables:
    ORGANIZATION_ADDRESSES:
    organization, address

    ORGANIZATION_EMAIL_ADDS:
    organization, email

    ORGANIZATION_PHONE_ADDS:
    organization, phone

    ORGANIZATION_URLS:
    organization, url

    EVENT_ORGANIZORS:
    event, organization

    PERSON_NON-PERSONS:
    person, non-person, type (CEO|CTO|Supporter|...)

    NON-PERSON_NON-PERSONS:
    non-person, non-person, type (subsidiary|member|...)

Re: Critique needed for contact-DB draft

From
"Felix E. Klee"
Date:
I forgot to create a relation that reflects that a person/non-person is
an organization:

On Thu, 15 Jul 2004 16:25:54 +0200 Felix E. Klee wrote:
> "Major" tables:
>     ORGANIZATIONS:
>     contact_id, offers, demands, description, type (person|non-person),
>     type_id (a person_id or a non-person_id)

contact_id should be substituted by organization_id.

>     PERSONS:
>     person_id, surname, given_names, pseudonyms, prefix, suffix, sex (M|F)

The field organization is missing.

>     NON-PERSONS:
>     non-person_id, name, type (company|non-profit|...)

The field organization is missing.

Felix

Re: Critique needed for contact-DB draft

From
Bruno Wolff III
Date:
On Thu, Jul 15, 2004 at 16:25:54 +0200,
  "Felix E. Klee" <felix.klee@inka.de> wrote:
> The new draft:
>
> "Major" tables:
>     ORGANIZATIONS:
>     contact_id, offers, demands, description, type (person|non-person),
>     type_id (a person_id or a non-person_id)

You probably want to have two separate type_id fields and use constraints
to enforce exactly one of them being NULL. They should both REFERENCE the
appropiate table. This makes the type field redundant, but if you find it
simpler to have it then adjust the above constraints to make it consistant
with which type_id field is not NULL.

>
>     PERSONS:
>     person_id, surname, given_names, pseudonyms, prefix, suffix, sex (M|F)
>
>     NON-PERSONS:
>     non-person_id, name, type (company|non-profit|...)

Re: Critique needed for contact-DB draft

From
"Felix E. Klee"
Date:
On Thu, 15 Jul 2004 20:24:12 +0200 Felix E. Klee wrote:
> I forgot to create a relation that reflects that a person/non-person is
> an organization:

*Bonk*, I already have the type_id field for that purpose:

> > "Major" tables:
> >     ORGANIZATIONS:
> >     contact_id, offers, demands, description, type (person|non-person),
> >     type_id (a person_id or a non-person_id)

Felix

Re: Critique needed for contact-DB draft

From
Bruno Wolff III
Date:
On Thu, Jul 15, 2004 at 20:24:12 +0200,
  "Felix E. Klee" <felix.klee@inka.de> wrote:
> I forgot to create a relation that reflects that a person/non-person is
> an organization:
>
> On Thu, 15 Jul 2004 16:25:54 +0200 Felix E. Klee wrote:
> > "Major" tables:
> >     ORGANIZATIONS:
> >     contact_id, offers, demands, description, type (person|non-person),
> >     type_id (a person_id or a non-person_id)
>
> contact_id should be substituted by organization_id.

That doesn't sound right. This table looks a lot like a contact table.
Unless there is exactly one contact per organzation you probably want two
tables.

>
> >     PERSONS:
> >     person_id, surname, given_names, pseudonyms, prefix, suffix, sex (M|F)
>
> The field organization is missing.

You don't need that.

>
> >     NON-PERSONS:
> >     non-person_id, name, type (company|non-profit|...)
>
> The field organization is missing.

You don't need that.

If you are trying to use non-person to name the company rather than to
list a contact address, then this should probably be an organization
table.

Re: Critique needed for contact-DB draft

From
"Felix E. Klee"
Date:
On Thu, 15 Jul 2004 13:52:38 -0500 Bruno Wolff III wrote:
> > > "Major" tables:
> > >     ORGANIZATIONS:
> > >     contact_id, offers, demands, description, type (person|non-person),
> > >     type_id (a person_id or a non-person_id)
> >
> > contact_id should be substituted by organization_id.
>
> That doesn't sound right.

It's only a cosmetic change. I used to use to name that table
CONTACTS. But after renaming it to ORGANIZATIONS, I forgot to rename the
field contact_id into organization_id in order to keep notation
consistent.

> This table looks a lot like a contact table.
> Unless there is exactly one contact per organzation you probably want
> two tables.

An organization can be used as a contact (but it has other uses as well,
therefore the name change). It *is* either a person (a "one man
organization") or a non-person (a company, a non-profit organization,
etc.). Actual contact informations (addresses, email addresses, etc.)
are stored in different tables.

Now, I wonder what is the best scheme to create an "is a" relation:

Scheme A:
    ORGANIZATIONS: organization_id, ...
    PERSONS: person_id, organization_id, ...
    NON-PERSONS: non-person_id, organization_id, ...
Scheme B:
    ORGANIZATIONS: organization_id, type (person|non-person), type_id
        (a person_id or a non-person_id)
    PERSONS: person_id, ...
    NON-PERSONS: non-person_id, ...
Scheme C:
    ORGANIZATIONS: organization_id, person_id (or NULL, if it is a
        non-person), non-person_id (or NULL, if it is a person), ...
    PERSONS: person_id, ...
    NON-PERSONS: non-person_id, ...
Scheme D:
    ORGANIZATIONS: organization_id, ...
    PERSONS (inherits from ORGANIZATIONS): person_id, ...
    NON-PERSONS (inherits from ORGANIZATIONS): non-person_id, ...
Scheme E:
    <something else>

Actually, the person_id and non-person_id field may be superfluous for
most of the schemes.

Sorry, for bothering you with this trivial stuff. I think I should
really get some manual to teach me the basics (but which one?).

Felix