Thread: Critique needed for contact-DB draft
Hi, I'm a bit under temporal pressure and I don't have much experience with data base design (only did some stuff with DBase a couple of years ago) and am a total newbie to PostgreSQL. That's why I bluntly ask you to criticize the design attached below. The database is for storing contacts to persons and organizations, basic relations between them (who is a member of whom), and events organized by them. More tables/fields may be added later, but that's pretty much what I need right now. I'm especially interested on your take of my use of arrays. They avoid the need for additional tables, but maybe they are not good, nevertheless? BTW, what I'm really missing is as a newbie is a way to define new data types as "structures" as known from e.g. the C programming language (or is this supported by PostgreSQL?). That would IMHO be a more natural concept for storing data (for example, then I could create an array of addresses for each contact). Felix The draft: "Major" tables: CONTACTS (not used "standalone"): contact_id, locations (location_id[]), phones (phone_id[]), emails (email_id[]), urls (url_id[]), description PERSONS (inherited from CONTACTS): surname, given_names, pseudonyms, prefix, suffix ORGANIZATIONS (inherited from CONTACTS): type, name CONTACT_ORGANIZATION_RELATIONS: member (contact_id), umbrella_organization (contact_id of organization), position_in_organization (position_id) EVENTS: event_id, name, date, location (location_id), description, organizers (contact_id[]) "Minor" tables: LOCATION: location_id, street1, street2, street3, city, postal_code, country (iso3166), subcountry (iso3166) PHONES: phone_id, number, type (tel|fax|mob|etc.), location (location_id|NULL) EMAILS: email_id, email_address, send_info (yes|no), put_in_ml (yes|no) URLS: url_id, url, put_in_links_list (yes|no) POSITIONS: position_id, position (CEO|CTO|subsidiary|...)
On Thu, Jul 15, 2004 at 02:27:37 +0200, "Felix E. Klee" <felix.klee@inka.de> wrote: > > I'm especially interested on your take of my use of arrays. They avoid > the need for additional tables, but maybe they are not good, > nevertheless? > > BTW, what I'm really missing is as a newbie is a way to define new data > types as "structures" as known from e.g. the C programming language (or > is this supported by PostgreSQL?). That would IMHO be a more natural > concept for storing data (for example, then I could create an array of > addresses for each contact). You don't want to do that. You want to have a person table and a table that contains a record for each contact address combination. Depending on how you are using the addresses you might also want a separate address table.
On Wednesday 14 July 2004 5:27 pm, Felix E. Klee wrote: > ...I bluntly ask you to criticize the design attached below... > > I'm especially interested on your take of my use of arrays. They > avoid the need for additional tables, but maybe they are not good, > nevertheless? Correct, they are not. Failure to create a well thought out database structure will cause you to lose the ability to realize the power of a SQL database and will be much harder to correct later than earlier. I know you are under "temporal pressure" but this is truely a case of "pay me now or pay me later". By way of example, say you want to write a query showing all contacts for a particular location. With an array of 0..n location elements that is very hard to do. Additionally, basic integrity checks that are easy to establish at the database level if the data structure is sound will be difficult to impossible with this array structure. An example here is ensuring that a location appears only once for a given contact and that all locations for a contact exist in the location table (and further, that the location record cannot be deleted if it is referenced by any other table). > BTW, what I'm really missing is as a newbie is a way to define new > data types as "structures" as known from e.g. the C programming > language (or is this supported by PostgreSQL?). That would IMHO be > a more natural concept for storing data (for example, then I could > create an array of addresses for each contact). No, what you are actually missing is experience "thinking in SQL." You are trying to develop in SQL while thinking in C. This will yield less than satisfactory results. Generally you should ask of each piece of data is it multivariate (an address field instead of separate fields for streetaddress, city, state and zip for example) and is it multivalued (ie. a field with 0..n phone numbers). If either of these exist then there is almost certainly something wrong with the design as this is basically the first step in data normalization. You may want to check out "Database Design for Mere Mortals". SQL has its roots in set theory. I can't try to write your entire schema but will offer an example to get you started. Say you have some people (name, sex, birthdate) and those people have zero or more phone numbers. You also have some organizations (name, president) to which those people belong. You might consider the following: people: person_id integer (primary key==> not null, unique - probably use a sequence number) name text, sex char(1) (validated as M or F) birthdate date (validated as appropriate for your data) phones: phone_id integer (primary key ==> not null, unique - use a sequence) phone number (validate as appropriate or even split into ac, prefix, number if necessary for your app_ number type (home, work, fax, cell, etc. - this can be hard coded but could also be validated against a "phone types" table) organizations: org_id integer (primary key ==> not null, unique, use a sequence) org_name president integer (links to people database) people_organizations (a "linking table") person_id integer org_id integer people_phones (another "linking table") person_id integer phone_id integer The linking table is the key to your array problem. It allows you to set up all sorts of many-to-many relationships and still enforce data integrity and perform efficient queries. A person can belong to several organizations and organizations will presumably have many members but organizing the data this way allows you to easily answer questions like how many organizations does Bob belong to, how many members does Fooorg have, who belongs to more than two organizations, etc. Organize your data right and you can run all sorts of queries easily. Do it wrong and you will suffer. Hope this helps. Cheers, Steve
On Wed, 14 Jul 2004 22:30:26 -0700 Steve Crawford wrote: > [arrays are bad] > I know you are under "temporal pressure" but this is truely a case of > "pay me now or pay me later". That's why I'm asking for advice on this list. It's not that I have to have the database today, but I don't want to spend days reorganizing it over and over again. For now, I'm aiming at the perfect database for our current needs. Planing for all possible future needs, OTOH, is is a bad idea, IMHO. It would probably complicate things considerably and in the end one has to reorganize anyways. > By way of example, say you want to write a query showing all contacts > for a particular location. With an array of 0..n location elements > that is very hard to do. Well, it is possible even for me as a newbie, but it is probably not as elegant and efficient as otherwise. Here's an example that I just built: SELECT * FROM contacts WHERE (SELECT location_id FROM locations WHERE name = 'Hannover') = ANY(locations) > You may want to check out "Database Design for Mere Mortals". Hm, I'm somewhat sceptical that a book on that topic needs to have 672 pages. I guess that *basic* database design principles and maybe also basic common naming conventions can be summarized on a dozen or so pages. If database design proves to be of further importance for me, I'll first investigate what other books there are. > [An example for a contacts scheme] Thanks a lot for your recommendations. I just build a new draft (see below). Note that I removed the inheritance scheme used in my last draft because I first like to get a feeling for basic database design features. Also note that in the current design an organization is an umbrella term that also includes single persons. I did that because persons and companies, etc. share many properties. Felix The new draft: "Major" tables: ORGANIZATIONS: contact_id, offers, demands, description, type (person|non-person), type_id (a person_id or a non-person_id) PERSONS: person_id, surname, given_names, pseudonyms, prefix, suffix, sex (M|F) NON-PERSONS: non-person_id, name, type (company|non-profit|...) EVENTS: event_id, name, date, address (an address_id), description "Minor" tables: ADDRESSES: address_id, street1, street2, street3, city, postal_code, country (iso3166), subcountry (iso3166) EMAIL_ADDS: email_id, address, send_info (yes|no), put_in_ml (yes|no) PHONE_ADDS: phone_id, number, type (tel|fax|mob|etc.), address (an address_id|NULL) URLS: url_id, url, list (yes|no) Link tables: ORGANIZATION_ADDRESSES: organization, address ORGANIZATION_EMAIL_ADDS: organization, email ORGANIZATION_PHONE_ADDS: organization, phone ORGANIZATION_URLS: organization, url EVENT_ORGANIZORS: event, organization PERSON_NON-PERSONS: person, non-person, type (CEO|CTO|Supporter|...) NON-PERSON_NON-PERSONS: non-person, non-person, type (subsidiary|member|...)
I forgot to create a relation that reflects that a person/non-person is an organization: On Thu, 15 Jul 2004 16:25:54 +0200 Felix E. Klee wrote: > "Major" tables: > ORGANIZATIONS: > contact_id, offers, demands, description, type (person|non-person), > type_id (a person_id or a non-person_id) contact_id should be substituted by organization_id. > PERSONS: > person_id, surname, given_names, pseudonyms, prefix, suffix, sex (M|F) The field organization is missing. > NON-PERSONS: > non-person_id, name, type (company|non-profit|...) The field organization is missing. Felix
On Thu, Jul 15, 2004 at 16:25:54 +0200, "Felix E. Klee" <felix.klee@inka.de> wrote: > The new draft: > > "Major" tables: > ORGANIZATIONS: > contact_id, offers, demands, description, type (person|non-person), > type_id (a person_id or a non-person_id) You probably want to have two separate type_id fields and use constraints to enforce exactly one of them being NULL. They should both REFERENCE the appropiate table. This makes the type field redundant, but if you find it simpler to have it then adjust the above constraints to make it consistant with which type_id field is not NULL. > > PERSONS: > person_id, surname, given_names, pseudonyms, prefix, suffix, sex (M|F) > > NON-PERSONS: > non-person_id, name, type (company|non-profit|...)
On Thu, 15 Jul 2004 20:24:12 +0200 Felix E. Klee wrote: > I forgot to create a relation that reflects that a person/non-person is > an organization: *Bonk*, I already have the type_id field for that purpose: > > "Major" tables: > > ORGANIZATIONS: > > contact_id, offers, demands, description, type (person|non-person), > > type_id (a person_id or a non-person_id) Felix
On Thu, Jul 15, 2004 at 20:24:12 +0200, "Felix E. Klee" <felix.klee@inka.de> wrote: > I forgot to create a relation that reflects that a person/non-person is > an organization: > > On Thu, 15 Jul 2004 16:25:54 +0200 Felix E. Klee wrote: > > "Major" tables: > > ORGANIZATIONS: > > contact_id, offers, demands, description, type (person|non-person), > > type_id (a person_id or a non-person_id) > > contact_id should be substituted by organization_id. That doesn't sound right. This table looks a lot like a contact table. Unless there is exactly one contact per organzation you probably want two tables. > > > PERSONS: > > person_id, surname, given_names, pseudonyms, prefix, suffix, sex (M|F) > > The field organization is missing. You don't need that. > > > NON-PERSONS: > > non-person_id, name, type (company|non-profit|...) > > The field organization is missing. You don't need that. If you are trying to use non-person to name the company rather than to list a contact address, then this should probably be an organization table.
On Thu, 15 Jul 2004 13:52:38 -0500 Bruno Wolff III wrote: > > > "Major" tables: > > > ORGANIZATIONS: > > > contact_id, offers, demands, description, type (person|non-person), > > > type_id (a person_id or a non-person_id) > > > > contact_id should be substituted by organization_id. > > That doesn't sound right. It's only a cosmetic change. I used to use to name that table CONTACTS. But after renaming it to ORGANIZATIONS, I forgot to rename the field contact_id into organization_id in order to keep notation consistent. > This table looks a lot like a contact table. > Unless there is exactly one contact per organzation you probably want > two tables. An organization can be used as a contact (but it has other uses as well, therefore the name change). It *is* either a person (a "one man organization") or a non-person (a company, a non-profit organization, etc.). Actual contact informations (addresses, email addresses, etc.) are stored in different tables. Now, I wonder what is the best scheme to create an "is a" relation: Scheme A: ORGANIZATIONS: organization_id, ... PERSONS: person_id, organization_id, ... NON-PERSONS: non-person_id, organization_id, ... Scheme B: ORGANIZATIONS: organization_id, type (person|non-person), type_id (a person_id or a non-person_id) PERSONS: person_id, ... NON-PERSONS: non-person_id, ... Scheme C: ORGANIZATIONS: organization_id, person_id (or NULL, if it is a non-person), non-person_id (or NULL, if it is a person), ... PERSONS: person_id, ... NON-PERSONS: non-person_id, ... Scheme D: ORGANIZATIONS: organization_id, ... PERSONS (inherits from ORGANIZATIONS): person_id, ... NON-PERSONS (inherits from ORGANIZATIONS): non-person_id, ... Scheme E: <something else> Actually, the person_id and non-person_id field may be superfluous for most of the schemes. Sorry, for bothering you with this trivial stuff. I think I should really get some manual to teach me the basics (but which one?). Felix