Thread: one-to-one

one-to-one

From
Michael Grant
Date:
Does anyone have any opinions or comments on one-to-one relationships? I'm
working on developing a single-table database I inherited into something
that I hope will be a little more useful. The database contains records of
individuals including both members and non-members of our organization.
There are a number of attributes that apply only to members--orientation
date, officer status and such--and I'm wondering whether it might make sense
to have a separate table just for members with those fields rather than
cluttering up the general table for individuals with attributes that won't
apply to many of them. Then I'd use a one-to-one join when I need to
retrieve both general and membership-specific data. What are the pros and
cons of this approach?

Thanks for any comments.

Michael

--
"Place a dirty shirt or some rags in an open pot or barrel containing a few
grains of wheat or some wheat bran, and in 21 days, mice will appear. There
will be adult males and females present, and they will be capable of mating
and reproducing more mice."

- Jan Baptista van Helmont, 16th century


Re: one-to-one

From
Josh Berkus
Date:
Micheal,

> won't apply to many of them. Then I'd use a one-to-one join when I need to
> retrieve both general and membership-specific data. What are the pros and
> cons of this approach?

This is the releational, 3NF approach.  I recommend it highly.

Do (NOT real SQL, just an example, consult the docs for accurate syntax)

create table people (
    id SERIAL PRIMARY KEY
    name
    address ....
);

create table member_info (
    id INT REFERENCES people(id) PRIMARY KEY
    orientation
    officer ...
);

Using ID both as PK and as an FK in the member_info table creates a 1:0-1
relationship.  Then, through the magic of PostgreSQL rules and views, you can
create a pseudo table called members:

CREATE VIEW members AS SELECT * FROM people JOIN members USING(id)

CREATE RULE member_insert ON INSERT INTO members
DO INSTEAD ( INSERT INTO people (name, address ....)
            VALUES (NEW.name, NEW.address, ... );
    INSERT INTO members (id, officer, orientation ...)
         (currval('people_id_seq'), NEW.officer, NEW.orientation)
;

And do the same for update and delete.
(see "CREATE RULE" in the docs, and this page:
http://www.postgresql.com/docs/7.3/interactive/rules.html)

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: one-to-one

From
Adrian Holovaty
Date:
Josh Berkus wrote:
> create table people (
>     id SERIAL PRIMARY KEY
>     name
>     address ....
> );
> create table member_info (
>     id INT REFERENCES people(id) PRIMARY KEY
>     orientation
>     officer ...
> );
> Using ID both as PK and as an FK in the member_info table creates a 1:0-1
> relationship.

What are the advantages and disadvantages of this approach over table
inheritance?

create table people (
   id SERIAL PRIMARY KEY
   name
   address ....
);
create table members (
   orientation
   officer ...
) inherits people;

Adrian

Re: one-to-one

From
Jason Hihn
Date:
Warning: Novice answering a question. ;-)

The way he did it, you'll have two tables, a master list of people and a
table just members & their member data. The way you did it, you'll have one,
and always store all the info. Inherits just adds existing schema to the
table, IIRC.



> -----Original Message-----
> From: pgsql-novice-owner@postgresql.org
> [mailto:pgsql-novice-owner@postgresql.org]On Behalf Of Adrian Holovaty
> Sent: Thursday, August 21, 2003 2:32 PM
> To: pgsql-novice@postgresql.org
> Subject: Re: [NOVICE] one-to-one
>
>
> Josh Berkus wrote:
> > create table people (
> >     id SERIAL PRIMARY KEY
> >     name
> >     address ....
> > );
> > create table member_info (
> >     id INT REFERENCES people(id) PRIMARY KEY
> >     orientation
> >     officer ...
> > );
> > Using ID both as PK and as an FK in the member_info table
> creates a 1:0-1
> > relationship.
>
> What are the advantages and disadvantages of this approach over table
> inheritance?
>
> create table people (
>    id SERIAL PRIMARY KEY
>    name
>    address ....
> );
> create table members (
>    orientation
>    officer ...
> ) inherits people;
>
> Adrian
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>


Re: one-to-one

From
Josh Berkus
Date:
Adrian,

> What are the advantages and disadvantages of this approach over table
> inheritance?

Well, I've always had a problem with table inheritance from a theory
perspective ... it's orthagonal to the SQL pseudo-relational model, and as an
orthagonal structure can cause consistency issues.

Postgresql has a few of these as implementation problems; particularly, the
inability to index over inherited tables, or to properly maintain a list of
dependancies for pg_dump and schema management.  And gods forbid you should
start playing with the structure of the parent table after inheritance.
These are all surmountable, but are current drawbacks to the inheritance
implementation.

In Michael's case, he has another issue; non-members will often *become*
members.  Using an inheritance model, he would then have to delete the person
from the "people" table, and insert them into "members", a less efficient
operation than simply adding member information to the member_info table.
Futher, it would be likely to disrupt whatever historical auditing mechanism
he has in place by adding a false delete and insert.

Finally, the creation of inherited child tables almost inevitably lead to the
desire for grandchild tables, which I don't think PostgreSQL supports.  For
example, imagine if Micheal realized that he needed an even more specific
group of members, "board_members".     Or it can lead to a desire for
"multiple-inheritance", another extremely problematic area.

Take this all with a grain of salt; as I said, I have an ideological issue
with table inheritance that certainly biases my viewpoint.  From my
perspective, table inheritance is an effort to extend inappropriate OOP
thinking into the SQL database space, and encourages many developers to
remain ignorant of SQL database architecture theory.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: one-to-one

From
Michael Grant
Date:
On 8/21/03 2:43 PM, "Josh Berkus" <josh@agliodbs.com> wrote:

> Well, I've always had a problem with table inheritance from a theory
> perspective ... it's orthagonal to the SQL pseudo-relational model, and as an
> orthagonal structure can cause consistency issues.

Thanks Josh, Jason and Adrian for your comments!
Josh, I'm curious why you refer to a "pseudo-relational model"?

Michael
--
"We're lost but we're making good time."
- Yogi Berra


Re: Database theory (was Re: one-to-one)

From
Josh Berkus
Date:
Ron,

> God bless you, Tiny Tim!  I had a similar discussion on [general]
> last week regarding arrays.

<grin>  sure.  Although arrays can be atomic and thus properly relational; it
just depends on how they are used.   Examples:

array of ordered molicules making up a genome: atomic & relational

array of points making up a graph:  atomic & relational

array of languages spoken by a salesperson:  non-atomic, non-relational

array of products offered by a supplier: non-atomic, non-relational

The real question to ask oneself when using an array data type is: "is the
data I am representing an ordered set which does not have meaning as
individual elements, and does not have the same meaning in a different
order?"  If the answer is yes, then please use an array.  If the answer is
no, then you should be using a related "child" table instead.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: one-to-one

From
Josh Berkus
Date:
Michael,

> Thanks Josh, Jason and Adrian for your comments!
> Josh, I'm curious why you refer to a "pseudo-relational model"?

Well, SQL is a very imperfect implementation of relational theory.  SQL fails
to implement a lot of Codd's relational calculus, such as distributed keys
and database constraints, as well as making some set operations (such as "is
not a member of" and "is the greatest(col) record of") far more difficult and
inefficient than they need to be.

For more detail on this, see Fabian Pascal's web page, particularly the
articles written by CJ Date:
www.dbdebunk.com

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Almost relational PostgreSQL (was: one-to-one)

From
Michael Glaesmann
Date:
Josh,

Even though I'm very much a novice at PostgreSQL, SQL in general, and
the full ramifications of the relational model, I've read as much as I
can of the writings of Pascal, Date, and Darwen (short of buying some
of their books, which I'm hoping to do when I get back to the States
during the winter holidays). So my curiosity was piqued when I read
these comments of yours:

On Friday, Aug 22, 2003, at 10:40 Asia/Tokyo, Josh Berkus wrote:

> Well, SQL is a very imperfect implementation of relational theory.
> SQL fails
> to implement a lot of Codd's relational calculus, such as distributed
> keys
> and database constraints, as well as making some set operations (such
> as "is
> not a member of" and "is the greatest(col) record of") far more
> difficult and
> inefficient than they need to be.

 From what I've read of PostgreSQL, it's a bit closer to true relational
than SQL, or at least you can limit yourself from using some of the
non-relational bits of SQL, such as always using NOT NULL in column
definitions, and using DISTINCT to eliminate duplicates in results
relations. I was wondering if you would share some of techniques you
would recommend using in PostgreSQL to make a database 'more
relational', or know of any sources on the web that I might find more
about this specifically.

Also, I often read about denormalization a database for performance
reasons. My understanding of this is not that normalization is a
problem in and of itself, but that the DBMS hasn't been properly
designed to handle highly normalized databases, which I gather has to
do with the fact the number of joins tends to increase with
normalization. However, I haven't seen much about this with respect to
PostgreSQL, or any particular DBMS, for that matter. Discussion seems
to always be on the theoretical level, though there should be
differences in performance as the engines underlying the different
DBMSs are different. I'd be interested in hearing your views on
practical normalization of PostgreSQL databases, or alternatively know
of places where I might find such discussion.

These are probably pretty big questions. I'm interested in anyone's
views on this, actually. I've addressed it specifically to Josh only
because it was his post that prompted me to ask.

Regards,

Michael


Re: Almost relational PostgreSQL (was: one-to-one)

From
Josh Berkus
Date:
Michael,

>  From what I've read of PostgreSQL, it's a bit closer to true relational
> than SQL, or at least you can limit yourself from using some of the
> non-relational bits of SQL, such as always using NOT NULL in column
> definitions, and using DISTINCT to eliminate duplicates in results
> relations. I was wondering if you would share some of techniques you
> would recommend using in PostgreSQL to make a database 'more
> relational', or know of any sources on the web that I might find more
> about this specifically.

Well, if you've already read Date, Darwin & Pascal, I think you've pretty much
covered things.  If you don't have it already, Pascal's "Practical Issues in
Database Management" is  a good place to start; for one thing, it's the
clearest description of the Normal Forms I've seen.

Unfortunately, a lot of the relational functionality that SQL (and PostgreSQL)
is missing can only be awkwardly implemented using PostgreSQL Triggers, Views
and Rules.  More is coming, though; when we have fully-functional
statement-level triggers in 7.5, and when/if Neil finishes implementing an
updatable view default, implementing distributed keys and set-based keys will
become a lot more possible.

As much as it is consistent with Codd, though, I don't recommend using
DISTINCT with everything; it's a performance-killer.   Neil and I were just
hashing this out on IRC.

Another tip:  beware of over-reliance on surrogate autoincrement keys.  They
are convenient and necessary some of the time, but use real keys where you
can.    The current industry reliance on "ID" primary keys encourages sloppy
thinking .... and sloppy schema ... by DBAs.  I have caught myself building
tables without any real keys this way.

For that matter, the term "Primary Key" is inherently meaningless anyway.
Don't get attached to it.

Maybe more later if I feel like it .... more likely, you've inspired me to a
magazine article.

> Also, I often read about denormalization a database for performance
> reasons. My understanding of this is not that normalization is a
> problem in and of itself, but that the DBMS hasn't been properly
> designed to handle highly normalized databases, which I gather has to
> do with the fact the number of joins tends to increase with
> normalization. However, I haven't seen much about this with respect to
> PostgreSQL, or any particular DBMS, for that matter. Discussion seems
> to always be on the theoretical level, though there should be
> differences in performance as the engines underlying the different
> DBMSs are different. I'd be interested in hearing your views on
> practical normalization of PostgreSQL databases, or alternatively know
> of places where I might find such discussion.

My attitude?  Construct your database according to strict normal principles,
and then see if you have a performance problem.   IMNSHO, 90% of the people
advocating denormalization haven't really tested; they are either making
assumptions based on outdated knowledge, or using "performance" as an excuse
for bad database design.

My personal limit of denormalization stops at a few NULL columns and using
cache tables to hold copies of views which are too slow.

Want to discuss this further?  Take it to the PGSQL-SQL list and/or the
#postgresql channel on irc.freenode.net.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: Almost relational PostgreSQL (was: one-to-one)

From
Michael Glaesmann
Date:
Josh,

>  If you don't have it already, Pascal's "Practical Issues in
> Database Management" is  a good place to start;

Thanks for your comments. Looks like I've got another book to add to my
reading list!

On Wednesday, Oct 22, 2003, at 05:10 Asia/Tokyo, Josh Berkus wrote:

> Maybe more later if I feel like it .... more likely, you've inspired
> me to a
> magazine article.

I'd be grateful to read it.

> Want to discuss this further?  Take it to the PGSQL-SQL list and/or the
> #postgresql channel on irc.freenode.net.

I'll start lurking.

Thanks again!

Michael


Surrogate vs natural keys (Was: Almost relational PostgreSQL (was: one-to-one))

From
Antonios Christofides
Date:
Josh Berkus wrote:
>  Another tip:  beware of over-reliance on surrogate autoincrement
>  keys.  They are convenient and necessary some of the time, but use
>  real keys where you can.    The current industry reliance on "ID"
>  primary keys encourages sloppy thinking .... and sloppy schema ... by
>  DBAs.  I have caught myself building tables without any real keys
>  this way.

Speaking of this, I'm finishing the design of a database and I tried to
follow this rule, but there were very few cases where I could. For
example, I have a table with device types, the key to which could be the
composite (manufacturer [a numeric id], modelname [a string]). I didn't
want to use that, however, because the user might type the wrong
modelname and later want to correct it. Not to mention that I've
frequently seen a different model name on the cover and on the label of
some machines.

I decided to use natural primary keys only for four entity types (I have
21 in total): languages, countries, states of the US, and
mime types (examples of keys are, respectively, "en", "UK", "CA",
"image/jpeg").

You think that 4 out of 21 is a good score? Should I allow primary key
changes?  Since the RDBMS does not support cascade updates, I'd need to
write triggers to do the job, and I would also have to make the
constraints deferred. And I doubt it would apply to more than one or two
entity types.

Antonios,

> Speaking of this, I'm finishing the design of a database and I tried to
> follow this rule, but there were very few cases where I could. For
> example, I have a table with device types, the key to which could be the
> composite (manufacturer [a numeric id], modelname [a string]). I didn't
> want to use that, however, because the user might type the wrong
> modelname and later want to correct it. Not to mention that I've
> frequently seen a different model name on the cover and on the label of
> some machines.

Yes, this is exactly the problem with trying to use real keys consistently.

> I decided to use natural primary keys only for four entity types (I have
> 21 in total): languages, countries, states of the US, and
> mime types (examples of keys are, respectively, "en", "UK", "CA",
> "image/jpeg").

All reference lists.  This is frequently how it goes.

> You think that 4 out of 21 is a good score? Should I allow primary key
> changes?

It's not about scores, and most of use are forced to use autonumber surrogate
keys a lot of the time to simplify database design.  The issue is to beware
of the "surrogate key trap" where you grow so used to autonumber keys that
you create a table with no *real* keys.   The problem with tables with no
real keys is that it's impossible to detect duplicates ....

Also, keep in mind that the term "primary key" is inherently meaningless, and
exists only as a convenience for legacy database developers and database
guis.  Within the world of relational design ... and relational calculus ...
there are only keys, which are all "equal".

> Since the RDBMS does not support cascade updates,

We don't?  That's news to me.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco