Thread: one-to-one
Does anyone have any opinions or comments on one-to-one relationships? I'm working on developing a single-table database I inherited into something that I hope will be a little more useful. The database contains records of individuals including both members and non-members of our organization. There are a number of attributes that apply only to members--orientation date, officer status and such--and I'm wondering whether it might make sense to have a separate table just for members with those fields rather than cluttering up the general table for individuals with attributes that won't apply to many of them. Then I'd use a one-to-one join when I need to retrieve both general and membership-specific data. What are the pros and cons of this approach? Thanks for any comments. Michael -- "Place a dirty shirt or some rags in an open pot or barrel containing a few grains of wheat or some wheat bran, and in 21 days, mice will appear. There will be adult males and females present, and they will be capable of mating and reproducing more mice." - Jan Baptista van Helmont, 16th century
Micheal, > won't apply to many of them. Then I'd use a one-to-one join when I need to > retrieve both general and membership-specific data. What are the pros and > cons of this approach? This is the releational, 3NF approach. I recommend it highly. Do (NOT real SQL, just an example, consult the docs for accurate syntax) create table people ( id SERIAL PRIMARY KEY name address .... ); create table member_info ( id INT REFERENCES people(id) PRIMARY KEY orientation officer ... ); Using ID both as PK and as an FK in the member_info table creates a 1:0-1 relationship. Then, through the magic of PostgreSQL rules and views, you can create a pseudo table called members: CREATE VIEW members AS SELECT * FROM people JOIN members USING(id) CREATE RULE member_insert ON INSERT INTO members DO INSTEAD ( INSERT INTO people (name, address ....) VALUES (NEW.name, NEW.address, ... ); INSERT INTO members (id, officer, orientation ...) (currval('people_id_seq'), NEW.officer, NEW.orientation) ; And do the same for update and delete. (see "CREATE RULE" in the docs, and this page: http://www.postgresql.com/docs/7.3/interactive/rules.html) -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus wrote: > create table people ( > id SERIAL PRIMARY KEY > name > address .... > ); > create table member_info ( > id INT REFERENCES people(id) PRIMARY KEY > orientation > officer ... > ); > Using ID both as PK and as an FK in the member_info table creates a 1:0-1 > relationship. What are the advantages and disadvantages of this approach over table inheritance? create table people ( id SERIAL PRIMARY KEY name address .... ); create table members ( orientation officer ... ) inherits people; Adrian
Warning: Novice answering a question. ;-) The way he did it, you'll have two tables, a master list of people and a table just members & their member data. The way you did it, you'll have one, and always store all the info. Inherits just adds existing schema to the table, IIRC. > -----Original Message----- > From: pgsql-novice-owner@postgresql.org > [mailto:pgsql-novice-owner@postgresql.org]On Behalf Of Adrian Holovaty > Sent: Thursday, August 21, 2003 2:32 PM > To: pgsql-novice@postgresql.org > Subject: Re: [NOVICE] one-to-one > > > Josh Berkus wrote: > > create table people ( > > id SERIAL PRIMARY KEY > > name > > address .... > > ); > > create table member_info ( > > id INT REFERENCES people(id) PRIMARY KEY > > orientation > > officer ... > > ); > > Using ID both as PK and as an FK in the member_info table > creates a 1:0-1 > > relationship. > > What are the advantages and disadvantages of this approach over table > inheritance? > > create table people ( > id SERIAL PRIMARY KEY > name > address .... > ); > create table members ( > orientation > officer ... > ) inherits people; > > Adrian > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings >
Adrian, > What are the advantages and disadvantages of this approach over table > inheritance? Well, I've always had a problem with table inheritance from a theory perspective ... it's orthagonal to the SQL pseudo-relational model, and as an orthagonal structure can cause consistency issues. Postgresql has a few of these as implementation problems; particularly, the inability to index over inherited tables, or to properly maintain a list of dependancies for pg_dump and schema management. And gods forbid you should start playing with the structure of the parent table after inheritance. These are all surmountable, but are current drawbacks to the inheritance implementation. In Michael's case, he has another issue; non-members will often *become* members. Using an inheritance model, he would then have to delete the person from the "people" table, and insert them into "members", a less efficient operation than simply adding member information to the member_info table. Futher, it would be likely to disrupt whatever historical auditing mechanism he has in place by adding a false delete and insert. Finally, the creation of inherited child tables almost inevitably lead to the desire for grandchild tables, which I don't think PostgreSQL supports. For example, imagine if Micheal realized that he needed an even more specific group of members, "board_members". Or it can lead to a desire for "multiple-inheritance", another extremely problematic area. Take this all with a grain of salt; as I said, I have an ideological issue with table inheritance that certainly biases my viewpoint. From my perspective, table inheritance is an effort to extend inappropriate OOP thinking into the SQL database space, and encourages many developers to remain ignorant of SQL database architecture theory. -- -Josh Berkus Aglio Database Solutions San Francisco
On 8/21/03 2:43 PM, "Josh Berkus" <josh@agliodbs.com> wrote: > Well, I've always had a problem with table inheritance from a theory > perspective ... it's orthagonal to the SQL pseudo-relational model, and as an > orthagonal structure can cause consistency issues. Thanks Josh, Jason and Adrian for your comments! Josh, I'm curious why you refer to a "pseudo-relational model"? Michael -- "We're lost but we're making good time." - Yogi Berra
Ron, > God bless you, Tiny Tim! I had a similar discussion on [general] > last week regarding arrays. <grin> sure. Although arrays can be atomic and thus properly relational; it just depends on how they are used. Examples: array of ordered molicules making up a genome: atomic & relational array of points making up a graph: atomic & relational array of languages spoken by a salesperson: non-atomic, non-relational array of products offered by a supplier: non-atomic, non-relational The real question to ask oneself when using an array data type is: "is the data I am representing an ordered set which does not have meaning as individual elements, and does not have the same meaning in a different order?" If the answer is yes, then please use an array. If the answer is no, then you should be using a related "child" table instead. -- -Josh Berkus Aglio Database Solutions San Francisco
Michael, > Thanks Josh, Jason and Adrian for your comments! > Josh, I'm curious why you refer to a "pseudo-relational model"? Well, SQL is a very imperfect implementation of relational theory. SQL fails to implement a lot of Codd's relational calculus, such as distributed keys and database constraints, as well as making some set operations (such as "is not a member of" and "is the greatest(col) record of") far more difficult and inefficient than they need to be. For more detail on this, see Fabian Pascal's web page, particularly the articles written by CJ Date: www.dbdebunk.com -- -Josh Berkus Aglio Database Solutions San Francisco
Josh, Even though I'm very much a novice at PostgreSQL, SQL in general, and the full ramifications of the relational model, I've read as much as I can of the writings of Pascal, Date, and Darwen (short of buying some of their books, which I'm hoping to do when I get back to the States during the winter holidays). So my curiosity was piqued when I read these comments of yours: On Friday, Aug 22, 2003, at 10:40 Asia/Tokyo, Josh Berkus wrote: > Well, SQL is a very imperfect implementation of relational theory. > SQL fails > to implement a lot of Codd's relational calculus, such as distributed > keys > and database constraints, as well as making some set operations (such > as "is > not a member of" and "is the greatest(col) record of") far more > difficult and > inefficient than they need to be. From what I've read of PostgreSQL, it's a bit closer to true relational than SQL, or at least you can limit yourself from using some of the non-relational bits of SQL, such as always using NOT NULL in column definitions, and using DISTINCT to eliminate duplicates in results relations. I was wondering if you would share some of techniques you would recommend using in PostgreSQL to make a database 'more relational', or know of any sources on the web that I might find more about this specifically. Also, I often read about denormalization a database for performance reasons. My understanding of this is not that normalization is a problem in and of itself, but that the DBMS hasn't been properly designed to handle highly normalized databases, which I gather has to do with the fact the number of joins tends to increase with normalization. However, I haven't seen much about this with respect to PostgreSQL, or any particular DBMS, for that matter. Discussion seems to always be on the theoretical level, though there should be differences in performance as the engines underlying the different DBMSs are different. I'd be interested in hearing your views on practical normalization of PostgreSQL databases, or alternatively know of places where I might find such discussion. These are probably pretty big questions. I'm interested in anyone's views on this, actually. I've addressed it specifically to Josh only because it was his post that prompted me to ask. Regards, Michael
Michael, > From what I've read of PostgreSQL, it's a bit closer to true relational > than SQL, or at least you can limit yourself from using some of the > non-relational bits of SQL, such as always using NOT NULL in column > definitions, and using DISTINCT to eliminate duplicates in results > relations. I was wondering if you would share some of techniques you > would recommend using in PostgreSQL to make a database 'more > relational', or know of any sources on the web that I might find more > about this specifically. Well, if you've already read Date, Darwin & Pascal, I think you've pretty much covered things. If you don't have it already, Pascal's "Practical Issues in Database Management" is a good place to start; for one thing, it's the clearest description of the Normal Forms I've seen. Unfortunately, a lot of the relational functionality that SQL (and PostgreSQL) is missing can only be awkwardly implemented using PostgreSQL Triggers, Views and Rules. More is coming, though; when we have fully-functional statement-level triggers in 7.5, and when/if Neil finishes implementing an updatable view default, implementing distributed keys and set-based keys will become a lot more possible. As much as it is consistent with Codd, though, I don't recommend using DISTINCT with everything; it's a performance-killer. Neil and I were just hashing this out on IRC. Another tip: beware of over-reliance on surrogate autoincrement keys. They are convenient and necessary some of the time, but use real keys where you can. The current industry reliance on "ID" primary keys encourages sloppy thinking .... and sloppy schema ... by DBAs. I have caught myself building tables without any real keys this way. For that matter, the term "Primary Key" is inherently meaningless anyway. Don't get attached to it. Maybe more later if I feel like it .... more likely, you've inspired me to a magazine article. > Also, I often read about denormalization a database for performance > reasons. My understanding of this is not that normalization is a > problem in and of itself, but that the DBMS hasn't been properly > designed to handle highly normalized databases, which I gather has to > do with the fact the number of joins tends to increase with > normalization. However, I haven't seen much about this with respect to > PostgreSQL, or any particular DBMS, for that matter. Discussion seems > to always be on the theoretical level, though there should be > differences in performance as the engines underlying the different > DBMSs are different. I'd be interested in hearing your views on > practical normalization of PostgreSQL databases, or alternatively know > of places where I might find such discussion. My attitude? Construct your database according to strict normal principles, and then see if you have a performance problem. IMNSHO, 90% of the people advocating denormalization haven't really tested; they are either making assumptions based on outdated knowledge, or using "performance" as an excuse for bad database design. My personal limit of denormalization stops at a few NULL columns and using cache tables to hold copies of views which are too slow. Want to discuss this further? Take it to the PGSQL-SQL list and/or the #postgresql channel on irc.freenode.net. -- -Josh Berkus Aglio Database Solutions San Francisco
Josh, > If you don't have it already, Pascal's "Practical Issues in > Database Management" is a good place to start; Thanks for your comments. Looks like I've got another book to add to my reading list! On Wednesday, Oct 22, 2003, at 05:10 Asia/Tokyo, Josh Berkus wrote: > Maybe more later if I feel like it .... more likely, you've inspired > me to a > magazine article. I'd be grateful to read it. > Want to discuss this further? Take it to the PGSQL-SQL list and/or the > #postgresql channel on irc.freenode.net. I'll start lurking. Thanks again! Michael
Surrogate vs natural keys (Was: Almost relational PostgreSQL (was: one-to-one))
From
Antonios Christofides
Date:
Josh Berkus wrote: > Another tip: beware of over-reliance on surrogate autoincrement > keys. They are convenient and necessary some of the time, but use > real keys where you can. The current industry reliance on "ID" > primary keys encourages sloppy thinking .... and sloppy schema ... by > DBAs. I have caught myself building tables without any real keys > this way. Speaking of this, I'm finishing the design of a database and I tried to follow this rule, but there were very few cases where I could. For example, I have a table with device types, the key to which could be the composite (manufacturer [a numeric id], modelname [a string]). I didn't want to use that, however, because the user might type the wrong modelname and later want to correct it. Not to mention that I've frequently seen a different model name on the cover and on the label of some machines. I decided to use natural primary keys only for four entity types (I have 21 in total): languages, countries, states of the US, and mime types (examples of keys are, respectively, "en", "UK", "CA", "image/jpeg"). You think that 4 out of 21 is a good score? Should I allow primary key changes? Since the RDBMS does not support cascade updates, I'd need to write triggers to do the job, and I would also have to make the constraints deferred. And I doubt it would apply to more than one or two entity types.
Re: Surrogate vs natural keys (Was: Almost relational PostgreSQL (was: one-to-one))
From
Josh Berkus
Date:
Antonios, > Speaking of this, I'm finishing the design of a database and I tried to > follow this rule, but there were very few cases where I could. For > example, I have a table with device types, the key to which could be the > composite (manufacturer [a numeric id], modelname [a string]). I didn't > want to use that, however, because the user might type the wrong > modelname and later want to correct it. Not to mention that I've > frequently seen a different model name on the cover and on the label of > some machines. Yes, this is exactly the problem with trying to use real keys consistently. > I decided to use natural primary keys only for four entity types (I have > 21 in total): languages, countries, states of the US, and > mime types (examples of keys are, respectively, "en", "UK", "CA", > "image/jpeg"). All reference lists. This is frequently how it goes. > You think that 4 out of 21 is a good score? Should I allow primary key > changes? It's not about scores, and most of use are forced to use autonumber surrogate keys a lot of the time to simplify database design. The issue is to beware of the "surrogate key trap" where you grow so used to autonumber keys that you create a table with no *real* keys. The problem with tables with no real keys is that it's impossible to detect duplicates .... Also, keep in mind that the term "primary key" is inherently meaningless, and exists only as a convenience for legacy database developers and database guis. Within the world of relational design ... and relational calculus ... there are only keys, which are all "equal". > Since the RDBMS does not support cascade updates, We don't? That's news to me. -- -Josh Berkus Aglio Database Solutions San Francisco