Re: Column order in multi column primary key - Mailing list pgsql-general

From Craig Boucher
Subject Re: Column order in multi column primary key
Date
Msg-id 0e2d01d1f1c8$9225a970$b670fc50$@wesvic.com
Whole thread Raw
In response to Re: Column order in multi column primary key  (Kevin Grittner <kgrittn@gmail.com>)
Responses Re: Column order in multi column primary key
List pgsql-general
Thanks Kevin for your response.  I've Googled and debated natural vs surrogate keys and I just find surrogate keys
easierto work with (maybe I'm just being lazy).  It just seems that a description or name is most often the natural
key. I just can't see, In my case, using a department description as part of the primary key in the department table
andhaving it repeated in millions of rows.  Though I always look for ways to use natural keys where they work well. 

Thanks,
Craig

-----Original Message-----
From: Kevin Grittner [mailto:kgrittn@gmail.com]
Sent: Monday, August 8, 2016 2:44 PM
To: Craig Boucher <craig@wesvic.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; David G. Johnston <david.g.johnston@gmail.com>; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Column order in multi column primary key

On Mon, Aug 8, 2016 at 4:01 PM, Craig Boucher <craig@wesvic.com> wrote:
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

>> I'm pretty skeptical of the notion of redefining what your PK is on
>> performance grounds.  With this definition, you'd allow two entries
>> with the same work_session_id, if they chanced to have different
>> customer_ids.  Is that really OK?

> It could actually be beneficial if we need to migrate a customer from
> one database to another because wouldn't have to worry about pk
> constraint violations.

Isn't "wouldn't have to worry about pk constraint violations"
another way of saying "we're OK with allowing the same logical row to be present multiple times in the table with
inconsistentdata among the various copies"?  You may want to read up on data normalization and shoot for 3rd normal
formand its benefits. 

The short version you can use as a memory device when mentally testing whether a key is write is that every non-key
columnshould be functionally dependent on the key, the whole key, and nothing but the key ("so help you Codd").  That
meansthat: 
 (1) given the key values you can determine the values of all other columns (1st normal form),
 (2) if you omit any of the key columns from the key you cannot determine the values of all other columns (2nd normal
form),and 
 (3) no column's value may be determined from non-key columns, and therefore only determined from the key indirectly
(3rdnormal form). 

You are talking about degrading your normalization to 1st normal form.  There is a wealth of literature on the problems
thatcan introduce.  What would be in line with well-established theory and practice is looking for a "natural key" in
eachtable -- some combination of columns which naturally occur in the table which uniquely identify the rows.  In some
casesit is necessary to add some "tie-breaker" column to the end of the key when you do this -- like a sequence within
thegroup or a timestamp. 

If you search the archives you will find periodic discussions of the relative merits of this approach versus adding a
meaninglesssynthetic key (often called "id" in every table) to use by itself as the primary key.  This is essentially
thesame as adding "pointers" among the various records and constraining how "navigation" among tables can happen.  It
hasa big performance downside in generating statistics and large reports because it requires a lot of pointer chasing.
Proponentswill point out how convenient it is to be able to change human-visible identifying values, potentially on a
verylarge scale, by modifying one column of one row.  That is, of course, a double-edged sword -- in discussing design
alternativeswith the CPAs who were going to be auditing financial data stored in a database, they didn't tend to see
thatas nearly as much of a plus as some programmers do. 

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-general by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: select array_remove(ARRAY[NULL,NULL,NULL],NULL); returns {} instead of {NULL,NULL,NULL}
Next
From: Tom Lane
Date:
Subject: Re: select array_remove(ARRAY[NULL,NULL,NULL],NULL); returns {} instead of {NULL,NULL,NULL}