Thread: primary key and existing unique fields
Hi all, I am wandering about the pros and cons of creating a separate serial field for a primary key when I already have a single unique field. This existing unique field will have to be a character of fixed length (VARCHAR(12)) because although it's a numeric value there will be leading zeroes. There are a couple more tables with similar unique fields and one of them would need to reference the others. Does anybody see any good reason for adding a separate autoincrement primary key field for each table? or either way is not a big deal. Sally _________________________________________________________________ Don�t just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/
Since you already have the unique field I see no point in adding a sequence to the table, unless of course the sequence of the data inserts is of importance at some point.
Duane
-----Original Message-----
From: Sally Sally [mailto:dedeb17@hotmail.com]
Sent: Tuesday, October 26, 2004 9:25 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] primary key and existing unique fields
Hi all,
I am wandering about the pros and cons of creating a separate serial field
for a primary key when I already have a single unique field. This existing
unique field will have to be a character of fixed length (VARCHAR(12))
because although it's a numeric value there will be leading zeroes. There
are a couple more tables with similar unique fields and one of them would
need to reference the others. Does anybody see any good reason for adding a
separate autoincrement primary key field for each table? or either way is
not a big deal.
Sally
_________________________________________________________________
Don't just search. Find. Check out the new MSN Search!
http://search.msn.click-url.com/go/onm00200636ave/direct/01/
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
Sally Sally wrote: > Hi all, > I am wandering about the pros and cons of creating a separate serial > field for a primary key when I already have a single unique field. This > existing unique field will have to be a character of fixed length > (VARCHAR(12)) because although it's a numeric value there will be > leading zeroes. There are a couple more tables with similar unique > fields and one of them would need to reference the others. Does anybody > see any good reason for adding a separate autoincrement primary key > field for each table? or either way is not a big deal. Your primary key should not be directly related to the data being stored. Outside of the fact that it is the primary reference or the row. Sincerely, Joshua D. Drake > Sally > > _________________________________________________________________ > Don’t just search. Find. Check out the new MSN Search! > http://search.msn.click-url.com/go/onm00200636ave/direct/01/ > > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend -- Command Prompt, Inc., home of PostgreSQL Replication, and plPHP. Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
Attachment
Can you please elaborate on the point you just made as to why the primary key should not relate to the data (even for a case when there is an existing unique field that can be used to identify the record) >From: "Joshua D. Drake" <jd@commandprompt.com> >To: Sally Sally <dedeb17@hotmail.com> >CC: pgsql-general@postgresql.org >Subject: Re: [GENERAL] primary key and existing unique fields >Date: Tue, 26 Oct 2004 09:48:50 -0700 > >Sally Sally wrote: >>Hi all, >>I am wandering about the pros and cons of creating a separate serial field >>for a primary key when I already have a single unique field. This existing >>unique field will have to be a character of fixed length (VARCHAR(12)) >>because although it's a numeric value there will be leading zeroes. There >>are a couple more tables with similar unique fields and one of them would >>need to reference the others. Does anybody see any good reason for adding >>a separate autoincrement primary key field for each table? or either way >>is not a big deal. > >Your primary key should not be directly related to the data being stored. >Outside of the fact that it is the primary reference or the row. > >Sincerely, > >Joshua D. Drake > > > >>Sally >> >>_________________________________________________________________ >>Dont just search. Find. Check out the new MSN Search! >>http://search.msn.click-url.com/go/onm00200636ave/direct/01/ >> >> >>---------------------------(end of broadcast)--------------------------- >>TIP 8: explain analyze is your friend > > >-- >Command Prompt, Inc., home of PostgreSQL Replication, and plPHP. >Postgresql support, programming shared hosting and dedicated hosting. >+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com >Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL ><< jd.vcf >> > >---------------------------(end of broadcast)--------------------------- >TIP 6: Have you searched our list archives? > > http://archives.postgresql.org _________________________________________________________________ Check out Election 2004 for up-to-date election news, plus voter tools and more! http://special.msn.com/msn/election2004.armx
Sally Sally wrote: > Can you please elaborate on the point you just made as to why the > primary key should not relate to the data (even for a case when there is > an existing unique field that can be used to identify the record) > Here is a good article on the topic: http://www.devx.com/ibm/Article/20702 Sincerely, Joshua D. Drake -- Command Prompt, Inc., home of PostgreSQL Replication, and plPHP. Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
Attachment
Joshua D. Drake wrote: > Sally Sally wrote: > >> Can you please elaborate on the point you just made as to why the >> primary key should not relate to the data (even for a case when there >> is an existing unique field that can be used to identify the record) >> > > Here is a good article on the topic: > > http://www.devx.com/ibm/Article/20702 That article makes me want to vomit uncontrollably! ;-) "Business data might also simply be bad -- glitches in the Social Security Administration's system may lead to different persons having the same Social Security Number. A surrogate key helps to isolate the system from such problems." The surrogate key isn't solving the underlying logical inconsistency problem. It is being used as a work-around to cover one up. I suspect the author of being a MySQL user. Mike Mascari
>> Here is a good article on the topic: >> >> http://www.devx.com/ibm/Article/20702 > > The surrogate key isn't solving the underlying logical inconsistency > problem. It is being used as a work-around to cover one up. I suspect > the author of being a MySQL user. Actually he is a software project consultant for IEEE.org and he holds a Ph.D. in Theoretical Physics. Sincerely, Joshua D. Drake > > Mike Mascari > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend -- Command Prompt, Inc., home of PostgreSQL Replication, and plPHP. Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
Attachment
Joshua D. Drake wrote: > >>> Here is a good article on the topic: >>> >>> http://www.devx.com/ibm/Article/20702 >> >> >> The surrogate key isn't solving the underlying logical inconsistency >> problem. It is being used as a work-around to cover one up. I suspect >> the author of being a MySQL user. > > Actually he is a software project consultant for IEEE.org and he holds > a Ph.D. in Theoretical Physics. <joking> Apparently gamma functions and string theory have little to do with understanding the relational model of data. </joking> Seriously, my only point was that Date & Darwen and other relational purists do not use surrogate keys. The surrogate-key vs. speaking-key debate devolves quickly. I'd only argue that it is wrong as the author of the article implied that the speaking-key side of the debate is without merit. He begins: "For the purpose of data modeling, the plumbing should be largely transparent. In fact, purist DB lore makes no distinction between data and plumbing. However, you will see that it is more efficient for administration and maintenance, as well as in terms of runtime performance, to have some additional fields to serve as DB keys." So he dismisses the speaking-key argument in one sentence as "purist DB lore." He then proceeds with a poor example: "The requirements for a primary key are very strict. It must: Exist Be unique Not change over time Surrogate keys help to mitigate the fact that real business data never reliably fulfills these requirements. Not every person has a Social Security Number (think of those outside the U.S.), people change their names, and other important information." 1. The reason we have ON UPDATE CASCADE is to handle changes in primary keys. 2. If not everyone has a social security number than the design should be sufficiently normalized to reflect that fact. I am not saying that Chris Date and Hugh Darwen are right and that Philipp Janert is wrong. I am only saying that both sides should be investigated and judged on the weight of their arguments. Personally, I've found over time that when I deviate from "purist DB lore" I get punished in long run. Could be wrong, though. :-) Mike Mascari
On Tue, 26 Oct 2004 16:24:44 +0000, Sally Sally <dedeb17@hotmail.com> wrote: > Hi all, > I am wandering about the pros and cons of creating a separate serial field > for a primary key when I already have a single unique field. This existing > unique field will have to be a character of fixed length (VARCHAR(12)) > because although it's a numeric value there will be leading zeroes. There > are a couple more tables with similar unique fields and one of them would > need to reference the others. Does anybody see any good reason for adding a > separate autoincrement primary key field for each table? or either way is > not a big deal. I see three possible advantages: 1. having varchar(12) in every referencing table, takes more storage space, which may mean something if you have tons of gigabytes of rows. ;) 2. if any of your varchar(12) row's data is likely to change in future, you'll make update of one table, not an update which will CASCADE over many tables. 3. If you'll need things like "last 50 keys", you can SELECT * FROM foo ORDER BY yourserialkey DESC LIMIT 50; And a disadvantage: if you'll need to access the data by your varchar(12) key, you'll need to perform JOIN on two tables. If you used varchar(12) as your key, you don't. :) Regards, Dawid
Look at the database design in terms of data retrieval. If I add a sequence number as my primary key, when I get ready to retrieve that record "directly" how do I know what that sequence number is. For instance, my employee number is 123456789, and it is unique within my company and my sequence number is 375. I will more likely know the employee number to query than I will the sequence number. Sure saves time in data access. I know, you can always create a unique index on the employee number as well as a primary index on the sequence number but WHY would I want to take up room for a field in the record as well as the useless index space for no purpose.
Codd said "the key, the whole key and nothing but the key." In database design we cannot always do this for query performance but why add something to a record that will serve no real purpose. Keys are updateable, hopefully they are not changed all that often but the ability should be there. If they are not updateable then the database engine is not one I would choose for my application.
Duane
-----Original Message-----
From: Mike Mascari [mailto:mascarm@mascari.com]
Sent: Tuesday, October 26, 2004 2:26 PM
To: Joshua D. Drake
Cc: Sally Sally; pgsql-general@postgresql.org
Subject: Re: [GENERAL] primary key and existing unique fields
Joshua D. Drake wrote:
>
>>> Here is a good article on the topic:
>>>
>>> http://www.devx.com/ibm/Article/20702
>>
>>
>> The surrogate key isn't solving the underlying logical inconsistency
>> problem. It is being used as a work-around to cover one up. I suspect
>> the author of being a MySQL user.
>
> Actually he is a software project consultant for IEEE.org and he holds
> a Ph.D. in Theoretical Physics.
<joking>
Apparently gamma functions and string theory have little to do with
understanding the relational model of data.
</joking>
Seriously, my only point was that Date & Darwen and other relational
purists do not use surrogate keys. The surrogate-key vs. speaking-key
debate devolves quickly. I'd only argue that it is wrong as the author
of the article implied that the speaking-key side of the debate is
without merit.
He begins:
"For the purpose of data modeling, the plumbing should be largely
transparent. In fact, purist DB lore makes no distinction between data
and plumbing. However, you will see that it is more efficient for
administration and maintenance, as well as in terms of runtime
performance, to have some additional fields to serve as DB keys."
So he dismisses the speaking-key argument in one sentence as "purist DB
lore." He then proceeds with a poor example:
"The requirements for a primary key are very strict. It must:
Exist
Be unique
Not change over time
Surrogate keys help to mitigate the fact that real business data never
reliably fulfills these requirements. Not every person has a Social
Security Number (think of those outside the U.S.), people change their
names, and other important information."
1. The reason we have ON UPDATE CASCADE is to handle changes in primary
keys.
2. If not everyone has a social security number than the design should
be sufficiently normalized to reflect that fact.
I am not saying that Chris Date and Hugh Darwen are right and that
Philipp Janert is wrong. I am only saying that both sides should be
investigated and judged on the weight of their arguments.
Personally, I've found over time that when I deviate from "purist DB
lore" I get punished in long run.
Could be wrong, though. :-)
Mike Mascari
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
> That article makes me want to vomit uncontrollably! ;-) > > "Business data might also simply be bad -- glitches in the Social > Security Administration's system may lead to different persons having > the same Social Security Number. A surrogate key helps to isolate the > system from such problems." > > The surrogate key isn't solving the underlying logical inconsistency > problem. It is being used as a work-around to cover one up. I suspect > the author of being a MySQL user. > I think what he's saying is that an application bug, or a business process problem, should not interfere with your database system. Granted, two identical SSNs seems far fetched. However, if your business screws up and you need to change someone's primary key, you've just violated the principle of a primary key. You better be REALLY sure the primary key will NEVER change over time for a given record, and that it really is unique. An SSN might fit that description, but there are always strange situations. What if someone sues to have their SSN changed and a judge orders it? If that's their PK, the social security administration is up a creek (at least in the DB theory world, it probably wouldn't matter much in practice). Regards, Jeff Davis
On Tue, 2004-10-26 at 17:26 -0400, Mike Mascari wrote: > <joking> > > Apparently gamma functions and string theory have little to do with > understanding the relational model of data. > > </joking> mmmmm.. string theory. :-) -- /*************************************** * Robby Russell | Owner.Developer.Geek * PLANET ARGON | www.planetargon.com * Portland, OR | robby@planetargon.com * 503.351.4730 | blog.planetargon.com * PHP/PostgreSQL Hosting & Development ****************************************/
Attachment
On Tue, 26 Oct 2004 17:05:27 -0700, Robby Russell <robby@planetargon.com> wrote: > On Tue, 2004-10-26 at 17:26 -0400, Mike Mascari wrote: > > <joking> > > > > Apparently gamma functions and string theory have little to do with > > understanding the relational model of data. > > > > </joking> > > mmmmm.. string theory. :-) > Ya you know the theory that states that the Database is really made up of a large amount of strings. Some are even null terminated strings, although most strings really have a quanta that can be found immediate before the string. :-)
On Tue, 2004-10-26 at 22:03 -0500, Kevin Barnard wrote: > On Tue, 26 Oct 2004 17:05:27 -0700, Robby Russell <robby@planetargon.com> wrote: > > On Tue, 2004-10-26 at 17:26 -0400, Mike Mascari wrote: > > > <joking> > > > > > > Apparently gamma functions and string theory have little to do with > > > understanding the relational model of data. > > > > > > </joking> > > > > mmmmm.. string theory. :-) > > > > Ya you know the theory that states that the Database is really made up > of a large amount of strings. Some are even null terminated strings, > although most strings really have a quanta that can be found immediate > before the string. :-) How do we SELECT the string so that we can observe it then? ;-) -- /*************************************** * Robby Russell | Owner.Developer.Geek * PLANET ARGON | www.planetargon.com * Portland, OR | robby@planetargon.com * 503.351.4730 | blog.planetargon.com * PHP/PostgreSQL Hosting & Development ****************************************/
Attachment
-----Original Message----- From: Robby Russell [mailto:robby@planetargon.com] Sent: Tue 10/26/2004 9:08 PM To: Kevin Barnard Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] primary key and existing unique fields On Tue, 2004-10-26 at 22:03 -0500, Kevin Barnard wrote: > On Tue, 26 Oct 2004 17:05:27 -0700, Robby Russell <robby@planetargon.com> wrote: > > On Tue, 2004-10-26 at 17:26 -0400, Mike Mascari wrote: > > > <joking> > > > > > > Apparently gamma functions and string theory have little to do with > > > understanding the relational model of data. > > > > > > </joking> > > > > mmmmm.. string theory. :-) > > > > Ya you know the theory that states that the Database is really made up > of a large amount of strings. Some are even null terminated strings, > although most strings really have a quanta that can be found immediate > before the string. :-) How do we SELECT the string so that we can observe it then? ;-) -- /*************************************** * Robby Russell | Owner.Developer.Geek * PLANET ARGON | www.planetargon.com * Portland, OR | robby@planetargon.com * 503.351.4730 | blog.planetargon.com * PHP/PostgreSQL Hosting & Development ****************************************/ You can't observe it ... only *infer* it.
Sally Sally wrote: > This > existing unique field will have to be a character of fixed length > (VARCHAR(12)) because although it's a numeric value there will be > leading zeroes. Plenty of people are contributing their tuppence-worth regarding the choice of surrogate vs natural primary key. Can I just point out that your existing unique field is EITHER a numeric value OR it has a fixed number of characters - numbers don't have leading zeros. If what you have is a number, then perhaps consider int8/numeric types and format appropriately when you display the values. -- Richard Huxton Archonet Ltd
On Wed, Oct 27, 2004 at 00:10:27 +0200, Dawid Kuroczko <qnex42@gmail.com> wrote: > 3. If you'll need things like "last 50 keys", you can SELECT * FROM > foo ORDER BY yourserialkey DESC LIMIT 50; You really shouldn't be doing that if you are using sequences to generate the key. Sequences are just guarenteed to return unique values, not to return them in order. Because groups of sequences can be allocated to a backend at once depending on a setting settable by a client, you can get assignments out of order. Also for overlapping transactions what the application means by the last 50 entries may not match what you get when you get the 50 highest serial values.
Dawid, I am interested in the first point you made that: having varchar(12) in every referencing table, takes more storage space. The thing is though, if I have a serial primary key then it would be an additional column. Or you are saying the space taken by a VARCHAR(12) field is more than two INT fields? ( or is it the fact that when it is referenced it will appear several times?) I guess the reason I am resisting the idea of an additional primary key field is to avoid the additional lookup in some queries. Perhaps it's a minor almost irrelevant performance factor. Thanks Sally _________________________________________________________________ Is your PC infected? Get a FREE online computer virus scan from McAfee� Security. http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963
On Thu, Oct 28, 2004 at 14:31:32 +0000, Sally Sally <dedeb17@hotmail.com> wrote: > Dawid, > I am interested in the first point you made that: > having varchar(12) in every referencing table, takes more storage > space. > The thing is though, if I have a serial primary key then it would be an > additional column. Or you are saying the space taken by a VARCHAR(12) field > is more than two INT fields? ( or is it the fact that when it is referenced > it will appear several times?) I guess the reason I am resisting the idea > of an additional primary key field is to avoid the additional lookup in > some queries. Perhaps it's a minor almost irrelevant performance factor. I think it is better to worry about what is going to make it easiest to have clean data and to support future changes than worry about performance. Over the long run hardware is cheaper than people.
I think the same too but sometimes it seems in the real world performance is given more value than a properly designed db. Or the long term flexiblity is not taken into account given the short term requirements. regards Sally >From: Bruno Wolff III <bruno@wolff.to> >To: Sally Sally <dedeb17@hotmail.com> >CC: pgsql-general@postgresql.org, qnex42@gmail.com >Subject: Re: [GENERAL] primary key and existing unique fields >Date: Thu, 28 Oct 2004 12:44:00 -0500 > >On Thu, Oct 28, 2004 at 14:31:32 +0000, > Sally Sally <dedeb17@hotmail.com> wrote: > > Dawid, > > I am interested in the first point you made that: > > having varchar(12) in every referencing table, takes more storage > > space. > > The thing is though, if I have a serial primary key then it would be an > > additional column. Or you are saying the space taken by a VARCHAR(12) >field > > is more than two INT fields? ( or is it the fact that when it is >referenced > > it will appear several times?) I guess the reason I am resisting the >idea > > of an additional primary key field is to avoid the additional lookup in > > some queries. Perhaps it's a minor almost irrelevant performance factor. > >I think it is better to worry about what is going to make it easiest to >have clean data and to support future changes than worry about performance. >Over the long run hardware is cheaper than people. > >---------------------------(end of broadcast)--------------------------- >TIP 8: explain analyze is your friend _________________________________________________________________ Check out Election 2004 for up-to-date election news, plus voter tools and more! http://special.msn.com/msn/election2004.armx