Thread: primary key and existing unique fields

primary key and existing unique fields

From
"Sally Sally"
Date:
Hi all,
I am wandering about the pros and cons of creating a separate serial field
for a primary key when I already have a single unique field. This existing
unique field will have to be a character of fixed length (VARCHAR(12))
because although it's a numeric value there will be leading zeroes. There
are a couple more tables with similar unique fields and one of them would
need to reference the others. Does anybody see any good reason for adding a
separate autoincrement primary key field for each table? or either way is
not a big deal.
Sally

_________________________________________________________________
Don�t just search. Find. Check out the new MSN Search!
http://search.msn.click-url.com/go/onm00200636ave/direct/01/


Re: primary key and existing unique fields

From
Duane Lee - EGOVX
Date:

Since you already have the unique field I see no point in adding a sequence to the table, unless of course the sequence of the data inserts is of importance at some point.

Duane

-----Original Message-----
From: Sally Sally [mailto:dedeb17@hotmail.com]
Sent: Tuesday, October 26, 2004 9:25 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] primary key and existing unique fields

Hi all,
I am wandering about the pros and cons of creating a separate serial field
for a primary key when I already have a single unique field. This existing
unique field will have to be a character of fixed length (VARCHAR(12))
because although it's a numeric value there will be leading zeroes. There
are a couple more tables with similar unique fields and one of them would
need to reference the others. Does anybody see any good reason for adding a
separate autoincrement primary key field for each table? or either way is
not a big deal.
Sally

_________________________________________________________________
Don't just search. Find. Check out the new MSN Search!
http://search.msn.click-url.com/go/onm00200636ave/direct/01/

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Re: primary key and existing unique fields

From
"Joshua D. Drake"
Date:
Sally Sally wrote:
> Hi all,
> I am wandering about the pros and cons of creating a separate serial
> field for a primary key when I already have a single unique field. This
> existing unique field will have to be a character of fixed length
> (VARCHAR(12)) because although it's a numeric value there will be
> leading zeroes. There are a couple more tables with similar unique
> fields and one of them would need to reference the others. Does anybody
> see any good reason for adding a separate autoincrement primary key
> field for each table? or either way is not a big deal.

Your primary key should not be directly related to the data being
stored. Outside of the fact that it is the primary reference or the row.

Sincerely,

Joshua D. Drake



> Sally
>
> _________________________________________________________________
> Don’t just search. Find. Check out the new MSN Search!
> http://search.msn.click-url.com/go/onm00200636ave/direct/01/
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend


--
Command Prompt, Inc., home of PostgreSQL Replication, and plPHP.
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL

Attachment

Re: primary key and existing unique fields

From
"Sally Sally"
Date:
Can you please elaborate on the point you just made as to why the primary
key should not relate to the data (even for a case when there is an existing
unique field that can be used to identify the record)

>From: "Joshua D. Drake" <jd@commandprompt.com>
>To: Sally Sally <dedeb17@hotmail.com>
>CC: pgsql-general@postgresql.org
>Subject: Re: [GENERAL] primary key and existing unique fields
>Date: Tue, 26 Oct 2004 09:48:50 -0700
>
>Sally Sally wrote:
>>Hi all,
>>I am wandering about the pros and cons of creating a separate serial field
>>for a primary key when I already have a single unique field. This existing
>>unique field will have to be a character of fixed length (VARCHAR(12))
>>because although it's a numeric value there will be leading zeroes. There
>>are a couple more tables with similar unique fields and one of them would
>>need to reference the others. Does anybody see any good reason for adding
>>a separate autoincrement primary key field for each table? or either way
>>is not a big deal.
>
>Your primary key should not be directly related to the data being stored.
>Outside of the fact that it is the primary reference or the row.
>
>Sincerely,
>
>Joshua D. Drake
>
>
>
>>Sally
>>
>>_________________________________________________________________
>>Dont just search. Find. Check out the new MSN Search!
>>http://search.msn.click-url.com/go/onm00200636ave/direct/01/
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 8: explain analyze is your friend
>
>
>--
>Command Prompt, Inc., home of PostgreSQL Replication, and plPHP.
>Postgresql support, programming shared hosting and dedicated hosting.
>+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
>Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
><< jd.vcf >>
>
>---------------------------(end of broadcast)---------------------------
>TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org

_________________________________________________________________
Check out Election 2004 for up-to-date election news, plus voter tools and
more! http://special.msn.com/msn/election2004.armx


Re: primary key and existing unique fields

From
"Joshua D. Drake"
Date:
Sally Sally wrote:
> Can you please elaborate on the point you just made as to why the
> primary key should not relate to the data (even for a case when there is
> an existing unique field that can be used to identify the record)
>

Here is a good article on the topic:

http://www.devx.com/ibm/Article/20702


Sincerely,

Joshua D. Drake



--
Command Prompt, Inc., home of PostgreSQL Replication, and plPHP.
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL

Attachment

Re: primary key and existing unique fields

From
Mike Mascari
Date:
Joshua D. Drake wrote:
> Sally Sally wrote:
>
>> Can you please elaborate on the point you just made as to why the
>> primary key should not relate to the data (even for a case when there
>> is an existing unique field that can be used to identify the record)
>>
>
> Here is a good article on the topic:
>
> http://www.devx.com/ibm/Article/20702

That article makes me want to vomit uncontrollably! ;-)

"Business data might also simply be bad -- glitches in the Social
Security Administration's system may lead to different persons having
the same Social Security Number. A surrogate key helps to isolate the
system from such problems."

The surrogate key isn't solving the underlying logical inconsistency
problem. It is being used as a work-around to cover one up. I suspect
the author of being a MySQL user.

Mike Mascari

Re: primary key and existing unique fields

From
"Joshua D. Drake"
Date:
>> Here is a good article on the topic:
>>
>> http://www.devx.com/ibm/Article/20702
>
> The surrogate key isn't solving the underlying logical inconsistency
> problem. It is being used as a work-around to cover one up. I suspect
> the author of being a MySQL user.

Actually he is a software project consultant for IEEE.org and he holds
a Ph.D. in Theoretical Physics.

Sincerely,

Joshua D. Drake


>
> Mike Mascari
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend


--
Command Prompt, Inc., home of PostgreSQL Replication, and plPHP.
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL

Attachment

Re: primary key and existing unique fields

From
Mike Mascari
Date:
Joshua D. Drake wrote:
>
>>> Here is a good article on the topic:
>>>
>>> http://www.devx.com/ibm/Article/20702
>>
>>
>> The surrogate key isn't solving the underlying logical inconsistency
>> problem. It is being used as a work-around to cover one up. I suspect
>> the author of being a MySQL user.
>
> Actually he is a software project consultant for IEEE.org and he holds
> a Ph.D. in Theoretical Physics.

<joking>

Apparently gamma functions and string theory have little to do with
understanding the relational model of data.

</joking>

Seriously, my only point was that Date & Darwen and other relational
purists do not use surrogate keys. The surrogate-key vs. speaking-key
debate devolves quickly. I'd only argue that it is wrong as the author
of the article implied that the speaking-key side of the debate is
without merit.

He begins:

"For the purpose of data modeling, the plumbing should be largely
transparent. In fact, purist DB lore makes no distinction between data
and plumbing. However, you will see that it is more efficient for
administration and maintenance, as well as in terms of runtime
performance, to have some additional fields to serve as DB keys."

So he dismisses the speaking-key argument in one sentence as "purist DB
lore." He then proceeds with a poor example:

"The requirements for a primary key are very strict. It must:

Exist
Be unique
Not change over time
Surrogate keys help to mitigate the fact that real business data never
reliably fulfills these requirements. Not every person has a Social
Security Number (think of those outside the U.S.), people change their
names, and other important information."

1. The reason we have ON UPDATE CASCADE is to handle changes in primary
keys.

2. If not everyone has a social security number than the design should
be sufficiently normalized to reflect that fact.

I am not saying that Chris Date and Hugh Darwen are right and that
Philipp Janert is wrong. I am only saying that both sides should be
investigated and judged on the weight of their arguments.

Personally, I've found over time that when I deviate from "purist DB
lore" I get punished in long run.

Could be wrong, though. :-)

Mike Mascari








Re: primary key and existing unique fields

From
Dawid Kuroczko
Date:
On Tue, 26 Oct 2004 16:24:44 +0000, Sally Sally <dedeb17@hotmail.com> wrote:
> Hi all,
> I am wandering about the pros and cons of creating a separate serial field
> for a primary key when I already have a single unique field. This existing
> unique field will have to be a character of fixed length (VARCHAR(12))
> because although it's a numeric value there will be leading zeroes. There
> are a couple more tables with similar unique fields and one of them would
> need to reference the others. Does anybody see any good reason for adding a
> separate autoincrement primary key field for each table? or either way is
> not a big deal.

I see three possible advantages:

1. having varchar(12) in every referencing table, takes more storage
space, which may mean something if you have tons of gigabytes of rows.
;)
2. if any of your varchar(12) row's data is likely to change in
future, you'll make update of one table, not an update which will
CASCADE over many tables.
3. If you'll need things like "last 50 keys", you can SELECT * FROM
foo ORDER BY yourserialkey DESC LIMIT 50;

And a disadvantage:
if you'll need to access the data by your varchar(12) key, you'll need
to perform JOIN on two tables.  If you used varchar(12) as your key,
you don't. :)

   Regards,
      Dawid

Re: primary key and existing unique fields

From
Duane Lee - EGOVX
Date:

Look at the database design in terms of data retrieval.  If I add a sequence number as my primary key, when I get ready to retrieve that record "directly" how do I know what that sequence number is.  For instance, my employee number is 123456789, and it is unique within my company and my sequence number is 375.  I will more likely know the employee number to query than I will the sequence number.  Sure saves time in data access.  I know, you can always create a unique index on the employee number as well as a primary index on the sequence number but WHY would I want to take up room for a field in the record as well as the useless index space for no purpose.

Codd said "the key, the whole key and nothing but the key."  In database design we cannot always do this for query performance but why add something to a record that will serve no real purpose.  Keys are updateable, hopefully they are not changed all that often but the ability should be there.  If they are not updateable then the database engine is not one I would choose for my application.

Duane
-----Original Message-----
From: Mike Mascari [mailto:mascarm@mascari.com]
Sent: Tuesday, October 26, 2004 2:26 PM
To: Joshua D. Drake
Cc: Sally Sally; pgsql-general@postgresql.org
Subject: Re: [GENERAL] primary key and existing unique fields

Joshua D. Drake wrote:
>
>>> Here is a good article on the topic:
>>>
>>> http://www.devx.com/ibm/Article/20702
>>
>>
>> The surrogate key isn't solving the underlying logical inconsistency
>> problem. It is being used as a work-around to cover one up. I suspect
>> the author of being a MySQL user.
>
> Actually he is a software project consultant for IEEE.org and he holds
> a Ph.D. in Theoretical Physics.

<joking>

Apparently gamma functions and string theory have little to do with
understanding the relational model of data.

</joking>

Seriously, my only point was that Date & Darwen and other relational
purists do not use surrogate keys. The surrogate-key vs. speaking-key
debate devolves quickly. I'd only argue that it is wrong as the author
of the article implied that the speaking-key side of the debate is
without merit.

He begins:

"For the purpose of data modeling, the plumbing should be largely
transparent. In fact, purist DB lore makes no distinction between data
and plumbing. However, you will see that it is more efficient for
administration and maintenance, as well as in terms of runtime
performance, to have some additional fields to serve as DB keys."

So he dismisses the speaking-key argument in one sentence as "purist DB
lore." He then proceeds with a poor example:

"The requirements for a primary key are very strict. It must:

Exist
Be unique
Not change over time
Surrogate keys help to mitigate the fact that real business data never
reliably fulfills these requirements. Not every person has a Social
Security Number (think of those outside the U.S.), people change their
names, and other important information."

1. The reason we have ON UPDATE CASCADE is to handle changes in primary
keys.

2. If not everyone has a social security number than the design should
be sufficiently normalized to reflect that fact.

I am not saying that Chris Date and Hugh Darwen are right and that
Philipp Janert is wrong. I am only saying that both sides should be
investigated and judged on the weight of their arguments.

Personally, I've found over time that when I deviate from "purist DB
lore" I get punished in long run.

Could be wrong, though. :-)

Mike Mascari






---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@postgresql.org so that your
      message can get through to the mailing list cleanly

Re: primary key and existing unique fields

From
Jeff Davis
Date:
> That article makes me want to vomit uncontrollably! ;-)
>
> "Business data might also simply be bad -- glitches in the Social
> Security Administration's system may lead to different persons having
> the same Social Security Number. A surrogate key helps to isolate the
> system from such problems."
>
> The surrogate key isn't solving the underlying logical inconsistency
> problem. It is being used as a work-around to cover one up. I suspect
> the author of being a MySQL user.
>

I think what he's saying is that an application bug, or a business
process problem, should not interfere with your database system.
Granted, two identical SSNs seems far fetched. However, if your business
screws up and you need to change someone's primary key, you've just
violated the principle of a primary key. You better be REALLY sure the
primary key will NEVER change over time for a given record, and that it
really is unique.

An SSN might fit that description, but there are always strange
situations. What if someone sues to have their SSN changed and a judge
orders it? If that's their PK, the social security administration is up
a creek (at least in the DB theory world, it probably wouldn't matter
much in practice).

Regards,
    Jeff Davis



Re: primary key and existing unique fields

From
Robby Russell
Date:
On Tue, 2004-10-26 at 17:26 -0400, Mike Mascari wrote:
> <joking>
>
> Apparently gamma functions and string theory have little to do with
> understanding the relational model of data.
>
> </joking>

mmmmm.. string theory. :-)



--
/***************************************
* Robby Russell | Owner.Developer.Geek
* PLANET ARGON  | www.planetargon.com
* Portland, OR  | robby@planetargon.com
* 503.351.4730  | blog.planetargon.com
* PHP/PostgreSQL Hosting & Development
****************************************/


Attachment

Re: primary key and existing unique fields

From
Kevin Barnard
Date:
On Tue, 26 Oct 2004 17:05:27 -0700, Robby Russell <robby@planetargon.com> wrote:
> On Tue, 2004-10-26 at 17:26 -0400, Mike Mascari wrote:
> > <joking>
> >
> > Apparently gamma functions and string theory have little to do with
> > understanding the relational model of data.
> >
> > </joking>
>
> mmmmm.. string theory. :-)
>

Ya you know the theory that states that the Database is really made up
of a large amount of strings.  Some are even null terminated strings,
although most strings really have a quanta that can be found immediate
before the string. :-)

Re: primary key and existing unique fields

From
Robby Russell
Date:
On Tue, 2004-10-26 at 22:03 -0500, Kevin Barnard wrote:
> On Tue, 26 Oct 2004 17:05:27 -0700, Robby Russell <robby@planetargon.com> wrote:
> > On Tue, 2004-10-26 at 17:26 -0400, Mike Mascari wrote:
> > > <joking>
> > >
> > > Apparently gamma functions and string theory have little to do with
> > > understanding the relational model of data.
> > >
> > > </joking>
> >
> > mmmmm.. string theory. :-)
> >
>
> Ya you know the theory that states that the Database is really made up
> of a large amount of strings.  Some are even null terminated strings,
> although most strings really have a quanta that can be found immediate
> before the string. :-)

How do we SELECT the string so that we can observe it then? ;-)



--
/***************************************
* Robby Russell | Owner.Developer.Geek
* PLANET ARGON  | www.planetargon.com
* Portland, OR  | robby@planetargon.com
* 503.351.4730  | blog.planetargon.com
* PHP/PostgreSQL Hosting & Development
****************************************/


Attachment

Re: primary key and existing unique fields

From
"Gregory S. Williamson"
Date:
-----Original Message-----
From:    Robby Russell [mailto:robby@planetargon.com]
Sent:    Tue 10/26/2004 9:08 PM
To:    Kevin Barnard
Cc:    pgsql-general@postgresql.org
Subject:    Re: [GENERAL] primary key and existing unique fields
On Tue, 2004-10-26 at 22:03 -0500, Kevin Barnard wrote:
> On Tue, 26 Oct 2004 17:05:27 -0700, Robby Russell <robby@planetargon.com> wrote:
> > On Tue, 2004-10-26 at 17:26 -0400, Mike Mascari wrote:
> > > <joking>
> > >
> > > Apparently gamma functions and string theory have little to do with
> > > understanding the relational model of data.
> > >
> > > </joking>
> >
> > mmmmm.. string theory. :-)
> >
>
> Ya you know the theory that states that the Database is really made up
> of a large amount of strings.  Some are even null terminated strings,
> although most strings really have a quanta that can be found immediate
> before the string. :-)

How do we SELECT the string so that we can observe it then? ;-)



--
/***************************************
* Robby Russell | Owner.Developer.Geek
* PLANET ARGON  | www.planetargon.com
* Portland, OR  | robby@planetargon.com
* 503.351.4730  | blog.planetargon.com
* PHP/PostgreSQL Hosting & Development
****************************************/

You can't observe it ... only *infer* it.



Re: primary key and existing unique fields

From
Richard Huxton
Date:
Sally Sally wrote:
>  This
> existing unique field will have to be a character of fixed length
> (VARCHAR(12)) because although it's a numeric value there will be
> leading zeroes.

Plenty of people are contributing their tuppence-worth regarding the
choice of surrogate vs natural primary key.

Can I just point out that your existing unique field is EITHER a numeric
value OR it has a fixed number of characters - numbers don't have
leading zeros.

If what you have is a number, then perhaps consider int8/numeric types
and format appropriately when you display the values.

--
   Richard Huxton
   Archonet Ltd

Re: primary key and existing unique fields

From
Bruno Wolff III
Date:
On Wed, Oct 27, 2004 at 00:10:27 +0200,
  Dawid Kuroczko <qnex42@gmail.com> wrote:
> 3. If you'll need things like "last 50 keys", you can SELECT * FROM
> foo ORDER BY yourserialkey DESC LIMIT 50;

You really shouldn't be doing that if you are using sequences to generate
the key. Sequences are just guarenteed to return unique values, not to
return them in order. Because groups of sequences can be allocated to
a backend at once depending on a setting settable by a client, you can
get assignments out of order. Also for overlapping transactions what
the application means by the last 50 entries may not match what you
get when you get the 50 highest serial values.

Re: primary key and existing unique fields

From
"Sally Sally"
Date:
Dawid,
I am interested in the first point you made that:
having varchar(12) in every referencing table, takes more storage
space.
The thing is though, if I have a serial primary key then it would be an
additional column. Or you are saying the space taken by a VARCHAR(12) field
is more than two INT fields? ( or is it the fact that when it is referenced
it will appear several times?) I guess the reason I am resisting the idea of
an additional primary key field is to avoid the additional lookup in some
queries. Perhaps it's a minor almost irrelevant performance factor.
Thanks
Sally

_________________________________________________________________
Is your PC infected? Get a FREE online computer virus scan from McAfee�
Security. http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963


Re: primary key and existing unique fields

From
Bruno Wolff III
Date:
On Thu, Oct 28, 2004 at 14:31:32 +0000,
  Sally Sally <dedeb17@hotmail.com> wrote:
> Dawid,
> I am interested in the first point you made that:
> having varchar(12) in every referencing table, takes more storage
> space.
> The thing is though, if I have a serial primary key then it would be an
> additional column. Or you are saying the space taken by a VARCHAR(12) field
> is more than two INT fields? ( or is it the fact that when it is referenced
> it will appear several times?) I guess the reason I am resisting the idea
> of an additional primary key field is to avoid the additional lookup in
> some queries. Perhaps it's a minor almost irrelevant performance factor.

I think it is better to worry about what is going to make it easiest to
have clean data and to support future changes than worry about performance.
Over the long run hardware is cheaper than people.

Re: primary key and existing unique fields

From
"Sally Sally"
Date:
I think the same too but sometimes it seems in the real world performance is
given more value than a properly designed db. Or the long term flexiblity is
not taken into account given the short term requirements.
regards
Sally

>From: Bruno Wolff III <bruno@wolff.to>
>To: Sally Sally <dedeb17@hotmail.com>
>CC: pgsql-general@postgresql.org, qnex42@gmail.com
>Subject: Re: [GENERAL] primary key and existing unique fields
>Date: Thu, 28 Oct 2004 12:44:00 -0500
>
>On Thu, Oct 28, 2004 at 14:31:32 +0000,
>   Sally Sally <dedeb17@hotmail.com> wrote:
> > Dawid,
> > I am interested in the first point you made that:
> > having varchar(12) in every referencing table, takes more storage
> > space.
> > The thing is though, if I have a serial primary key then it would be an
> > additional column. Or you are saying the space taken by a VARCHAR(12)
>field
> > is more than two INT fields? ( or is it the fact that when it is
>referenced
> > it will appear several times?) I guess the reason I am resisting the
>idea
> > of an additional primary key field is to avoid the additional lookup in
> > some queries. Perhaps it's a minor almost irrelevant performance factor.
>
>I think it is better to worry about what is going to make it easiest to
>have clean data and to support future changes than worry about performance.
>Over the long run hardware is cheaper than people.
>
>---------------------------(end of broadcast)---------------------------
>TIP 8: explain analyze is your friend

_________________________________________________________________
Check out Election 2004 for up-to-date election news, plus voter tools and
more! http://special.msn.com/msn/election2004.armx