Thread: primary keys

primary keys

From
Grant Maxwell
Date:
Hi Folks

I'm looking for a bit of advice regarding alpha primary keys.

I have a table (designed by someone else) that has a numeric primary
key and also a unique non-null email address field.

The use of the primary key is causing me some headaches in that in
multiple database server environment each server allocates a unique
number from a range
(and that works fine) but when the table is replicated (master-master-
master) the exception handling is a bit tricky because each database
server may have
records that are duplicate at the email address field - with a
different primary key number.

I don't know why it was done this way but it seems to me that the
email addresses are unique, non null and could be used as the primary
key. This would make the
replication much faster and simpler.

Does anyone out there think the change (number to email address as
primary key) would be a bad idea ?

your thoughts would be appreciated.

regards
Grant

Re: primary keys

From
Merlin Moncure
Date:
On Sat, Sep 12, 2009 at 12:35 PM, Grant Maxwell
<grant.maxwell@maxan.com.au> wrote:
> Hi Folks
>
> I'm looking for a bit of advice regarding alpha primary keys.
>
> I have a table (designed by someone else) that has a numeric primary key and
> also a unique non-null email address field.
>
> The use of the primary key is causing me some headaches in that in  multiple
> database server environment each server allocates a unique number from a
> range
> (and that works fine) but when the table is replicated
> (master-master-master) the exception handling is a bit tricky because each
> database server may have
> records that are duplicate at the email address field - with a different
> primary key number.
>
> I don't know why it was done this way but it seems to me that the email
> addresses are unique, non null and could be used as the primary key. This
> would make the
> replication much faster and simpler.
>
> Does anyone out there think the change (number to email address as primary
> key) would be a bad idea ?
>
> your thoughts would be appreciated.

I think it's a fine idea so long as you understand:
*) the index(es) will be larger
*) referring tables will have to be updated (via RI) if/when the email
address changes
*) lots of people will tell you that you are doing it wrong :-)
*) if you need to specialize p-key further (add another field), it can
be a big change

on the plus side:
*) you get to drop an index because you obviously had to index the key
serparately
*) if you query a referring table and are interested in email address
(and no dependent props), you get to skip a join
*) sorting be email address can be free
*) less data transfer headaches

merlin

Re: primary keys

From
Bill Moran
Date:
On Sun, 13 Sep 2009 02:35:02 +1000
Grant Maxwell <grant.maxwell@maxan.com.au> wrote:

> Hi Folks
>
> I'm looking for a bit of advice regarding alpha primary keys.
>
> I have a table (designed by someone else) that has a numeric primary
> key and also a unique non-null email address field.
>
> The use of the primary key is causing me some headaches in that in
> multiple database server environment each server allocates a unique
> number from a range
> (and that works fine) but when the table is replicated (master-master-
> master) the exception handling is a bit tricky because each database
> server may have
> records that are duplicate at the email address field - with a
> different primary key number.
>
> I don't know why it was done this way but it seems to me that the
> email addresses are unique, non null and could be used as the primary
> key. This would make the
> replication much faster and simpler.
>
> Does anyone out there think the change (number to email address as
> primary key) would be a bad idea ?
>
> your thoughts would be appreciated.

I think you have two issues here, and if you identify them as such, you'll
come up with a better decision.

The first is, "Should the email address be my primary key".  And the answer
to that really depends on whether you expect it to change a lot, whether you
have a lot of FK relations and how easy/difficult it will be to manage those.

The second, which may influence the first, is "how do I manage conflicts
when data from different sources is merged?" which is a huge, complex
question.  One potential answer is to replace your integer surrogate key
with a GUID key, which will be unique from all the different sources.  You
could also use a "prefix" system to ensure uniqueness.  Another option is
to use the email address itself.  I'm surprised by your approach, as
personally, I've seen a LOT of people who share a single email address
(husband/wife, for example).  It's not horribly common, since getting
addresses is easy, but it happens, and it's something to consider when
setting this up: if two people share an email address and try to add their
records at different places, how will the system handle it?

In any event, if all those factors tell you that you should switch to using
the email address as the PK, I doubt you'll regret your decision.  Just be
sure to take into account the foreign key factor early on and you shouldn't
have too many problems (in my experience, a lot of people are unaware of
ON DELETE CASCADE and ON UPDATE CASCADE).

Hope this helps.

-Bill

Re: primary keys

From
Merlin Moncure
Date:
On Sat, Sep 12, 2009 at 2:11 PM, Bill Moran <wmoran@potentialtech.com> wrote:
> On Sun, 13 Sep 2009 02:35:02 +1000
> Grant Maxwell <grant.maxwell@maxan.com.au> wrote:
>
>> Hi Folks
>>
>> I'm looking for a bit of advice regarding alpha primary keys.
>>
>> I have a table (designed by someone else) that has a numeric primary
>> key and also a unique non-null email address field.
>>
>> The use of the primary key is causing me some headaches in that in
>> multiple database server environment each server allocates a unique
>> number from a range
>> (and that works fine) but when the table is replicated (master-master-
>> master) the exception handling is a bit tricky because each database
>> server may have
>> records that are duplicate at the email address field - with a
>> different primary key number.
>>
>> I don't know why it was done this way but it seems to me that the
>> email addresses are unique, non null and could be used as the primary
>> key. This would make the
>> replication much faster and simpler.
>>
>> Does anyone out there think the change (number to email address as
>> primary key) would be a bad idea ?
>>
>> your thoughts would be appreciated.
>
> I think you have two issues here, and if you identify them as such, you'll
> come up with a better decision.
>
> The first is, "Should the email address be my primary key".  And the answer
> to that really depends on whether you expect it to change a lot, whether you
> have a lot of FK relations and how easy/difficult it will be to manage those.
>
> The second, which may influence the first, is "how do I manage conflicts
> when data from different sources is merged?" which is a huge, complex
> question.  One potential answer is to replace your integer surrogate key
> with a GUID key, which will be unique from all the different sources.  You

guid solves the surrogate issue wrt data transfer, but glosses over
what happens when you have duplicates.

> could also use a "prefix" system to ensure uniqueness.  Another option is
> to use the email address itself.  I'm surprised by your approach, as
> personally, I've seen a LOT of people who share a single email address
> (husband/wife, for example).  It's not horribly common, since getting

that's going to depend on how it's defined in the app. couple of
different approaches:

email _must_ be unique, and we care not to whom it belongs:
email (email primary key)
contact (email references email [...])

we do care option 1:
email (email, memo text, primary key(email, memo))
contact (email, memo, references email(email, memo))

storing info in memo field to distinguish the different users

we do care option 2:
email (email primary key)
contact (contact_id)
email_contact_map
(
  contact_id references contact,
  email references email,
  memo, -- 'person a', 'person b', etc
  primary key(contact_id, email),
)

The point is this: if you (the user) needs to distinguish between
email adress users, that information should be in the
database...allowing multiple entry of email addresses via serial or
guid allows you to sneak by this requirement

merlin

Re: primary keys

From
Tom Lane
Date:
Grant Maxwell <grant.maxwell@maxan.com.au> writes:
> I don't know why it was done this way but it seems to me that the
> email addresses are unique, non null and could be used as the primary
> key. This would make the replication much faster and simpler.

> Does anyone out there think the change (number to email address as
> primary key) would be a bad idea ?

One thing that's often considered a useful attribute of a primary key is
that it be immutable.  In your application, do users ever change their
email addresses?  If so, what should happen --- is it okay to treat that
as effectively a new entry?

Also, if you have any other tables referencing this one via foreign
keys, you'd have to have them storing the email address instead of
the serial number; it'll be bulkier and address updates will be that
much more expensive.

You can find lots and lots and lots of discussion of this topic if
you search the archives for talk about natural versus surrogate keys.

            regards, tom lane

Re: primary keys

From
Grant Maxwell
Date:
On 13/09/2009, at 2:46 AM, Tom Lane wrote:

> Grant Maxwell <grant.maxwell@maxan.com.au> writes:
>> I don't know why it was done this way but it seems to me that the
>> email addresses are unique, non null and could be used as the primary
>> key. This would make the replication much faster and simpler.
>
>> Does anyone out there think the change (number to email address as
>> primary key) would be a bad idea ?
>
> One thing that's often considered a useful attribute of a primary
> key is
> that it be immutable.  In your application, do users ever change their
> email addresses?  If so, what should happen --- is it okay to treat
> that
> as effectively a new entry?
>
This would be ok. The table keeps a list of email addresses and some
stats on them
users may have multiple addresses but would never alter a specific
record.

> Also, if you have any other tables referencing this one via foreign
> keys, you'd have to have them storing the email address instead of
> the serial number; it'll be bulkier and address updates will be that
> much more expensive.
so text PKs will be less efficient than numeric ?
>
> You can find lots and lots and lots of discussion of this topic if
> you search the archives for talk about natural versus surrogate keys.
>
Good pointer = thanks Tom
>             regards, tom lane