Thread: primary keys
Hi Folks I'm looking for a bit of advice regarding alpha primary keys. I have a table (designed by someone else) that has a numeric primary key and also a unique non-null email address field. The use of the primary key is causing me some headaches in that in multiple database server environment each server allocates a unique number from a range (and that works fine) but when the table is replicated (master-master- master) the exception handling is a bit tricky because each database server may have records that are duplicate at the email address field - with a different primary key number. I don't know why it was done this way but it seems to me that the email addresses are unique, non null and could be used as the primary key. This would make the replication much faster and simpler. Does anyone out there think the change (number to email address as primary key) would be a bad idea ? your thoughts would be appreciated. regards Grant
On Sat, Sep 12, 2009 at 12:35 PM, Grant Maxwell <grant.maxwell@maxan.com.au> wrote: > Hi Folks > > I'm looking for a bit of advice regarding alpha primary keys. > > I have a table (designed by someone else) that has a numeric primary key and > also a unique non-null email address field. > > The use of the primary key is causing me some headaches in that in multiple > database server environment each server allocates a unique number from a > range > (and that works fine) but when the table is replicated > (master-master-master) the exception handling is a bit tricky because each > database server may have > records that are duplicate at the email address field - with a different > primary key number. > > I don't know why it was done this way but it seems to me that the email > addresses are unique, non null and could be used as the primary key. This > would make the > replication much faster and simpler. > > Does anyone out there think the change (number to email address as primary > key) would be a bad idea ? > > your thoughts would be appreciated. I think it's a fine idea so long as you understand: *) the index(es) will be larger *) referring tables will have to be updated (via RI) if/when the email address changes *) lots of people will tell you that you are doing it wrong :-) *) if you need to specialize p-key further (add another field), it can be a big change on the plus side: *) you get to drop an index because you obviously had to index the key serparately *) if you query a referring table and are interested in email address (and no dependent props), you get to skip a join *) sorting be email address can be free *) less data transfer headaches merlin
On Sun, 13 Sep 2009 02:35:02 +1000 Grant Maxwell <grant.maxwell@maxan.com.au> wrote: > Hi Folks > > I'm looking for a bit of advice regarding alpha primary keys. > > I have a table (designed by someone else) that has a numeric primary > key and also a unique non-null email address field. > > The use of the primary key is causing me some headaches in that in > multiple database server environment each server allocates a unique > number from a range > (and that works fine) but when the table is replicated (master-master- > master) the exception handling is a bit tricky because each database > server may have > records that are duplicate at the email address field - with a > different primary key number. > > I don't know why it was done this way but it seems to me that the > email addresses are unique, non null and could be used as the primary > key. This would make the > replication much faster and simpler. > > Does anyone out there think the change (number to email address as > primary key) would be a bad idea ? > > your thoughts would be appreciated. I think you have two issues here, and if you identify them as such, you'll come up with a better decision. The first is, "Should the email address be my primary key". And the answer to that really depends on whether you expect it to change a lot, whether you have a lot of FK relations and how easy/difficult it will be to manage those. The second, which may influence the first, is "how do I manage conflicts when data from different sources is merged?" which is a huge, complex question. One potential answer is to replace your integer surrogate key with a GUID key, which will be unique from all the different sources. You could also use a "prefix" system to ensure uniqueness. Another option is to use the email address itself. I'm surprised by your approach, as personally, I've seen a LOT of people who share a single email address (husband/wife, for example). It's not horribly common, since getting addresses is easy, but it happens, and it's something to consider when setting this up: if two people share an email address and try to add their records at different places, how will the system handle it? In any event, if all those factors tell you that you should switch to using the email address as the PK, I doubt you'll regret your decision. Just be sure to take into account the foreign key factor early on and you shouldn't have too many problems (in my experience, a lot of people are unaware of ON DELETE CASCADE and ON UPDATE CASCADE). Hope this helps. -Bill
On Sat, Sep 12, 2009 at 2:11 PM, Bill Moran <wmoran@potentialtech.com> wrote: > On Sun, 13 Sep 2009 02:35:02 +1000 > Grant Maxwell <grant.maxwell@maxan.com.au> wrote: > >> Hi Folks >> >> I'm looking for a bit of advice regarding alpha primary keys. >> >> I have a table (designed by someone else) that has a numeric primary >> key and also a unique non-null email address field. >> >> The use of the primary key is causing me some headaches in that in >> multiple database server environment each server allocates a unique >> number from a range >> (and that works fine) but when the table is replicated (master-master- >> master) the exception handling is a bit tricky because each database >> server may have >> records that are duplicate at the email address field - with a >> different primary key number. >> >> I don't know why it was done this way but it seems to me that the >> email addresses are unique, non null and could be used as the primary >> key. This would make the >> replication much faster and simpler. >> >> Does anyone out there think the change (number to email address as >> primary key) would be a bad idea ? >> >> your thoughts would be appreciated. > > I think you have two issues here, and if you identify them as such, you'll > come up with a better decision. > > The first is, "Should the email address be my primary key". And the answer > to that really depends on whether you expect it to change a lot, whether you > have a lot of FK relations and how easy/difficult it will be to manage those. > > The second, which may influence the first, is "how do I manage conflicts > when data from different sources is merged?" which is a huge, complex > question. One potential answer is to replace your integer surrogate key > with a GUID key, which will be unique from all the different sources. You guid solves the surrogate issue wrt data transfer, but glosses over what happens when you have duplicates. > could also use a "prefix" system to ensure uniqueness. Another option is > to use the email address itself. I'm surprised by your approach, as > personally, I've seen a LOT of people who share a single email address > (husband/wife, for example). It's not horribly common, since getting that's going to depend on how it's defined in the app. couple of different approaches: email _must_ be unique, and we care not to whom it belongs: email (email primary key) contact (email references email [...]) we do care option 1: email (email, memo text, primary key(email, memo)) contact (email, memo, references email(email, memo)) storing info in memo field to distinguish the different users we do care option 2: email (email primary key) contact (contact_id) email_contact_map ( contact_id references contact, email references email, memo, -- 'person a', 'person b', etc primary key(contact_id, email), ) The point is this: if you (the user) needs to distinguish between email adress users, that information should be in the database...allowing multiple entry of email addresses via serial or guid allows you to sneak by this requirement merlin
Grant Maxwell <grant.maxwell@maxan.com.au> writes: > I don't know why it was done this way but it seems to me that the > email addresses are unique, non null and could be used as the primary > key. This would make the replication much faster and simpler. > Does anyone out there think the change (number to email address as > primary key) would be a bad idea ? One thing that's often considered a useful attribute of a primary key is that it be immutable. In your application, do users ever change their email addresses? If so, what should happen --- is it okay to treat that as effectively a new entry? Also, if you have any other tables referencing this one via foreign keys, you'd have to have them storing the email address instead of the serial number; it'll be bulkier and address updates will be that much more expensive. You can find lots and lots and lots of discussion of this topic if you search the archives for talk about natural versus surrogate keys. regards, tom lane
On 13/09/2009, at 2:46 AM, Tom Lane wrote: > Grant Maxwell <grant.maxwell@maxan.com.au> writes: >> I don't know why it was done this way but it seems to me that the >> email addresses are unique, non null and could be used as the primary >> key. This would make the replication much faster and simpler. > >> Does anyone out there think the change (number to email address as >> primary key) would be a bad idea ? > > One thing that's often considered a useful attribute of a primary > key is > that it be immutable. In your application, do users ever change their > email addresses? If so, what should happen --- is it okay to treat > that > as effectively a new entry? > This would be ok. The table keeps a list of email addresses and some stats on them users may have multiple addresses but would never alter a specific record. > Also, if you have any other tables referencing this one via foreign > keys, you'd have to have them storing the email address instead of > the serial number; it'll be bulkier and address updates will be that > much more expensive. so text PKs will be less efficient than numeric ? > > You can find lots and lots and lots of discussion of this topic if > you search the archives for talk about natural versus surrogate keys. > Good pointer = thanks Tom > regards, tom lane