Re: Alternative to serial primary key - Mailing list pgsql-sql

From Scott Marlowe
Subject Re: Alternative to serial primary key
Date
Msg-id 1152196912.13851.119.camel@state.g2switchworks.com
Whole thread Raw
In response to Alternative to serial primary key  ("David Clarke" <pigwin32@gmail.com>)
Responses Re: Alternative to serial primary key
List pgsql-sql
On Thu, 2006-07-06 at 05:16, David Clarke wrote:
> I posted a couple of weeks back a question regarding the use of a 100
> char column as a primary key and the responses uniformily advised the
> use of a serial column. My concern is that the key is effectively
> abstract and I want to use the column as a foreign key in other
> tables. It occurred to me that if I used a hash function on insert to
> generate another column and used that column as the primary key then I
> have a value that meets a lot of the requirements for a good key,
> including that I can regenerate the exact value from my data,
> something that is impossible with a serial id. I also don't have to
> index the 100 char column in order to search on the table, I just need
> to calculate the hash value and check that against the calculated
> column. It does violate the rule that a table shouldn't contain a
> column that is calculated from another column in the table but I think
> it would still be more effective than a serial id.
> 
> Is this a reasonable/normal thing to do? I know postgres contains an
> md5() hash function, is this likely to be fast enough to make this an
> effective choice? Are there other options? Am I just a noob barking up
> the wrong tree? It is getting kind of late and my brain is starting to
> hurt.

I've read the responses, I'm starting a new answer thread because I
think that this is "premature optimization".

Please note that there seemed to be a misunderstanding in a few
responses that this gentleman had 100 columns to key.  According to this
post it is one column, with 100 characters in it.

My guess is that any test you come up with will find no gain in md5ing a
100 char column.  1,000 or 10,000 maybe.  But not 100.

And since it was intimated this is an address, I would assume it's a
varchar(100) not a char(100) since there's no reason for it to be
padded, so it will not always even be 100 characters long.

I would imagine initing your database for the C (Ascii) locale would be
a much bigger performance gain than any md5 hashing would.


pgsql-sql by date:

Previous
From: Chris Browne
Date:
Subject: Re: Alternative to serial primary key
Next
From: "Aaron Bono"
Date:
Subject: Re: week ending