On Thu, 2006-07-06 at 16:43, Aaron Bono wrote:
> On 7/6/06, David Clarke <pigwin32@gmail.com> wrote:
> To recap, yes there is only a single column, yes it is
> varchar. I need
> to do a lookup on the address column which is unique and use
> it as a
> foreign key in other tables. Using a serial id would obviously
> work
> and has been recommended. But having a hash function over the
> address
> column as the primary key means I can always regenerate my
> primary key
> from the data which is impossible with a serial key. I believe
> the
> risk of collision using md5 is effectively zero on this data
> and I can
> put a unique index over it.
I'll repeat my previous statement that this is premature optimization,
and the hash is kind the wrong direction.
If you store an int and the 1 to 100 characters in a varchar, you'll
have about 4 to 8 bytes for the int (6 I think, but it's been a while)
plus 1 to 200 or possibly more for the characters in the address.
If you use C local with ASCII encoding, you can get single byte.
If you switch to an md5 hash, you'll need ~50 bytes (average address
about 1/2 max length, just a guess) plus 32 bytes, plus the extra bytes
to keep track of the length of the fields.
The table now becomes wider itself, and the md5 is generally about as
big as the address, or fairly close to it.
And you've got the possibility of md5 collisions to deal with.
I'd say just FK off of the address field. It's a natural key, fairly
small (100 bytes ain't really that big) and your primary key never needs
any kind of regenerating or anything, because it's already there.
Just set it up with cascading updates and deletes in case you need to
edit it in the future.
The first rule of optimization: Don't