Re: Alternative to serial primary key - Mailing list pgsql-sql

From Scott Marlowe
Subject Re: Alternative to serial primary key
Date
Msg-id 1152290820.22269.18.camel@state.g2switchworks.com
Whole thread Raw
In response to Re: Alternative to serial primary key  ("David Clarke" <pigwin32@gmail.com>)
List pgsql-sql
On Fri, 2006-07-07 at 03:07, David Clarke wrote:

> Yep, this was pretty much where I started from and I totally agree
> with you regarding premature optimisation. I would point out that md5
> hash is 128 bits or 16 bytes and not 32 

Unless you're going to store them as a binary field, the standard ascii
rep of an md5 is 32 characters (not I didn't say bytes in my original
post, cause bytes and characters aren't always the same thing.)

If you were using multi-byte encoding, and handed it an md5sum and
stored it in UTF-8, it would take 64 bytes to store.  My point being
that paying close attention to locale and encoding can likely save you
more space and give you better performance than using md5 hashes can
here.  

>  My original post a few weeks back was really about the use
> of a natural key of varchar(100) as a foreign key in other tables. The
> response was to not do it and instead use a serial which is basically
> how I was progressing. Celko's comments re the use of autonumbering
> schemes have obviously been gnawing away at me. 

Yeah, like I said earlier (at least I think I did.  :) this is a
religious issue.  I tend towards using the natural keys with a serial
field to pull things out of the fire should they head south.

> The question remains regarding the use of a string value as a primary
> key for the table and as a foreign key in other tables. If I use the
> address column as a foreign key in a differrent table will postgres
> physically duplicate the data or will it simply attach the appropriate
> index magically behind the scenes?

PostgreSQL, and most other dbs as well, will duplicate the data.  but
keep in mind that for the most part, differences of <100 bytes aren't a
big performance issue.  Now, if you're trying to set records running TPC
tests, then yes, you'll want to optimize the heck out of your schema.

But mostly, I've found that there's plenty of low hanging fruit in the
application side before the database becomes the choke point in this
kind of thing.  

Good luck on your implementation.  Let us all know how it goes.


pgsql-sql by date:

Previous
From: Gary Stainburn
Date:
Subject: Re: SELECT substring with regex
Next
From: T E Schmitz
Date:
Subject: Re: SELECT substring with regex