Re: Alternative to serial primary key - Mailing list pgsql-sql

From David Clarke
Subject Re: Alternative to serial primary key
Date
Msg-id 12b7ac1e0607070107t78a808aeuc8aba982d593abff@mail.gmail.com
Whole thread Raw
In response to Re: Alternative to serial primary key  (Scott Marlowe <smarlowe@g2switchworks.com>)
Responses Re: Alternative to serial primary key  ("Aaron Bono" <postgresql@aranya.com>)
Re: Alternative to serial primary key  (Scott Marlowe <smarlowe@g2switchworks.com>)
List pgsql-sql
On 7/7/06, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> On Thu, 2006-07-06 at 16:43, Aaron Bono wrote:
> I'll repeat my previous statement that this is premature optimization,
> and the hash is kind the wrong direction.
>
> If you store an int and the 1 to 100 characters in a varchar, you'll
> have about 4 to 8 bytes for the int (6 I think, but it's been a while)
> plus 1 to 200 or possibly more for the characters in the address.
>
> If you use C local with ASCII encoding, you can get single byte.
>
> If you switch to an md5 hash, you'll need ~50 bytes (average address
> about 1/2 max length, just a guess) plus 32 bytes, plus the extra bytes
> to keep track of the length of the fields.
>
> The table now becomes wider itself, and the md5 is generally about as
> big as the address, or fairly close to it.
>
> And you've got the possibility of md5 collisions to deal with.
>
> I'd say just FK off of the address field.  It's a natural key, fairly
> small (100 bytes ain't really that big) and your primary key never needs
> any kind of regenerating or anything, because it's already there.
>
> Just set it up with cascading updates and deletes in case you need to
> edit it in the future.
>
> The first rule of optimization:  Don't
>
Yep, this was pretty much where I started from and I totally agree
with you regarding premature optimisation. I would point out that md5
hash is 128 bits or 16 bytes and not 32 so the difference between the
hash value and the source data is somewhat larger than you've
suggested. My original post a few weeks back was really about the use
of a natural key of varchar(100) as a foreign key in other tables. The
response was to not do it and instead use a serial which is basically
how I was progressing. Celko's comments re the use of autonumbering
schemes have obviously been gnawing away at me. That is why I asked
the question about the hashing approach which I saw as a way to
fulfill a lot of the criteria he has suggested for a primary key and
still have a reasonable value to use as a foreign key. Please ignore
the regeneration comment, it was written in haste and not because I
really anticipate any need to regenerate my primary key at any stage.

The question remains regarding the use of a string value as a primary
key for the table and as a foreign key in other tables. If I use the
address column as a foreign key in a differrent table will postgres
physically duplicate the data or will it simply attach the appropriate
index magically behind the scenes? I agree that the address column is
fairly small and I've heard of others using the likes of GUIDs as key
values which are not a lot smaller than I would expect my average
address to be.

Thanks
Dave

Dave


pgsql-sql by date:

Previous
From: "David Clarke"
Date:
Subject: Re: Alternative to serial primary key
Next
From: "Rodrigo De Leon"
Date:
Subject: Re: Select Maths