Re: Alternative to serial primary key - Mailing list pgsql-sql

From Chris Browne
Subject Re: Alternative to serial primary key
Date
Msg-id 60k66qudc3.fsf@dba2.int.libertyrms.com
Whole thread Raw
In response to Alternative to serial primary key  ("David Clarke" <pigwin32@gmail.com>)
List pgsql-sql
ajs@crankycanuck.ca (Andrew Sullivan) writes:
> On Thu, Jul 06, 2006 at 10:16:42PM +1200, David Clarke wrote:
>> column that is calculated from another column in the table but I think
>> it would still be more effective than a serial id.
>
> There is the problem that the hash is not proved unique (in fact,
> someone has generated collisions on md5).  Primary keys have to be
> unique, of course.
>
> I _think_ with a hundred columns, you could probably prove (using
> brute force, if need be) that the hashes are going to be unique,
> assuming the list of possible values in each column is bounded.

Seems to me that would only happen if those hundred columns consisted
of values that could be fully enumerated before designing the hash.

In effect, if you know that each column selects from a predetermined
set of values that will never change, then you may assign a number to
each value, and then generate a function which amounts to...
 hash := 0 for column from 1 to 100 do    multiple := degree[column]    hash := (hash + enum[column,value[column]]) *
multipledone
 

where degree[column] is the number of possible values for the column,
value[column] is the value found in the column, and enum[column,VALUE]
selects the integer associated with that column.

Unfortunately, this approach to hashing breaks down if there is ever a
reason to add to the list of values that can be stored in a column.
For instance, country codes per ISO 3166-1 are no good as one of these
enumerated columns because the set of countries in the world changes
every so often.

You could allow for that change by, say, allowing for an extra 100
entries that would likely allow the country codes to change for 50
years.

But that's the "good news" part.

If some of the 100 fields contain peoples' names, that's not
particularly usefully enumerable :-(.  New names come up all the time,
even if only because immigration officials transliterate names into
English in inconsistent ways...  Back in the days when I was an
accountant, I did tax returns for three Lebanese brothers who came to
Canada and who all have slightly different surnames in English even
though they were the same in (I presume) Arabic.

It would be nice to have a suitable hash, but I'm not sure it's
attainable...
-- 
output = ("cbbrowne" "@" "ntlug.org")
http://cbbrowne.com/info/spreadsheets.html
Rules of the Evil Overlord #114. "I will never accept a challenge from
the hero." <http://www.eviloverlord.com/>


pgsql-sql by date:

Previous
From: Andrew Sullivan
Date:
Subject: Re: Alternative to serial primary key
Next
From: Scott Marlowe
Date:
Subject: Re: Alternative to serial primary key