Re: Using varchar primary keys. - Mailing list pgsql-general

From Joe Van Dyk
Subject Re: Using varchar primary keys.
Date
Msg-id CACfv+p+OBgT1DQ1CTP82e-jsBBSZy8XWME0k5NHwcAoog7_pgg@mail.gmail.com
Whole thread Raw
In response to Re: Using varchar primary keys.  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-general
On Tue, Apr 2, 2013 at 11:16 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Tue, Apr 2, 2013 at 10:34 AM, Joe Van Dyk <joe@tanga.com> wrote:
> On Mon, Apr 1, 2013 at 1:11 PM, Tim Uckun <timuckun@gmail.com> wrote:
>>
>>
>>
>>
>> On Tue, Apr 2, 2013 at 8:35 AM, jesusthefrog <jesusthefrog@gmail.com>
>> wrote:
>>>
>>> On the topic of 'natural' versus 'synthetic' primary keys, I am generally
>>> in the camp that an extra ID field won't cost you too much, and while one
>>> may not need it for a simple table (i.e. id, name) one might add any number
>>> of columns later, and you'll be glad to have it.
>>>
>>
>> Nothing prevents you from adding more columns if you use varchar primary
>> keys.
>>
>>>
>>>
>>> My preferred method is to give every table an ID column of UUID type and
>>> generate a UUID using the uuid-ossp contrib module. This also prevents
>>> someone not familiar with the database design from using an ID somewhere
>>> they should not (as is possible with natural PKs) or treating the ID as an
>>> integer, not an identifier (as is all too common with serial integers).
>>>
>>>
>>
>> This would be a concern if you had multi master writes . As far as I know
>> Postgres does not have a true multi master replication system so all the
>> writes have to happen on one server right?
>>
>> As for UUIDs I use them sometimes but I tend to also use one serial column
>> because when I am communicating with people it makes it so much easier to
>> say "dealer number X" than "dealer number SOME_HUGE_UUID".  I often have to
>> talk to people about the data and UUIDs make it very difficult to
>> communicate with humans.
>
>
> I've been wishing for a smaller uuid type for a while. Say you need to
> assign a Order #. Customers might be referencing the number, so you don't
> want it to be too long. But you don't want Order #'s to be guessable or have
> the characters/digits be transposed accidently.
>
> I've been using a unique text column with a default of random_characters(12)
>
> CREATE OR REPLACE FUNCTION public.random_characters(length integer)
>  RETURNS text
>  LANGUAGE sql
>  STABLE
> AS $function$
> SELECT array_to_string(array((
>       SELECT SUBSTRING('abcdefghjkmnpqrstuvwxyz23456789'
>         FROM mod((random()*31)::int, 31)+1 FOR 1)
>       FROM generate_series(1, $1))),'');
> $function$;
>
> This seems to work ok. I don't allow 1's or l's or 0's or O's, as they can
> easily be mistyped or misread.

This is pseudo random and can be guessed, which is maybe dangerous
depending on circumstance.  For stronger random stream go to
pgcrypto.gen_random_bytes().  Also, now you have to worry about
collisions -- the whole point of uuid is to try and keep you from
having to deal with that.

Right, but it's better than using serial's as far as being guessable. 

The probability for collisions are fairly low, if you are using 12 or more characters (with 30 possible characters). Not sure what the math is on the probability of collisions (birthday problem) though.. and you could have a trigger that checked for the existence of a matching key before inserts/updates.

And using UUIDs would be too long for lots of purposes where people are working with the numbers, and where there might be external constraints on how long the IDs can be.

An example use case: 

where "cd8f90c81a" is the 'uuid' for that product. It's non-guessable enough, and I don't want to put a full UUID in the URL.


 

My historical comments in this debate are noted.  To summarize, I
strongly believe that natural keys are often (but not always) better.


 

pgsql-general by date:

Previous
From: Jigar Shah
Date:
Subject: corrupted item pointer in streaming based replication
Next
From: Kevin Grittner
Date:
Subject: Re: in C trigger function find out if column is part of primary key