Re: UUID performance as primary key - Mailing list pgsql-performance

From Craig Ringer
Subject Re: UUID performance as primary key
Date
Msg-id 4CB91515.6020201@postnewspapers.com.au
Whole thread Raw
In response to UUID performance as primary key  (Navkirat Singh <navkirats@gmail.com>)
Responses Re: UUID performance as primary key
Re: UUID performance as primary key
List pgsql-performance
On 16/10/2010 9:58 AM, Navkirat Singh wrote:
> Hi Guys,
>
> I am interested in finding out the pros/cons of using UUID as a primary key field. My requirement states that UUID
wouldbe perfect in my case as I will be having many small databases which will link up to a global database using the
UUID.Hence, the need for a unique key across all databases. It would be extremely helpful if someone could help me
figurethis out, as it is critical for my project. 

Pro: No need for (serverid,serverseq) pair primary keys or hacks with
modulus based key generation. Doesn't set any pre-determined limit on
how many servers/databases may be in a cluster.

Con: Slower than modulo key generation approach, uses more storage.
Foreign key relationships may be slower too.

Overall, UUIDs seem to be a favoured approach. The other way people seem
to do this is by assigning a unique instance id to each server/database
out of a maximum "n" instances decided at setup time. Every key
generation sequence increments by "n" whenever it generates a key, with
an offset of the server/database id. That way, if n=100, server 1 will
generate primary keys 001, 101, 201, 301, ..., server 2 will generate
primary keys 002, 102, 202, 302, ... and so on.

That works great until you need more than 100 instances, at which point
you're really, REALLY boned. In really busy systems it also limits the
total amount of primary key space - but with BIGINT primary keys, that's
unlikely to be something you need to worry about.

The composite primary key (serverid,sequenceid) approach avoids the need
for a pre-defined maximum number of servers, but can be slow to index
and can require more storage, especially because of tuple headers.

I have no firsthand experience with any of these approaches so I can't
offer you a considered opinion. I know that the MS-SQL crowd at least
strongly prefer UUIDs, but they have very strong in-database UUID
support. MySQL folks seem to mostly favour the modulo primary key
generation approach. I don't see much discussion of the issue here - I
get the impression Pg doesn't see heavy use in sharded environments.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Stored procedure declared as VOLATILE => no good optimization is done
Next
From: Greg Smith
Date:
Subject: Re: Slow count(*) again...