Re: UUID performance as primary key - Mailing list pgsql-performance

From Andy
Subject Re: UUID performance as primary key
Date
Msg-id 321028.24609.qm@web111312.mail.gq1.yahoo.com
Whole thread Raw
In response to Re: UUID performance as primary key  (Craig Ringer <craig@postnewspapers.com.au>)
List pgsql-performance
Wouldn't UUID PK cause a significant drop in insert performance because every insert is now out of order, which leads to a constant re-arranging of the B+ tree? The amount of random IO's that's going to generate would just kill the performance.

--- On Fri, 10/15/10, Craig Ringer <craig@postnewspapers.com.au> wrote:

From: Craig Ringer <craig@postnewspapers.com.au>
Subject: Re: [PERFORM] UUID performance as primary key
To: "Navkirat Singh" <navkirats@gmail.com>
Cc: pgsql-performance@postgresql.org
Date: Friday, October 15, 2010, 10:59 PM

On 16/10/2010 9:58 AM, Navkirat Singh wrote:
> Hi Guys,
>
> I am interested in finding out the pros/cons of using UUID as a primary key field. My requirement states that UUID would be perfect in my case as I will be having many small databases which will link up to a global database using the UUID. Hence, the need for a unique key across all databases. It would be extremely helpful if someone could help me figure this out, as it is critical for my project.

Pro: No need for (serverid,serverseq) pair primary keys or hacks with modulus based key generation. Doesn't set any pre-determined limit on how many servers/databases may be in a cluster.

Con: Slower than modulo key generation approach, uses more storage. Foreign key relationships may be slower too.

Overall, UUIDs seem to be a favoured approach. The other way people seem to do this is by assigning a unique instance id to each server/database out of a maximum "n" instances decided at setup time. Every key generation sequence increments by "n" whenever it generates a key, with an offset of the server/database id. That way, if n=100, server 1 will generate primary keys 001, 101, 201, 301, ..., server 2 will generate primary keys 002, 102, 202, 302, ... and so on.

That works great until you need more than 100 instances, at which point you're really, REALLY boned. In really busy systems it also limits the total amount of primary key space - but with BIGINT primary keys, that's unlikely to be something you need to worry about.

The composite primary key (serverid,sequenceid) approach avoids the need for a pre-defined maximum number of servers, but can be slow to index and can require more storage, especially because of tuple headers.

I have no firsthand experience with any of these approaches so I can't offer you a considered opinion. I know that the MS-SQL crowd at least strongly prefer UUIDs, but they have very strong in-database UUID support. MySQL folks seem to mostly favour the modulo primary key generation approach. I don't see much discussion of the issue here - I get the impression Pg doesn't see heavy use in sharded environments.

-- Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

-- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

pgsql-performance by date:

Previous
From: Greg Smith
Date:
Subject: Re: Slow count(*) again...
Next
From: Alvaro Herrera
Date:
Subject: Re: No hash join across partitioned tables?