Thread: UUID as primary key

UUID as primary key

From
tsuraan
Date:
I have a system where it would be very useful for the primary keys for
a few tables to be UUIDs (actually MD5s of files, but UUID seems to be
the best 128-bit type available).  What is the expected performance of
using a UUID as a primary key which will have numerous foreign
references to it, versus using a 64-bit int (32-bit isn't big enough)?

From the uuid.c in adt, it looks like a UUID is just stored as 8
consecutive bytes, and are compared using memcmp, whereas an int uses
primitive CPU instructions for comparison.  Is that a significant
issue with foreign key performance, or is it mostly just the size that
the key would take in all related tables?

Re: UUID as primary key

From
Mark Mielke
Date:
On 10/09/2009 12:56 PM, tsuraan wrote:
> I have a system where it would be very useful for the primary keys for
> a few tables to be UUIDs (actually MD5s of files, but UUID seems to be
> the best 128-bit type available).  What is the expected performance of
> using a UUID as a primary key which will have numerous foreign
> references to it, versus using a 64-bit int (32-bit isn't big enough)?
>
> > From the uuid.c in adt, it looks like a UUID is just stored as 8
> consecutive bytes, and are compared using memcmp, whereas an int uses
> primitive CPU instructions for comparison.  Is that a significant
> issue with foreign key performance, or is it mostly just the size that
> the key would take in all related tables?
>

The most significant impact is that it takes up twice as much space,
including the primary key index. This means fewer entries per block,
which means slower scans and/or more blocks to navigate through. Still,
compared to the rest of the overhead of an index row or a table row, it
is low - I think it's more important to understand whether you can get
away with using a sequential integer, in which case UUID is unnecessary
overhead - or whether you are going to need UUID anyways. If you need
UUID anyways - having two primary keys is probably not worth it.

Cheers,
mark


--
Mark Mielke<mark@mielke.cc>


Re: UUID as primary key

From
tsuraan
Date:
> The most significant impact is that it takes up twice as much space,
> including the primary key index. This means fewer entries per block,
> which means slower scans and/or more blocks to navigate through. Still,
> compared to the rest of the overhead of an index row or a table row, it
> is low - I think it's more important to understand whether you can get
> away with using a sequential integer, in which case UUID is unnecessary
> overhead - or whether you are going to need UUID anyways. If you need
> UUID anyways - having two primary keys is probably not worth it.

Ok, that's what I was hoping.  Out of curiosity, is there a preferred
way to store 256-bit ints in postgres?  At that point, is a bytea the
most reasonable choice, or is there a better way to do it?

Re: UUID as primary key

From
Mark Mielke
Date:
On 10/10/2009 01:14 AM, tsuraan wrote:
>> The most significant impact is that it takes up twice as much space,
>> including the primary key index. This means fewer entries per block,
>> which means slower scans and/or more blocks to navigate through. Still,
>> compared to the rest of the overhead of an index row or a table row, it
>> is low - I think it's more important to understand whether you can get
>> away with using a sequential integer, in which case UUID is unnecessary
>> overhead - or whether you are going to need UUID anyways. If you need
>> UUID anyways - having two primary keys is probably not worth it.
>>
> Ok, that's what I was hoping.  Out of curiosity, is there a preferred
> way to store 256-bit ints in postgres?  At that point, is a bytea the
> most reasonable choice, or is there a better way to do it?
>

Do you need to be able to do queries on it? Numeric should be able to
store 256-bit integers.

If you don't need to do queries on it, an option I've considered in the
past is to break it up into 4 x int64. Before UUID was supported, I had
seriously considered storing UUID as 2 x int64. Now that UUID is
supported, you might also abuse UUID where 1 x 256-bit = 2 x UUID.

If you want it to be seemless and fully optimal, you would introduce a
new int256 type (or whatever the name of the type you are trying to
represent). Adding new types to PostgreSQL is not that hard. This would
allow queries (=, <>, <, >) as well.

Cheers,
mark

--
Mark Mielke<mark@mielke.cc>


Re: UUID as primary key

From
decibel
Date:
On Oct 10, 2009, at 10:40 AM, Mark Mielke wrote:
> On 10/10/2009 01:14 AM, tsuraan wrote:
>>> The most significant impact is that it takes up twice as much space,
>>> including the primary key index. This means fewer entries per block,
>>> which means slower scans and/or more blocks to navigate through.
>>> Still,
>>> compared to the rest of the overhead of an index row or a table
>>> row, it
>>> is low - I think it's more important to understand whether you
>>> can get
>>> away with using a sequential integer, in which case UUID is
>>> unnecessary
>>> overhead - or whether you are going to need UUID anyways. If you
>>> need
>>> UUID anyways - having two primary keys is probably not worth it.
>>>
>> Ok, that's what I was hoping.  Out of curiosity, is there a preferred
>> way to store 256-bit ints in postgres?  At that point, is a bytea the
>> most reasonable choice, or is there a better way to do it?
>>
>
> Do you need to be able to do queries on it? Numeric should be able
> to store 256-bit integers.
>
> If you don't need to do queries on it, an option I've considered in
> the past is to break it up into 4 x int64. Before UUID was
> supported, I had seriously considered storing UUID as 2 x int64.
> Now that UUID is supported, you might also abuse UUID where 1 x 256-
> bit = 2 x UUID.
>
> If you want it to be seemless and fully optimal, you would
> introduce a new int256 type (or whatever the name of the type you
> are trying to represent). Adding new types to PostgreSQL is not
> that hard. This would allow queries (=, <>, <, >) as well.


If you want an example of that, we had Command Prompt create a full
set of hash datatypes (SHA*, and I think md5). That stuff should be
on pgFoundry; if it's not drop me a note at jnasby@cashnetusa.com and
I'll get it added.
--
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



Re: UUID as primary key

From
Alvaro Herrera
Date:
decibel escribió:

> >If you want it to be seemless and fully optimal, you would
> >introduce a new int256 type (or whatever the name of the type you
> >are trying to represent). Adding new types to PostgreSQL is not
> >that hard. This would allow queries (=, <>, <, >) as well.
>
> If you want an example of that, we had Command Prompt create a full
> set of hash datatypes (SHA*, and I think md5). That stuff should be
> on pgFoundry; if it's not drop me a note at jnasby@cashnetusa.com
> and I'll get it added.

It's at project "shatypes".

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: UUID as primary key

From
decibel
Date:
On Oct 10, 2009, at 10:40 AM, Mark Mielke wrote:
> On 10/10/2009 01:14 AM, tsuraan wrote:
>>> The most significant impact is that it takes up twice as much space,
>>> including the primary key index. This means fewer entries per block,
>>> which means slower scans and/or more blocks to navigate through.
>>> Still,
>>> compared to the rest of the overhead of an index row or a table
>>> row, it
>>> is low - I think it's more important to understand whether you
>>> can get
>>> away with using a sequential integer, in which case UUID is
>>> unnecessary
>>> overhead - or whether you are going to need UUID anyways. If you
>>> need
>>> UUID anyways - having two primary keys is probably not worth it.
>>>
>> Ok, that's what I was hoping.  Out of curiosity, is there a preferred
>> way to store 256-bit ints in postgres?  At that point, is a bytea the
>> most reasonable choice, or is there a better way to do it?
>>
>
> Do you need to be able to do queries on it? Numeric should be able
> to store 256-bit integers.
>
> If you don't need to do queries on it, an option I've considered in
> the past is to break it up into 4 x int64. Before UUID was
> supported, I had seriously considered storing UUID as 2 x int64.
> Now that UUID is supported, you might also abuse UUID where 1 x 256-
> bit = 2 x UUID.
>
> If you want it to be seemless and fully optimal, you would
> introduce a new int256 type (or whatever the name of the type you
> are trying to represent). Adding new types to PostgreSQL is not
> that hard. This would allow queries (=, <>, <, >) as well.


If you want an example of that, we had Command Prompt create a full
set of hash datatypes (SHA*, and I think md5). That stuff should be
on pgFoundry; if it's not drop me a note at jnasby@cashnetusa.com and
I'll get it added.
--
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828