Thread: UUID as primary key
I have a system where it would be very useful for the primary keys for a few tables to be UUIDs (actually MD5s of files, but UUID seems to be the best 128-bit type available). What is the expected performance of using a UUID as a primary key which will have numerous foreign references to it, versus using a 64-bit int (32-bit isn't big enough)? From the uuid.c in adt, it looks like a UUID is just stored as 8 consecutive bytes, and are compared using memcmp, whereas an int uses primitive CPU instructions for comparison. Is that a significant issue with foreign key performance, or is it mostly just the size that the key would take in all related tables?
On 10/09/2009 12:56 PM, tsuraan wrote: > I have a system where it would be very useful for the primary keys for > a few tables to be UUIDs (actually MD5s of files, but UUID seems to be > the best 128-bit type available). What is the expected performance of > using a UUID as a primary key which will have numerous foreign > references to it, versus using a 64-bit int (32-bit isn't big enough)? > > > From the uuid.c in adt, it looks like a UUID is just stored as 8 > consecutive bytes, and are compared using memcmp, whereas an int uses > primitive CPU instructions for comparison. Is that a significant > issue with foreign key performance, or is it mostly just the size that > the key would take in all related tables? > The most significant impact is that it takes up twice as much space, including the primary key index. This means fewer entries per block, which means slower scans and/or more blocks to navigate through. Still, compared to the rest of the overhead of an index row or a table row, it is low - I think it's more important to understand whether you can get away with using a sequential integer, in which case UUID is unnecessary overhead - or whether you are going to need UUID anyways. If you need UUID anyways - having two primary keys is probably not worth it. Cheers, mark -- Mark Mielke<mark@mielke.cc>
> The most significant impact is that it takes up twice as much space, > including the primary key index. This means fewer entries per block, > which means slower scans and/or more blocks to navigate through. Still, > compared to the rest of the overhead of an index row or a table row, it > is low - I think it's more important to understand whether you can get > away with using a sequential integer, in which case UUID is unnecessary > overhead - or whether you are going to need UUID anyways. If you need > UUID anyways - having two primary keys is probably not worth it. Ok, that's what I was hoping. Out of curiosity, is there a preferred way to store 256-bit ints in postgres? At that point, is a bytea the most reasonable choice, or is there a better way to do it?
On 10/10/2009 01:14 AM, tsuraan wrote: >> The most significant impact is that it takes up twice as much space, >> including the primary key index. This means fewer entries per block, >> which means slower scans and/or more blocks to navigate through. Still, >> compared to the rest of the overhead of an index row or a table row, it >> is low - I think it's more important to understand whether you can get >> away with using a sequential integer, in which case UUID is unnecessary >> overhead - or whether you are going to need UUID anyways. If you need >> UUID anyways - having two primary keys is probably not worth it. >> > Ok, that's what I was hoping. Out of curiosity, is there a preferred > way to store 256-bit ints in postgres? At that point, is a bytea the > most reasonable choice, or is there a better way to do it? > Do you need to be able to do queries on it? Numeric should be able to store 256-bit integers. If you don't need to do queries on it, an option I've considered in the past is to break it up into 4 x int64. Before UUID was supported, I had seriously considered storing UUID as 2 x int64. Now that UUID is supported, you might also abuse UUID where 1 x 256-bit = 2 x UUID. If you want it to be seemless and fully optimal, you would introduce a new int256 type (or whatever the name of the type you are trying to represent). Adding new types to PostgreSQL is not that hard. This would allow queries (=, <>, <, >) as well. Cheers, mark -- Mark Mielke<mark@mielke.cc>
On Oct 10, 2009, at 10:40 AM, Mark Mielke wrote: > On 10/10/2009 01:14 AM, tsuraan wrote: >>> The most significant impact is that it takes up twice as much space, >>> including the primary key index. This means fewer entries per block, >>> which means slower scans and/or more blocks to navigate through. >>> Still, >>> compared to the rest of the overhead of an index row or a table >>> row, it >>> is low - I think it's more important to understand whether you >>> can get >>> away with using a sequential integer, in which case UUID is >>> unnecessary >>> overhead - or whether you are going to need UUID anyways. If you >>> need >>> UUID anyways - having two primary keys is probably not worth it. >>> >> Ok, that's what I was hoping. Out of curiosity, is there a preferred >> way to store 256-bit ints in postgres? At that point, is a bytea the >> most reasonable choice, or is there a better way to do it? >> > > Do you need to be able to do queries on it? Numeric should be able > to store 256-bit integers. > > If you don't need to do queries on it, an option I've considered in > the past is to break it up into 4 x int64. Before UUID was > supported, I had seriously considered storing UUID as 2 x int64. > Now that UUID is supported, you might also abuse UUID where 1 x 256- > bit = 2 x UUID. > > If you want it to be seemless and fully optimal, you would > introduce a new int256 type (or whatever the name of the type you > are trying to represent). Adding new types to PostgreSQL is not > that hard. This would allow queries (=, <>, <, >) as well. If you want an example of that, we had Command Prompt create a full set of hash datatypes (SHA*, and I think md5). That stuff should be on pgFoundry; if it's not drop me a note at jnasby@cashnetusa.com and I'll get it added. -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
decibel escribió: > >If you want it to be seemless and fully optimal, you would > >introduce a new int256 type (or whatever the name of the type you > >are trying to represent). Adding new types to PostgreSQL is not > >that hard. This would allow queries (=, <>, <, >) as well. > > If you want an example of that, we had Command Prompt create a full > set of hash datatypes (SHA*, and I think md5). That stuff should be > on pgFoundry; if it's not drop me a note at jnasby@cashnetusa.com > and I'll get it added. It's at project "shatypes". -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Oct 10, 2009, at 10:40 AM, Mark Mielke wrote: > On 10/10/2009 01:14 AM, tsuraan wrote: >>> The most significant impact is that it takes up twice as much space, >>> including the primary key index. This means fewer entries per block, >>> which means slower scans and/or more blocks to navigate through. >>> Still, >>> compared to the rest of the overhead of an index row or a table >>> row, it >>> is low - I think it's more important to understand whether you >>> can get >>> away with using a sequential integer, in which case UUID is >>> unnecessary >>> overhead - or whether you are going to need UUID anyways. If you >>> need >>> UUID anyways - having two primary keys is probably not worth it. >>> >> Ok, that's what I was hoping. Out of curiosity, is there a preferred >> way to store 256-bit ints in postgres? At that point, is a bytea the >> most reasonable choice, or is there a better way to do it? >> > > Do you need to be able to do queries on it? Numeric should be able > to store 256-bit integers. > > If you don't need to do queries on it, an option I've considered in > the past is to break it up into 4 x int64. Before UUID was > supported, I had seriously considered storing UUID as 2 x int64. > Now that UUID is supported, you might also abuse UUID where 1 x 256- > bit = 2 x UUID. > > If you want it to be seemless and fully optimal, you would > introduce a new int256 type (or whatever the name of the type you > are trying to represent). Adding new types to PostgreSQL is not > that hard. This would allow queries (=, <>, <, >) as well. If you want an example of that, we had Command Prompt create a full set of hash datatypes (SHA*, and I think md5). That stuff should be on pgFoundry; if it's not drop me a note at jnasby@cashnetusa.com and I'll get it added. -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828