Thread: UUID - Data type inefficient
The new data type, UUID, is stored as a string -char(16)-: ------------ struct pg_uuid_t { unsigned char data[UUID_LEN]; }; #define UUID_LEN 16 ------------ but this it's very inefficient as you can read here [1]. The ideal would be use bit(128), but today isn't possible. One possible solution would be create a structure with 2 fields, each one with bit(64). [1] http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/
Kless wrote: > The new data type, UUID, is stored as a string -char(16)-: > > ------------ > struct pg_uuid_t > { > unsigned char data[UUID_LEN]; > }; > #define UUID_LEN 16 > ------------ > No it is not. It is stored as 16 binary bytes. As text it won't fit into 16 bytes. > but this it's very inefficient as you can read here [1]. > What on earth makes you assume that MySQL performance characteristics apply to PostgreSQL? If you want to show that our implementation is inefficient, you need to produce PostgreSQL performance tests to demonstrate it. cheers andrew
Kless wrote: > The new data type, UUID, is stored as a string -char(16)-: > > ------------ > struct pg_uuid_t > { > unsigned char data[UUID_LEN]; > }; > #define UUID_LEN 16 > ------------ > > but this it's very inefficient as you can read here [1]. > > The ideal would be use bit(128), but today isn't possible. One > possible solution would be create a structure with 2 fields, each one > with bit(64). > > > [1] http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/ > > That's a general page about UUID vs serial integers. What is the complaint? Do you have evidence that it would be noticeably faster as two 64-bits? Note that a UUID is broken into several non-64 bit elements, and managing it as bytes or 64-bit integers, or as a union with the bit-lengths specified, are probably all efficient or inefficient depending on the operation being performed. The hope should be that the optimizer will generate similar best code for each. Cheers, mark -- Mark Mielke <mark@mielke.cc>
Mark Mielke wrote: > Kless wrote: >> The new data type, UUID, is stored as a string -char(16)-: >> struct pg_uuid_t >> { >> unsigned char data[UUID_LEN]; >> }; >> #define UUID_LEN 16 >> > > What is the complaint? Do you have evidence that it would be > noticeably faster as two 64-bits? Note that a UUID is broken into > several non-64 bit elements, and managing it as bytes or 64-bit > integers, or as a union with the bit-lengths specified, are probably > all efficient or inefficient depending on the operation being > performed. The hope should be that the optimizer will generate similar > best code for each. I didn't notice that he put 16. Now I'm looking at uuid.c in PostgreSQL 8.3.3 and I see that it does use 16, and the struct pg_uuid_t is length 16. I find myself confused now - why does PostgreSQL define UUID_LEN as 16? I will investigate if I have time tonight. There MUST be some mistake or misunderstanding. 128-bit numbers should be stored as 8 bytes, not 16. Cheers, mark -- Mark Mielke <mark@mielke.cc>
Mark Mielke <mark@mark.mielke.cc> writes: > Kless wrote: >> [1] http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/ > That's a general page about UUID vs serial integers. AFAICT the author of that page thinks that UUIDs are stored in ASCII form (32 hex digits), which would indeed be inefficient. I have no idea whether he knows what he's talking about with respect to mysql, but it's certainly 100% irrelevant to the Postgres datatype. regards, tom lane
Mark Mielke wrote: > I didn't notice that he put 16. Now I'm looking at uuid.c in > PostgreSQL 8.3.3 and I see that it does use 16, and the struct > pg_uuid_t is length 16. I find myself confused now - why does > PostgreSQL define UUID_LEN as 16? > > I will investigate if I have time tonight. There MUST be some mistake > or misunderstanding. 128-bit numbers should be stored as 8 bytes, not 16. Grrrr.... Kless you've confused me. 32-bit numbers = 4 bytes, 64-bit numbers = 8 bytes, 128-bit numbers = 16 bytes. You are out to lunch and you dragged me with you. Did we have beer at least? :-) Cheers, mark -- Mark Mielke <mark@mielke.cc>
On Thu, 2008-07-10 at 12:05 -0400, Mark Mielke wrote: > Mark Mielke wrote: > > I didn't notice that he put 16. Now I'm looking at uuid.c in > > PostgreSQL 8.3.3 and I see that it does use 16, and the struct > > pg_uuid_t is length 16. I find myself confused now - why does > > PostgreSQL define UUID_LEN as 16? > > > > I will investigate if I have time tonight. There MUST be some mistake > > or misunderstanding. 128-bit numbers should be stored as 8 bytes, not 16. > > Grrrr.... Kless you've confused me. 32-bit numbers = 4 bytes, 64-bit > numbers = 8 bytes, 128-bit numbers = 16 bytes. > > You are out to lunch and you dragged me with you. Did we have beer at > least? :-) Sounds like at least 4 and a couple of chasers. > > Cheers, > mark > > -- > Mark Mielke <mark@mielke.cc> > >
On Jul 10, 2008, at 09:13, Joshua D. Drake wrote: >> You are out to lunch and you dragged me with you. Did we have beer at >> least? :-) > > Sounds like at least 4 and a couple of chasers. Next time I'd like to be invited to the party, too! :-P David
> You are out to lunch and you dragged me with you. Did we have beer at > least? :-) A bit, and you had a byte of bread. -- Med venlig hilsen Kaare Rasmussen, Jasonic Jasonic Telefon: +45 3816 2582 Nordre Fasanvej 12 2000 Frederiksberg Email: kaare@jasonic.dk
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Mark Mielke <mark@mark.mielke.cc> writes: >> Kless wrote: >>> [1] http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/ > >> That's a general page about UUID vs serial integers. > > AFAICT the author of that page thinks that UUIDs are stored in ASCII > form (32 hex digits), which would indeed be inefficient. Well he does say "In fact if you store UUID in binary form you can bring it down to 16 bytes so size is not really the problem." Though I'm unclear why he thinks a 4x increase in space usage is "not really a problem". If you have a highly relational database you can easily have half or more your columns in large tables consisting of foreign keys. If your database is i/o bandwidth limited that would be a huge effect. > I have no idea whether he knows what he's talking about with respect to > mysql, but it's certainly 100% irrelevant to the Postgres datatype. The rest of it seems to be pretty mysql-specific. Some of the problems are universal such as making index inserts especially random and making clustering impossible, but how much they hurt on different databases is going to be very different. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support!
On Jul 10, 5:05 pm, m...@mark.mielke.cc (Mark Mielke) wrote: > Mark Mielke wrote: > > I didn't notice that he put 16. Now I'm looking at uuid.c in > > PostgreSQL 8.3.3 and I see that it does use 16, and the struct > > pg_uuid_t is length 16. I find myself confused now - why does > > PostgreSQL define UUID_LEN as 16? > > > I will investigate if I have time tonight. There MUST be some mistake > > or misunderstanding. 128-bit numbers should be stored as 8 bytes, not 16. > > Grrrr.... Kless you've confused me. 32-bit numbers = 4 bytes, 64-bit > numbers = 8 bytes, 128-bit numbers = 16 bytes. > > You are out to lunch and you dragged me with you. Did we have beer at > least? :-) > > Cheers, > mark > > -- xDxD I see that the PostgreSQL developers have sense of humor :) I like it. It has been a failure mine. I question about that in the IRC, anybody says me that structure but also say me of see here: pgsql/src/backend/utils/adt/uuid.c:45:uuid_out thing that I didn't make. But it's clear that this problem has been well resolved. Greetings!