Thread: UUID - Data type inefficient

UUID - Data type inefficient

From
Kless
Date:
The new data type, UUID, is stored as a string -char(16)-:

------------
struct pg_uuid_t
{       unsigned char data[UUID_LEN];
};
#define UUID_LEN 16
------------

but this it's very inefficient as you can read here [1].

The ideal would be use bit(128), but today isn't possible. One
possible solution would be create a structure with 2 fields, each one
with bit(64).


[1] http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/


Re: UUID - Data type inefficient

From
Andrew Dunstan
Date:

Kless wrote:
> The new data type, UUID, is stored as a string -char(16)-:
>
> ------------
> struct pg_uuid_t
> {
>         unsigned char data[UUID_LEN];
> };
> #define UUID_LEN 16
> ------------
>   


No it is not. It is stored as 16 binary bytes. As text it won't fit into 
16 bytes.


> but this it's very inefficient as you can read here [1].
>   


What on earth makes you assume that MySQL performance characteristics 
apply to PostgreSQL? If you want to show that our implementation is 
inefficient, you need to produce PostgreSQL performance tests to 
demonstrate it.

cheers

andrew



Re: UUID - Data type inefficient

From
Mark Mielke
Date:
Kless wrote:
> The new data type, UUID, is stored as a string -char(16)-:
>
> ------------
> struct pg_uuid_t
> {
>         unsigned char data[UUID_LEN];
> };
> #define UUID_LEN 16
> ------------
>
> but this it's very inefficient as you can read here [1].
>
> The ideal would be use bit(128), but today isn't possible. One
> possible solution would be create a structure with 2 fields, each one
> with bit(64).
>
>
> [1] http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/
>
>   

That's a general page about UUID vs serial integers.

What is the complaint? Do you have evidence that it would be noticeably 
faster as two 64-bits? Note that a UUID is broken into several non-64 
bit elements, and managing it as bytes or 64-bit integers, or as a union 
with the bit-lengths specified, are probably all efficient or 
inefficient depending on the operation being performed. The hope should 
be that the optimizer will generate similar best code for each.

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>



Re: UUID - Data type inefficient

From
Mark Mielke
Date:
Mark Mielke wrote:
> Kless wrote:
>> The new data type, UUID, is stored as a string -char(16)-:
>> struct pg_uuid_t
>> {
>>         unsigned char data[UUID_LEN];
>> };
>> #define UUID_LEN 16
>>
>
> What is the complaint? Do you have evidence that it would be 
> noticeably faster as two 64-bits? Note that a UUID is broken into 
> several non-64 bit elements, and managing it as bytes or 64-bit 
> integers, or as a union with the bit-lengths specified, are probably 
> all efficient or inefficient depending on the operation being 
> performed. The hope should be that the optimizer will generate similar 
> best code for each.

I didn't notice that he put 16. Now I'm looking at uuid.c in PostgreSQL 
8.3.3 and I see that it does use 16, and the struct pg_uuid_t is length 
16. I find myself confused now - why does PostgreSQL define UUID_LEN as 16?

I will investigate if I have time tonight. There MUST be some mistake or 
misunderstanding. 128-bit numbers should be stored as 8 bytes, not 16.

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>



Re: UUID - Data type inefficient

From
Tom Lane
Date:
Mark Mielke <mark@mark.mielke.cc> writes:
> Kless wrote:
>> [1] http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/

> That's a general page about UUID vs serial integers.

AFAICT the author of that page thinks that UUIDs are stored in ASCII
form (32 hex digits), which would indeed be inefficient.  I have no
idea whether he knows what he's talking about with respect to mysql,
but it's certainly 100% irrelevant to the Postgres datatype.
        regards, tom lane


Re: UUID - Data type inefficient

From
Mark Mielke
Date:
Mark Mielke wrote:
> I didn't notice that he put 16. Now I'm looking at uuid.c in 
> PostgreSQL 8.3.3 and I see that it does use 16, and the struct 
> pg_uuid_t is length 16. I find myself confused now - why does 
> PostgreSQL define UUID_LEN as 16?
>
> I will investigate if I have time tonight. There MUST be some mistake 
> or misunderstanding. 128-bit numbers should be stored as 8 bytes, not 16.

Grrrr.... Kless you've confused me.  32-bit numbers = 4 bytes, 64-bit 
numbers = 8 bytes, 128-bit numbers = 16 bytes.

You are out to lunch and you dragged me with you. Did we have beer at 
least? :-)

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>



Re: UUID - Data type inefficient

From
"Joshua D. Drake"
Date:

On Thu, 2008-07-10 at 12:05 -0400, Mark Mielke wrote:
> Mark Mielke wrote:
> > I didn't notice that he put 16. Now I'm looking at uuid.c in 
> > PostgreSQL 8.3.3 and I see that it does use 16, and the struct 
> > pg_uuid_t is length 16. I find myself confused now - why does 
> > PostgreSQL define UUID_LEN as 16?
> >
> > I will investigate if I have time tonight. There MUST be some mistake 
> > or misunderstanding. 128-bit numbers should be stored as 8 bytes, not 16.
> 
> Grrrr.... Kless you've confused me.  32-bit numbers = 4 bytes, 64-bit 
> numbers = 8 bytes, 128-bit numbers = 16 bytes.
> 
> You are out to lunch and you dragged me with you. Did we have beer at 
> least? :-)

Sounds like at least 4 and a couple of chasers.

> 
> Cheers,
> mark
> 
> -- 
> Mark Mielke <mark@mielke.cc>
> 
> 



Re: UUID - Data type inefficient

From
"David E. Wheeler"
Date:
On Jul 10, 2008, at 09:13, Joshua D. Drake wrote:

>> You are out to lunch and you dragged me with you. Did we have beer at
>> least? :-)
>
> Sounds like at least 4 and a couple of chasers.

Next time I'd like to be invited to the party, too! :-P

David


Re: UUID - Data type inefficient

From
Kaare Rasmussen
Date:
> You are out to lunch and you dragged me with you. Did we have beer at
> least? :-)

A bit, and you had a byte of bread.

-- 

Med venlig hilsen
Kaare Rasmussen, Jasonic

Jasonic                 Telefon: +45 3816 2582
Nordre Fasanvej 12
2000 Frederiksberg      Email: kaare@jasonic.dk


Re: UUID - Data type inefficient

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Mark Mielke <mark@mark.mielke.cc> writes:
>> Kless wrote:
>>> [1] http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/
>
>> That's a general page about UUID vs serial integers.
>
> AFAICT the author of that page thinks that UUIDs are stored in ASCII
> form (32 hex digits), which would indeed be inefficient.  

Well he does say "In fact if you store UUID in binary form you can bring it
down to 16 bytes so size is not really the problem." Though I'm unclear why he
thinks a 4x increase in space usage is "not really a problem". 

If you have a highly relational database you can easily have half or more your
columns in large tables consisting of foreign keys. If your database is i/o
bandwidth limited that would be a huge effect.

> I have no idea whether he knows what he's talking about with respect to
> mysql, but it's certainly 100% irrelevant to the Postgres datatype.

The rest of it seems to be pretty mysql-specific. Some of the problems are
universal such as making index inserts especially random and making clustering
impossible, but how much they hurt on different databases is going to be very
different.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication
support!


Re: UUID - Data type inefficient

From
Kless
Date:
On Jul 10, 5:05 pm, m...@mark.mielke.cc (Mark Mielke) wrote:
> Mark Mielke wrote:
> > I didn't notice that he put 16. Now I'm looking at uuid.c in
> > PostgreSQL 8.3.3 and I see that it does use 16, and the struct
> > pg_uuid_t is length 16. I find myself confused now - why does
> > PostgreSQL define UUID_LEN as 16?
>
> > I will investigate if I have time tonight. There MUST be some mistake
> > or misunderstanding. 128-bit numbers should be stored as 8 bytes, not 16.
>
> Grrrr.... Kless you've confused me.  32-bit numbers = 4 bytes, 64-bit
> numbers = 8 bytes, 128-bit numbers = 16 bytes.
>
> You are out to lunch and you dragged me with you. Did we have beer at
> least? :-)
>
> Cheers,
> mark
>
> --
xDxD I see that the PostgreSQL developers have sense of humor :) I
like it.

It has been a failure mine. I question about that in the IRC, anybody
says me that structure but also say me of see here:

pgsql/src/backend/utils/adt/uuid.c:45:uuid_out

thing that I didn't make.

But it's clear that this problem has been well resolved.


Greetings!