Re: UUID/GUID discussion leading to request for hexstring bytea? - Mailing list pgsql-hackers

From Gevik Babakhani
Subject Re: UUID/GUID discussion leading to request for hexstring bytea?
Date
Msg-id 1157628441.614.8.camel@voyager.truesoftware.net
Whole thread Raw
In response to UUID/GUID discussion leading to request for hexstring bytea? (was: Re: TODO: GUID datatype)  (mark@mark.mielke.cc)
Responses Re: UUID/GUID discussion leading to request for hexstring bytea?  (Martijn van Oosterhout <kleptog@svana.org>)
Re: UUID/GUID discussion leading to request for hexstring bytea?  (Thomas Hallgren <thomas@tada.se>)
List pgsql-hackers
On Wed, 2006-09-06 at 17:05 -0400, mark@mark.mielke.cc wrote:

> The UUID type itself has value, however, the value it provides is
> limited. Generation of a UUID doesn't have to occur with the database.
> The application inserting the row can generate the UUID. The UUID type
> itself has limited value, in that the difference between a 36 bytes +
> 4 bytes = 40 bytes as a fully expanded char(40), or the packed value
> using the hexstring encode/decode SQL functions of a bytea type at 16
> + 4 bytes = 20 bytes compared to a tightly packed UUID type of 16
> bytes, are very close. The argument can easily be made that if space
> (disk space, index size, I/O bandwidth) is your first priority, than
> a UUID is the wrong type to use. A 64-bit integer may suffice.

Agreed.

> 
> I'm also having trouble with the idea that a UUID deserves special
> treatment. I currently have a desire to store both UUID and MD5
> checksum in my rows. They are both 128 bits = 16 bytes, and fit all
> of the same requirements above.

Agreed.

> This makes it seem as if a generic 128-bit data type would be
> desirable. They both have a compatible representation of a hexadecimal
> string. The extra '-' characters in the UUID can be easily added when
> necessary by a HEXSTRING2UUID() sort of function in plpgsql or in
> the claling application.

Agreed

> 
> But - MD5 isn't the only checksum that is frequently used. Some argue
> that the MD5 can be shown to be weak in some regards, and that perhaps
> other checksum algorithms such as SHA-1 provide a better guarantee of
> uniqueness. SHA-1 isn't 128-bits. It's 160-bits.

To my knowledge most apps use MD5 checksum because it is easier to
implement and use.  
> 
> This is where I start to buy Tom Lane's argument that the 4-byte
> prefix is no big deal. I find it more desirable to have a binary data
> type with a hexadecimal string input and output function. The
> flexibility of being able to use 128-bits or 160-bits is worth this
> 4-byte header to me. What I don't want to do is store double size
> fields, stored as hexadecimal.

Agreed.

I have experimented with GUID/UUID type for a while and here are my
findings when having the need to store GUID/UUID/MD5 in the database.

1. Almost always these values are created outside the database. So to my
opinion having all kinds of functions in the database to create such
values becomes second priority except when wanting to use GUIDs auto
generated for PK like the uuid datatype in MS SQL

2. Yet I haven't seen any kind of arithmetic operations (+ - * / %) on
stored GUIDs hence these operations would be meaningless. I also never
did see the need to change specific parts of a GUID 

To my opinion only some of relational/compare operations like == and !=
apply to such values. comparing guid >= guid or md5 < md5 is also
meaningless.

3. Almost always a GUID/MD5 is not changed when it is generated and
stored except for cases like resetting md5sum of a password that is
changed.

To my opinion GUIDs type need to provide the following in the database.

1. GUID type must accept the correct string format(s), with of without
extra '-'
2. GUID type must internally be stored as small as possible.
3. GUID type must be comparable with == , != , LIKE and (NOT) IS NULL
4. GUID type must have the ability to be indexed, grouped, ordered,
DISTINCT... but not MAX(), MIN() or SUM()....

I think option three would be a good one to experiment with. I am
thinking of a 16 length struct to hold the GUID value and corresponding
functionality to achieve the above however possible.

I also think we should have the GUID/UUID as a datatype and not just
functions handling hexstring.








pgsql-hackers by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: Fixed length data types issue
Next
From: Andrew - Supernews
Date:
Subject: Re: Fixed length data types issue