Thread: New data type: uniqueidentifier

New data type: uniqueidentifier

From
"Dmitry G. Mastrukov"
Date:
Hi, All!

I've developed new data type for PostgreSQL - uniqueidentifier - 128-bit
value claims to be unique across Universe. It depends on libuuid from
e2fsprogs by Theodore Ts'o. Now I use it in my project. Everybody can grab
it from   http://taurussoft.chat.ru/uniqueidentifier-0.1.9.tar.gz

Before announce this new type through pgsql-announce I want to clear for
myself some things.
I've marked "=" operator with HASH clause (and planner has started to use
hash jons). But as I understand the right way is to create special hash
function (may be wrapper for hash_any(), isn't it?) and register it for hash
as for btree method.
So is it desirable to mark "=" as HASH for this type (seems internal 16 byte
representation will be hash well) and if yes how can I create hash sort
method for uniqueidentifier?

regards,
Dmitry

PS. If you decide to install uniqueidentifier look at the date of
uuid/uuid.h somewhere in INCLUDE path. Sometimes it's necessary to manualy
enter "make install" in lib/uuid directory of e2fsprogs.



Re: New data type: uniqueidentifier

From
Alex Pilosov
Date:
On Tue, 26 Jun 2001, Dmitry G. Mastrukov wrote:

> myself some things.
> I've marked "=" operator with HASH clause (and planner has started to use
> hash jons). But as I understand the right way is to create special hash
> function (may be wrapper for hash_any(), isn't it?) and register it for hash
> as for btree method.

No. Currently, there's no way to specify a hash function for a given
operator, it always uses a builtin function that operates on memory
representation of a value.

There's no need (or possibility) to register a hash with btree method.

> So is it desirable to mark "=" as HASH for this type (seems internal 16 byte
> representation will be hash well) and if yes how can I create hash sort
> method for uniqueidentifier?
You can mark it hashable, since two identical uuid values would have
identical memory representation and thus the same hash value. 

I'd look at your code, but that is URL too slow, in 5 minutes downloaded
1000 bytes...

> regards,
> Dmitry
> 
> PS. If you decide to install uniqueidentifier look at the date of
> uuid/uuid.h somewhere in INCLUDE path. Sometimes it's necessary to manualy
> enter "make install" in lib/uuid directory of e2fsprogs.



Re: New data type: uniqueidentifier

From
"Dmitry G. Mastrukov"
Date:
Alex Pilosov <alex@pilosoft.com> wrote:
> 
> I'd look at your code, but that is URL too slow, in 5 minutes downloaded
> 1000 bytes...
> 
It's possible now to grab from another location

http://fitmark.net/taurussoft/uniqueidentifier-0.1.9.tar.gz

regards,
Dmitry



Re: New data type: uniqueidentifier

From
Peter Eisentraut
Date:
Dmitry G. Mastrukov writes:

> I've developed new data type for PostgreSQL - uniqueidentifier - 128-bit
> value claims to be unique across Universe. It depends on libuuid from
> e2fsprogs by Theodore Ts'o.

ISTM that this should be a function, not a data type.

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



Re: New data type: uniqueidentifier

From
"Dmitry G. Mastrukov"
Date:
Alex Pilosov <alex@pilosoft.com> wrote:
> On Tue, 26 Jun 2001, Dmitry G. Mastrukov wrote:
>
> > myself some things.
> > I've marked "=" operator with HASH clause (and planner has started to
use
> > hash jons). But as I understand the right way is to create special hash
> > function (may be wrapper for hash_any(), isn't it?) and register it for
hash
> > as for btree method.
>
> No. Currently, there's no way to specify a hash function for a given
> operator, it always uses a builtin function that operates on memory
> representation of a value.
>
> There's no need (or possibility) to register a hash with btree method.
>
Strange. When I execute following query (slightly modified query from User's
Guide chapter 7.6)

SELECT am.amname AS acc_name,      opc.opcname AS ops_name,      opr.oprname AS ops_comp   FROM pg_am am, pg_amop amop,
      pg_opclass opc, pg_operator opr   WHERE amop.amopid = am.oid AND         amop.amopclaid = opc.oid AND
amop.amopopr= opr.oid   ORDER BY ops_name, ops_comp;
 

I see both hash and btree amname for builtin opclasses. For example
acc_name | ops_name | ops_comp
----------+----------+----------btree    | int4_ops | <btree    | int4_ops | <=btree    | int4_ops | =hash     |
int4_ops| =btree    | int4_ops | >btree    | int4_ops | >=
 

But new type has no hash for "=". Plus I saw hash functions for builtin
types in source code. So can I achieve for created type such intergration
with Postgres as for builtin types? Or am I understanding something wrong?

regards,
Dmitry



Re: New data type: uniqueidentifier

From
Tom Lane
Date:
"Dmitry G. Mastrukov" <dmitry@taurussoft.org> writes:
> Alex Pilosov <alex@pilosoft.com> wrote:
>> On Tue, 26 Jun 2001, Dmitry G. Mastrukov wrote:
> I've marked "=" operator with HASH clause (and planner has started to
> use
> hash jons). But as I understand the right way is to create special hash
> function (may be wrapper for hash_any(), isn't it?) and register it for
> hash
> as for btree method.
>> 
>> No. Currently, there's no way to specify a hash function for a given
>> operator, it always uses a builtin function that operates on memory
>> representation of a value.

> Strange. When I execute following query (slightly modified query from User's
> Guide chapter 7.6)

You're looking at support for hash indexes, which have nothing to do
with hash joins.

*Why* they have nothing to do with hash joins, I dunno.  You'd think
that using the same hash functions for both would be a good idea.
But that's not how it's set up at the moment.
        regards, tom lane


Re: New data type: uniqueidentifier

From
"Dmitry G. Mastrukov"
Date:
 Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Dmitry G. Mastrukov" <dmitry@taurussoft.org> writes:
> > Alex Pilosov <alex@pilosoft.com> wrote:
> >> On Tue, 26 Jun 2001, Dmitry G. Mastrukov wrote:
> > I've marked "=" operator with HASH clause (and planner has started to
> > use
> > hash jons). But as I understand the right way is to create special hash
> > function (may be wrapper for hash_any(), isn't it?) and register it for
> > hash
> > as for btree method.
> >>
> >> No. Currently, there's no way to specify a hash function for a given
> >> operator, it always uses a builtin function that operates on memory
> >> representation of a value.
>
> > Strange. When I execute following query (slightly modified query from
User's
> > Guide chapter 7.6)
>
> You're looking at support for hash indexes, which have nothing to do
> with hash joins.
>
> *Why* they have nothing to do with hash joins, I dunno.  You'd think
> that using the same hash functions for both would be a good idea.
> But that's not how it's set up at the moment.
>
OK. It's clear for me now. Thanks.
But should I create support for hash indexes? Since builtin types have such
support I want it too for uniqueidentifier :) How can I make it?

regards,
Dmitry



Re: New data type: uniqueidentifier

From
Thomas Swan
Date:
Peter Eisentraut wrote:<br /><blockquote cite="mid:Pine.LNX.4.30.0106271653340.729-100000@peter.localdomain"
type="cite"><prewrap="">Dmitry G. Mastrukov writes:<br /><br /></pre><blockquote type="cite"><pre wrap="">I've
developednew data type for PostgreSQL - unique identifier - 128-bit<br />value claims to be unique across Universe. It
dependson libuuid from<br />e2fsprogs by Theodore Ts'o.<br /></pre></blockquote><pre wrap=""><br />ISTM that this
shouldbe a function, not a data type.<br /><br /></pre></blockquote> I'd second the function idea: function uuid( )
returnsan int8 value; don't create a bazillion datatypes.  Besides, 128 bit numbers are 7 byte integers.   PostgreSQL
hasan int8 (8 byte integer) datatype.  While I like the UUID function idea, I'd recommend a better solution to creating
an"unique" identifier.  Why not create a serial8 datatype: int8 with an int8 sequence = 256bit "unique" number.  {Yes,
Iknow I'm violating my first sentence.}  Then, you'd have the same thing (or better) AND your not relying on
randomness. <br /><br /> 

Re: Re: New data type: uniqueidentifier

From
Alex Pilosov
Date:
On Mon, 2 Jul 2001, Thomas Swan wrote:

> Peter Eisentraut wrote:
> 
> >Dmitry G. Mastrukov writes:
> >
> >>I've developed new data type for PostgreSQL - unique identifier - 128-bit
> >>value claims to be unique across Universe. It depends on libuuid from
> >>e2fsprogs by Theodore Ts'o.
> >>
> >
> >ISTM that this should be a function, not a data type.
> >
> I'd second the function idea: function uuid( ) returns an int8 value; 
> don't create a bazillion datatypes.  Besides, 128 bit numbers are 7 byte 
> integers.   PostgreSQL has an int8 (8 byte integer) datatype.  While I 
> like the UUID function idea, I'd recommend a better solution to creating 
> an "unique" identifier.  Why not create a serial8 datatype: int8 with an 
> int8 sequence = 256bit "unique" number.  {Yes, I know I'm violating my 
> first sentence.}  Then, you'd have the same thing (or better) AND your 
> not relying on randomness.  

I don't think you know what UUID is. It is NOT just a unique randon
number. There are specific rules for construction of such number, specific
rules for comparison of numbers (no, its not bit-by-bit), thus a datatype
is most appropriate answer. 

-alex



Re: Re: New data type: uniqueidentifier

From
Alex Pilosov
Date:
On Mon, 2 Jul 2001, Peter Eisentraut wrote:

> Alex Pilosov writes:
> 
> > I don't think you know what UUID is. It is NOT just a unique randon
> > number. There are specific rules for construction of such number, specific
> > rules for comparison of numbers (no, its not bit-by-bit), thus a datatype
> > is most appropriate answer.
> 
> A data type may be appropriate for storing these values, but not for
> generating them.  Functions generate stuff, data types store stuff.

Sorry, apparently we misunderstood each other but are really in full
agreement.

Dmitry's stuff contains both datatype (uniqueidentifier), a function to
generate a new object of that type (newid), and a set of functions to
implement comparison operators for that type.

I don't see anything wrong with that setup, but maybe I'm still missing
something?

-alex



RE: Re: New data type: uniqueidentifier

From
"Christopher Kings-Lynne"
Date:
> don't create a bazillion datatypes.  Besides, 128 bit numbers are 7
> byte integers.

Hang on:  128 div 8 = 16 byte integer

> PostgreSQL has an int8 (8 byte integer) datatype.

And therefore it is a _64_ bit integer and you can't have a 256bit unique
number in it...

> While I like the UUID function idea, I'd recommend a better solution to
> creating an "unique" identifier.  Why not create a serial8 datatype:
> int8 with an int8 sequence = 256bit "unique" number.  {Yes, I know
> violating my first sentence.}  Then, you'd have the same thing (or
> better) AND your not relying on randomness.

Chris



Re: Re: New data type: uniqueidentifier

From
Thomas Swan
Date:
I sit corrected.  <br /><br /> *slightly humbled*<br /><br /> Why not do an unsigned int16 to hold your UUID generated
numbers.  Ultimately, this would seem to be a more general solution and accomplish your goals at the sametime.  Or, am
Icompletely missing something.<br /><br /> Christopher Kings-Lynne wrote:<br /><blockquote
cite="mid:ECEHIKNFIMMECLEBJFIGKECECBAA.chriskl@familyhealth.com.au"type="cite"><blockquote type="cite"><pre
wrap="">don'tcreate a bazillion datatypes.  Besides, 128 bit numbers are 7<br />byte integers.<br
/></pre></blockquote><prewrap=""><br />Hang on:  128 div 8 = 16 byte integer<br /><br /></pre><blockquote
type="cite"><prewrap="">PostgreSQL has an int8 (8 byte integer) datatype.<br /></pre></blockquote><pre wrap=""><br
/>Andtherefore it is a _64_ bit integer and you can't have a 256bit unique<br />number in it...<br /><br
/></pre><blockquotetype="cite"><pre wrap="">While I like the UUID function idea, I'd recommend a better solution to<br
/>creatingan "unique" identifier.  Why not create a serial8 datatype:<br />int8 with an int8 sequence = 256bit "unique"
number. {Yes, I know<br />violating my first sentence.}  Then, you'd have the same thing (or<br />better) AND your not
relyingon randomness.<br /></pre></blockquote><pre wrap=""><br />Chris<br /></pre></blockquote><br /><br /> 

Re: Re: New data type: uniqueidentifier

From
Alex Pilosov
Date:
On Mon, 2 Jul 2001, Thomas Swan wrote:

> I sit corrected.  
> 
> *slightly humbled*
> 
> Why not do an unsigned int16 to hold your UUID generated numbers.   
Not a good idea, since rules for comparison of UUID are wierd and are
_definitely_ not same as for comparison of int16.

> Ultimately, this would seem to be a more general solution and accomplish 
> your goals at the sametime.  Or, am I completely missing something.



Re: New data type: uniqueidentifier

From
Thomas Swan
Date:
Where can I find some more information on it?  I'm curious now.<br /><br /> Alex Pilosov wrote:<br /><blockquote
cite="mid:Pine.BSO.4.10.10107022230120.7004-100000@spider.pilosoft.com"type="cite"><pre wrap="">On Mon, 2 Jul 2001,
ThomasSwan wrote:<br /><br /></pre><blockquote type="cite"><pre wrap="">I sit corrected.  <br /><br />*slightly
humbled*<br/><br />Why not do an unsigned int16 to hold your UUID generated numbers.   <br /></pre></blockquote><pre
wrap="">Nota good idea, since rules for comparison of UUID are wierd and are<br />_definitely_ not same as for
comparisonof int16.<br /><br /></pre><blockquote type="cite"><pre wrap="">Ultimately, this would seem to be a more
generalsolution and accomplish <br />your goals at the sametime.  Or, am I completely missing something.<br
/></pre></blockquote><prewrap=""><br /><br />---------------------------(end of
broadcast)---------------------------<br/>TIP 2: you can get off all lists at once with the unregister command<br />
(send"unregister YourEmailAddressHere" to <a class="moz-txt-link-abbreviated"
href="mailto:majordomo@postgresql.org">majordomo@postgresql.org</a>)<br/></pre></blockquote><br /><br /> 

Re: New data type: uniqueidentifier

From
Alex Pilosov
Date:
If you mean the [proposed?] standard itself, here is a good description of
it:
http://www.ics.uci.edu/pub/ietf/webdav/uuid-guid/draft-leach-uuids-guids-01.txt

It was a proposed IETF standard, however, IETF standardization failed
because ISO already ratified it as a DCE/RPC standard ISO 11578, however,
the above URL provides far better description of UUIDs than ISO standard
itself
On Tue, 3 Jul 2001, Thomas Swan wrote:

> Where can I find some more information on it?  I'm curious now.



Re: Re: New data type: uniqueidentifier

From
Peter Eisentraut
Date:
Alex Pilosov writes:

> Dmitry's stuff contains both datatype (uniqueidentifier), a function to
> generate a new object of that type (newid), and a set of functions to
> implement comparison operators for that type.
>
> I don't see anything wrong with that setup, but maybe I'm still missing
> something?

It would be much simpler if you stored the unique id in varchar or text.

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



Re: Re: New data type: uniqueidentifier

From
Alex Pilosov
Date:
On Tue, 3 Jul 2001, Peter Eisentraut wrote:


>> Dmitry's stuff contains both datatype (uniqueidentifier), a function to
>> generate a new object of that type (newid), and a set of functions to
>> implement comparison operators for that type.

> It would be much simpler if you stored the unique id in varchar or text.
Peter,

UUIDs have specific rules for comparison of them. Its so much easier to
compare them via a<b than uuid_lt(a,b). If one wanted to make a meaningful
index on uuid value, normal ordering of varchar would not suffice...

-alex




Re: Re: New data type: uniqueidentifier

From
"Dmitry G. Mastrukov"
Date:
Peter Eisentraut wrote:

>Alex Pilosov writes:
>
>>Dmitry's stuff contains both datatype (uniqueidentifier), a function to
>>generate a new object of that type (newid), and a set of functions to
>>implement comparison operators for that type.
>>
>>I don't see anything wrong with that setup, but maybe I'm still missing
>>something?
>>
>
>It would be much simpler if you stored the unique id in varchar or text.
>
Are you sure varchar comparision will be quickly than current 
implementation? Next, varchar will need 36 byte, uniqueidentifier takes 
16. Next, indexing - IMHO current stuff more suitable for indexes. Some 
time ago I saw some stuff which deals with uniqueidentifiers for 
Interbase. It uses your scheme with chars. But it strip "-" from string 
and reverts it to efficiently use indexes (uid sometimes uses 
MAC-address as part of itself, so MAC should go first in string). Weird 
scheme for me!

regards,
Dmitry




Re: Re: New data type: uniqueidentifier

From
Peter Eisentraut
Date:
Alex Pilosov writes:

> I don't think you know what UUID is. It is NOT just a unique randon
> number. There are specific rules for construction of such number, specific
> rules for comparison of numbers (no, its not bit-by-bit), thus a datatype
> is most appropriate answer.

A data type may be appropriate for storing these values, but not for
generating them.  Functions generate stuff, data types store stuff.

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter