Thread: New data type: uniqueidentifier
Hi, All! I've developed new data type for PostgreSQL - uniqueidentifier - 128-bit value claims to be unique across Universe. It depends on libuuid from e2fsprogs by Theodore Ts'o. Now I use it in my project. Everybody can grab it from http://taurussoft.chat.ru/uniqueidentifier-0.1.9.tar.gz Before announce this new type through pgsql-announce I want to clear for myself some things. I've marked "=" operator with HASH clause (and planner has started to use hash jons). But as I understand the right way is to create special hash function (may be wrapper for hash_any(), isn't it?) and register it for hash as for btree method. So is it desirable to mark "=" as HASH for this type (seems internal 16 byte representation will be hash well) and if yes how can I create hash sort method for uniqueidentifier? regards, Dmitry PS. If you decide to install uniqueidentifier look at the date of uuid/uuid.h somewhere in INCLUDE path. Sometimes it's necessary to manualy enter "make install" in lib/uuid directory of e2fsprogs.
On Tue, 26 Jun 2001, Dmitry G. Mastrukov wrote: > myself some things. > I've marked "=" operator with HASH clause (and planner has started to use > hash jons). But as I understand the right way is to create special hash > function (may be wrapper for hash_any(), isn't it?) and register it for hash > as for btree method. No. Currently, there's no way to specify a hash function for a given operator, it always uses a builtin function that operates on memory representation of a value. There's no need (or possibility) to register a hash with btree method. > So is it desirable to mark "=" as HASH for this type (seems internal 16 byte > representation will be hash well) and if yes how can I create hash sort > method for uniqueidentifier? You can mark it hashable, since two identical uuid values would have identical memory representation and thus the same hash value. I'd look at your code, but that is URL too slow, in 5 minutes downloaded 1000 bytes... > regards, > Dmitry > > PS. If you decide to install uniqueidentifier look at the date of > uuid/uuid.h somewhere in INCLUDE path. Sometimes it's necessary to manualy > enter "make install" in lib/uuid directory of e2fsprogs.
Alex Pilosov <alex@pilosoft.com> wrote: > > I'd look at your code, but that is URL too slow, in 5 minutes downloaded > 1000 bytes... > It's possible now to grab from another location http://fitmark.net/taurussoft/uniqueidentifier-0.1.9.tar.gz regards, Dmitry
Dmitry G. Mastrukov writes: > I've developed new data type for PostgreSQL - uniqueidentifier - 128-bit > value claims to be unique across Universe. It depends on libuuid from > e2fsprogs by Theodore Ts'o. ISTM that this should be a function, not a data type. -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
Alex Pilosov <alex@pilosoft.com> wrote: > On Tue, 26 Jun 2001, Dmitry G. Mastrukov wrote: > > > myself some things. > > I've marked "=" operator with HASH clause (and planner has started to use > > hash jons). But as I understand the right way is to create special hash > > function (may be wrapper for hash_any(), isn't it?) and register it for hash > > as for btree method. > > No. Currently, there's no way to specify a hash function for a given > operator, it always uses a builtin function that operates on memory > representation of a value. > > There's no need (or possibility) to register a hash with btree method. > Strange. When I execute following query (slightly modified query from User's Guide chapter 7.6) SELECT am.amname AS acc_name, opc.opcname AS ops_name, opr.oprname AS ops_comp FROM pg_am am, pg_amop amop, pg_opclass opc, pg_operator opr WHERE amop.amopid = am.oid AND amop.amopclaid = opc.oid AND amop.amopopr= opr.oid ORDER BY ops_name, ops_comp; I see both hash and btree amname for builtin opclasses. For example acc_name | ops_name | ops_comp ----------+----------+----------btree | int4_ops | <btree | int4_ops | <=btree | int4_ops | =hash | int4_ops| =btree | int4_ops | >btree | int4_ops | >= But new type has no hash for "=". Plus I saw hash functions for builtin types in source code. So can I achieve for created type such intergration with Postgres as for builtin types? Or am I understanding something wrong? regards, Dmitry
"Dmitry G. Mastrukov" <dmitry@taurussoft.org> writes: > Alex Pilosov <alex@pilosoft.com> wrote: >> On Tue, 26 Jun 2001, Dmitry G. Mastrukov wrote: > I've marked "=" operator with HASH clause (and planner has started to > use > hash jons). But as I understand the right way is to create special hash > function (may be wrapper for hash_any(), isn't it?) and register it for > hash > as for btree method. >> >> No. Currently, there's no way to specify a hash function for a given >> operator, it always uses a builtin function that operates on memory >> representation of a value. > Strange. When I execute following query (slightly modified query from User's > Guide chapter 7.6) You're looking at support for hash indexes, which have nothing to do with hash joins. *Why* they have nothing to do with hash joins, I dunno. You'd think that using the same hash functions for both would be a good idea. But that's not how it's set up at the moment. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Dmitry G. Mastrukov" <dmitry@taurussoft.org> writes: > > Alex Pilosov <alex@pilosoft.com> wrote: > >> On Tue, 26 Jun 2001, Dmitry G. Mastrukov wrote: > > I've marked "=" operator with HASH clause (and planner has started to > > use > > hash jons). But as I understand the right way is to create special hash > > function (may be wrapper for hash_any(), isn't it?) and register it for > > hash > > as for btree method. > >> > >> No. Currently, there's no way to specify a hash function for a given > >> operator, it always uses a builtin function that operates on memory > >> representation of a value. > > > Strange. When I execute following query (slightly modified query from User's > > Guide chapter 7.6) > > You're looking at support for hash indexes, which have nothing to do > with hash joins. > > *Why* they have nothing to do with hash joins, I dunno. You'd think > that using the same hash functions for both would be a good idea. > But that's not how it's set up at the moment. > OK. It's clear for me now. Thanks. But should I create support for hash indexes? Since builtin types have such support I want it too for uniqueidentifier :) How can I make it? regards, Dmitry
Peter Eisentraut wrote:<br /><blockquote cite="mid:Pine.LNX.4.30.0106271653340.729-100000@peter.localdomain" type="cite"><prewrap="">Dmitry G. Mastrukov writes:<br /><br /></pre><blockquote type="cite"><pre wrap="">I've developednew data type for PostgreSQL - unique identifier - 128-bit<br />value claims to be unique across Universe. It dependson libuuid from<br />e2fsprogs by Theodore Ts'o.<br /></pre></blockquote><pre wrap=""><br />ISTM that this shouldbe a function, not a data type.<br /><br /></pre></blockquote> I'd second the function idea: function uuid( ) returnsan int8 value; don't create a bazillion datatypes. Besides, 128 bit numbers are 7 byte integers. PostgreSQL hasan int8 (8 byte integer) datatype. While I like the UUID function idea, I'd recommend a better solution to creating an"unique" identifier. Why not create a serial8 datatype: int8 with an int8 sequence = 256bit "unique" number. {Yes, Iknow I'm violating my first sentence.} Then, you'd have the same thing (or better) AND your not relying on randomness. <br /><br />
On Mon, 2 Jul 2001, Thomas Swan wrote: > Peter Eisentraut wrote: > > >Dmitry G. Mastrukov writes: > > > >>I've developed new data type for PostgreSQL - unique identifier - 128-bit > >>value claims to be unique across Universe. It depends on libuuid from > >>e2fsprogs by Theodore Ts'o. > >> > > > >ISTM that this should be a function, not a data type. > > > I'd second the function idea: function uuid( ) returns an int8 value; > don't create a bazillion datatypes. Besides, 128 bit numbers are 7 byte > integers. PostgreSQL has an int8 (8 byte integer) datatype. While I > like the UUID function idea, I'd recommend a better solution to creating > an "unique" identifier. Why not create a serial8 datatype: int8 with an > int8 sequence = 256bit "unique" number. {Yes, I know I'm violating my > first sentence.} Then, you'd have the same thing (or better) AND your > not relying on randomness. I don't think you know what UUID is. It is NOT just a unique randon number. There are specific rules for construction of such number, specific rules for comparison of numbers (no, its not bit-by-bit), thus a datatype is most appropriate answer. -alex
On Mon, 2 Jul 2001, Peter Eisentraut wrote: > Alex Pilosov writes: > > > I don't think you know what UUID is. It is NOT just a unique randon > > number. There are specific rules for construction of such number, specific > > rules for comparison of numbers (no, its not bit-by-bit), thus a datatype > > is most appropriate answer. > > A data type may be appropriate for storing these values, but not for > generating them. Functions generate stuff, data types store stuff. Sorry, apparently we misunderstood each other but are really in full agreement. Dmitry's stuff contains both datatype (uniqueidentifier), a function to generate a new object of that type (newid), and a set of functions to implement comparison operators for that type. I don't see anything wrong with that setup, but maybe I'm still missing something? -alex
> don't create a bazillion datatypes. Besides, 128 bit numbers are 7 > byte integers. Hang on: 128 div 8 = 16 byte integer > PostgreSQL has an int8 (8 byte integer) datatype. And therefore it is a _64_ bit integer and you can't have a 256bit unique number in it... > While I like the UUID function idea, I'd recommend a better solution to > creating an "unique" identifier. Why not create a serial8 datatype: > int8 with an int8 sequence = 256bit "unique" number. {Yes, I know > violating my first sentence.} Then, you'd have the same thing (or > better) AND your not relying on randomness. Chris
I sit corrected. <br /><br /> *slightly humbled*<br /><br /> Why not do an unsigned int16 to hold your UUID generated numbers. Ultimately, this would seem to be a more general solution and accomplish your goals at the sametime. Or, am Icompletely missing something.<br /><br /> Christopher Kings-Lynne wrote:<br /><blockquote cite="mid:ECEHIKNFIMMECLEBJFIGKECECBAA.chriskl@familyhealth.com.au"type="cite"><blockquote type="cite"><pre wrap="">don'tcreate a bazillion datatypes. Besides, 128 bit numbers are 7<br />byte integers.<br /></pre></blockquote><prewrap=""><br />Hang on: 128 div 8 = 16 byte integer<br /><br /></pre><blockquote type="cite"><prewrap="">PostgreSQL has an int8 (8 byte integer) datatype.<br /></pre></blockquote><pre wrap=""><br />Andtherefore it is a _64_ bit integer and you can't have a 256bit unique<br />number in it...<br /><br /></pre><blockquotetype="cite"><pre wrap="">While I like the UUID function idea, I'd recommend a better solution to<br />creatingan "unique" identifier. Why not create a serial8 datatype:<br />int8 with an int8 sequence = 256bit "unique" number. {Yes, I know<br />violating my first sentence.} Then, you'd have the same thing (or<br />better) AND your not relyingon randomness.<br /></pre></blockquote><pre wrap=""><br />Chris<br /></pre></blockquote><br /><br />
On Mon, 2 Jul 2001, Thomas Swan wrote: > I sit corrected. > > *slightly humbled* > > Why not do an unsigned int16 to hold your UUID generated numbers. Not a good idea, since rules for comparison of UUID are wierd and are _definitely_ not same as for comparison of int16. > Ultimately, this would seem to be a more general solution and accomplish > your goals at the sametime. Or, am I completely missing something.
Where can I find some more information on it? I'm curious now.<br /><br /> Alex Pilosov wrote:<br /><blockquote cite="mid:Pine.BSO.4.10.10107022230120.7004-100000@spider.pilosoft.com"type="cite"><pre wrap="">On Mon, 2 Jul 2001, ThomasSwan wrote:<br /><br /></pre><blockquote type="cite"><pre wrap="">I sit corrected. <br /><br />*slightly humbled*<br/><br />Why not do an unsigned int16 to hold your UUID generated numbers. <br /></pre></blockquote><pre wrap="">Nota good idea, since rules for comparison of UUID are wierd and are<br />_definitely_ not same as for comparisonof int16.<br /><br /></pre><blockquote type="cite"><pre wrap="">Ultimately, this would seem to be a more generalsolution and accomplish <br />your goals at the sametime. Or, am I completely missing something.<br /></pre></blockquote><prewrap=""><br /><br />---------------------------(end of broadcast)---------------------------<br/>TIP 2: you can get off all lists at once with the unregister command<br /> (send"unregister YourEmailAddressHere" to <a class="moz-txt-link-abbreviated" href="mailto:majordomo@postgresql.org">majordomo@postgresql.org</a>)<br/></pre></blockquote><br /><br />
If you mean the [proposed?] standard itself, here is a good description of it: http://www.ics.uci.edu/pub/ietf/webdav/uuid-guid/draft-leach-uuids-guids-01.txt It was a proposed IETF standard, however, IETF standardization failed because ISO already ratified it as a DCE/RPC standard ISO 11578, however, the above URL provides far better description of UUIDs than ISO standard itself On Tue, 3 Jul 2001, Thomas Swan wrote: > Where can I find some more information on it? I'm curious now.
Alex Pilosov writes: > Dmitry's stuff contains both datatype (uniqueidentifier), a function to > generate a new object of that type (newid), and a set of functions to > implement comparison operators for that type. > > I don't see anything wrong with that setup, but maybe I'm still missing > something? It would be much simpler if you stored the unique id in varchar or text. -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
On Tue, 3 Jul 2001, Peter Eisentraut wrote: >> Dmitry's stuff contains both datatype (uniqueidentifier), a function to >> generate a new object of that type (newid), and a set of functions to >> implement comparison operators for that type. > It would be much simpler if you stored the unique id in varchar or text. Peter, UUIDs have specific rules for comparison of them. Its so much easier to compare them via a<b than uuid_lt(a,b). If one wanted to make a meaningful index on uuid value, normal ordering of varchar would not suffice... -alex
Peter Eisentraut wrote: >Alex Pilosov writes: > >>Dmitry's stuff contains both datatype (uniqueidentifier), a function to >>generate a new object of that type (newid), and a set of functions to >>implement comparison operators for that type. >> >>I don't see anything wrong with that setup, but maybe I'm still missing >>something? >> > >It would be much simpler if you stored the unique id in varchar or text. > Are you sure varchar comparision will be quickly than current implementation? Next, varchar will need 36 byte, uniqueidentifier takes 16. Next, indexing - IMHO current stuff more suitable for indexes. Some time ago I saw some stuff which deals with uniqueidentifiers for Interbase. It uses your scheme with chars. But it strip "-" from string and reverts it to efficiently use indexes (uid sometimes uses MAC-address as part of itself, so MAC should go first in string). Weird scheme for me! regards, Dmitry
Alex Pilosov writes: > I don't think you know what UUID is. It is NOT just a unique randon > number. There are specific rules for construction of such number, specific > rules for comparison of numbers (no, its not bit-by-bit), thus a datatype > is most appropriate answer. A data type may be appropriate for storing these values, but not for generating them. Functions generate stuff, data types store stuff. -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter