Thread: GUIDs

GUIDs

From
David Wheeler
Date:
Hi All,

I want to use GUIDs for object IDs in my application. This means that
they would be used for primary keys in PostgreSQL, and there would of
course be FKs pointing to them. A GUID is 128 bits, and can be in the
following possible formats:

* 16 byte binary (but then I'd have to convert it to hex in my app)
* 32 byte string (e.g., '4162F7121DD211B2B17EC09EFE1DC403')
* hex string (e.g., 0x3271839C163D11D891F785398CC7932E)
* Base 64 string (e.g., 'MnIAdBY9EdiR94U5jMeTLg==')

It looks like the hex option would be the best option, but there's no
native hex format in PostgreSQL. Anyone have suggestions on what the
best approach might be? I can't convert it to a number, really, because
128 bit numbers aren't too portable).

Please Cc any replies to me, as I'm not subscribed to the mail list.

Many TIA,

David

--
David Wheeler                                     AIM: dwTheory
david@kineticode.com                              ICQ: 15726394
http://www.kineticode.com/                     Yahoo!: dew7e
                                                Jabber: Theory@jabber.org
Kineticode. Setting knowledge in motion.[sm]


Re: GUIDs

From
Peter Eisentraut
Date:
David Wheeler writes:

> It looks like the hex option would be the best option, but there's no
> native hex format in PostgreSQL. Anyone have suggestions on what the
> best approach might be? I can't convert it to a number, really, because
> 128 bit numbers aren't too portable).

Use bytea.  It stores bytes and allows the conversion into several output
formats.

--
Peter Eisentraut   peter_e@gmx.net


Re: GUIDs

From
David Wheeler
Date:
On Thursday, November 13, 2003, at 08:04  PM, Peter Eisentraut wrote:

> Use bytea.  It stores bytes and allows the conversion into several
> output
> formats.

bytea with the binary or the hex? And isn't it a bit of a waste to add
the extra 4 bytes when I'll only ever need 16?

Thanks,

David

--
David Wheeler                                     AIM: dwTheory
david@kineticode.com                              ICQ: 15726394
http://www.kineticode.com/                     Yahoo!: dew7e
                                                Jabber: Theory@jabber.org
Kineticode. Setting knowledge in motion.[sm]


Re: GUIDs

From
Peter Eisentraut
Date:
David Wheeler writes:

> bytea with the binary or the hex?

Binary or hex what?

> And isn't it a bit of a waste to add the extra 4 bytes when I'll only
> ever need 16?

Sure, but if you want to avoid that, you'll have to implement your own
data type.  Actually, I think someone has already done that for GUID.  If
you search the archives you might find out about it.

--
Peter Eisentraut   peter_e@gmx.net


Re: GUIDs

From
David Wheeler
Date:
On Thursday, November 13, 2003, at 08:30  PM, Peter Eisentraut wrote:

> Bbytea with the binary or the hex?
>
> Binary or hex what?

Representation of the UUID.

>> And isn't it a bit of a waste to add the extra 4 bytes when I'll only
>> ever need 16?
>
> Sure, but if you want to avoid that, you'll have to implement your own
> data type.  Actually, I think someone has already done that for GUID.
> If
> you search the archives you might find out about it.

Sure enough. Josh just mentioned this to me:

   http://gborg.postgresql.org/project/pguuid/projdisplay.php

Looks like 1.0.0 was released June 17, 2003. I wonder how robust it is?

Cheers,

David

--
David Wheeler                                     AIM: dwTheory
david@kineticode.com                              ICQ: 15726394
http://www.kineticode.com/                     Yahoo!: dew7e
                                                Jabber: Theory@jabber.org
Kineticode. Setting knowledge in motion.[sm]


Re: GUIDs

From
Bruno Wolff III
Date:
On Thu, Nov 13, 2003 at 19:57:33 -0500,
  David Wheeler <david@kineticode.com> wrote:
> Hi All,
>
> I want to use GUIDs for object IDs in my application. This means that
> they would be used for primary keys in PostgreSQL, and there would of
> course be FKs pointing to them. A GUID is 128 bits, and can be in the
> following possible formats:

Couldn't you use numeric? That should be fairly portable.

Re: GUIDs

From
David Wheeler
Date:
On Friday, November 14, 2003, at 12:04  AM, Bruno Wolff III wrote:

> Couldn't you use numeric? That should be fairly portable.

Not really. The GUID is 128 bits, which doesn't work so well on 32-bit
systems.

Regards,

David

--
David Wheeler                                     AIM: dwTheory
david@kineticode.com                              ICQ: 15726394
http://www.kineticode.com/                     Yahoo!: dew7e
                                                Jabber: Theory@jabber.org
Kineticode. Setting knowledge in motion.[sm]


Re: GUIDs

From
Doug McNaught
Date:
David Wheeler <david@kineticode.com> writes:

> On Friday, November 14, 2003, at 12:04  AM, Bruno Wolff III wrote:
>
> > Couldn't you use numeric? That should be fairly portable.
>
> Not really. The GUID is 128 bits, which doesn't work so well on 32-bit
> systems.

NUMERIC is an arbitrary-precision integer, not a machine word.

-Doug

Re: GUIDs

From
David Wheeler
Date:
On Friday, November 14, 2003, at 06:18  PM, Doug McNaught wrote:

> NUMERIC is an arbitrary-precision integer, not a machine word

Yeah, but that doesn't help me in Perl.

Regards,

David

--
David Wheeler                                     AIM: dwTheory
david@kineticode.com                              ICQ: 15726394
http://www.kineticode.com/                     Yahoo!: dew7e
                                                Jabber: Theory@jabber.org
Kineticode. Setting knowledge in motion.[sm]


Re: GUIDs

From
Bruno Wolff III
Date:
On Fri, Nov 14, 2003 at 17:52:41 -0500,
  David Wheeler <david@kineticode.com> wrote:
> On Friday, November 14, 2003, at 12:04  AM, Bruno Wolff III wrote:
>
> >Couldn't you use numeric? That should be fairly portable.
>
> Not really. The GUID is 128 bits, which doesn't work so well on 32-bit
> systems.

On postgres numeric isn't limited to what can be represented in 32 bits.
I am not sure about other systems, but I would expect it to not be limited
to just 32 bits on them either.

Re: GUIDs

From
Bruno Wolff III
Date:
On Fri, Nov 14, 2003 at 18:26:27 -0500,
  David Wheeler <david@kineticode.com> wrote:
> On Friday, November 14, 2003, at 06:18  PM, Doug McNaught wrote:
>
> >NUMERIC is an arbitrary-precision integer, not a machine word
>
> Yeah, but that doesn't help me in Perl.

You can probably keep it as a string in perl. There are also large number
handling routines available for perl if you really need to treat it
as a number there.

Re: GUIDs

From
David Wheeler
Date:
On Friday, November 14, 2003, at 11:20  PM, Bruno Wolff III wrote:

>> Yeah, but that doesn't help me in Perl.
>
> You can probably keep it as a string in perl. There are also large
> number
> handling routines available for perl if you really need to treat it
> as a number there.

I can store them as a string in Perl, but the trick is getting the
string representation in the first place. The library I was looking at
using Data::UUID, offers binary and hex representations, as well as a
32 bit alphanumeric string and a Base64-encoded string, but not a
numeric string, unfortunately.

   http://search.cpan.org/dist/Data-UUID/UUID.pm

Regards,

David

--
David Wheeler                                     AIM: dwTheory
david@kineticode.com                              ICQ: 15726394
http://www.kineticode.com/                     Yahoo!: dew7e
                                                Jabber: Theory@jabber.org
Kineticode. Setting knowledge in motion.[sm]


Re: GUIDs

From
Doug McNaught
Date:
David Wheeler <david@kineticode.com> writes:

> I can store them as a string in Perl, but the trick is getting the
> string representation in the first place. The library I was looking at
> using Data::UUID, offers binary and hex representations, as well as a
> 32 bit alphanumeric string and a Base64-encoded string, but not a
> numeric string, unfortunately.

I think if I were in your place, I'd store them as TEXT fields using
either the base64 or hex representation.

-Doug

Re: GUIDs

From
Jan Wieck
Date:
Doug McNaught wrote:

> David Wheeler <david@kineticode.com> writes:
>
>> On Friday, November 14, 2003, at 12:04  AM, Bruno Wolff III wrote:
>>
>> > Couldn't you use numeric? That should be fairly portable.
>>
>> Not really. The GUID is 128 bits, which doesn't work so well on 32-bit
>> systems.
>
> NUMERIC is an arbitrary-precision integer, not a machine word.

No, NUMERIC is an arbitrary precision _numeric_ type that uses string
math internally. Want log(2.0) with 200 digits precision?


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: GUIDs

From
David Wheeler
Date:
On Saturday, November 15, 2003, at 05:26  PM, Doug McNaught wrote:

> I think if I were in your place, I'd store them as TEXT fields using
> either the base64 or hex representation.

That was my original plan. But because the hex and base64
representations are 32 bytes instead of 16, it ends up being a lot less
efficient. Josh Berkus explains it to me like this:

<agliodbs> I mean, think of, for example, many-to-many join tables, or
tree tables, with millions of rows but just to id columns
<agliodbs> going from 32 bytes per row to 64 bytes will almost double
the size of the table and all of its indexes
<agliodbs> for example, let's take the join table/index example:
1000000 rows, with 2 ID colmuns
<agliodbs> now, usling a 16byte number, you have about 40bytes per row
(16+16+overhead)
<agliodbs> that's 40mb to load the table/index into memory
<agliodbs> but if you go with 32chars, that's about 76mb ... or abut
120mb for unicode
<agliodbs> hopefully you're not loading the whole thing into memory
often, but sometimes seq scans are necessary, and as much as 1/3 of the
table could end up in memory
<agliodbs> in addition to the memory load, it takes longer to get 40 mb
off disk than it does to take 13mb
<agliodbs> and longer to sync it to disk, and longer to vacuum it

So I'm inclined, I think, to use BYTEA as Peter originally suggested (I
can't get pguuid to compile for PostgreSQL 7.3.3 or 7.4RC2) and convert
it to other representations as needed on the API.

Regards,

David

--
David Wheeler                                     AIM: dwTheory
david@kineticode.com                              ICQ: 15726394
http://www.kineticode.com/                     Yahoo!: dew7e
                                                Jabber: Theory@jabber.org
Kineticode. Setting knowledge in motion.[sm]