Thread: GUIDs
Hi All, I want to use GUIDs for object IDs in my application. This means that they would be used for primary keys in PostgreSQL, and there would of course be FKs pointing to them. A GUID is 128 bits, and can be in the following possible formats: * 16 byte binary (but then I'd have to convert it to hex in my app) * 32 byte string (e.g., '4162F7121DD211B2B17EC09EFE1DC403') * hex string (e.g., 0x3271839C163D11D891F785398CC7932E) * Base 64 string (e.g., 'MnIAdBY9EdiR94U5jMeTLg==') It looks like the hex option would be the best option, but there's no native hex format in PostgreSQL. Anyone have suggestions on what the best approach might be? I can't convert it to a number, really, because 128 bit numbers aren't too portable). Please Cc any replies to me, as I'm not subscribed to the mail list. Many TIA, David -- David Wheeler AIM: dwTheory david@kineticode.com ICQ: 15726394 http://www.kineticode.com/ Yahoo!: dew7e Jabber: Theory@jabber.org Kineticode. Setting knowledge in motion.[sm]
David Wheeler writes: > It looks like the hex option would be the best option, but there's no > native hex format in PostgreSQL. Anyone have suggestions on what the > best approach might be? I can't convert it to a number, really, because > 128 bit numbers aren't too portable). Use bytea. It stores bytes and allows the conversion into several output formats. -- Peter Eisentraut peter_e@gmx.net
On Thursday, November 13, 2003, at 08:04 PM, Peter Eisentraut wrote: > Use bytea. It stores bytes and allows the conversion into several > output > formats. bytea with the binary or the hex? And isn't it a bit of a waste to add the extra 4 bytes when I'll only ever need 16? Thanks, David -- David Wheeler AIM: dwTheory david@kineticode.com ICQ: 15726394 http://www.kineticode.com/ Yahoo!: dew7e Jabber: Theory@jabber.org Kineticode. Setting knowledge in motion.[sm]
David Wheeler writes: > bytea with the binary or the hex? Binary or hex what? > And isn't it a bit of a waste to add the extra 4 bytes when I'll only > ever need 16? Sure, but if you want to avoid that, you'll have to implement your own data type. Actually, I think someone has already done that for GUID. If you search the archives you might find out about it. -- Peter Eisentraut peter_e@gmx.net
On Thursday, November 13, 2003, at 08:30 PM, Peter Eisentraut wrote: > Bbytea with the binary or the hex? > > Binary or hex what? Representation of the UUID. >> And isn't it a bit of a waste to add the extra 4 bytes when I'll only >> ever need 16? > > Sure, but if you want to avoid that, you'll have to implement your own > data type. Actually, I think someone has already done that for GUID. > If > you search the archives you might find out about it. Sure enough. Josh just mentioned this to me: http://gborg.postgresql.org/project/pguuid/projdisplay.php Looks like 1.0.0 was released June 17, 2003. I wonder how robust it is? Cheers, David -- David Wheeler AIM: dwTheory david@kineticode.com ICQ: 15726394 http://www.kineticode.com/ Yahoo!: dew7e Jabber: Theory@jabber.org Kineticode. Setting knowledge in motion.[sm]
On Thu, Nov 13, 2003 at 19:57:33 -0500, David Wheeler <david@kineticode.com> wrote: > Hi All, > > I want to use GUIDs for object IDs in my application. This means that > they would be used for primary keys in PostgreSQL, and there would of > course be FKs pointing to them. A GUID is 128 bits, and can be in the > following possible formats: Couldn't you use numeric? That should be fairly portable.
On Friday, November 14, 2003, at 12:04 AM, Bruno Wolff III wrote: > Couldn't you use numeric? That should be fairly portable. Not really. The GUID is 128 bits, which doesn't work so well on 32-bit systems. Regards, David -- David Wheeler AIM: dwTheory david@kineticode.com ICQ: 15726394 http://www.kineticode.com/ Yahoo!: dew7e Jabber: Theory@jabber.org Kineticode. Setting knowledge in motion.[sm]
David Wheeler <david@kineticode.com> writes: > On Friday, November 14, 2003, at 12:04 AM, Bruno Wolff III wrote: > > > Couldn't you use numeric? That should be fairly portable. > > Not really. The GUID is 128 bits, which doesn't work so well on 32-bit > systems. NUMERIC is an arbitrary-precision integer, not a machine word. -Doug
On Friday, November 14, 2003, at 06:18 PM, Doug McNaught wrote: > NUMERIC is an arbitrary-precision integer, not a machine word Yeah, but that doesn't help me in Perl. Regards, David -- David Wheeler AIM: dwTheory david@kineticode.com ICQ: 15726394 http://www.kineticode.com/ Yahoo!: dew7e Jabber: Theory@jabber.org Kineticode. Setting knowledge in motion.[sm]
On Fri, Nov 14, 2003 at 17:52:41 -0500, David Wheeler <david@kineticode.com> wrote: > On Friday, November 14, 2003, at 12:04 AM, Bruno Wolff III wrote: > > >Couldn't you use numeric? That should be fairly portable. > > Not really. The GUID is 128 bits, which doesn't work so well on 32-bit > systems. On postgres numeric isn't limited to what can be represented in 32 bits. I am not sure about other systems, but I would expect it to not be limited to just 32 bits on them either.
On Fri, Nov 14, 2003 at 18:26:27 -0500, David Wheeler <david@kineticode.com> wrote: > On Friday, November 14, 2003, at 06:18 PM, Doug McNaught wrote: > > >NUMERIC is an arbitrary-precision integer, not a machine word > > Yeah, but that doesn't help me in Perl. You can probably keep it as a string in perl. There are also large number handling routines available for perl if you really need to treat it as a number there.
On Friday, November 14, 2003, at 11:20 PM, Bruno Wolff III wrote: >> Yeah, but that doesn't help me in Perl. > > You can probably keep it as a string in perl. There are also large > number > handling routines available for perl if you really need to treat it > as a number there. I can store them as a string in Perl, but the trick is getting the string representation in the first place. The library I was looking at using Data::UUID, offers binary and hex representations, as well as a 32 bit alphanumeric string and a Base64-encoded string, but not a numeric string, unfortunately. http://search.cpan.org/dist/Data-UUID/UUID.pm Regards, David -- David Wheeler AIM: dwTheory david@kineticode.com ICQ: 15726394 http://www.kineticode.com/ Yahoo!: dew7e Jabber: Theory@jabber.org Kineticode. Setting knowledge in motion.[sm]
David Wheeler <david@kineticode.com> writes: > I can store them as a string in Perl, but the trick is getting the > string representation in the first place. The library I was looking at > using Data::UUID, offers binary and hex representations, as well as a > 32 bit alphanumeric string and a Base64-encoded string, but not a > numeric string, unfortunately. I think if I were in your place, I'd store them as TEXT fields using either the base64 or hex representation. -Doug
Doug McNaught wrote: > David Wheeler <david@kineticode.com> writes: > >> On Friday, November 14, 2003, at 12:04 AM, Bruno Wolff III wrote: >> >> > Couldn't you use numeric? That should be fairly portable. >> >> Not really. The GUID is 128 bits, which doesn't work so well on 32-bit >> systems. > > NUMERIC is an arbitrary-precision integer, not a machine word. No, NUMERIC is an arbitrary precision _numeric_ type that uses string math internally. Want log(2.0) with 200 digits precision? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On Saturday, November 15, 2003, at 05:26 PM, Doug McNaught wrote: > I think if I were in your place, I'd store them as TEXT fields using > either the base64 or hex representation. That was my original plan. But because the hex and base64 representations are 32 bytes instead of 16, it ends up being a lot less efficient. Josh Berkus explains it to me like this: <agliodbs> I mean, think of, for example, many-to-many join tables, or tree tables, with millions of rows but just to id columns <agliodbs> going from 32 bytes per row to 64 bytes will almost double the size of the table and all of its indexes <agliodbs> for example, let's take the join table/index example: 1000000 rows, with 2 ID colmuns <agliodbs> now, usling a 16byte number, you have about 40bytes per row (16+16+overhead) <agliodbs> that's 40mb to load the table/index into memory <agliodbs> but if you go with 32chars, that's about 76mb ... or abut 120mb for unicode <agliodbs> hopefully you're not loading the whole thing into memory often, but sometimes seq scans are necessary, and as much as 1/3 of the table could end up in memory <agliodbs> in addition to the memory load, it takes longer to get 40 mb off disk than it does to take 13mb <agliodbs> and longer to sync it to disk, and longer to vacuum it So I'm inclined, I think, to use BYTEA as Peter originally suggested (I can't get pguuid to compile for PostgreSQL 7.3.3 or 7.4RC2) and convert it to other representations as needed on the API. Regards, David -- David Wheeler AIM: dwTheory david@kineticode.com ICQ: 15726394 http://www.kineticode.com/ Yahoo!: dew7e Jabber: Theory@jabber.org Kineticode. Setting knowledge in motion.[sm]