Thread: GUID for postgreSQL

GUID for postgreSQL

From

"Riaan van der Westhuizen"

Date:

27 July 2005, 04:46:06

Hi All,

We are a small developing house in South Africa, which is in the process of
porting our Apps from
MS SQL to PostgreSQL.  We use the newid() [globally unique identifier
(GUID)] function in SQL a lot, and need the same for pg.

Our development platform is .NET using c#.  We also plan to start using Mono
C# in the future.
We will be deploying PostgreSQL on Windows (2003 Server) and Linux (Ubuntu)
platforms.

We have search the web and found c code that can do this, but we do not have
c programmers.

Are there anybody that can help us compiling these for us, we need it on
both OSs?

Thanks

Regards,

Riaan van der Westhuizen
CEO

Huizensoft (Pty) Ltd
Tel: +27 44 871 5534
Fax: +27 44 871 5098
 
This e-mail and any attachments thereto is confidential and
is intended solely for the use of the addressee's.
If you are not the intended recipient, be advised that any use,
dissemination, forwarding, printing, or copying of this e-mail is strictly
prohibited.

Huizensoft (Pty) Ltd accepts no liability for any views or opinions
expressed in
this e-mail or for any loss or damages that may be suffered by any person
whomsoever, arising from, or in connection with, or caused by, the use of
this e-mail.

Re: GUID for postgreSQL

From

Tino Wildenhain

Date:

27 July 2005, 05:49:47

Am Mittwoch, den 27.07.2005, 09:48 +0200 schrieb Riaan van der
Westhuizen:
> Hi All,
>
> We are a small developing house in South Africa, which is in the process of
> porting our Apps from
> MS SQL to PostgreSQL.  We use the newid() [globally unique identifier
> (GUID)] function in SQL a lot, and need the same for pg.
>
> Our development platform is .NET using c#.  We also plan to start using Mono
> C# in the future.
> We will be deploying PostgreSQL on Windows (2003 Server) and Linux (Ubuntu)
> platforms.
>
> We have search the web and found c code that can do this, but we do not have
> c programmers.
>
> Are there anybody that can help us compiling these for us, we need it on
> both OS’s?

I'd create a sequence:

CREATE SEQUENCE global_unique_id_seq;

and a function:

CREATE OR REPLACE FUNCTION newid()
  RETURNS text AS
$BODY$ SELECT nextval('global_unique_id_seq')::text; $BODY$
  LANGUAGE 'sql' VOLATILE;


now every call to newid() returns a garantied unique id for
say the next 18446744073709551616 calls.
Of course you can obfuscate the ID even more using
md5, include servername and so on, but this will not improve
security in any way (unless you mix data with 2nd database)


> This e-mail and any attachments thereto is confidential and
> is intended solely for the use of the addressee's.
> If you are not the intended recipient, be advised that any use,
> dissemination, forwarding, printing, or copying of this e-mail is strictly
> prohibited.
>
> Huizensoft (Pty) Ltd accepts no liability for any views or opinions
> expressed in
> this e-mail or for any loss or damages that may be suffered by any person
> whomsoever, arising from, or in connection with, or caused by, the use of
> this e-mail.
>
I'd skip this disclaimer as it is not relevant to law and makes the
company look a bit moronic (sorry) especially on mailinglists.

Tino

Re: GUID for postgreSQL

From

Edwin Hernán Barrios Núñez

Date:

27 July 2005, 10:55:39

Hi Riaan.!


i think that is a better  solution, to use  the postgres native type
"serial", that  it is  a autoincremental number type. You can see it
on the postgres manual. It's very usefull because of you only need to
redifine the type of your id vars. For example, on this moment you
have

 CREATE TABLE test ( name varchar(50), id int );

 insert into test values ('prueba',newid);

On postgres you can resplace thoses

 CREATE TABLE test ( name varchar(50), id serial primary key);

 insert into test values ('prueba');

Serial type create automatically a sequence asocited to id. and always
you insert something  using its default value, the serial id
autoincrements its value.

thanks , i have not a  good english but i try to help !

Atte:
Edwin Barrios !
Gerente iBand Networks Ltda




On 7/27/05, Riaan van der Westhuizen <riaan@huizensoft.co.za> wrote:
> Hi All,
>
> We are a small developing house in South Africa, which is in the process of
> porting our Apps from
> MS SQL to PostgreSQL.  We use the newid() [globally unique identifier
> (GUID)] function in SQL a lot, and need the same for pg.
>
> Our development platform is .NET using c#.  We also plan to start using Mono
> C# in the future.
> We will be deploying PostgreSQL on Windows (2003 Server) and Linux (Ubuntu)
> platforms.
>
> We have search the web and found c code that can do this, but we do not have
> c programmers.
>
> Are there anybody that can help us compiling these for us, we need it on
> both OS's?
>
> Thanks
>
> Regards,
>
> Riaan van der Westhuizen
> CEO
>
> Huizensoft (Pty) Ltd
> Tel: +27 44 871 5534
> Fax: +2744 871 5098
>
> This e-mail and any attachments thereto is confidential and
> is intended solely for the use of the addressee's.
> If you are not the intended recipient, be advised that any use,
> dissemination, forwarding, printing, or copying of this e-mail is strictly
> prohibited.
>
> Huizensoft (Pty) Ltd accepts no liability for any views or opinions
> expressed in
> this e-mail or for any loss or damages that may be suffered by any person
> whomsoever, arising from, or in connection with, or caused by, the use of
> this e-mail.
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

Re: GUID for postgreSQL

From

John DeSoi

Date:

27 July 2005, 17:33:28

On Jul 27, 2005, at 4:49 AM, Tino Wildenhain wrote:

> I'd create a sequence:
>
> CREATE SEQUENCE global_unique_id_seq;
>
> and a function:
>
> CREATE OR REPLACE FUNCTION newid()
>   RETURNS text AS
> $BODY$ SELECT nextval('global_unique_id_seq')::text; $BODY$
>   LANGUAGE 'sql' VOLATILE;
>
>
> now every call to newid() returns a garantied unique id for
> say the next 18446744073709551616 calls.
> Of course you can obfuscate the ID even more using
> md5, include servername and so on, but this will not improve
> security in any way (unless you mix data with 2nd database)

This is not really a viable replacement for a GUID == globally unique
identifier. Here global means that if I use the application in
multiple databases, I'm guaranteed that no two identifiers will be
the same. Using a sequence will only support uniqueness for a single
database.

Best,

John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL

Re: GUID for postgreSQL

From

Scott Marlowe

Date:

27 July 2005, 17:46:50

On Wed, 2005-07-27 at 15:32, John DeSoi wrote:
> On Jul 27, 2005, at 4:49 AM, Tino Wildenhain wrote:
>
> > I'd create a sequence:
> >
> > CREATE SEQUENCE global_unique_id_seq;
> >
> > and a function:
> >
> > CREATE OR REPLACE FUNCTION newid()
> >   RETURNS text AS
> > $BODY$ SELECT nextval('global_unique_id_seq')::text; $BODY$
> >   LANGUAGE 'sql' VOLATILE;
> >
> >
> > now every call to newid() returns a garantied unique id for
> > say the next 18446744073709551616 calls.
> > Of course you can obfuscate the ID even more using
> > md5, include servername and so on, but this will not improve
> > security in any way (unless you mix data with 2nd database)
>
>
> This is not really a viable replacement for a GUID == globally unique
> identifier. Here global means that if I use the application in
> multiple databases, I'm guaranteed that no two identifiers will be
> the same. Using a sequence will only support uniqueness for a single
> database.

So, how can two databases, not currently talking to one another,
guarantee that their GUIDs don't collide? using a large randomly
generated name space only reduces the chances of collision, it doesn't
actually guarantee it.

Re: GUID for postgreSQL

From

"Dann Corbit"

Date:

27 July 2005, 17:56:35

Windows uses the MAC address in GUID generation.

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Scott Marlowe
> Sent: Wednesday, July 27, 2005 1:47 PM
> To: John DeSoi
> Cc: Tino Wildenhain; Riaan van der Westhuizen; Postgresql-General
> Subject: Re: [GENERAL] GUID for postgreSQL
>
> On Wed, 2005-07-27 at 15:32, John DeSoi wrote:
> > On Jul 27, 2005, at 4:49 AM, Tino Wildenhain wrote:
> >
> > > I'd create a sequence:
> > >
> > > CREATE SEQUENCE global_unique_id_seq;
> > >
> > > and a function:
> > >
> > > CREATE OR REPLACE FUNCTION newid()
> > >   RETURNS text AS
> > > $BODY$ SELECT nextval('global_unique_id_seq')::text; $BODY$
> > >   LANGUAGE 'sql' VOLATILE;
> > >
> > >
> > > now every call to newid() returns a garantied unique id for
> > > say the next 18446744073709551616 calls.
> > > Of course you can obfuscate the ID even more using
> > > md5, include servername and so on, but this will not improve
> > > security in any way (unless you mix data with 2nd database)
> >
> >
> > This is not really a viable replacement for a GUID == globally
unique
> > identifier. Here global means that if I use the application in
> > multiple databases, I'm guaranteed that no two identifiers will be
> > the same. Using a sequence will only support uniqueness for a single
> > database.
>
> So, how can two databases, not currently talking to one another,
> guarantee that their GUIDs don't collide? using a large randomly
> generated name space only reduces the chances of collision, it doesn't
> actually guarantee it.
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq

Re: GUID for postgreSQL

From

John DeSoi

Date:

27 July 2005, 17:57:42

On Jul 27, 2005, at 4:46 PM, Scott Marlowe wrote:

> So, how can two databases, not currently talking to one another,
> guarantee that their GUIDs don't collide? using a large randomly
> generated name space only reduces the chances of collision, it doesn't
> actually guarantee it.

Like MD5, there is no 100% guarantee, but the collision possibility
supposed to be is very close to zero.

See http://en.wikipedia.org/wiki/GUID

John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL

Re: GUID for postgreSQL

From

Ben

Date:

27 July 2005, 17:58:29

Yes, this is the problem with GUIDs... you can calculate them by mashing
toghether things like the time, a network address, and some random
numbers, which makes it very unlikely for a collision.... but at the end
of the day that G stand for global, *not* guaranteed.

On Wed, 27 Jul 2005, Scott Marlowe wrote:

> On Wed, 2005-07-27 at 15:32, John DeSoi wrote:
> > On Jul 27, 2005, at 4:49 AM, Tino Wildenhain wrote:
> >
> > > I'd create a sequence:
> > >
> > > CREATE SEQUENCE global_unique_id_seq;
> > >
> > > and a function:
> > >
> > > CREATE OR REPLACE FUNCTION newid()
> > >   RETURNS text AS
> > > $BODY$ SELECT nextval('global_unique_id_seq')::text; $BODY$
> > >   LANGUAGE 'sql' VOLATILE;
> > >
> > >
> > > now every call to newid() returns a garantied unique id for
> > > say the next 18446744073709551616 calls.
> > > Of course you can obfuscate the ID even more using
> > > md5, include servername and so on, but this will not improve
> > > security in any way (unless you mix data with 2nd database)
> >
> >
> > This is not really a viable replacement for a GUID == globally unique
> > identifier. Here global means that if I use the application in
> > multiple databases, I'm guaranteed that no two identifiers will be
> > the same. Using a sequence will only support uniqueness for a single
> > database.
>
> So, how can two databases, not currently talking to one another,
> guarantee that their GUIDs don't collide? using a large randomly
> generated name space only reduces the chances of collision, it doesn't
> actually guarantee it.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>

Re: GUID for postgreSQL

From

Scott Marlowe

Date:

27 July 2005, 18:02:52

On Wed, 2005-07-27 at 15:57, John DeSoi wrote:
> On Jul 27, 2005, at 4:46 PM, Scott Marlowe wrote:
>
> > So, how can two databases, not currently talking to one another,
> > guarantee that their GUIDs don't collide? using a large randomly
> > generated name space only reduces the chances of collision, it doesn't
> > actually guarantee it.
>
>
> Like MD5, there is no 100% guarantee, but the collision possibility
> supposed to be is very close to zero.

Then I would think a better thought out solution would be one where your
unique ids ARE guaranteed to be unique, where you used something like

select 'astringuniqtothismachine'||nextval('localsequence');

That really would be guaranteed unique as long as you set up each
machine to have a string unique to it.

Re: GUID for postgreSQL

From

"Magnus Hagander"

Date:

27 July 2005, 18:04:42

> > This is not really a viable replacement for a GUID ==
> globally unique
> > identifier. Here global means that if I use the application in
> > multiple databases, I'm guaranteed that no two identifiers
> will be the
> > same. Using a sequence will only support uniqueness for a single
> > database.
>
> So, how can two databases, not currently talking to one
> another, guarantee that their GUIDs don't collide? using a
> large randomly generated name space only reduces the chances
> of collision, it doesn't actually guarantee it.

At least on Windows, the GUID is derived in part from the computers
primary MAC address. No, it's not a guarantee, but it's pretty unlikely
:-)

//Magnus

Re: GUID for postgreSQL

From

Chris Travers

Date:

27 July 2005, 18:20:42

You could guarantee it, for example...

Something like (pseudocode here):
create sequence local_id;
create domain guid AS text default ('54-' || (nextval(local_id))::text);
where 54 is the database id.  In this way, every inserted GUID will be
guaranteed to contain a GUID in two parts:  A database identifier and a
locally unique local identifier.  These could then be parsed in a
reasonable way.

The only way I think one can come up with *guaranteed* globally unique
identifiers is to place such information such as we use with other
things that must be globally unique:  have a locally unique identifier
along with a globally unique location identifieer.  Sort of like we have
with IP addresses, MAC addresses, telephone numbers, etc...

Best Wishes,
Chris Travers
Metatron Technology Consulting

Ben wrote:

>Yes, this is the problem with GUIDs... you can calculate them by mashing
>toghether things like the time, a network address, and some random
>numbers, which makes it very unlikely for a collision.... but at the end
>of the day that G stand for global, *not* guaranteed.
>
>On Wed, 27 Jul 2005, Scott Marlowe wrote:
>
>
>
>>On Wed, 2005-07-27 at 15:32, John DeSoi wrote:
>>
>>
>>>On Jul 27, 2005, at 4:49 AM, Tino Wildenhain wrote:
>>>
>>>
>>>
>>>>I'd create a sequence:
>>>>
>>>>CREATE SEQUENCE global_unique_id_seq;
>>>>
>>>>and a function:
>>>>
>>>>CREATE OR REPLACE FUNCTION newid()
>>>>  RETURNS text AS
>>>>$BODY$ SELECT nextval('global_unique_id_seq')::text; $BODY$
>>>>  LANGUAGE 'sql' VOLATILE;
>>>>
>>>>
>>>>now every call to newid() returns a garantied unique id for
>>>>say the next 18446744073709551616 calls.
>>>>Of course you can obfuscate the ID even more using
>>>>md5, include servername and so on, but this will not improve
>>>>security in any way (unless you mix data with 2nd database)
>>>>
>>>>
>>>This is not really a viable replacement for a GUID == globally unique
>>>identifier. Here global means that if I use the application in
>>>multiple databases, I'm guaranteed that no two identifiers will be
>>>the same. Using a sequence will only support uniqueness for a single
>>>database.
>>>
>>>
>>So, how can two databases, not currently talking to one another,
>>guarantee that their GUIDs don't collide? using a large randomly
>>generated name space only reduces the chances of collision, it doesn't
>>actually guarantee it.
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 3: Have you checked our extensive FAQ?
>>
>>               http://www.postgresql.org/docs/faq
>>
>>
>>
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 6: explain analyze is your friend
>
>
>
>

Re: GUID for postgreSQL

From

Chris Travers

Date:

27 July 2005, 18:29:22

Magnus Hagander wrote:

>>
>
>At least on Windows, the GUID is derived in part from the computers
>primary MAC address. No, it's not a guarantee, but it's pretty unlikely
>:-)
>
>
The danger is not that the MAC address will be duplicated, but that
other factors will lead to an MD5 collision.

Unless you can show me that there is a 1:1 correspondence of all
possible unique factors going into the GUID generation and the output,
then I will say it is still no guarantee.

Just because two documents or files have the same MD5 doesn't mean that
they are the same files either.  I.e. you can't go searching all files
by MD5 checksums and expecting to find the right one.  OTOH, MD5
provides reasonable assurance that any given file (once you know its
intended MD5) has not been tampered with.  I.e. MD5 is not meant to
preclude collisions, but rather it is meant to preclude *intentional*
collisions.  Similarly, if we want a guaranteed uniqueness to a GUID we
have to have some sort of unique string to the GUID prepended to it (not
merely used in a hash).

So you could use the Mac address of the machine, I guess, if you wanted
to....

Best Wishes,
Chris Travers
Metatron Technology Consulting

Re: GUID for postgreSQL

From

"Dann Corbit"

Date:

27 July 2005, 18:46:04

There is a "privacy hole" from using the MAC address.  (Read it in the
WIKI article someone else posted).

Probably, it would be better to use a one way hash of the MAC address.


> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Chris Travers
> Sent: Wednesday, July 27, 2005 2:27 PM
> To: Magnus Hagander; pgsql-general
> Subject: Re: [GENERAL] GUID for postgreSQL
>
> Magnus Hagander wrote:
>
> >>
> >
> >At least on Windows, the GUID is derived in part from the computers
> >primary MAC address. No, it's not a guarantee, but it's pretty
unlikely
> >:-)
> >
> >
> The danger is not that the MAC address will be duplicated, but that
> other factors will lead to an MD5 collision.
>
> Unless you can show me that there is a 1:1 correspondence of all
> possible unique factors going into the GUID generation and the output,
> then I will say it is still no guarantee.
>
> Just because two documents or files have the same MD5 doesn't mean
that
> they are the same files either.  I.e. you can't go searching all files
> by MD5 checksums and expecting to find the right one.  OTOH, MD5
> provides reasonable assurance that any given file (once you know its
> intended MD5) has not been tampered with.  I.e. MD5 is not meant to
> preclude collisions, but rather it is meant to preclude *intentional*
> collisions.  Similarly, if we want a guaranteed uniqueness to a GUID
we
> have to have some sort of unique string to the GUID prepended to it
(not
> merely used in a hash).
>
> So you could use the Mac address of the machine, I guess, if you
wanted
> to....
>
> Best Wishes,
> Chris Travers
> Metatron Technology Consulting
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

Re: GUID for postgreSQL

From

Tino Wildenhain

Date:

27 July 2005, 19:17:16

Am Mittwoch, den 27.07.2005, 23:03 +0200 schrieb Magnus Hagander:
> > > This is not really a viable replacement for a GUID ==
> > globally unique
> > > identifier. Here global means that if I use the application in
> > > multiple databases, I'm guaranteed that no two identifiers
> > will be the
> > > same. Using a sequence will only support uniqueness for a single
> > > database.

> > So, how can two databases, not currently talking to one
> > another, guarantee that their GUIDs don't collide? using a
> > large randomly generated name space only reduces the chances
> > of collision, it doesn't actually guarantee it.

well, then give each database a numer, name or whatever and
pad sequence with it. Nothing simpler then that.
The global uniqueness is just a wishfull dream. Nobody can
garantie it. If you use a distingushed name or number for
each of your databases, its garantied.


> At least on Windows, the GUID is derived in part from the computers
> primary MAC address. No, it's not a guarantee, but it's pretty unlikely
> :-)

Yes, thats one way.. But really you just need a domain (not
related to the internet meaning of domains ;) Anything
that lets you identify (or at least distinguish) _your_
databases. (unlikely you are working with every or random
databases in the world ;)

You dont even need a hash function (md5, sha1) if uniqueness is
all you need. Otoh, the often raised clash argument with md5 does not
count here because collisions dont happen just per coincidence
given the very limited rule (database-identifier + serial)

Re: GUID for postgreSQL

From

Tony Caduto

Date:

27 July 2005, 19:46:06

Use Dblink and do a select off of a sequence on just one of the boxes?
You could set up a view that uses DBlink on all the boxes that points to
the master seq box.

should work.

Scott Marlowe wrote:

>So, how can two databases, not currently talking to one another,
>guarantee that their GUIDs don't collide? using a large randomly
>generated name space only reduces the chances of collision, it doesn't
>actually guarantee it.
>
>---------------------------(end of broadcast)---------------------------
>TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>
>
>

Re: GUID for postgreSQL

From

Alvaro Herrera

Date:

27 July 2005, 20:34:41

On Wed, Jul 27, 2005 at 05:40:11PM -0500, Tony Caduto wrote:
> Use Dblink and do a select off of a sequence on just one of the boxes?
> You could set up a view that uses DBlink on all the boxes that points to
> the master seq box.
>
> should work.

It'll make the whole thing painfully slow.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
<inflex> really, I see PHP as like a stange amalgamation of C, Perl, Shell
<crab> inflex: you know that "amalgam" means "mixture with mercury",
       more or less, right?
<crab> i.e., "deadly poison"

Re: GUID for postgreSQL

From

John DeSoi

Date:

27 July 2005, 20:43:25

On Jul 27, 2005, at 5:00 PM, Scott Marlowe wrote:

> Then I would think a better thought out solution would be one where
> your
> unique ids ARE guaranteed to be unique, where you used something like
>
> select 'astringuniqtothismachine'||nextval('localsequence');
>
> That really would be guaranteed unique as long as you set up each
> machine to have a string unique to it.

I have implemented this type of approach in distributed systems. The
problem is users who make a copy of their database, continue to use
both copies, and then call you when they try to merge things
together. I would say user opportunity to mess this up is way more
likely than having a GUID collision.

I'm not saying that GUIDs are the ultimate solution to this problem.
The original poster brought up the need to store GUIDs in a database.
There are protocols and standards that require GUIDs and I merely
agree it would be nice to have a GUID data type.

John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL

Re: GUID for postgreSQL

From

Alvaro Herrera

Date:

27 July 2005, 21:25:15

On Wed, Jul 27, 2005 at 07:43:08PM -0400, John DeSoi wrote:

> I'm not saying that GUIDs are the ultimate solution to this problem.
> The original poster brought up the need to store GUIDs in a database.
> There are protocols and standards that require GUIDs and I merely
> agree it would be nice to have a GUID data type.

AFAIR there is one on gborg.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"Just treat us the way you want to be treated + some extra allowance
 for ignorance."                                    (Michael Brusser)

Re: GUID for postgreSQL

From

Tony Caduto

Date:

28 July 2005, 00:31:54

I don't think it would be that slow, unless running on a modem.
It would really depend on what you are doing, calling dblink 1000 times
a second would be slow, but I could see it being
used succesfully for other applications.

Alvaro Herrera wrote:

>On Wed, Jul 27, 2005 at 05:40:11PM -0500, Tony Caduto wrote:
>
>
>>Use Dblink and do a select off of a sequence on just one of the boxes?
>>You could set up a view that uses DBlink on all the boxes that points to
>>the master seq box.
>>
>>should work.
>>
>>
>
>It'll make the whole thing painfully slow.
>
>
>

Re: GUID for postgreSQL

From

"Stefan 'Kaishakunin' Schumacher"

Date:

28 July 2005, 00:56:33

Also sprach Scott Marlowe (smarlowe@g2switchworks.com)
> On Wed, 2005-07-27 at 15:57, John DeSoi wrote:
> > On Jul 27, 2005, at 4:46 PM, Scott Marlowe wrote:
> >
> > > So, how can two databases, not currently talking to one another,
> > > guarantee that their GUIDs don't collide? using a large randomly
> > > generated name space only reduces the chances of collision, it doesn't
> > > actually guarantee it.
> >
> >
> > Like MD5, there is no 100% guarantee, but the collision possibility
> > supposed to be is very close to zero.
>
> Then I would think a better thought out solution would be one where your
> unique ids ARE guaranteed to be unique, where you used something like
>
> select 'astringuniqtothismachine'||nextval('localsequence');
>
> That really would be guaranteed unique as long as you set up each
> machine to have a string unique to it.

I was a little bit confused about the uniqueness of GUID (esp. using
only 16 Bit [1]) and read the article about the UUID:
http://en.wikipedia.org/wiki/Universally_Unique_Identifier

It states:

 Conceptually, the original (version 1) generation scheme for UUIDs was
 to concatenate the UUID version with the MAC address of the computer
 that is generating the UUID, and with the number of 100-nanosecond
 intervals since the adoption of the Gregorian calendar. In practice,
 the actual algorithm is more complicated. This scheme has been
 criticized in that it is not sufficiently 'opaque'; it reveals both
 the identity of the computer that generated the UUID and the time at
 which it did so.

 Several other generation algorithms have been developed and
 incorporated into the standard, including a scheme relying only on
 random numbers (version 4 UUIDs), and a scheme deriving a UUID from a
 URL via MD5 (version 3 UUIDs) or SHA-1 (version 5 UUIDs) hashing.

 The 5.0 release of Java provides a class that will produce 128-bit
 UUIDs. The API documentation for the java.util.UUID class refers to
 ISO/IEC 11578:1996.

So as I understand the GUIDs only apply to the Microsoft Universe[TM], or
are there any other serious apllications using it?

I don't see any value of that, if not all real databases support one
GUID standard (with more than 16 Bit) worldwide. It might also be a
good idea to reserve space of the hash to identify the database
(vendor), just like FCC IDs or MACs do, but that would require a
central authority assigning those names and numbers.

[1] Why do people want to have IPv6?
--
PGP FPR: CF74 D5F2 4871 3E5C FFFE 0130 11F4 C41E B3FB AE33
--
Der Geist des Kriegers sollte mit Beginn des Neujahrstages bis zum Ende
des Jahres vom Gedanken an seinen Tod beherrscht werden.

Daijouji Shigesuke in "Budo Shoshin Shuu"

Attachment

msg-32091-101314.dat

Re: GUID for postgreSQL

From

Chris Browne

Date:

28 July 2005, 16:04:30

chris@travelamericas.com (Chris Travers) writes:
>>At least on Windows, the GUID is derived in part from the computers
>>primary MAC address. No, it's not a guarantee, but it's pretty unlikely
>>:-)

> The danger is not that the MAC address will be duplicated, but that
> other factors will lead to an MD5 collision.

What factors?

The 'random' portion comes _in addition to_ location information and a
timestamp, so that for there to be a collision, you need to be
generating thousands of GUIDs *per millisecond* on a given host.

> Unless you can show me that there is a 1:1 correspondence of all
> possible unique factors going into the GUID generation and the output,
> then I will say it is still no guarantee.

Read the RFC (4122).  It shows how it works.

> So you could use the Mac address of the machine, I guess, if you
> wanted to....

That is one option; section 4.3 of the RFC suggests an alternative
that is also likely to work.
--
(format nil "~S@~S" "cbbrowne" "acm.org")
http://www.ntlug.org/~cbbrowne/sap.html
Rules of the Evil Overlord #78.  "I will not tell my Legions of Terror
"And he must  be taken alive!" The command will be:  ``And try to take
him alive if it is reasonably practical.''"
<http://www.eviloverlord.com/>

Re: GUID for postgreSQL

From

Chris Browne

Date:

28 July 2005, 16:05:35

stefan@net-tex.de ("Stefan 'Kaishakunin' Schumacher") writes:
> So as I understand the GUIDs only apply to the Microsoft Universe[TM], or
> are there any other serious apllications using it?

No, the RFC (4122) was written by a combination of people including
Rich Salz, and is an encoding of the DCE UUID specification previously
produced as part of DCE.
--
(format nil "~S@~S" "cbbrowne" "acm.org")
http://www.ntlug.org/~cbbrowne/sap.html
Rules of the Evil Overlord #78.  "I will not tell my Legions of Terror
"And he must  be taken alive!" The command will be:  ``And try to take
him alive if it is reasonably practical.''"
<http://www.eviloverlord.com/>

Re: GUID for postgreSQL

From

Chris Browne

Date:

28 July 2005, 16:05:42

smarlowe@g2switchworks.com (Scott Marlowe) writes:
> On Wed, 2005-07-27 at 15:32, John DeSoi wrote:
>> On Jul 27, 2005, at 4:49 AM, Tino Wildenhain wrote:
>>
>> > I'd create a sequence:
>> >
>> > CREATE SEQUENCE global_unique_id_seq;
>> >
>> > and a function:
>> >
>> > CREATE OR REPLACE FUNCTION newid()
>> >   RETURNS text AS
>> > $BODY$ SELECT nextval('global_unique_id_seq')::text; $BODY$
>> >   LANGUAGE 'sql' VOLATILE;
>> >
>> >
>> > now every call to newid() returns a garantied unique id for
>> > say the next 18446744073709551616 calls.
>> > Of course you can obfuscate the ID even more using
>> > md5, include servername and so on, but this will not improve
>> > security in any way (unless you mix data with 2nd database)
>>
>>
>> This is not really a viable replacement for a GUID == globally unique
>> identifier. Here global means that if I use the application in
>> multiple databases, I'm guaranteed that no two identifiers will be
>> the same. Using a sequence will only support uniqueness for a single
>> database.
>
> So, how can two databases, not currently talking to one another,
> guarantee that their GUIDs don't collide? using a large randomly
> generated name space only reduces the chances of collision, it doesn't
> actually guarantee it.

Consult RFC 4122...

  <http://www.ietf.org/rfc/rfc4122.txt>

"Abstract

   This specification defines a Uniform Resource Name namespace for
   UUIDs (Universally Unique IDentifier), also known as GUIDs (Globally
   Unique IDentifier).  A UUID is 128 bits long, and can guarantee
   uniqueness across space and time.  UUIDs were originally used in the
   Apollo Network Computing System and later in the Open Software
   Foundation's (OSF) Distributed Computing Environment (DCE), and then
   in Microsoft Windows platforms.

   This specification is derived from the DCE specification with the
   kind permission of the OSF (now known as The Open Group).
   Information from earlier versions of the DCE specification have been
   incorporated into this document."

See also the gBorg "pgUUID" project:
  <http://gborg.postgresql.org/project/pguuid/projdisplay.php>
--
(format nil "~S@~S" "cbbrowne" "acm.org")
http://www.ntlug.org/~cbbrowne/sap.html
Rules of the Evil Overlord #78.  "I will not tell my Legions of Terror
"And he must  be taken alive!" The command will be:  ``And try to take
him alive if it is reasonably practical.''"
<http://www.eviloverlord.com/>

Re: GUID for postgreSQL

From

Bruno Wolff III

Date:

28 July 2005, 16:20:51

On Wed, Jul 27, 2005 at 16:57:21 -0400,
  John DeSoi <desoi@pgedit.com> wrote:
>
> On Jul 27, 2005, at 4:46 PM, Scott Marlowe wrote:
>
> >So, how can two databases, not currently talking to one another,
> >guarantee that their GUIDs don't collide? using a large randomly
> >generated name space only reduces the chances of collision, it doesn't
> >actually guarantee it.
>
>
> Like MD5, there is no 100% guarantee, but the collision possibility
> supposed to be is very close to zero.

If you use a large enough space for the number you can reduce that
probability of an accidental collision to much less than that of
catastrophic hardware failure at which point it isn't noticably better
than having no chance of collisions.

Re: GUID for postgreSQL

From

Christopher Weimann

Date:

28 July 2005, 17:12:18

On 07/28/2005-05:55AM, Stefan 'Kaishakunin' Schumacher wrote:
>
> I was a little bit confused about the uniqueness of GUID (esp. using
> only 16 Bit [1]) and read the article about the UUID:
> http://en.wikipedia.org/wiki/Universally_Unique_Identifier
>

16 BYTE not bit.  From the wiki...

   A UUID is essentially a 16-byte number and in its canonical form a UUID may look like this:

       550E8400-E29B-11D4-A716-446655440000

Re: GUID for postgreSQL

From

"John D. Burger"

Date:

29 July 2005, 13:18:47

> If you use a large enough space for the number you can reduce that
> probability of an accidental collision to much less than that of
> catastrophic hardware failure at which point it isn't noticably better
> than having no chance of collisions.

I find the comparison unconvincing - if my hardware crashes, I know it
and can decide how to recover.  If two UIDs collide, my system may
silently do something that may never be detected.

- John Burger
   MITRE

Re: GUID for postgreSQL

From

Greg Stark

Date:

29 July 2005, 13:59:20

"John D. Burger" <john@mitre.org> writes:

> > If you use a large enough space for the number you can reduce that
> > probability of an accidental collision to much less than that of
> > catastrophic hardware failure at which point it isn't noticably better
> > than having no chance of collisions.
>
> I find the comparison unconvincing - if my hardware crashes, I know it and can
> decide how to recover.  If two UIDs collide, my system may silently do
> something that may never be detected.

Alright, say the probability of a cosmic ray flipping one bit (if you're
picky, flipping two or three bits perfectly arranged so ECC ram doesn't detect
it). That would produce similarly subtle and unpredictable behaviour,
especially if it hit in a primary key.

Though I'm inclined to agree. They seem to be a fad with not much reason to
exist.

--
greg

Re: GUID for postgreSQL

From

Bruno Wolff III

Date:

29 July 2005, 14:12:21

On Fri, Jul 29, 2005 at 12:18:30 -0400,
  "John D. Burger" <john@mitre.org> wrote:
> >If you use a large enough space for the number you can reduce that
> >probability of an accidental collision to much less than that of
> >catastrophic hardware failure at which point it isn't noticably better
> >than having no chance of collisions.
>
> I find the comparison unconvincing - if my hardware crashes, I know it
> and can decide how to recover.  If two UIDs collide, my system may
> silently do something that may never be detected.

If it crashes yes, if a bit flips maybe not.

Note that by using a larger hash and more random bits you can make this
probability arbitrarily small. For 512 hashes with 512 bits of entropy,
I doubt you could compare documents fast enough to have a 50-50 chance
of finding a collision before the heat death of the universe.

Re: GUID for postgreSQL

From

Ron Mayer

Date:

31 July 2005, 17:36:58

Dann Corbit wrote:
> There is a "privacy hole" from using the MAC address.  (Read it in the
> WIKI article someone else posted).
>
> Probably, it would be better to use a one way hash of the MAC address.
>

The chances of MAC addresses colliding (through some low-end network
card vendor's sloppy manufacturing process; or through the not
uncommon ability to edit your mac address) is surely much higher than
the chance of big random numbers colliding.   Sure, you won't hurt
anything if you add the MAC address to your entropy sources when
generating a GUID; but thinking a MAC address is more unique than
the bits you'd get from any less human-error-prone source is
almost certainly wrong.