Thread: GUID for postgreSQL
Hi All, We are a small developing house in South Africa, which is in the process of porting our Apps from MS SQL to PostgreSQL. We use the newid() [globally unique identifier (GUID)] function in SQL a lot, and need the same for pg. Our development platform is .NET using c#. We also plan to start using Mono C# in the future. We will be deploying PostgreSQL on Windows (2003 Server) and Linux (Ubuntu) platforms. We have search the web and found c code that can do this, but we do not have c programmers. Are there anybody that can help us compiling these for us, we need it on both OSs? Thanks Regards, Riaan van der Westhuizen CEO Huizensoft (Pty) Ltd Tel: +27 44 871 5534 Fax: +27 44 871 5098 This e-mail and any attachments thereto is confidential and is intended solely for the use of the addressee's. If you are not the intended recipient, be advised that any use, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. Huizensoft (Pty) Ltd accepts no liability for any views or opinions expressed in this e-mail or for any loss or damages that may be suffered by any person whomsoever, arising from, or in connection with, or caused by, the use of this e-mail.
Am Mittwoch, den 27.07.2005, 09:48 +0200 schrieb Riaan van der Westhuizen: > Hi All, > > We are a small developing house in South Africa, which is in the process of > porting our Apps from > MS SQL to PostgreSQL. We use the newid() [globally unique identifier > (GUID)] function in SQL a lot, and need the same for pg. > > Our development platform is .NET using c#. We also plan to start using Mono > C# in the future. > We will be deploying PostgreSQL on Windows (2003 Server) and Linux (Ubuntu) > platforms. > > We have search the web and found c code that can do this, but we do not have > c programmers. > > Are there anybody that can help us compiling these for us, we need it on > both OS’s? I'd create a sequence: CREATE SEQUENCE global_unique_id_seq; and a function: CREATE OR REPLACE FUNCTION newid() RETURNS text AS $BODY$ SELECT nextval('global_unique_id_seq')::text; $BODY$ LANGUAGE 'sql' VOLATILE; now every call to newid() returns a garantied unique id for say the next 18446744073709551616 calls. Of course you can obfuscate the ID even more using md5, include servername and so on, but this will not improve security in any way (unless you mix data with 2nd database) > This e-mail and any attachments thereto is confidential and > is intended solely for the use of the addressee's. > If you are not the intended recipient, be advised that any use, > dissemination, forwarding, printing, or copying of this e-mail is strictly > prohibited. > > Huizensoft (Pty) Ltd accepts no liability for any views or opinions > expressed in > this e-mail or for any loss or damages that may be suffered by any person > whomsoever, arising from, or in connection with, or caused by, the use of > this e-mail. > I'd skip this disclaimer as it is not relevant to law and makes the company look a bit moronic (sorry) especially on mailinglists. Tino
Hi Riaan.! i think that is a better solution, to use the postgres native type "serial", that it is a autoincremental number type. You can see it on the postgres manual. It's very usefull because of you only need to redifine the type of your id vars. For example, on this moment you have CREATE TABLE test ( name varchar(50), id int ); insert into test values ('prueba',newid); On postgres you can resplace thoses CREATE TABLE test ( name varchar(50), id serial primary key); insert into test values ('prueba'); Serial type create automatically a sequence asocited to id. and always you insert something using its default value, the serial id autoincrements its value. thanks , i have not a good english but i try to help ! Atte: Edwin Barrios ! Gerente iBand Networks Ltda On 7/27/05, Riaan van der Westhuizen <riaan@huizensoft.co.za> wrote: > Hi All, > > We are a small developing house in South Africa, which is in the process of > porting our Apps from > MS SQL to PostgreSQL. We use the newid() [globally unique identifier > (GUID)] function in SQL a lot, and need the same for pg. > > Our development platform is .NET using c#. We also plan to start using Mono > C# in the future. > We will be deploying PostgreSQL on Windows (2003 Server) and Linux (Ubuntu) > platforms. > > We have search the web and found c code that can do this, but we do not have > c programmers. > > Are there anybody that can help us compiling these for us, we need it on > both OS's? > > Thanks > > Regards, > > Riaan van der Westhuizen > CEO > > Huizensoft (Pty) Ltd > Tel: +27 44 871 5534 > Fax: +2744 871 5098 > > This e-mail and any attachments thereto is confidential and > is intended solely for the use of the addressee's. > If you are not the intended recipient, be advised that any use, > dissemination, forwarding, printing, or copying of this e-mail is strictly > prohibited. > > Huizensoft (Pty) Ltd accepts no liability for any views or opinions > expressed in > this e-mail or for any loss or damages that may be suffered by any person > whomsoever, arising from, or in connection with, or caused by, the use of > this e-mail. > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
On Jul 27, 2005, at 4:49 AM, Tino Wildenhain wrote: > I'd create a sequence: > > CREATE SEQUENCE global_unique_id_seq; > > and a function: > > CREATE OR REPLACE FUNCTION newid() > RETURNS text AS > $BODY$ SELECT nextval('global_unique_id_seq')::text; $BODY$ > LANGUAGE 'sql' VOLATILE; > > > now every call to newid() returns a garantied unique id for > say the next 18446744073709551616 calls. > Of course you can obfuscate the ID even more using > md5, include servername and so on, but this will not improve > security in any way (unless you mix data with 2nd database) This is not really a viable replacement for a GUID == globally unique identifier. Here global means that if I use the application in multiple databases, I'm guaranteed that no two identifiers will be the same. Using a sequence will only support uniqueness for a single database. Best, John DeSoi, Ph.D. http://pgedit.com/ Power Tools for PostgreSQL
On Wed, 2005-07-27 at 15:32, John DeSoi wrote: > On Jul 27, 2005, at 4:49 AM, Tino Wildenhain wrote: > > > I'd create a sequence: > > > > CREATE SEQUENCE global_unique_id_seq; > > > > and a function: > > > > CREATE OR REPLACE FUNCTION newid() > > RETURNS text AS > > $BODY$ SELECT nextval('global_unique_id_seq')::text; $BODY$ > > LANGUAGE 'sql' VOLATILE; > > > > > > now every call to newid() returns a garantied unique id for > > say the next 18446744073709551616 calls. > > Of course you can obfuscate the ID even more using > > md5, include servername and so on, but this will not improve > > security in any way (unless you mix data with 2nd database) > > > This is not really a viable replacement for a GUID == globally unique > identifier. Here global means that if I use the application in > multiple databases, I'm guaranteed that no two identifiers will be > the same. Using a sequence will only support uniqueness for a single > database. So, how can two databases, not currently talking to one another, guarantee that their GUIDs don't collide? using a large randomly generated name space only reduces the chances of collision, it doesn't actually guarantee it.
Windows uses the MAC address in GUID generation. > -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of Scott Marlowe > Sent: Wednesday, July 27, 2005 1:47 PM > To: John DeSoi > Cc: Tino Wildenhain; Riaan van der Westhuizen; Postgresql-General > Subject: Re: [GENERAL] GUID for postgreSQL > > On Wed, 2005-07-27 at 15:32, John DeSoi wrote: > > On Jul 27, 2005, at 4:49 AM, Tino Wildenhain wrote: > > > > > I'd create a sequence: > > > > > > CREATE SEQUENCE global_unique_id_seq; > > > > > > and a function: > > > > > > CREATE OR REPLACE FUNCTION newid() > > > RETURNS text AS > > > $BODY$ SELECT nextval('global_unique_id_seq')::text; $BODY$ > > > LANGUAGE 'sql' VOLATILE; > > > > > > > > > now every call to newid() returns a garantied unique id for > > > say the next 18446744073709551616 calls. > > > Of course you can obfuscate the ID even more using > > > md5, include servername and so on, but this will not improve > > > security in any way (unless you mix data with 2nd database) > > > > > > This is not really a viable replacement for a GUID == globally unique > > identifier. Here global means that if I use the application in > > multiple databases, I'm guaranteed that no two identifiers will be > > the same. Using a sequence will only support uniqueness for a single > > database. > > So, how can two databases, not currently talking to one another, > guarantee that their GUIDs don't collide? using a large randomly > generated name space only reduces the chances of collision, it doesn't > actually guarantee it. > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq
On Jul 27, 2005, at 4:46 PM, Scott Marlowe wrote: > So, how can two databases, not currently talking to one another, > guarantee that their GUIDs don't collide? using a large randomly > generated name space only reduces the chances of collision, it doesn't > actually guarantee it. Like MD5, there is no 100% guarantee, but the collision possibility supposed to be is very close to zero. See http://en.wikipedia.org/wiki/GUID John DeSoi, Ph.D. http://pgedit.com/ Power Tools for PostgreSQL
Yes, this is the problem with GUIDs... you can calculate them by mashing toghether things like the time, a network address, and some random numbers, which makes it very unlikely for a collision.... but at the end of the day that G stand for global, *not* guaranteed. On Wed, 27 Jul 2005, Scott Marlowe wrote: > On Wed, 2005-07-27 at 15:32, John DeSoi wrote: > > On Jul 27, 2005, at 4:49 AM, Tino Wildenhain wrote: > > > > > I'd create a sequence: > > > > > > CREATE SEQUENCE global_unique_id_seq; > > > > > > and a function: > > > > > > CREATE OR REPLACE FUNCTION newid() > > > RETURNS text AS > > > $BODY$ SELECT nextval('global_unique_id_seq')::text; $BODY$ > > > LANGUAGE 'sql' VOLATILE; > > > > > > > > > now every call to newid() returns a garantied unique id for > > > say the next 18446744073709551616 calls. > > > Of course you can obfuscate the ID even more using > > > md5, include servername and so on, but this will not improve > > > security in any way (unless you mix data with 2nd database) > > > > > > This is not really a viable replacement for a GUID == globally unique > > identifier. Here global means that if I use the application in > > multiple databases, I'm guaranteed that no two identifiers will be > > the same. Using a sequence will only support uniqueness for a single > > database. > > So, how can two databases, not currently talking to one another, > guarantee that their GUIDs don't collide? using a large randomly > generated name space only reduces the chances of collision, it doesn't > actually guarantee it. > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
On Wed, 2005-07-27 at 15:57, John DeSoi wrote: > On Jul 27, 2005, at 4:46 PM, Scott Marlowe wrote: > > > So, how can two databases, not currently talking to one another, > > guarantee that their GUIDs don't collide? using a large randomly > > generated name space only reduces the chances of collision, it doesn't > > actually guarantee it. > > > Like MD5, there is no 100% guarantee, but the collision possibility > supposed to be is very close to zero. Then I would think a better thought out solution would be one where your unique ids ARE guaranteed to be unique, where you used something like select 'astringuniqtothismachine'||nextval('localsequence'); That really would be guaranteed unique as long as you set up each machine to have a string unique to it.
> > This is not really a viable replacement for a GUID == > globally unique > > identifier. Here global means that if I use the application in > > multiple databases, I'm guaranteed that no two identifiers > will be the > > same. Using a sequence will only support uniqueness for a single > > database. > > So, how can two databases, not currently talking to one > another, guarantee that their GUIDs don't collide? using a > large randomly generated name space only reduces the chances > of collision, it doesn't actually guarantee it. At least on Windows, the GUID is derived in part from the computers primary MAC address. No, it's not a guarantee, but it's pretty unlikely :-) //Magnus
You could guarantee it, for example... Something like (pseudocode here): create sequence local_id; create domain guid AS text default ('54-' || (nextval(local_id))::text); where 54 is the database id. In this way, every inserted GUID will be guaranteed to contain a GUID in two parts: A database identifier and a locally unique local identifier. These could then be parsed in a reasonable way. The only way I think one can come up with *guaranteed* globally unique identifiers is to place such information such as we use with other things that must be globally unique: have a locally unique identifier along with a globally unique location identifieer. Sort of like we have with IP addresses, MAC addresses, telephone numbers, etc... Best Wishes, Chris Travers Metatron Technology Consulting Ben wrote: >Yes, this is the problem with GUIDs... you can calculate them by mashing >toghether things like the time, a network address, and some random >numbers, which makes it very unlikely for a collision.... but at the end >of the day that G stand for global, *not* guaranteed. > >On Wed, 27 Jul 2005, Scott Marlowe wrote: > > > >>On Wed, 2005-07-27 at 15:32, John DeSoi wrote: >> >> >>>On Jul 27, 2005, at 4:49 AM, Tino Wildenhain wrote: >>> >>> >>> >>>>I'd create a sequence: >>>> >>>>CREATE SEQUENCE global_unique_id_seq; >>>> >>>>and a function: >>>> >>>>CREATE OR REPLACE FUNCTION newid() >>>> RETURNS text AS >>>>$BODY$ SELECT nextval('global_unique_id_seq')::text; $BODY$ >>>> LANGUAGE 'sql' VOLATILE; >>>> >>>> >>>>now every call to newid() returns a garantied unique id for >>>>say the next 18446744073709551616 calls. >>>>Of course you can obfuscate the ID even more using >>>>md5, include servername and so on, but this will not improve >>>>security in any way (unless you mix data with 2nd database) >>>> >>>> >>>This is not really a viable replacement for a GUID == globally unique >>>identifier. Here global means that if I use the application in >>>multiple databases, I'm guaranteed that no two identifiers will be >>>the same. Using a sequence will only support uniqueness for a single >>>database. >>> >>> >>So, how can two databases, not currently talking to one another, >>guarantee that their GUIDs don't collide? using a large randomly >>generated name space only reduces the chances of collision, it doesn't >>actually guarantee it. >> >>---------------------------(end of broadcast)--------------------------- >>TIP 3: Have you checked our extensive FAQ? >> >> http://www.postgresql.org/docs/faq >> >> >> > > > >---------------------------(end of broadcast)--------------------------- >TIP 6: explain analyze is your friend > > > >
Magnus Hagander wrote: >> > >At least on Windows, the GUID is derived in part from the computers >primary MAC address. No, it's not a guarantee, but it's pretty unlikely >:-) > > The danger is not that the MAC address will be duplicated, but that other factors will lead to an MD5 collision. Unless you can show me that there is a 1:1 correspondence of all possible unique factors going into the GUID generation and the output, then I will say it is still no guarantee. Just because two documents or files have the same MD5 doesn't mean that they are the same files either. I.e. you can't go searching all files by MD5 checksums and expecting to find the right one. OTOH, MD5 provides reasonable assurance that any given file (once you know its intended MD5) has not been tampered with. I.e. MD5 is not meant to preclude collisions, but rather it is meant to preclude *intentional* collisions. Similarly, if we want a guaranteed uniqueness to a GUID we have to have some sort of unique string to the GUID prepended to it (not merely used in a hash). So you could use the Mac address of the machine, I guess, if you wanted to.... Best Wishes, Chris Travers Metatron Technology Consulting
There is a "privacy hole" from using the MAC address. (Read it in the WIKI article someone else posted). Probably, it would be better to use a one way hash of the MAC address. > -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of Chris Travers > Sent: Wednesday, July 27, 2005 2:27 PM > To: Magnus Hagander; pgsql-general > Subject: Re: [GENERAL] GUID for postgreSQL > > Magnus Hagander wrote: > > >> > > > >At least on Windows, the GUID is derived in part from the computers > >primary MAC address. No, it's not a guarantee, but it's pretty unlikely > >:-) > > > > > The danger is not that the MAC address will be duplicated, but that > other factors will lead to an MD5 collision. > > Unless you can show me that there is a 1:1 correspondence of all > possible unique factors going into the GUID generation and the output, > then I will say it is still no guarantee. > > Just because two documents or files have the same MD5 doesn't mean that > they are the same files either. I.e. you can't go searching all files > by MD5 checksums and expecting to find the right one. OTOH, MD5 > provides reasonable assurance that any given file (once you know its > intended MD5) has not been tampered with. I.e. MD5 is not meant to > preclude collisions, but rather it is meant to preclude *intentional* > collisions. Similarly, if we want a guaranteed uniqueness to a GUID we > have to have some sort of unique string to the GUID prepended to it (not > merely used in a hash). > > So you could use the Mac address of the machine, I guess, if you wanted > to.... > > Best Wishes, > Chris Travers > Metatron Technology Consulting > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings
Am Mittwoch, den 27.07.2005, 23:03 +0200 schrieb Magnus Hagander: > > > This is not really a viable replacement for a GUID == > > globally unique > > > identifier. Here global means that if I use the application in > > > multiple databases, I'm guaranteed that no two identifiers > > will be the > > > same. Using a sequence will only support uniqueness for a single > > > database. > > So, how can two databases, not currently talking to one > > another, guarantee that their GUIDs don't collide? using a > > large randomly generated name space only reduces the chances > > of collision, it doesn't actually guarantee it. well, then give each database a numer, name or whatever and pad sequence with it. Nothing simpler then that. The global uniqueness is just a wishfull dream. Nobody can garantie it. If you use a distingushed name or number for each of your databases, its garantied. > At least on Windows, the GUID is derived in part from the computers > primary MAC address. No, it's not a guarantee, but it's pretty unlikely > :-) Yes, thats one way.. But really you just need a domain (not related to the internet meaning of domains ;) Anything that lets you identify (or at least distinguish) _your_ databases. (unlikely you are working with every or random databases in the world ;) You dont even need a hash function (md5, sha1) if uniqueness is all you need. Otoh, the often raised clash argument with md5 does not count here because collisions dont happen just per coincidence given the very limited rule (database-identifier + serial)
Use Dblink and do a select off of a sequence on just one of the boxes? You could set up a view that uses DBlink on all the boxes that points to the master seq box. should work. Scott Marlowe wrote: >So, how can two databases, not currently talking to one another, >guarantee that their GUIDs don't collide? using a large randomly >generated name space only reduces the chances of collision, it doesn't >actually guarantee it. > >---------------------------(end of broadcast)--------------------------- >TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > > >
On Wed, Jul 27, 2005 at 05:40:11PM -0500, Tony Caduto wrote: > Use Dblink and do a select off of a sequence on just one of the boxes? > You could set up a view that uses DBlink on all the boxes that points to > the master seq box. > > should work. It'll make the whole thing painfully slow. -- Alvaro Herrera (<alvherre[a]alvh.no-ip.org>) <inflex> really, I see PHP as like a stange amalgamation of C, Perl, Shell <crab> inflex: you know that "amalgam" means "mixture with mercury", more or less, right? <crab> i.e., "deadly poison"
On Jul 27, 2005, at 5:00 PM, Scott Marlowe wrote: > Then I would think a better thought out solution would be one where > your > unique ids ARE guaranteed to be unique, where you used something like > > select 'astringuniqtothismachine'||nextval('localsequence'); > > That really would be guaranteed unique as long as you set up each > machine to have a string unique to it. I have implemented this type of approach in distributed systems. The problem is users who make a copy of their database, continue to use both copies, and then call you when they try to merge things together. I would say user opportunity to mess this up is way more likely than having a GUID collision. I'm not saying that GUIDs are the ultimate solution to this problem. The original poster brought up the need to store GUIDs in a database. There are protocols and standards that require GUIDs and I merely agree it would be nice to have a GUID data type. John DeSoi, Ph.D. http://pgedit.com/ Power Tools for PostgreSQL
On Wed, Jul 27, 2005 at 07:43:08PM -0400, John DeSoi wrote: > I'm not saying that GUIDs are the ultimate solution to this problem. > The original poster brought up the need to store GUIDs in a database. > There are protocols and standards that require GUIDs and I merely > agree it would be nice to have a GUID data type. AFAIR there is one on gborg. -- Alvaro Herrera (<alvherre[a]alvh.no-ip.org>) "Just treat us the way you want to be treated + some extra allowance for ignorance." (Michael Brusser)
I don't think it would be that slow, unless running on a modem. It would really depend on what you are doing, calling dblink 1000 times a second would be slow, but I could see it being used succesfully for other applications. Alvaro Herrera wrote: >On Wed, Jul 27, 2005 at 05:40:11PM -0500, Tony Caduto wrote: > > >>Use Dblink and do a select off of a sequence on just one of the boxes? >>You could set up a view that uses DBlink on all the boxes that points to >>the master seq box. >> >>should work. >> >> > >It'll make the whole thing painfully slow. > > >
Also sprach Scott Marlowe (smarlowe@g2switchworks.com) > On Wed, 2005-07-27 at 15:57, John DeSoi wrote: > > On Jul 27, 2005, at 4:46 PM, Scott Marlowe wrote: > > > > > So, how can two databases, not currently talking to one another, > > > guarantee that their GUIDs don't collide? using a large randomly > > > generated name space only reduces the chances of collision, it doesn't > > > actually guarantee it. > > > > > > Like MD5, there is no 100% guarantee, but the collision possibility > > supposed to be is very close to zero. > > Then I would think a better thought out solution would be one where your > unique ids ARE guaranteed to be unique, where you used something like > > select 'astringuniqtothismachine'||nextval('localsequence'); > > That really would be guaranteed unique as long as you set up each > machine to have a string unique to it. I was a little bit confused about the uniqueness of GUID (esp. using only 16 Bit [1]) and read the article about the UUID: http://en.wikipedia.org/wiki/Universally_Unique_Identifier It states: Conceptually, the original (version 1) generation scheme for UUIDs was to concatenate the UUID version with the MAC address of the computer that is generating the UUID, and with the number of 100-nanosecond intervals since the adoption of the Gregorian calendar. In practice, the actual algorithm is more complicated. This scheme has been criticized in that it is not sufficiently 'opaque'; it reveals both the identity of the computer that generated the UUID and the time at which it did so. Several other generation algorithms have been developed and incorporated into the standard, including a scheme relying only on random numbers (version 4 UUIDs), and a scheme deriving a UUID from a URL via MD5 (version 3 UUIDs) or SHA-1 (version 5 UUIDs) hashing. The 5.0 release of Java provides a class that will produce 128-bit UUIDs. The API documentation for the java.util.UUID class refers to ISO/IEC 11578:1996. So as I understand the GUIDs only apply to the Microsoft Universe[TM], or are there any other serious apllications using it? I don't see any value of that, if not all real databases support one GUID standard (with more than 16 Bit) worldwide. It might also be a good idea to reserve space of the hash to identify the database (vendor), just like FCC IDs or MACs do, but that would require a central authority assigning those names and numbers. [1] Why do people want to have IPv6? -- PGP FPR: CF74 D5F2 4871 3E5C FFFE 0130 11F4 C41E B3FB AE33 -- Der Geist des Kriegers sollte mit Beginn des Neujahrstages bis zum Ende des Jahres vom Gedanken an seinen Tod beherrscht werden. Daijouji Shigesuke in "Budo Shoshin Shuu"
Attachment
chris@travelamericas.com (Chris Travers) writes: >>At least on Windows, the GUID is derived in part from the computers >>primary MAC address. No, it's not a guarantee, but it's pretty unlikely >>:-) > The danger is not that the MAC address will be duplicated, but that > other factors will lead to an MD5 collision. What factors? The 'random' portion comes _in addition to_ location information and a timestamp, so that for there to be a collision, you need to be generating thousands of GUIDs *per millisecond* on a given host. > Unless you can show me that there is a 1:1 correspondence of all > possible unique factors going into the GUID generation and the output, > then I will say it is still no guarantee. Read the RFC (4122). It shows how it works. > So you could use the Mac address of the machine, I guess, if you > wanted to.... That is one option; section 4.3 of the RFC suggests an alternative that is also likely to work. -- (format nil "~S@~S" "cbbrowne" "acm.org") http://www.ntlug.org/~cbbrowne/sap.html Rules of the Evil Overlord #78. "I will not tell my Legions of Terror "And he must be taken alive!" The command will be: ``And try to take him alive if it is reasonably practical.''" <http://www.eviloverlord.com/>
stefan@net-tex.de ("Stefan 'Kaishakunin' Schumacher") writes: > So as I understand the GUIDs only apply to the Microsoft Universe[TM], or > are there any other serious apllications using it? No, the RFC (4122) was written by a combination of people including Rich Salz, and is an encoding of the DCE UUID specification previously produced as part of DCE. -- (format nil "~S@~S" "cbbrowne" "acm.org") http://www.ntlug.org/~cbbrowne/sap.html Rules of the Evil Overlord #78. "I will not tell my Legions of Terror "And he must be taken alive!" The command will be: ``And try to take him alive if it is reasonably practical.''" <http://www.eviloverlord.com/>
smarlowe@g2switchworks.com (Scott Marlowe) writes: > On Wed, 2005-07-27 at 15:32, John DeSoi wrote: >> On Jul 27, 2005, at 4:49 AM, Tino Wildenhain wrote: >> >> > I'd create a sequence: >> > >> > CREATE SEQUENCE global_unique_id_seq; >> > >> > and a function: >> > >> > CREATE OR REPLACE FUNCTION newid() >> > RETURNS text AS >> > $BODY$ SELECT nextval('global_unique_id_seq')::text; $BODY$ >> > LANGUAGE 'sql' VOLATILE; >> > >> > >> > now every call to newid() returns a garantied unique id for >> > say the next 18446744073709551616 calls. >> > Of course you can obfuscate the ID even more using >> > md5, include servername and so on, but this will not improve >> > security in any way (unless you mix data with 2nd database) >> >> >> This is not really a viable replacement for a GUID == globally unique >> identifier. Here global means that if I use the application in >> multiple databases, I'm guaranteed that no two identifiers will be >> the same. Using a sequence will only support uniqueness for a single >> database. > > So, how can two databases, not currently talking to one another, > guarantee that their GUIDs don't collide? using a large randomly > generated name space only reduces the chances of collision, it doesn't > actually guarantee it. Consult RFC 4122... <http://www.ietf.org/rfc/rfc4122.txt> "Abstract This specification defines a Uniform Resource Name namespace for UUIDs (Universally Unique IDentifier), also known as GUIDs (Globally Unique IDentifier). A UUID is 128 bits long, and can guarantee uniqueness across space and time. UUIDs were originally used in the Apollo Network Computing System and later in the Open Software Foundation's (OSF) Distributed Computing Environment (DCE), and then in Microsoft Windows platforms. This specification is derived from the DCE specification with the kind permission of the OSF (now known as The Open Group). Information from earlier versions of the DCE specification have been incorporated into this document." See also the gBorg "pgUUID" project: <http://gborg.postgresql.org/project/pguuid/projdisplay.php> -- (format nil "~S@~S" "cbbrowne" "acm.org") http://www.ntlug.org/~cbbrowne/sap.html Rules of the Evil Overlord #78. "I will not tell my Legions of Terror "And he must be taken alive!" The command will be: ``And try to take him alive if it is reasonably practical.''" <http://www.eviloverlord.com/>
On Wed, Jul 27, 2005 at 16:57:21 -0400, John DeSoi <desoi@pgedit.com> wrote: > > On Jul 27, 2005, at 4:46 PM, Scott Marlowe wrote: > > >So, how can two databases, not currently talking to one another, > >guarantee that their GUIDs don't collide? using a large randomly > >generated name space only reduces the chances of collision, it doesn't > >actually guarantee it. > > > Like MD5, there is no 100% guarantee, but the collision possibility > supposed to be is very close to zero. If you use a large enough space for the number you can reduce that probability of an accidental collision to much less than that of catastrophic hardware failure at which point it isn't noticably better than having no chance of collisions.
On 07/28/2005-05:55AM, Stefan 'Kaishakunin' Schumacher wrote: > > I was a little bit confused about the uniqueness of GUID (esp. using > only 16 Bit [1]) and read the article about the UUID: > http://en.wikipedia.org/wiki/Universally_Unique_Identifier > 16 BYTE not bit. From the wiki... A UUID is essentially a 16-byte number and in its canonical form a UUID may look like this: 550E8400-E29B-11D4-A716-446655440000
> If you use a large enough space for the number you can reduce that > probability of an accidental collision to much less than that of > catastrophic hardware failure at which point it isn't noticably better > than having no chance of collisions. I find the comparison unconvincing - if my hardware crashes, I know it and can decide how to recover. If two UIDs collide, my system may silently do something that may never be detected. - John Burger MITRE
"John D. Burger" <john@mitre.org> writes: > > If you use a large enough space for the number you can reduce that > > probability of an accidental collision to much less than that of > > catastrophic hardware failure at which point it isn't noticably better > > than having no chance of collisions. > > I find the comparison unconvincing - if my hardware crashes, I know it and can > decide how to recover. If two UIDs collide, my system may silently do > something that may never be detected. Alright, say the probability of a cosmic ray flipping one bit (if you're picky, flipping two or three bits perfectly arranged so ECC ram doesn't detect it). That would produce similarly subtle and unpredictable behaviour, especially if it hit in a primary key. Though I'm inclined to agree. They seem to be a fad with not much reason to exist. -- greg
On Fri, Jul 29, 2005 at 12:18:30 -0400, "John D. Burger" <john@mitre.org> wrote: > >If you use a large enough space for the number you can reduce that > >probability of an accidental collision to much less than that of > >catastrophic hardware failure at which point it isn't noticably better > >than having no chance of collisions. > > I find the comparison unconvincing - if my hardware crashes, I know it > and can decide how to recover. If two UIDs collide, my system may > silently do something that may never be detected. If it crashes yes, if a bit flips maybe not. Note that by using a larger hash and more random bits you can make this probability arbitrarily small. For 512 hashes with 512 bits of entropy, I doubt you could compare documents fast enough to have a 50-50 chance of finding a collision before the heat death of the universe.
Dann Corbit wrote: > There is a "privacy hole" from using the MAC address. (Read it in the > WIKI article someone else posted). > > Probably, it would be better to use a one way hash of the MAC address. > The chances of MAC addresses colliding (through some low-end network card vendor's sloppy manufacturing process; or through the not uncommon ability to edit your mac address) is surely much higher than the chance of big random numbers colliding. Sure, you won't hurt anything if you add the MAC address to your entropy sources when generating a GUID; but thinking a MAC address is more unique than the bits you'd get from any less human-error-prone source is almost certainly wrong.