Thread: UUID column as pimrary key?
I haven't been able to find anywhere, easily, in the documentation using google where a list of allowed data types for primary keys is. So, UUIDs can be primary keys? Any issues wtih them on sorting or paging of index tables, etc.? Also, the documentation says that UUIDs are 128 bit value, but never explicitly says that's how it's stored. Nor does it use one of the nice, blue headered tables for UUID (or ENUM) showing storage and other attributes as it does for numeric, character,boolean, date/time, binary, monetary, geometric, or network types. Dennis Gearon Signature Warning ---------------- It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die.
On Tue, 2011-01-04 at 11:07 -0800, Dennis Gearon wrote: > I haven't been able to find anywhere, easily, in the documentation using google > where a list of allowed data types for primary keys is. Anything that can be UNIQUE NOT NULL > > So, UUIDs can be primary keys? Yes. JD -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
We're using UUID for primary keys in PG 8.4 without any issues. I have no real insights into the details or performance issues, but always figured it was stored as a binary 128-bit value, but with added benefits of being able to enter and view them using a standard string format. We don't sort them as they have no real meaning for us. On 1/4/2011 11:07 AM, Dennis Gearon wrote: > I haven't been able to find anywhere, easily, in the documentation using google > where a list of allowed data types for primary keys is. > > So, UUIDs can be primary keys? > Any issues wtih them on sorting or paging of index tables, etc.? > > Also, the documentation says that UUIDs are 128 bit value, but never explicitly > says that's how it's stored. Nor does it use one of the nice, blue headered > tables for UUID (or ENUM) showing storage and other attributes as it does for > numeric, character,boolean, date/time, binary, monetary, geometric, or network > types.
d.wall@computer.org (David Wall) writes: > We're using UUID for primary keys in PG 8.4 without any issues. I > have no real insights into the details or performance issues, but > always figured it was stored as a binary 128-bit value, but with added > benefits of being able to enter and view them using a standard string > format. We don't sort them as they have no real meaning for us. In principle, this might be a reason to want to do the long-outstanding work on hash indexes; with UUIDs, it mayn't be useful to sort the values, but you *do* want to be able to validate that they're unique. -- output = ("cbbrowne" "@" "gmail.com") http://www3.sympatico.ca/cbbrowne/ "Computers are like air conditioners: They stop working properly if you open windows."
On Tue, 4 Jan 2011 11:07:00 -0800 (PST), Dennis Gearon <gearond@sbcglobal.net> wrote: > I haven't been able to find anywhere, easily, in the documentation > using google > where a list of allowed data types for primary keys is. > > So, UUIDs can be primary keys? > Any issues wtih them on sorting or paging of index tables, etc.? Disadvantage * aren't ordered in way as records are added to DB. * 128bit length (PSQL stores them as 128bit value) * slower to generate you need to use random number generator * ... if you do select * on table with two uuids you will need to scroll GUI to see data :) * ... really unhandy if you want to make manual updates :) Advantage: * simple to generate, and 128bit random is almost globally unique, * you have your id, before executing query, (in contrast to all this autoincrement) so you may put it in dependant rows * almost every platform has UUID generator Advantage / disadvantage * depending on UUID generator, UUID can store some "privacy" information e.g. MAC address of your card, such UUID. Personally I prefer pooled incremental id's. Fast, unique, you have Id before query - but you need to write "code" by self. > Also, the documentation says that UUIDs are 128 bit value, but never > explicitly > says that's how it's stored. Nor does it use one of the nice, blue > headered > tables for UUID (or ENUM) showing storage and other attributes as it > does for > numeric, character,boolean, date/time, binary, monetary, geometric, > or network > types. > > > > Dennis Gearon > > > Signature Warning > ---------------- > It is always a good idea to learn from your own mistakes. It is > usually a better > idea to learn from others’ mistakes, so you do not have to make them > yourself. > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > EARTH has a Right To Life, > otherwise we all die.
On 01/05/2011 07:31 PM, Radosław Smogura wrote: > * you have your id, before executing query, (in contrast to all this > autoincrement) so you may put it in dependant rows Do you mean that with a UUID, you don't need to talk to the database at all, you can generate an ID with no interaction with / involvement with the database at all? Because other than that, there's not much difference in how you normally work with them. With a sequence, you might: CREATE SEQUENCE x_id_seq; CREATE TABLE x ( id integer PRIMIARY KEY DEFAULT nextval('x_id_seq'), y integer ); INSERT INTO x(y) VALUES (1); With a uuid, you'd: CREATE TABLE x ( id uuid PRIMARY KEY DEFAULT uuid_generate_v4(), y integer ); INSERT INTO x(y) VALUES (1); In either case, you can explicitly call the generator function for seq/uuid - nextval(seqname) or uuid_generate_v4() respectively - or you can omit the PK column in your inserts and let the database generate it. > Personally I prefer pooled incremental id's. Fast, unique, you have Id > before query - but you need to write "code" by self. Many libraries / ORMs / etc that interact with Pg will happily take care of this for you. In fact, I had to fight to convince Hibernate that I *didn't* want it to increment all my counters in steps of 50. -- Craig Ringer
On Wed, 05 Jan 2011 21:50:11 +1100, Craig Ringer <craig@postnewspapers.com.au> wrote: > On 01/05/2011 07:31 PM, Radosław Smogura wrote: > >> * you have your id, before executing query, (in contrast to all this >> autoincrement) so you may put it in dependant rows > > Do you mean that with a UUID, you don't need to talk to the database > at all, you can generate an ID with no interaction with / involvement > with the database at all? Because other than that, there's not much > difference in how you normally work with them. > > > With a sequence, you might: > > CREATE SEQUENCE x_id_seq; > CREATE TABLE x ( > id integer PRIMIARY KEY DEFAULT nextval('x_id_seq'), > y integer > ); > INSERT INTO x(y) VALUES (1); > I mean situation, when You create e.g. in one transaction, book and chapters, in some way You need retrieve book's id, by returning clause of insert, or by obtaining id form sequence. It's simpler to write: book_id = new uuid(); insert into book values(book_id,....); insert into chapters values(new uuid(), book_id, ...); isn't it?
On 5 January 2011 15:28, Radosław Smogura <rsmogura@softperience.eu> wrote:
On Wed, 05 Jan 2011 21:50:11 +1100, Craig Ringer <craig@postnewspapers.com.au> wrote:I mean situation, when You create e.g. in one transaction, book and chapters, in some way You need retrieve book's id, by returning clause of insert, or by obtaining id form sequence.On 01/05/2011 07:31 PM, Radosław Smogura wrote:* you have your id, before executing query, (in contrast to all this
autoincrement) so you may put it in dependant rows
Do you mean that with a UUID, you don't need to talk to the database
at all, you can generate an ID with no interaction with / involvement
with the database at all? Because other than that, there's not much
difference in how you normally work with them.
With a sequence, you might:
CREATE SEQUENCE x_id_seq;
CREATE TABLE x (
id integer PRIMIARY KEY DEFAULT nextval('x_id_seq'),
y integer
);
INSERT INTO x(y) VALUES (1);
It's simpler to write:
book_id = new uuid();
insert into book values(book_id,....);
insert into chapters values(new uuid(), book_id, ...);
isn't it?
For me it is simpler just to write this:
bookid = insert into books(...) values(...) returning book_id;
insert into chapters(book_id, ...) values( bookid, ...);
but it's a matter of taste, I think.
regards
Szymon
On Jan 5, 2011, at 1:31 AM, Radosław Smogura wrote: > * simple to generate, and 128bit random is almost globally unique, Almost? Should be totally unique, as long as your random source is decent quality. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
On Jan 5, 2011, at 7:28 AM, Radosław Smogura wrote: > It's simpler to write: ... > isn't it? Depends on the situation, the libraries you're using, and so on. Now, if you're generating records in a distributed system, where your node might be disconnected when it's creating a record,it is *much* simpler in that case ;-) -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
On Wed, Jan 5, 2011 at 2:37 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote: > On Jan 5, 2011, at 1:31 AM, Radosław Smogura wrote: > >> * simple to generate, and 128bit random is almost globally unique, > > Almost? Should be totally unique, as long as your random source is decent quality. But I would never rely on that alone. You always have a strategy in place, in case there's a duplicate. -- GJ
2011/1/5 Grzegorz Jaśkiewicz <gryzman@gmail.com>: > On Wed, Jan 5, 2011 at 2:37 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote: >> On Jan 5, 2011, at 1:31 AM, Radosław Smogura wrote: >> >>> * simple to generate, and 128bit random is almost globally unique, >> >> Almost? Should be totally unique, as long as your random source is decent quality. > > But I would never rely on that alone. You always have a strategy in > place, in case there's a duplicate. As long as all your UUIDs are generated with the same algorithm, they are guaranteed to be unique.
On Jan 5, 2011, at 7:55 AM, Grzegorz Jaśkiewicz wrote: > But I would never rely on that alone. You always have a strategy in > place, in case there's a duplicate. That's really unnecessary, basically a total waste of effort. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
In response to Scott Ribe <scott_ribe@elevated-dev.com>: > On Jan 5, 2011, at 1:31 AM, Radosław Smogura wrote: > > > * simple to generate, and 128bit random is almost globally unique, > > Almost? Should be totally unique, as long as your random source is decent quality. This is going off-topic, but I did some research on this because we were considering using UUIDs for various keys ... Fact is, UUIDs are not _guaranteed_ to be unique. If you use the generating system that includes a MAC address, then in theory, they are guaranteed to be unique, but in practice, MAC addresses aren't guaranteed to be unique either, so that's not 100% either. Beyond that, the namespace size for a UUID is so incomprehensibly huge that the chance of two randomly generated UUIDs having the same value is incomprehensibly unlikely ... it is, however, not a 100% guarantee. Anyway, in our case, we determined that the chance of UUID collision for the dataset in question was extremely unlikely, however, the consequences of such a collision were pretty bad. We also determined that we were able to control a "unit ID" for each independent system that would generate IDs, which could (a) be part of a unique seed for UUIDs, or (b) be a prefix to a autonumber ID that would be a lot easier to read and work with manually. In the end, we chose b for the human factor. Face it, reading, remembering, and typing UUIDs kinda sucks. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
On Jan 5, 2011, at 8:03 AM, Mike Christensen wrote: > As long as all your UUIDs are generated with the same algorithm, they > are guaranteed to be unique. There is no requirement that they be generated with the same algorithm in order to be unique. A MAC/time-based UUID cannotduplicate a random one, and vice versa. (Also applies to the 3rd flavor of UUID whose details I do not remember.) -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
On Jan 5, 2011, at 8:05 AM, Bill Moran wrote: > Beyond that, the namespace size for a UUID is so incomprehensibly huge > that the chance of two randomly generated UUIDs having the same value > is incomprehensibly unlikely Yes, as in: it is *far* more likely that all of your team members and all of your client contacts will be simultaneouslystruck by lightning and killed in their sleep, and it is *far* more likely that all life on earth will be wipedout by an asteroid impact, and it is *far* more likely that the solar system orbits are not actually stable and earthwill fly off into space... If you're worried about UUID collisions, then either your priorities are completely wrong,or you live in a bomb shelter--that's not sarcasm by the way, it's simply true, the chance of a collision is so vanishinglysmall that it is dwarfed by all sorts of risks that we all ignore because the chances are so low, including thechance that all drives in all your RAIDs across all your replicas will simultaneously fail on the same day that firesstart in all the locations where your tapes are stored and all the sprinkler systems fail... (By "far" more likely,I mean many many many orders of magnitude...) > In the end, we chose b for the human > factor. A very good decision, in the case where you're actually able to control each independent system. > Face it, reading, remembering, and typing UUIDs kinda sucks. Lots of copy & paste, or custom GUI tools for devs & DBAs, or abuse like '...%', all of them painful in their own way. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
In response to Scott Ribe <scott_ribe@elevated-dev.com>: > On Jan 5, 2011, at 8:05 AM, Bill Moran wrote: > > > Beyond that, the namespace size for a UUID is so incomprehensibly huge > > that the chance of two randomly generated UUIDs having the same value > > is incomprehensibly unlikely > > Yes, as in: it is *far* more likely that all of your team members and all of your client contacts will be simultaneouslystruck by lightning and killed in their sleep, and it is *far* more likely that all life on earth will be wipedout by an asteroid impact, and it is *far* more likely that the solar system orbits are not actually stable and earthwill fly off into space... If you're worried about UUID collisions, then either your priorities are completely wrong,or you live in a bomb shelter--that's not sarcasm by the way, it's simply true, the chance of a collision is so vanishinglysmall that it is dwarfed by all sorts of risks that we all ignore because the chances are so low, including thechance that all drives in all your RAIDs across all your replicas will simultaneously fail on the same day that firesstart in all the locations where your tapes are stored and all the sprinkler systems fail... (By "far" more likely,I mean many many many orders of magnitude...) That statement demonstrates a lack of investigation and/or consideration of the circumstances. I can't find my math or I'd reproduce it here, but consider this: If you have 50 devices, each generating 100 UUIDs per hour, and you'll keep records for 1 year, then your argument above is probably accurate. However, if there are 5000 devices generating 100 UUIDs per hour, and you'll be keeping those records for 10+ years, the chances of collisions near the end of that 10 year span get high enough to actually make developers nervous. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
Scott Ribe <scott_ribe@elevated-dev.com> writes: > On Jan 5, 2011, at 8:05 AM, Bill Moran wrote: >> Beyond that, the namespace size for a UUID is so incomprehensibly huge >> that the chance of two randomly generated UUIDs having the same value >> is incomprehensibly unlikely > Yes, as in: it is *far* more likely that all of your team members and all of your client contacts will be simultaneouslystruck by lightning and killed in their sleep, and it is *far* more likely that all life on earth will be wipedout by an asteroid impact, I see those sorts of arguments all the time, and I consider them pure BS. Yes, the namespace is theoretically large. However, the questions you really have to answer are (a) how much of the namespace is actually being used by the UUID generation methods in use in a particular application; (b) how sure can you be that there is not correlation between UUIDs generated in different places/sessions. In practical use I think the odds of a collision are *far* higher than you are suggesting, unless the UUID generation is being done with a lot more care than is likely if the user takes these sorts of claims at face value. The odds may still be low enough to be a very good risk, but you need to think about it not just bet your database on it without thinking. Being paranoid is a good thing. It's what DBAs are paid for. regards, tom lane
On 01/05/2011 08:55 AM, Bill Moran wrote: > In response to Scott Ribe <scott_ribe@elevated-dev.com>: > >> On Jan 5, 2011, at 8:05 AM, Bill Moran wrote: >> >>> Beyond that, the namespace size for a UUID is so incomprehensibly huge >>> that the chance of two randomly generated UUIDs having the same value >>> is incomprehensibly unlikely >> >> Yes, as in: it is *far* more likely that all of your team members and all of your client contacts will be simultaneouslystruck by lightning and killed in their sleep, and it is *far* more likely that all life on earth will be wipedout by an asteroid impact, and it is *far* more likely that the solar system orbits are not actually stable and earthwill fly off into space... If you're worried about UUID collisions, then either your priorities are completely wrong,or you live in a bomb shelter--that's not sarcasm by the way, it's simply true, the chance of a collision is so vanishinglysmall that it is dwarfed by all sorts of risks that we all ignore because the chances are so low, including thechance that all drives in all your RAIDs across all your replicas will simultaneously fail on the same day that firesstart in all the locations where your tapes are stored and all the sprinkler systems fail... (By "far" more likely,I mean many many many orders of magnitude...) > > That statement demonstrates a lack of investigation and/or consideration > of the circumstances. > > I can't find my math or I'd reproduce it here, but consider this: > > If you have 50 devices, each generating 100 UUIDs per hour, and you'll > keep records for 1 year, then your argument above is probably accurate. > > However, if there are 5000 devices generating 100 UUIDs per hour, and you'll > be keeping those records for 10+ years, the chances of collisions near > the end of that 10 year span get high enough to actually make developers > nervous. > But we're talking about a primary key. Postgres guarantees the uniqueness. 1 transaction in 10^^100 rolls back due to a second instance of an (otherwise/almost) uuid. Big deal.
On Jan 5, 2011, at 8:55 AM, Bill Moran wrote: > That statement demonstrates a lack of investigation and/or consideration > of the circumstances. No, it doesn't. > However, if there are 5000 devices generating 100 UUIDs per hour, and you'll > be keeping those records for 10+ years, the chances of collisions near > the end of that 10 year span get high enough to actually make developers > nervous. No, they don't. At the end of your hypothetical 10-year period, you will have used about 43,000,000,000 UUIDs, or about 1/100,000,000,000,000,000,000,000,000thof the UUID space (assuming random UUIDs). Leaving you with a chance of a single collisionof about 1/18,000,000,000,000,000. Assuming of course good entropy. If the generation of random numbers is bad, then UUIDs are not so useful ;-) -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
On Wednesday 5. January 2011 16.05.29 Bill Moran wrote: > Beyond that, the namespace size for a UUID is so incomprehensibly huge > that the chance of two randomly generated UUIDs having the same value > is incomprehensibly unlikely ... it is, however, not a 100% guarantee. I can't help thinking of the «Birthday Paradox»: http://en.wikipedia.org/wiki/Birthday_problem regards, Leif
On Jan 5, 2011, at 9:01 AM, Tom Lane wrote: > In practical use I think the odds of a collision are *far* higher than > you are suggesting, unless the UUID generation is being done with a lot > more care than is likely if the user takes these sorts of claims at face > value. Eh? The user taking such claims at face value has no bearing whatsoever on the quality of the UUID generation algorithm providedby the OS. So, unless we're talking about users coming up with their own algorithms, it seems reasonable to assumethat the generation is done with a great deal of care. (And if we are talking about users coming up with their ownalgorithms, then all bets are off; feel free to assume the worst.) I know that is the case on OS X & Linux. I would beshocked if it were not the case on Solaris. I would even be surprised if it were not the case on Windows. The IETF Network Working Group designed UUIDs to ensure that their uniqueness guarantee would be strong enough that no applicationwould need to worry about duplicates, ever. Claims that collisions are too likely to depend on UUIDs being uniquereally are claims that the IETF Network Working Group didn't know what it was doing, which I find a bit ridiculous. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
On Jan 5, 2011, at 9:19 AM, Leif Biberg Kristensen wrote: > I can't help thinking of the «Birthday Paradox»: Yes, the calculation of the probability of a collision is the same for the "birthday paradox" as for random UUID collisions. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
On Wed, Jan 5, 2011 at 3:03 PM, Mike Christensen <mike@kitchenpc.com> wrote: > 2011/1/5 Grzegorz Jaśkiewicz <gryzman@gmail.com>: >> On Wed, Jan 5, 2011 at 2:37 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote: >>> On Jan 5, 2011, at 1:31 AM, Radosław Smogura wrote: >>> >>>> * simple to generate, and 128bit random is almost globally unique, >>> >>> Almost? Should be totally unique, as long as your random source is decent quality. >> >> But I would never rely on that alone. You always have a strategy in >> place, in case there's a duplicate. > > As long as all your UUIDs are generated with the same algorithm, they > are guaranteed to be unique. > Good luck with that .... -- GJ
On 01/05/2011 08:29 AM, Scott Ribe wrote: > On Jan 5, 2011, at 9:01 AM, Tom Lane wrote: > >> In practical use I think the odds of a collision are *far* higher than >> you are suggesting, unless the UUID generation is being done with a lot >> more care than is likely if the user takes these sorts of claims at face >> value. > > Eh? The user taking such claims at face value has no bearing whatsoever on the quality of the UUID generation algorithmprovided by the OS. So, unless we're talking about users coming up with their own algorithms, it seems reasonableto assume that the generation is done with a great deal of care. (And if we are talking about users coming up withtheir own algorithms, then all bets are off; feel free to assume the worst.) I know that is the case on OS X& Linux.I would be shocked if it were not the case on Solaris. I would even be surprised if it were not the case on Windows. > > The IETF Network Working Group designed UUIDs to ensure that their uniqueness guarantee would be strong enough that noapplication would need to worry about duplicates, ever. Claims that collisions are too likely to depend on UUIDs beingunique really are claims that the IETF Network Working Group didn't know what it was doing, which I find a bit ridiculous. > Maybe or maybe not:) http://www.ietf.org/rfc/rfc4122.txt "4.3. Algorithm for Creating a Name-Based UUID The version 3 or 5 UUID is meant for generating UUIDs from "names" that are drawn from, and unique within, some "name space". The concept of name and name space should be broadly construed, and not limited to textual names. For example, some name spaces are the domain name system, URLs, ISO Object IDs (OIDs), X.500 Distinguished Names (DNs), and reserved words in a programming language. The mechanisms or conventions used for allocating names and ensuring their uniqueness within their name spaces are beyond the scope of this specification. The requirements for these types of UUIDs are as follows: o The UUIDs generated at different times from the same name in the same namespace MUST be equal. ...... " -- Adrian Klaver adrian.klaver@gmail.com
Sorry for not citation... When I was talking about "almost unique", I was meaning that the UUID is random so there is no guarantee that you will not generate two indencital UUIDs even in subsequent calls, but it has looooow probability (you have greater chances to win in LOTTO). 128bits is huge for now, but what will happen in next 2,3 years? In 20th century, people think storing only last two digit of year will be enaugh!!! Suppose we are creating gloabl, distributed database for storing information about mobile device and base station it logged in. If we want to guarantee uniquness of UUID across calls, we could talk about much more far _pseudo_ random generator, then "normal" pseudo - randoms, and what I think we need to keep state of such random generator, and share and lock it for multiple concurrent calls. So it will not be something different then ordinal serial column... Now I think it's clear there is no "magical" algorithm for UUIDs. My opinion about all of those UUID with MAC, IP addresses, Microsoft "growing" UUIDs. All of this decrases chance of uniqness of UUID. If we will keep n first bits of UUID constant, then we have only 128-n bits random (truly in UUIDs we have one decimal field reserved for UUID version). If we want next UUID to be greater then previous, at each call we will remove (next-prev) of possible values. Shouldn't this be enaugh for namespace UUIDs new UUID("namespece".hashCode(), "name".hashChode()) or a little joke... new UUID(1,1) meats this condition > o The UUIDs generated at different times from the same name in the > > same namespace MUST be equal.
On Jan 5, 2011, at 10:30 AM, Radosław Smogura wrote: > 128bits is huge for now, but what will happen in next 2,3 years? It will still be large. When you get up to around 100 trillion UUIDs, you'll be getting up to around a 1 in a billion chanceof a single collision. Before you claim that we'll outgrow them in a few years, you might want to consider the actualnumbers, the growth rate of storage density, the number of disks required to hold that many records. I have a feelingthe answers might surprise you ;-) > If we want to guarantee uniquness of UUID across calls, we could talk about > much more far _pseudo_ random generator, then "normal" pseudo - randoms, and > what I think we need to keep state of such random generator, and share and > lock it for multiple concurrent calls. So it will not be something different > then ordinal serial column... No, we don't really have to think about it much at all. The IETF and OS engineers already did. Random UUIDs are not generatedwith some sloppy amateurish algorithm. > My opinion about all of those UUID with MAC, IP addresses, Microsoft "growing" > UUIDs. All of this decrases chance of uniqness of UUID. Well, a decrease from one insanely small chance to another insanely small chance is not anything to worry about. After all,you could argue that 128 bits is a "decrease" from 256 bits. It's the same argument. UUIDs were designed to avoid collisions,by people who knew what they were doing. More significant bits would lower the chance, from "already low enough"to "even more low enough". > Shouldn't this be enaugh for namespace UUIDs > new UUID("namespece".hashCode(), "name".hashChode()) > > or a little joke... > new UUID(1,1) meats this condition >> o The UUIDs generated at different times from the same name in the >> >> same namespace MUST be equal. > People, people, people, please. *Namespace* UUIDs are intended to map *unique* names to UUIDs in the case where you alreadyhave *unique* names and just need to map them to a more compact form, thus the requirement that the same namespace+ name always yields the same UUID. This is *not* a weakness in UUIDs, nor is it the kind of UUID you get from asimple uuid_gen or similar call, nor is it a possible source of collisions for database UUIDs (unless you do something enormouslystupid, like use database fields to construct a name to give to a UUID generator). -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
Scott Ribe <scott_ribe@elevated-dev.com> Wednesday 05 January 2011 17:33:51 > On Jan 5, 2011, at 9:19 AM, Leif Biberg Kristensen wrote: > > I can't help thinking of the «Birthday Paradox»: > Yes, the calculation of the probability of a collision is the same for the > "birthday paradox" as for random UUID collisions. Depends on probability measure we will choose. The true is that probability that in two coin drops we will get two reverses is 1/4, but true is, too, as Newton said, it's 1/3, because if in 1st drop we don't get reverse we don't need to drop again.
In response to Rob Sargent <robjsargent@gmail.com>: > > > On 01/05/2011 08:55 AM, Bill Moran wrote: > > In response to Scott Ribe <scott_ribe@elevated-dev.com>: > > > >> On Jan 5, 2011, at 8:05 AM, Bill Moran wrote: > >> > >>> Beyond that, the namespace size for a UUID is so incomprehensibly huge > >>> that the chance of two randomly generated UUIDs having the same value > >>> is incomprehensibly unlikely > >> > >> Yes, as in: it is *far* more likely that all of your team members and all of your client contacts will be simultaneouslystruck by lightning and killed in their sleep, and it is *far* more likely that all life on earth will be wipedout by an asteroid impact, and it is *far* more likely that the solar system orbits are not actually stable and earthwill fly off into space... If you're worried about UUID collisions, then either your priorities are completely wrong,or you live in a bomb shelter--that's not sarcasm by the way, it's simply true, the chance of a collision is so vanishinglysmall that it is dwarfed by all sorts of risks that we all ignore because the chances are so low, including thechance that all drives in all your RAIDs across all your replicas will simultaneously fail on the same day that firesstart in all the locations where your tapes are stored and all the sprinkler systems fail... (By "far" more likely,I mean many many many orders of magnitude...) > > > > That statement demonstrates a lack of investigation and/or consideration > > of the circumstances. > > > > I can't find my math or I'd reproduce it here, but consider this: > > > > If you have 50 devices, each generating 100 UUIDs per hour, and you'll > > keep records for 1 year, then your argument above is probably accurate. > > > > However, if there are 5000 devices generating 100 UUIDs per hour, and you'll > > be keeping those records for 10+ years, the chances of collisions near > > the end of that 10 year span get high enough to actually make developers > > nervous. > > > > But we're talking about a primary key. Postgres guarantees the > uniqueness. 1 transaction in 10^^100 rolls back due to a second > instance of an (otherwise/almost) uuid. Big deal. That doesn't make any sense. If you're using a single PostgreSQL instance, then why not just use the built in SERIAL mechanism that guarantees that you will NEVER have a conflict? In our case (and I expect it's the case with most people considering UUIDs) we're talking about independent devices that occasionally synchronize data between themselves. These devices need to generate a unique ID of some sort without having to check with every other device. This is one of the problems that UUIDs were intended to fix. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
In response to Scott Ribe <scott_ribe@elevated-dev.com>: > On Jan 5, 2011, at 8:55 AM, Bill Moran wrote: > > > That statement demonstrates a lack of investigation and/or consideration > > of the circumstances. > > No, it doesn't. > > > However, if there are 5000 devices generating 100 UUIDs per hour, and you'll > > be keeping those records for 10+ years, the chances of collisions near > > the end of that 10 year span get high enough to actually make developers > > nervous. > > No, they don't. At the end of your hypothetical 10-year period, you will have used about 43,000,000,000 UUIDs, or about1/100,000,000,000,000,000,000,000,000th of the UUID space (assuming random UUIDs). Leaving you with a chance of a singlecollision of about 1/18,000,000,000,000,000. For crying out loud. If you're going to pick me apart with numbers, then actually do it with some intelligence. I could easily counter your argument by upping the numbers to 500,000 mobile devices generating 10000 UUIDs per hour over 20 years ... or raise it even higher if you come back with that same argument ... But the point (that you are trying to sidestep) is that the UUID namespace is finite, so therefore you WILL hit a problem with conflicts at some point. Just because that point is larger than most people have to concern themselves with isn't an invalidation. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
On Jan 5, 2011, at 11:31 AM, Radosław Smogura wrote: > The true is that probability > that in two coin drops we will get two reverses is 1/4, but true is, too, as > Newton said, it's 1/3, because if in 1st drop we don't get reverse we don't > need to drop again. Nonsense. You don't stop generating UUIDs just because you haven't yet got a collision. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
Yes, the ID is generated before the database is contacted, in my case anyway. The type of UUID that I want is a hybrid, so I would have to write a stored procedure and then a trigger upon insert to get the UUID. Not rocket science, I just have more on my plate than I can handle. So PHP is my 'main thing', so I quickly wrote it in that. Plus, it makes it more database agnostic. I don't get next/currval behavior, but I don't think I'll need it. BTW, Switching from Postgres? Not likely anytime soon. I'd have to be up in the 500M+ rows and be in the data warehousing/map reducing arena before I'd consider THAT. And there's 'flavors' of Postgres that will do that, anyway. Dennis Gearon Signature Warning ---------------- It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. ----- Original Message ---- From: Craig Ringer <craig@postnewspapers.com.au> To: Radosław Smogura <rsmogura@softperience.eu> Cc: Dennis Gearon <gearond@sbcglobal.net>; pgsql-general@postgresql.org Sent: Wed, January 5, 2011 2:50:11 AM Subject: Re: [GENERAL] UUID column as pimrary key? On 01/05/2011 07:31 PM, Radosław Smogura wrote: > * you have your id, before executing query, (in contrast to all this > autoincrement) so you may put it in dependant rows Do you mean that with a UUID, you don't need to talk to the database at all, you can generate an ID with no interaction with / involvement with the database at all? Because other than that, there's not much difference in how you normally work with them. With a sequence, you might: CREATE SEQUENCE x_id_seq; CREATE TABLE x ( id integer PRIMIARY KEY DEFAULT nextval('x_id_seq'), y integer ); INSERT INTO x(y) VALUES (1); With a uuid, you'd: CREATE TABLE x ( id uuid PRIMARY KEY DEFAULT uuid_generate_v4(), y integer ); INSERT INTO x(y) VALUES (1); In either case, you can explicitly call the generator function for seq/uuid - nextval(seqname) or uuid_generate_v4() respectively - or you can omit the PK column in your inserts and let the database generate it. > Personally I prefer pooled incremental id's. Fast, unique, you have Id > before query - but you need to write "code" by self. Many libraries / ORMs / etc that interact with Pg will happily take care of this for you. In fact, I had to fight to convince Hibernate that I *didn't* want it to increment all my counters in steps of 50. -- Craig Ringer
On Jan 5, 2011, at 12:03 PM, Bill Moran wrote: > For crying out loud. If you're going to pick me apart with numbers, then > actually do it with some intelligence. If you're going to get nasty, at least try to be accurate. > I could easily counter your argument by upping the numbers to 500,000 > mobile devices generating 10000 UUIDs per hour over 20 years ... or raise > it even higher if you come back with that same argument ... Yeah, then you get into the realm of 1 in 10s of millions of a chance of collision. But you would need to explain to me howyou would get that many records into the database committed to disk, when the UUIDs alone without any other data representa stream of 22MB/s ;-) Or, looked at another way, inserting 1,388,889 rows/second would indeed be difficult to sustain. > But the point (that you are trying to sidestep) is that the UUID namespace > is finite, so therefore you WILL hit a problem with conflicts at some point. > Just because that point is larger than most people have to concern themselves > with isn't an invalidation. I'm not sidestepping the point at all. The point is that the finiteness of the space is a red herring. The space is largeenough that there's no chance of collision in any realistic scenario. In order to get to a point where the probabilityof collision is high enough to worry about, you have to generate (and collect) UUIDs at a rate that is simplynot realistic--as in your second example quoted above. If you just keep raising your numbers, you could go for 100,000,000,000,000devices generating 100,000,000,000,000 UUIDs an hour for 10,000 years. Collisions would be guaranteed,but that does not make it a useful scenario to consider. 2^256 is a finite space as well. Would you argue that because it "is finite, so therefore you WILL hit a problem with conflictsat some point"? How about 2^512? (Bearing in mind that even though finite that space would be large enough to assignapproximately 10^74 UUIDs to every atom in the observable universe, or 10^51 UUIDs to every atom in the total universeusing high-end estimates of the size of the non-observable universe)? -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
On 01/05/2011 11:57 AM, Bill Moran wrote: > In response to Rob Sargent <robjsargent@gmail.com>: > >> >> >> On 01/05/2011 08:55 AM, Bill Moran wrote: >>> In response to Scott Ribe <scott_ribe@elevated-dev.com>: >>> >>>> On Jan 5, 2011, at 8:05 AM, Bill Moran wrote: >>>> >>>>> Beyond that, the namespace size for a UUID is so incomprehensibly huge >>>>> that the chance of two randomly generated UUIDs having the same value >>>>> is incomprehensibly unlikely >>>> >>>> Yes, as in: it is *far* more likely that all of your team members and all of your client contacts will be simultaneouslystruck by lightning and killed in their sleep, and it is *far* more likely that all life on earth will be wipedout by an asteroid impact, and it is *far* more likely that the solar system orbits are not actually stable and earthwill fly off into space... If you're worried about UUID collisions, then either your priorities are completely wrong,or you live in a bomb shelter--that's not sarcasm by the way, it's simply true, the chance of a collision is so vanishinglysmall that it is dwarfed by all sorts of risks that we all ignore because the chances are so low, including thechance that all drives in all your RAIDs across all your replicas will simultaneously fail on the same day that firesstart in all the locations where your tapes are stored and all the sprinkler systems fail... (By "far" more likely,I mean many many many orders of magnitude...) >>> >>> That statement demonstrates a lack of investigation and/or consideration >>> of the circumstances. >>> >>> I can't find my math or I'd reproduce it here, but consider this: >>> >>> If you have 50 devices, each generating 100 UUIDs per hour, and you'll >>> keep records for 1 year, then your argument above is probably accurate. >>> >>> However, if there are 5000 devices generating 100 UUIDs per hour, and you'll >>> be keeping those records for 10+ years, the chances of collisions near >>> the end of that 10 year span get high enough to actually make developers >>> nervous. >>> >> >> But we're talking about a primary key. Postgres guarantees the >> uniqueness. 1 transaction in 10^^100 rolls back due to a second >> instance of an (otherwise/almost) uuid. Big deal. > > That doesn't make any sense. If you're using a single PostgreSQL instance, > then why not just use the built in SERIAL mechanism that guarantees that > you will NEVER have a conflict? > > In our case (and I expect it's the case with most people considering UUIDs) > we're talking about independent devices that occasionally synchronize > data between themselves. These devices need to generate a unique ID > of some sort without having to check with every other device. This is > one of the problems that UUIDs were intended to fix. > Indeed there is a finite space. A very large, but finite space, just as sequence has I suspect. If your software cannot handle a rollback for whatever reason, you have much bigger problem on your hand than the the remote chance of a collision in uuid generation. From wikipedia, "only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. The probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs."
On Jan 5, 2011, at 1:13 PM, Raymond O'Donnell wrote: > Is that taking dark matter into account? :-) It's not clear to me ;-) -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
On 05/01/2011 19:41, Scott Ribe wrote: > to every atom in the observable universe, or 10^51 UUIDs to every > atom in the total universe using high-end estimates of the size of > the non-observable universe)? Is that taking dark matter into account? :-) Ray. -- Raymond O'Donnell :: Galway :: Ireland rod@iol.ie
In response to Rob Sargent <robjsargent@gmail.com>: > > > > In our case (and I expect it's the case with most people considering UUIDs) > > we're talking about independent devices that occasionally synchronize > > data between themselves. These devices need to generate a unique ID > > of some sort without having to check with every other device. This is > > one of the problems that UUIDs were intended to fix. > > Indeed there is a finite space. A very large, but finite space, just as > sequence has I suspect. If your software cannot handle a rollback for > whatever reason, you have much bigger problem on your hand than the the > remote chance of a collision in uuid generation. You missed the point. Despite the fact that the chance of a collision is very, very small, there is no easy way to fix it if it happens. Zero. It can't be done without shutting the system down, recalling all the remote devices and manually reconciling the problem ... which is not an option. Also, it's hard to DETECT the collision. If a device creates a new record with a duplicate UUID, how do we tell that apart, during synchronization, from an update to the record? Now I have your SSN or private medical data included as part of my record, which violates laws, could lead to incorrect medical care that could kill someone ... etc. So, despite the chance being very small, the consequences are HUGE. I can't just detect it and roll back. If you have a single DB backend, these problem are easy to solve -- so easy, in fact, that using UUIDs is overkill. But when you have mission-critical data that must be correct and available under every circumstance, you have to consider that UUIDs are not a guarantee. And when a method that DOES have a guarantee is one of the options, why would you take ANY risk at all, no matter how small? (BTW: I hope that the people who think that the risk is acceptable aren't writing medical software. Even if it only kills one person every 10,000 years because they were given the wrong medicine, that's too often in my opinion) -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
On 01/05/2011 01:43 PM, Bill Moran wrote: > In response to Rob Sargent <robjsargent@gmail.com>: >>> >>> In our case (and I expect it's the case with most people considering UUIDs) >>> we're talking about independent devices that occasionally synchronize >>> data between themselves. These devices need to generate a unique ID >>> of some sort without having to check with every other device. This is >>> one of the problems that UUIDs were intended to fix. >> >> Indeed there is a finite space. A very large, but finite space, just as >> sequence has I suspect. If your software cannot handle a rollback for >> whatever reason, you have much bigger problem on your hand than the the >> remote chance of a collision in uuid generation. > > You missed the point. > > Despite the fact that the chance of a collision is very, very small, there > is no easy way to fix it if it happens. Zero. It can't be done without > shutting the system down, recalling all the remote devices and manually > reconciling the problem ... which is not an option. > > Also, it's hard to DETECT the collision. If a device creates a new record > with a duplicate UUID, how do we tell that apart, during synchronization, > from an update to the record? Now I have your SSN or private medical > data included as part of my record, which violates laws, could lead to > incorrect medical care that could kill someone ... etc. > > So, despite the chance being very small, the consequences are HUGE. I > can't just detect it and roll back. > > If you have a single DB backend, these problem are easy to solve -- so > easy, in fact, that using UUIDs is overkill. > > But when you have mission-critical data that must be correct and available > under every circumstance, you have to consider that UUIDs are not a > guarantee. And when a method that DOES have a guarantee is one of the > options, why would you take ANY risk at all, no matter how small? > > (BTW: I hope that the people who think that the risk is acceptable aren't > writing medical software. Even if it only kills one person every 10,000 > years because they were given the wrong medicine, that's too often in > my opinion) > Seems the software would need to know whether an update or an add was begin performed: is this a new patient or one all ready in the db. New record patient record and new id (of some arbitrary nature) and any new subsidiary data associated appropriately. BTW, you're opinion of efficacy of current medical practice is also out of whack. From a JAMA artical: "Similar to previous studies, almost a quarter (22.7%) of active-care patient deaths were rated as at least possibly preventable by optimal care, with 6.0% rated as probably or definitely preventable. " That's 3 in 50. How d'ya like them odds?
On Wed, Jan 05, 2011 at 12:41:43PM -0700, Scott Ribe wrote: > I'm not sidestepping the point at all. You may be missing it, however, because. . . > The point is that the finiteness of the space is a red herring. The > space is large enough that there's no chance of collision in any > realistic scenario. > In order to get to a point where the probability > of collision is high enough to worry about, you have to generate > (and collect) UUIDs at a rate that is simply not realistic--as in > your second example quoted above. . . .the example was not that UUIDs are being generated and collected in one place at that rate, but that they're being generated in several independent places at a time, and if the cost of the collision is extremely high, there might be reasons not to use the UUID strategy but instead to use something else that is generated algorithmically by the database. There's a trade-off in having distributed systems acting completely independently, and while I have lots of confidence in my colleagues at the IETF (and agree with you that for the overwhelming majority of cases UUIDs are guaranteed-unique enough), correctly making these trade-offs still requires thought and analysis. It's exactly the kind of of analysis that professional paranoids like DBAs are for. A -- Andrew Sullivan ajs@crankycanuck.ca
>> From wikipedia, "only after generating 1 billion UUIDs every second for > the next 100 years, the probability of creating just one duplicate would > be about 50%. The probability of one duplicate would be about 50% if > every person on earth owns 600 million UUIDs." Even if the chances of one person encountering a UUID collision are about once in say 100 billion years, that could be tomorrow.It could also not happen at all in that time span. That's how chance works. You can never assume it won't happen.If the costs incurred by a collision are lower than the costs of preventing it, you can choose to just take the hit,but that doesn't go for _every_ problem domain. The medical systems mentioned up-thread are an example of that. Next to that, UUID's are generated by computers. I have no doubts that the numeric space that makes up a UUID allows forcollision chances as low as described, but are computers capable of generating those numbers sufficiently random thatthey actually achieve that low a chance? I think that's pushing it. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4d24f31d11541020617287!
On 01/05/2011 03:39 PM, Alban Hertroys wrote: >>> From wikipedia, "only after generating 1 billion UUIDs every second for >> the next 100 years, the probability of creating just one duplicate would >> be about 50%. The probability of one duplicate would be about 50% if >> every person on earth owns 600 million UUIDs." > > > Even if the chances of one person encountering a UUID collision are about once in say 100 billion years, that could betomorrow. It could also not happen at all in that time span. That's how chance works. You can never assume it won't happen.If the costs incurred by a collision are lower than the costs of preventing it, you can choose to just take the hit,but that doesn't go for _every_ problem domain. The medical systems mentioned up-thread are an example of that. > > Next to that, UUID's are generated by computers. I have no doubts that the numeric space that makes up a UUID allows forcollision chances as low as described, but are computers capable of generating those numbers sufficiently random thatthey actually achieve that low a chance? I think that's pushing it. > > Alban Hertroys > > -- > If you can't see the forest for the trees, > cut the trees and you'll see there is no forest. > > > !DSPAM:1214,4d24f31711541026711723! > > I'm not talking about chance. I'm talking about robust software. You have to expect some sort of rollback for any of a number of reasons and deal with them. The potential UUID collision is just one example of a contingency one might plan for after having taken care of all the more probable failure points.
I've been enjoying this discussion, but it seems to me there are two possibilities that guarantee unique results, use less storage, and neither would use a UUID for the primary key. The first would assume that only one machine is doing the work on the database. As has been pointed out earlier, in this case the simplest solution would be to simply use a series. As that is guaranteed unique on a single machine and is incredibly simple to implement. Once multiple machines are linked to maintain the database, this has a flaw in it as a series is not guaranteed to be unique between machines (actually I think they're pretty well guaranteed *NOT* to be, but that's another issue. In that case, another field is needed as follows: Create table test ( id serial, machine_id integer, ..... primary_key(machine_id, id); ) Each machine would have a unique machine_id. This would guarantee uniqueness and be very easy to maintain. In addition - *IF* it were ever necessary - it would be easy to determine which machine generated records. OK, I'm probably going to regret jumping in here, but I couldn't resist. Sorry
On Wed, Jan 5, 2011 at 1:03 PM, Bill Moran <wmoran@potentialtech.com> wrote: > > But the point (that you are trying to sidestep) is that the UUID namespace > is finite, so therefore you WILL hit a problem with conflicts at some point. > Just because that point is larger than most people have to concern themselves > with isn't an invalidation. The UUID itself is 128 bits. Some of those bits are pre-determined. I don't recall, but I think that a "normal" UUID has 121 bits of randomness. How many would one have to store in a database before a collision would even be a concern. Such a database would be freaking huge. Probably far larger than anything that anyone has. Lets say (I'm pulling numbers out of my ass here), that you wanted to store 2^100 rows in a table. Each row would have a UUID and some other meaningful data. Maybe a short string or something. I don't recall what the postgresql row overhead is (~20 bytes?), but lets say that each row in your magic table of death required 64 bytes. A table with 2^100 rows would require nearly 10^31 bytes ( = log_10(64 * 2^100)). How on Earth would you store that much data? And why would you ever need to? I postulate that UUID collisions in Postgresql, using a "good" source for UUID generation, is unlikely to have collisions for any reasonable database. Food for thought: http://blogs.sun.com/dcb/entry/zfs_boils_the_ocean_consumes ps- If my math is off, I apologize. Its been a long day...
Pooled id.. child machine connects to main and says give 10000 of ids, main increments counter by 10000, child allocates in given pool. On Wed, 5 Jan 2011 17:11:35 -0600, Michael Satterwhite <michael@weblore.com> wrote: > Once multiple machines are linked to maintain the database, this has > a flaw in > it as a series is not guaranteed to be unique between machines > (actually I > think they're pretty well guaranteed *NOT* to be
On 6 January 2011 00:32, dennis jenkins <dennis.jenkins.75@gmail.com> wrote:
On Wed, Jan 5, 2011 at 1:03 PM, Bill Moran <wmoran@potentialtech.com> wrote:The UUID itself is 128 bits. Some of those bits are pre-determined.
>
> But the point (that you are trying to sidestep) is that the UUID namespace
> is finite, so therefore you WILL hit a problem with conflicts at some point.
> Just because that point is larger than most people have to concern themselves
> with isn't an invalidation.
I don't recall, but I think that a "normal" UUID has 121 bits of
randomness.
How many would one have to store in a database before a collision
would even be a concern. Such a database would be freaking huge.
Probably far larger than anything that anyone has.
Not necessarily.
Bad luck: two inserts, and one collision.
Good luck: many many inserts, without any collision.
If you have ten pairs of socks... how many do you need to choose, to have two from the same pair (to have a collision)?
Good luck: 2
Bad luck: 11
With uuid you can write a program that most of the time works correctly, but sometimes not.
If you need a program that works correctly all the time, you can use UUID, with the overhead of a procedure, that checks the collisions, and does something with that, or you could generate the PK without UUID, and be sure that the generator won't have collisions, just because it is a normal sequence.
regards
Szymon
dalroi@solfertje.student.utwente.nl (Alban Hertroys) writes: >>> From wikipedia, "only after generating 1 billion UUIDs every second for >> the next 100 years, the probability of creating just one duplicate would >> be about 50%. The probability of one duplicate would be about 50% if >> every person on earth owns 600 million UUIDs." > > > Even if the chances of one person encountering a UUID collision are > about once in say 100 billion years, that could be tomorrow. It could > also not happen at all in that time span. That's how chance works. You > can never assume it won't happen. If the costs incurred by a collision > are lower than the costs of preventing it, you can choose to just take > the hit, but that doesn't go for _every_ problem domain. The medical > systems mentioned up-thread are an example of that. > > Next to that, UUID's are generated by computers. I have no doubts that > the numeric space that makes up a UUID allows for collision chances as > low as described, but are computers capable of generating those > numbers sufficiently random that they actually achieve that low a > chance? I think that's pushing it. I think I'd rather analyze this by: a) Reading the RFC, and b) Looking at some of the common implementations likely to get used rather than to merely have "a doubt." RFC 4122 does NOT point to randomness as the only criterion to discourage collisions, and treating UUIDs as if they were merely about being "sufficiently random to achieve low chance of collision" is insulting to the drafters of the standard, because they were certainly NOT so naive as to think that was sufficient. -- "cbbrowne","@","acm.org" "What I find most amusing about com and .NET is that they are trying to solve a problem I only had when programming using MS tools." -- Max M <maxm@mxm.dk> (on comp.lang.python)
ajs@crankycanuck.ca (Andrew Sullivan) writes: > On Wed, Jan 05, 2011 at 12:41:43PM -0700, Scott Ribe wrote: >> I'm not sidestepping the point at all. > > You may be missing it, however, because. . . > >> The point is that the finiteness of the space is a red herring. The >> space is large enough that there's no chance of collision in any >> realistic scenario. >> In order to get to a point where the probability >> of collision is high enough to worry about, you have to generate >> (and collect) UUIDs at a rate that is simply not realistic--as in >> your second example quoted above. > > . . .the example was not that UUIDs are being generated and collected > in one place at that rate, but that they're being generated in several > independent places at a time, and if the cost of the collision is > extremely high, there might be reasons not to use the UUID strategy > but instead to use something else that is generated algorithmically by > the database. There's a trade-off in having distributed systems > acting completely independently, and while I have lots of confidence > in my colleagues at the IETF (and agree with you that for the > overwhelming majority of cases UUIDs are guaranteed-unique enough), > correctly making these trade-offs still requires thought and > analysis. It's exactly the kind of of analysis that professional > paranoids like DBAs are for. But it seems to me that some of the analytics are getting a little *too* paranoid, on the "perhaps UUIDs are the wrong answer" side of the column. There's no panaceas, here; if the process that is using IDs is fragile, then things can break down whether one is using UUID or SERIAL. I prefer the "probably unique enough" side of the fence, myself. And the process that uses the IDs needs to be robust enough that things won't just fall apart in tatters if it runs into non-uniqueness. I'd expect that to not need to be a terribly big deal - if there's a UNIQUE index on a UUID-based column, then an insert will fail, and the process can pick between things like: - Responding that it had a problem, or - Retrying. And if the system isn't prepared for that sort of condition, then it's also not prepared for some seemingly more likely error conditions such as: - The DB connection timed out because something fuzzed out on the network - The DB server fell over and is restarting because (power failed, someone kicked the switch, disk ran out, ...) It seems rather silly to be *totally* paranoid about the not-infinite-uniqueness of UUIDs when there are plenty of other risks lurking around that also need erro checking. -- "cbbrowne","@","gmail.com" http://linuxdatabases.info/info/slony.html "How can you dream the impossible dream when you can't get any sleep?" -- Sam Robb
On Jan 5, 2011, at 3:03 PM, Andrew Sullivan wrote: > ...the example was not that UUIDs are being generated and collected > in one place at that rate, but that they're being generated in several > independent places at a time... In order for a collision to matter, really in order for there to *be* a collision, the duplicate pair has to be collectedin one place. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
On Jan 5, 2011, at 4:11 PM, Michael Satterwhite wrote: > Each machine would have a unique machine_id. This would guarantee uniqueness > and be very easy to maintain. And if somebody clones the disk to a new machine, but leaves the old one in service? Or do you use the MAC address and hopethat's unique? -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
On Wed, Jan 05, 2011 at 06:22:08PM -0500, Chris Browne wrote: > But it seems to me that some of the analytics are getting a little *too* > paranoid, on the "perhaps UUIDs are the wrong answer" side of the > column. That could be. I was simply noting that there are cases where one could legitimately decide that the UUID is a bad fit. I was just objecting to the seeming (to me anyway) claim that UUIDs can always be used. > There's no panaceas, here; if the process that is using IDs is fragile, > then things can break down whether one is using UUID or SERIAL. Not necessarily. In a distributed system where some rows will sometimes (and unpredictably) be shared, guaranteeing that two systems cannot generate the same ID could be really important. In those cases, it's probably better to have a fixed generator ID plus a serial (or, for that matter, a UUID) because then you can be sure you don't run into one another. (Of course, some UUID examples already have this built in. It sort of depends on the algorithm one is using.) > I prefer the "probably unique enough" side of the fence, myself. Me too, most of the time. > And the process that uses the IDs needs to be robust enough that things > won't just fall apart in tatters if it runs into non-uniqueness. But that robustness requirement might be impossible in a distributed system, was all I was trying to point out. > It seems rather silly to be *totally* paranoid about the > not-infinite-uniqueness of UUIDs when there are plenty of other risks > lurking around that also need erro checking. I fully agree with this. A -- Andrew Sullivan ajs@crankycanuck.ca
On Wed, Jan 05, 2011 at 07:44:29PM -0700, Scott Ribe wrote: > In order for a collision to matter, really in order for there to *be* a collision, the duplicate pair has to be collectedin one place. Not at the time of generation, though. They only have to end up in the same place at once. For most cases, the value of the data compared to the improbably low liklihood of collision means that you deal with this case by saying, "Yuck. Well, someone loses." But if the data is valuable enough (say, unique identifiers for nuclear warheads), that's just not an acceptable trade-off. A -- Andrew Sullivan ajs@crankycanuck.ca
On Wed, Jan 05, 2011 at 07:46:18PM -0700, Scott Ribe wrote: > And if somebody clones the disk to a new machine, but leaves the old one in service? Or do you use the MAC address andhope that's unique? > Excellent questions, and exactly the sort (as I was arguing elsewhere in this thread) one has to answer in order to decide what the right strategy is. For instance, it may be that your application need is dealing with incredibly valuable identifiers, that not cloning the disk is a matter of national security, that the cost of finding and fixing a failed unique identification case runs into the billions of dollars (not to mention the millions of deaths), and that the assignment is performed in real time by distributed systems that only eventually, if at all, deliver data to a unified system. In this case, the cost of the failure of uniqueness is very high, and it would be prudent to arrange something that guaranteed that no assigner node could possibly create a duplicate identifier. There are UUID approaches that can do this; they mostly work by guaranteeing a fixed and assignable local part. UUID vs. serial for one of those columns seems to me to be nothing but an application efficiency problem. What is far more likely is that ordinary-value identifiers are being generated, and that you are using them for more or less pedestrian reasons. The cost of an undetected duplicate is still maybe millions of dollars, but the chances of a duplicate going undetected at creation time (like, say, the MAC address of the creating machine is used, the useful lifetime of the UUID is on the order of single-digit years, and there are no more than 10 database back ends involved) are fairly low. In this case, using a proven UUID generator seems like a no-brainer to me (and in fact, I work on an application where we do this). I don't understand the number of knees in this conversation that seem to be jerking against the answer, "It depends." Of course it depends. If there were one answer for everything, developing good database-backed systems wouldn't be something people would pay any of us for. A -- Andrew Sullivan ajs@crankycanuck.ca
On 6 Jan 2011, at 24:27, Chris Browne wrote: >> Next to that, UUID's are generated by computers. I have no doubts that >> the numeric space that makes up a UUID allows for collision chances as >> low as described, but are computers capable of generating those >> numbers sufficiently random that they actually achieve that low a >> chance? I think that's pushing it. > > RFC 4122 does NOT point to randomness as the only criterion to > discourage collisions, and treating UUIDs as if they were merely about > being "sufficiently random to achieve low chance of collision" is > insulting to the drafters of the standard, because they were certainly > NOT so naive as to think that was sufficient. I'm sure the designers knew what they were getting into. This comment was aimed at people claiming things like "with thisand that huge number of events a collision won't occur in 100 billion years", which - to me at least - looks like they'reonly looking at the big number of bits involved without understanding statistical analysis. Let's just say, if the developers of "Microsoft Visual Nuclear Power Plant Designer Professional" were claiming things likethat, would you trust their product? The main point with the randomness of UUID's remains that you _can_ have a collision at any given moment. It's unlikely toever happen, but you can't predict when it will happen if it does. The possible consequences of a collision matter a lotin designing whether and how to handle these collisions. Maybe it doesn't matter at all, maybe you should get a reallyhefty insurance, or maybe you need to evacuate the country. Opposed to that, a sequence isn't random and therefore you can predict when you will run into collisions - namely once thesequence wraps. Considering that even a 32-bit sequence allows for several billions of rows before collisions _can_ occur,you can be certain that your problem is pretty far into the future. It _will_ be a big problem without an obvious solution if it occurs though, as from that point on you will run into a lotof collisions and the resolution to the problem is rather dependent on what you're working on. Now that is not an argument against protecting your application against collisions, if there is a chance that you will runinto collisions (you won't in a 10-record lookup table, for example) then you need to take that into consideration inyour designs, but there are many (usually obvious) cases in which it's safe to omit it. With UUID's that's a little morecomplicated. I don't think anyone in this discussion is saying "Don't use UUID's!". Just be aware of their limitations and the problemdomains where they are sensible to use. The same goes for sequences. It would, for example, be (obviously) pretty insane to use UUID's for a 10-record lookup table. There's plenty of examplesin this thread where they shine, I don't need to repeat that. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4d25777f11541886517442!
>>> Next to that, UUID's are generated by computers. I have no doubts that >>> the numeric space that makes up a UUID allows for collision chances as >>> low as described, but are computers capable of generating those >>> numbers sufficiently random that they actually achieve that low a >>> chance? I think that's pushing it. > The main point with the randomness of UUID's remains that you _can_ have a collision at any given moment. It's unlikelyto ever happen, but you can't predict when it will happen if it does. The possible consequences of a collision mattera lot in designing whether and how to handle these collisions. Maybe it doesn't matter at all, maybe you should geta really hefty insurance, or maybe you need to evacuate the country. Given the math, I suspect the chance of a UUID collision generated by a respected library is roughly the same or less than any other method of choosing a unique id. If you are looking at these extreme improbabilities, your SERIAL isn't guaranteed unique either when you take into account cosmic rays flipping the right bits in your ECC memory or on your disk platter. So if you are worried about such things, give up now because your application is going to have to run with hardware and software in this universe with all its limitations. Maybe I should start a business in providing UUID collision insurance? -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/
On 2011-01-05, Scott Ribe <scott_ribe@elevated-dev.com> wrote: > On Jan 5, 2011, at 9:01 AM, Tom Lane wrote: > >> In practical use I think the odds of a collision are *far* higher than >> you are suggesting, unless the UUID generation is being done with a lot >> more care than is likely if the user takes these sorts of claims at face >> value. > > Eh? The user taking such claims at face value has no bearing whatsoever on the quality of the UUID generation algorithmprovided by the OS. So, unless we're talking about users coming up with their own algorithms, it seems reasonableto assume that the generation is done with a great deal of care. (And if we are talking about users coming up withtheir own algorithms, then all bets are off; feel free to assume the worst.) I know that is the case on OS X & Linux.I would be shocked if it were not the case on Solaris. I would even be surprised if it were not the case on Windows. > The IETF Network Working Group designed UUIDs to ensure that their > uniqueness guarantee would be strong enough that no application would > need to worry about duplicates, ever. Claims that collisions are too > likely to depend on UUIDs being unique really are claims that the IETF > Network Working Group didn't know what it was doing, which I find a > bit ridiculous. Who was it that decided on 32 bits for IP addresses? -- ⚂⚃ 100% natural
On Jan 6, 2011, at 2:51 AM, Jasen Betts wrote: > Who was it that decided on 32 bits for IP addresses? Nice try, but that was rather long before the IETF existed ;-) -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
On Jan 6, 2011, at 1:52 AM, Stuart Bishop wrote: > If you are looking at these extreme > improbabilities, your SERIAL isn't guaranteed unique either when you > take into account cosmic rays flipping the right bits in your ECC > memory or on your disk platter. Yes, that's rather the point, the probability is so extremely low that it in most cases it should be treated as 0. Some peopleseem to have a problem wrapping their heads around relative magnitudes that extreme. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
In response to Scott Ribe <scott_ribe@elevated-dev.com>: > On Jan 6, 2011, at 1:52 AM, Stuart Bishop wrote: > > > If you are looking at these extreme > > improbabilities, your SERIAL isn't guaranteed unique either when you > > take into account cosmic rays flipping the right bits in your ECC > > memory or on your disk platter. > > Yes, that's rather the point, the probability is so extremely low that it in most cases it should be treated as 0. Somepeople seem to have a problem wrapping their heads around relative magnitudes that extreme. My head is totally wrapped around probability -- yours is not. It's amazing the number of people on this list who are confused by a bunch of x in y #s being thrown around. If the chance of a duplicate is 1 in a hundred gazillion, then I can still hit a dupe the VERY FIRST TIME I USE IT. I'm writing software that is intended to be used to save lives in the event of an earthquake or flood or cosmic ray flipping bits or any other massive disaster. The last thing I need while a bunch of EMTs are digging bodies out of the rubble trying to save their lives is to need to resolve a key conflict with a bunch of mobile devices, most of which are not reachable because the cell network is down because of the disaster. There's going to be enough other shit going wrong ... my software is supposed to be part of the solution. I don't give a fuck how small the chance of conflict is, the only viable option for that chance is 0. Period. Any argument to the contrary is stupid, asinine and outright negligent. Now, if I were writing facebook or twitter or some other pointless entertainment app, the odd chance of losing a post or whatever to a key conflict is not something I'm going to worry about for more than the time it takes to yawn. And that's been my point all along, despite people trying to dilute it with nonsense numbers that they don't understand: UUIDs are sufficiently unique for 99.99999999999% of the applications out there. However, you should always take 5 or 10 minutes to consider whether your application is one of the .00000000001% that can't tolerate the tiny risk. And also, if your entire solution to that risk is to rollback the transaction in the event of a conflict, then your application is simple enough that UUIDs are overkill anyway. Use them anyway if you want. As far as statistics are concerned, the chance that someone as batfuck insane as Hitler would rise to power in a major country is 1 in 1,102,196,287,287,859,992,396,273,293,203 -- yet it happened. There. I Godwined the damn thing. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
On Wednesday, January 05, 2011 08:46:18 pm Scott Ribe wrote: > On Jan 5, 2011, at 4:11 PM, Michael Satterwhite wrote: > > Each machine would have a unique machine_id. This would guarantee > > uniqueness and be very easy to maintain. > > And if somebody clones the disk to a new machine, but leaves the old one in > service? Or do you use the MAC address and hope that's unique? That would be a matter of incompetent administration. *NOTHING* can protect against that.
On Jan 6, 2011, at 8:14 AM, Bill Moran wrote: > I don't give a fuck how small the chance of conflict is, the only > viable option for that chance is 0. Period. Any argument to the > contrary is stupid, asinine and outright negligent. Do you give a fuck about the chance that bits will flip in the RAM and not be detected? Do you give a fuck about the chancethat bits will flip in whatever persistent storage is in your device and not be detected? Do you give a fuck aboutthe chance that bits will be flipped in the network and not be detected? Do you give a fuck about the chance that atransistor will go into a metastable state instead of flipping, and lock up the device and lose data not yet saved? Well,of course you do. But now what are the relative odds? All of these things can happen already (and all of them can happenthe very first time you use it), so already your system is not precisely 100% reliable. Now is the risk of UUID collisionsomewhere near the same as these risks, or orders of magnitude higher, or orders of magnitude lower? It does matter. > However, you > should always take 5 or 10 minutes to consider whether your application > is one of the .00000000001% that can't tolerate the tiny risk. If an application/device truly can't tolerate a risk of an error of 1 in 10^-16, that's a problem because you can't builda device without risk of error below some threshold (in that general neighborhood I think). You can only push risksto lower & lower probability, and it makes no sense to focus on a single risk and spend time and effort to push it toorders of magnitude lower probability than all the other risks in the system. (As long as there are risks at orders ofmagnitude higher priority, those should get the time & expense.) > And that's been my point all along, despite people trying to dilute it > with nonsense numbers that they don't understand... No, it hasn't been your point all along. Your point has shifted twice now as you've been shown to be wrong about the odds.And the numbers used are not nonsense at all. All of which somewhat contradicts your statement that your "head is totallywrapped around probability" ;-) Added to your apparent ignoring of other error sources in order to focus on one extremelyunlikely one, well... > And also, if your entire solution to that risk is to rollback the > transaction in the event of a conflict, then your application is simple > enough that UUIDs are overkill anyway. I kind of doubt that the person who posted that intended it as the entire solution. It seemed to me that was intended asjust the event that triggers conflict resolution and the next step would be to inform the device that the conflicting recordis getting a new UUID, update appropriately, and so on. Just so you know, I'm done talking to you. Your arrogance, rudeness, insults, condescension and personal attacks are notsomething that I will deal with anymore. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
On Thursday 06 January 2011 7:14:00 am Bill Moran wrote: > In response to Scott Ribe <scott_ribe@elevated-dev.com>: > > On Jan 6, 2011, at 1:52 AM, Stuart Bishop wrote: > > > If you are looking at these extreme > > > improbabilities, your SERIAL isn't guaranteed unique either when you > > > take into account cosmic rays flipping the right bits in your ECC > > > memory or on your disk platter. > > > > Yes, that's rather the point, the probability is so extremely low that it > > in most cases it should be treated as 0. Some people seem to have a > > problem wrapping their heads around relative magnitudes that extreme. > > > There. I Godwined the damn thing. > > -- > Bill Moran Maybe a wrap up is in order:) As I said earlier this is one of those arguments that could go forever because everyone is right, so to summarize: 1) UUIDs can have a very to extremely large namespace but less than infinite. 2) There are other alternatives i.e SERIAL 3) Managing the above is based on the interaction of three components - software,hardware,wetware(people). Any one of which can have a weakness and in combination their are many permutations. 4) DBAs need to plan for the worse. Worse being somewhat contextual. Real time control of a nuclear plant versus Web social media. Choosing a unique number generator and dealing with possible collisions is contingent on this context. -- Adrian Klaver adrian.klaver@gmail.com
On Jan 6, 2011, at 8:19 AM, Michael Satterwhite wrote: > That would be a matter of incompetent administration. *NOTHING* can protect > against that. Well, no, not necessarily. It might well be a goal (in fact, is a goal with some software that I'm developing), that users/adminsdon't have to worry about data caches moving across machines. My primary point, which I stated incompletely,was that in order to depend on node ids as part of unique ids, requires a degree of control over the administrationof nodes, and for a given application this might or might not be practical. For instance, if your app runson cell phones, and the OSs you deploy on give you access to the device id, and you don't mind using a rather long prefixto form your unique ids, then you have an obvious solution that, as far as I know, is guaranteed to be unique. (Ignoringthe possibility of hacking of the device id, because no matter what you choose as a prefix, if an adversary managesto deliberately change the prefix, you can get duplicates.) My secondary point was that this is rather difficult todetect in time to prevent conflicts. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
On Wed, Jan 5, 2011 at 3:43 PM, Bill Moran <wmoran@potentialtech.com> wrote: > Despite the fact that the chance of a collision is very, very small, there > is no easy way to fix it if it happens. Zero. It can't be done without > shutting the system down, recalling all the remote devices and manually > reconciling the problem ... which is not an option. Sounds like a security problem - an intentional collision is much more likely than a random one.
dennis.jenkins.75@gmail.com (dennis jenkins) writes: > The UUID itself is 128 bits. Some of those bits are pre-determined. > I don't recall, but I think that a "normal" UUID has 121 bits of > randomness. That doesn't match RFC 4122 very well... It indicates 5 forms of UUIDs: 1) Time-based, where about 1/2 the data comes from local timestamp data, and 48 bits come from MAC address (or similar) 2) "DCE Security" (about which it says little) 3) Name-based, using MD5 hashing 4) Randomly generated UUIDs (which are quite likely what you're thinking about) have 122 bits of random data 5) Name-based, using SHA-1 hashing The reasonable choices for a would-be artificial primary key seem to be 1 and 3; in a distributed system, I'd expect to prefer 1, as the time + host data are likely to eliminate the "oh, it might just randomly match" problem. Note the set of functions in the uuid-ossp contrib module fit this, down to omitting Version 2 :-). test@localhost-> \df+ public.uuid_generate* List of functions Schema | Name | Result data type | Argument data types | Type | Volatility | Owner | Language | Source code | Description --------+--------------------+------------------+---------------------------+--------+------------+----------+----------+--------------------+------------- public | uuid_generate_v1 | uuid | | normal | volatile | postgres | c |uuid_generate_v1 | public | uuid_generate_v1mc | uuid | | normal | volatile | postgres | c |uuid_generate_v1mc | public | uuid_generate_v3 | uuid | namespace uuid, name text | normal | immutable | postgres | c |uuid_generate_v3 | public | uuid_generate_v4 | uuid | | normal | volatile | postgres | c |uuid_generate_v4 | public | uuid_generate_v5 | uuid | namespace uuid, name text | normal | immutable | postgres | c |uuid_generate_v5 | (5 rows) -- "I'm all for advancement. However rich text on an ephemeral media is a totally brain-dead idea. Nobody in their right mind would take the effort to prettyfy documents that are going to be gone in a few days." -- Jay Denebeim <denebeim@deepthot.ml.org>
wmoran@potentialtech.com (Bill Moran) writes: > If the chance of a duplicate is 1 in a hundred gazillion, then I can > still hit a dupe the VERY FIRST TIME I USE IT. > > I'm writing software that is intended to be used to save lives in the > event of an earthquake or flood or cosmic ray flipping bits or any > other massive disaster. The last thing I need while a bunch of EMTs > are digging bodies out of the rubble trying to save their lives is to > need to resolve a key conflict with a bunch of mobile devices, most of > which are not reachable because the cell network is down because of > the disaster. There's going to be enough other shit going wrong ... > my software is supposed to be part of the solution. > > I don't give a fuck how small the chance of conflict is, the only > viable option for that chance is 0. Period. Any argument to the > contrary is stupid, asinine and outright negligent. If your system is sufficiently negligently designed that this particular conflict causes it to kill people, then I wouldn't be too inclined to point at this issue with UUIDs being the Real Problem with the system. This is NOT the only risk that the system faces; you can't get *nearly* as low probabilities attached to hardware and network issues such as: - Disks failing - Cosmic rays twiddling bits in memory - Network connections failing part way through the work - Dumb techs blindly cloning the same "host key" onto every one of the EMTs' data collection devices That last item is an argument in *FAVOR* of using UUIDs for the "EMTs digging bodies out of rubble" scenario... The typical alternative to UUIDs would be to have (Sequence + Host Data), and I'd expect there to be a considerable risk of Dumb Techs making mistakes there. It wouldn't be difficult for such a scenario to induce systematic key collisions. It's never just about one risk; it's about *all the risks.* > And that's been my point all along, despite people trying to dilute it > with nonsense numbers that they don't understand: UUIDs are sufficiently > unique for 99.99999999999% of the applications out there. However, you > should always take 5 or 10 minutes to consider whether your application > is one of the .00000000001% that can't tolerate the tiny risk. If your application is so fragile that it cannot tolerate *that* tiny risk, then I have to wonder if your system isn't Just Plain Too Fragile, because there are other unavoidable risks likely to be of much greater probability. -- "Have you noticed that, when we were young, we were told that `everybody else is doing it' was a really stupid reason to do something, but now it's the standard reason for picking a particular software package?" -- Barry Gehm
On Jan 6, 2011, at 9:31 AM, Chris Browne wrote: > The reasonable choices for a would-be artificial primary key seem to be > 1 and 3; in a distributed system, I'd expect to prefer 1, as the time + > host data are likely to eliminate the "oh, it might just randomly match" > problem. In some contexts, 1 is considered a security weakness, as it reveals information about which machine generated it and when,which is why most OS-supplied uuid generators now default to 4 (random). This tends to be more of a concern with encryption/securityuses, and if it's not a concern for your db[*], then your are correct that 1 is likely the best choice. [*] After all, in many dbs we log all sorts of explicit where/who/when for auditing purposes. In that case, having ids thatprovide a clue of where/when most certainly does not add any legitimate security concern. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
It was recently brought to my attention that in my enthusiasm to have my point of view understood, that I may have been crude, or "self-important" or something else that people find offensive. My apologies to anyone who was offended, and to anyone who considers this thread a bikeshed or flamewar that I have been contributing to. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
On 6 Jan 2011, at 17:51, Chris Browne wrote: > wmoran@potentialtech.com (Bill Moran) writes: > If your system is sufficiently negligently designed that this particular > conflict causes it to kill people, then I wouldn't be too inclined to > point at this issue with UUIDs being the Real Problem with the system. > > This is NOT the only risk that the system faces; you can't get *nearly* > as low probabilities attached to hardware and network issues such as: > - Disks failing > - Cosmic rays twiddling bits in memory > - Network connections failing part way through the work > - Dumb techs blindly cloning the same "host key" onto every one of the > EMTs' data collection devices Let's say that you actually build a mission critical system for which you'd need to evacuate the country if it fails. Youpick the best ECC RAM you can find, the most reliable type of disk storage available, your fallback network has a fallbacknetwork of it's own, etc. Basically you have done everything you could to ensure that the chances of the system failingare as small as technically possible. All those little failure chances add up to a certain number. Using UUID's for your ID's is not required for the design ofthe system, yet you chose to do so. You added a nearly infinite chance of UUID collisions to the accumulated chance ofthe system failing. Now the system miraculously fails and the country needs evacuating. A committee is going to investigate why it failed. Ifthe dumb techy above is responsible, they just found themselves a scape-goat. If they didn't, but stumble upon your unnecessaryusage of UUID's instead... Let's just say I don't want to be that person. I have to agree with Bill here, if lives depend on your system then anything that adds to the failure chances is very hardto defend. In the end it often boils down to responsibility in case of failure, not to mention what it does to your ownpeace of mind. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4d260af211541129314545!
dalroi@solfertje.student.utwente.nl (Alban Hertroys) writes: > On 6 Jan 2011, at 17:51, Chris Browne wrote: > >> wmoran@potentialtech.com (Bill Moran) writes: >> If your system is sufficiently negligently designed that this particular >> conflict causes it to kill people, then I wouldn't be too inclined to >> point at this issue with UUIDs being the Real Problem with the system. >> >> This is NOT the only risk that the system faces; you can't get *nearly* >> as low probabilities attached to hardware and network issues such as: >> - Disks failing >> - Cosmic rays twiddling bits in memory >> - Network connections failing part way through the work >> - Dumb techs blindly cloning the same "host key" onto every one of the >> EMTs' data collection devices > > Let's say that you actually build a mission critical system for which > you'd need to evacuate the country if it fails. You pick the best ECC > RAM you can find, the most reliable type of disk storage available, > your fallback network has a fallback network of it's own, > etc. Basically you have done everything you could to ensure that the > chances of the system failing are as small as technically possible. > > All those little failure chances add up to a certain number. Using > UUID's for your ID's is not required for the design of the system, yet > you chose to do so. You added a nearly infinite chance of UUID > collisions to the accumulated chance of the system failing. Infinite? The probability can't conceivably exceed 1. It's scarcely likely to exceed "infinitesimal." I've built clustered systems, and frequently, the resulting Rube Goldberg apparatus that tries to protect against failures of the other apparatus trying to protect against failures of further apparatus that tries to protect against failures introduces a tall and unwieldy stack of intricately interwoven components such that operators need to be *mighty* careful not to tip anything over lest the protective apparatus collapse, knocking over the system it was supposed to protect. > Now the system miraculously fails and the country needs evacuating. A > committee is going to investigate why it failed. If the dumb techy > above is responsible, they just found themselves a scape-goat. If they > didn't, but stumble upon your unnecessary usage of UUID's > instead... Let's just say I don't want to be that person. If the system is that mission critical, then it well and truly warrants doing enough proper analysis of the risks to *know* the risks of the various expectable failure conditions, and to do so in rather more detail than the oversimplification of characterizing them as "infinitesimal" or "infinite." > I have to agree with Bill here, if lives depend on your system then > anything that adds to the failure chances is very hard to defend. In > the end it often boils down to responsibility in case of failure, not > to mention what it does to your own peace of mind. It seems to me that using serially assigned values, along with manually assigned server IDs, to construct a would-be-unique value, is likely to introduce quite a lot *more risk* of system failure than would the use of UUIDs. So someone that rules out UUIDs based on some fallacious imagined "infinite chance of collisions" is jumping away from a small risk, and accepting one much more likely to take lives. We haven't seen any indication that would distinguish between "infinite" and "infinitesimal," beyond the fact that "infinite" is infinitely larger than the largest probability that one can find for an event, which is 1. -- (format nil "~S@~S" "cbbrowne" "gmail.com") "But life wasn't yes-no, on-off. Life was shades of gray, and rainbows not in the order of the spectrum." -- L. E. Modesitt, Jr., _Adiamante_
In response to Chris Browne <cbbrowne@acm.org>: > > It seems to me that using serially assigned values, along with manually > assigned server IDs, to construct a would-be-unique value, is likely to > introduce quite a lot *more risk* of system failure than would the use > of UUIDs. First off, server IDs are not randomly assigned. They are assigned automatically when the software is activated and tracked by the master server. Second, there is a very simple mechanism to prevent attempted "device cloning" from causing a problem. (it's not a particularly difficult problem to solve, actually) Third, and very important, is the following conversation we had during planning: "Can we get good quality random data on mobile devices?" "We'll have to research it, we don't know yet." "Wait, before we do the research -- if we find that some mobile devices have good quality random data and others don't -- can we tell the client which devices they're allowed to use?" "No, the client will dictate which devices it must work on." So basically, we realized that even if our concerns about UUID collisions we're unreasonably paranoid, we still couldn't guarantee that the devices would be up to the task of generating UUIDs with sufficient randomness. As I tried to point out earlier in the thread, the collision issue was one of MANY things we considered in this design. It is, however, the ONLY one that's directly relevant to the original conversation. Now that my description of our design seems to be coming under fire, I feel the need to at least point out that we thought it through more than that. I suppose that's mostly my own fault for trying to bring in an example that it's impractical to discuss in full detail. As other people have pointed out, there are concerns about malicious impersonation, device failure, data corruption ... each of these may be more or less likely or dangerous than UUID collision, but not all of them are _intended_ to be solved by not using UUIDs, so claiming that abandoning UUIDs does not fix these problems is completely correct, and also a straw man. > So someone that rules out UUIDs based on some fallacious imagined > "infinite chance of collisions" is jumping away from a small risk, and > accepting one much more likely to take lives. The possibility of collisions is not fallacious, however, the use of "infinite" (I don't remember who wrote that) is obviously not correct. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
On Jan 6, 2011, at 3:52 AM, Stuart Bishop wrote: > Maybe I should start a business in providing UUID collision insurance? Your ideas are intriguing to me and I wish to subscribe to your newsletter. -M
Hey guys and gals, As the originator of this topic, I've received a lot of good answers, opinions, and advice. Thank you. I'm not sure that more conversation on this will go anywhere but down. It seems that UUID vs. Integer is one of those 'values' subjects, like: Sexual practices and preferences. The right way to raise children. How to handle money in a family. What are the defintions of a woman and a man? What religion is best and correct? Which political party really has the best interests of the country as a goal, and is the better party? What's an appropriate lifestyle? Uh . . . . my mother never taught me this(deprived childhood), but I learned the hardway. NEVER discuss these things in public, (especially drunk in a bar in a part of town at a different socio economic level than you), but learn what any possible mate things of these ;-) And UUID vs. Integer? Seems to be something not to discuss too long in a newsgroup. Seriously, there's some emotions here on this subject that as the originator of this topic, I'd like to see calm down and get back to productive work. Is it a deal, guys and gals? This newsgroup I always brag about to my other geek friends. 'Those people on the postgres news group are the steadiest, most helpful, most logical people I know on the Internet. The don't jump on fancy new trends nor hide technology in market speak. Their work revolves around the best tested, theory based practices. They just make the tech world a better place with some of the most reliable, best supported OSS out there.' And you guys and gals ARE that description. Dennis Gearon Signature Warning ---------------- It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die.
On 6 Jan 2011, at 20:36, Chris Browne wrote: > Infinite? The probability can't conceivably exceed 1. Don't start picking om words please, "infinitely small" or "infinitesimal" is obviously what I meant to write there. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4d263d0511543498117537!
On 2011-01-06, Chris Browne <cbbrowne@acm.org> wrote: > (Sequence + Host Data), and I'd expect there to > be a considerable risk of Dumb Techs making mistakes there. It wouldn't > be difficult for such a scenario to induce systematic key collisions. I've seen that happen. cleanup of the effected records was tricky, a few customers were inconvenienced -- ⚂⚃ 100% natural