Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table - Mailing list pgsql-hackers
From | David Fetter |
---|---|
Subject | Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table |
Date | |
Msg-id | 20140425184607.GI16465@fetter.org Whole thread Raw |
In response to | Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table (Josh Berkus <josh@agliodbs.com>) |
List | pgsql-hackers |
On Fri, Apr 25, 2014 at 10:58:29AM -0700, Josh Berkus wrote: > On 04/24/2014 05:23 PM, Marti Raudsepp wrote: > > On Thu, Apr 24, 2014 at 8:40 PM, Josh Berkus <josh@agliodbs.com> wrote: > >> A pseudo-random UUID is frankly pretty > >> useless to me because (a) it's not really unique > > > > This is FUD. A pseudorandom UUID contains 122 bits of randomness. As > > long as you can trust the random number generator, the chances of a > > value occurring twice can be estimated using the birthday paradox: > > there's a 50% chance of having *one* collision in a set of 2^61 items. > > Storing this amount of UUIDs alone requires 32 exabytes of storage. > > Factor in the tuple and indexing overheads and you'd be needing close > > to all the hard disk space ever manufactured in the world. > > Well, I've already had collisions with UUID-OSSP, in production, with > only around 20 billion values. So clearly there aren't 122bits of true > randomness in OSSP. I can't speak for other implementations because I > haven't tried them. > > >> (b) it doesn't help me route data at all. > > > > That's really out of scope for UUIDs. They're about generating > > identifiers, not describing what the identifier means. UUIDs also > > don't happen to cure cancer. > > http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327 > > On the contrary, I would argue that an object identifier which is > completely random is possibly the worst way to form an ID of all > possible concepts; there's no relationship whatsoever between the ID, > the application stack, and the application data; you don't even get the > pseudo-time indexing you get with Serials. The only reason to do it is > because you're too lazy do implement a better way. > > Or to put it another way: a value which is truly random is no identifier > at all. Not exactly. It's at least potentially hiding information an attacker could use, with all the caveats that carries. > Compare this with a composite identifier which carries information about > the node, table, and schema of origin for the tuple. Not only does this > help ensure uniqueness, but it also supports intelligent sharding and > multi-master replication systems. I don't speak hypothetically; we've > done this in the past and will do it again in the future. This is an excellent idea, but I don't think it's in scope for UUIDs. > I would love to have some machinery inside PostgreSQL to make this > easier (for example, a useful unique database ID), but I suspect that > acutal implementation will always remain application-specific. > > You may say "oh, that's not the job of the identifer", but if it's not, > WTF is the identifer for, then? Frequently, it's to provide some kind of opacity in the sense of not have an obvious predecessor or successor. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
pgsql-hackers by date: