Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table - Mailing list pgsql-hackers

From David Fetter
Subject Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table
Date
Msg-id 20140425184607.GI16465@fetter.org
Whole thread Raw
In response to Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table  (Josh Berkus <josh@agliodbs.com>)
List pgsql-hackers
On Fri, Apr 25, 2014 at 10:58:29AM -0700, Josh Berkus wrote:
> On 04/24/2014 05:23 PM, Marti Raudsepp wrote:
> > On Thu, Apr 24, 2014 at 8:40 PM, Josh Berkus <josh@agliodbs.com> wrote:
> >> A pseudo-random UUID is frankly pretty
> >> useless to me because (a) it's not really unique
> > 
> > This is FUD. A pseudorandom UUID contains 122 bits of randomness. As
> > long as you can trust the random number generator, the chances of a
> > value occurring twice can be estimated using the birthday paradox:
> > there's a 50% chance of having *one* collision in a set of 2^61 items.
> > Storing this amount of UUIDs alone requires 32 exabytes of storage.
> > Factor in the tuple and indexing overheads and you'd be needing close
> > to all the hard disk space ever manufactured in the world.
> 
> Well, I've already had collisions with UUID-OSSP, in production, with
> only around 20 billion values.  So clearly there aren't 122bits of true
> randomness in OSSP.  I can't speak for other implementations because I
> haven't tried them.
> 
> >> (b) it doesn't help me route data at all.
> > 
> > That's really out of scope for UUIDs. They're about generating
> > identifiers, not describing what the identifier means. UUIDs also
> > don't happen to cure cancer.
> 
> http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327
> 
> On the contrary, I would argue that an object identifier which is
> completely random is possibly the worst way to form an ID of all
> possible concepts; there's no relationship whatsoever between the ID,
> the application stack, and the application data; you don't even get the
> pseudo-time indexing you get with Serials.   The only reason to do it is
> because you're too lazy do implement a better way.
> 
> Or to put it another way: a value which is truly random is no identifier
> at all.

Not exactly.  It's at least potentially hiding information an attacker
could use, with all the caveats that carries.

> Compare this with a composite identifier which carries information about
> the node, table, and schema of origin for the tuple.  Not only does this
> help ensure uniqueness, but it also supports intelligent sharding and
> multi-master replication systems.  I don't speak hypothetically; we've
> done this in the past and will do it again in the future.

This is an excellent idea, but I don't think it's in scope for UUIDs.

> I would love to have some machinery inside PostgreSQL to make this
> easier (for example, a useful unique database ID), but I suspect that
> acutal implementation will always remain application-specific.
> 
> You may say "oh, that's not the job of the identifer", but if it's not,
> WTF is the identifer for, then?

Frequently, it's to provide some kind of opacity in the sense of not
have an obvious predecessor or successor.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table
Next
From: Boszormenyi Zoltan
Date:
Subject: Re: Review: ECPG FETCH readahead