Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table - Mailing list pgsql-hackers

From Josh Berkus
Subject Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table
Date
Msg-id 535AA245.2070005@agliodbs.com
Whole thread Raw
In response to Re: 9.4 Proposal: Initdb creates a single table  (David Fetter <david@fetter.org>)
Responses Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table
Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table
Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table
List pgsql-hackers
On 04/24/2014 05:23 PM, Marti Raudsepp wrote:
> On Thu, Apr 24, 2014 at 8:40 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> A pseudo-random UUID is frankly pretty
>> useless to me because (a) it's not really unique
> 
> This is FUD. A pseudorandom UUID contains 122 bits of randomness. As
> long as you can trust the random number generator, the chances of a
> value occurring twice can be estimated using the birthday paradox:
> there's a 50% chance of having *one* collision in a set of 2^61 items.
> Storing this amount of UUIDs alone requires 32 exabytes of storage.
> Factor in the tuple and indexing overheads and you'd be needing close
> to all the hard disk space ever manufactured in the world.

Well, I've already had collisions with UUID-OSSP, in production, with
only around 20 billion values.  So clearly there aren't 122bits of true
randomness in OSSP.  I can't speak for other implementations because I
haven't tried them.

>> (b) it doesn't help me route data at all.
> 
> That's really out of scope for UUIDs. They're about generating
> identifiers, not describing what the identifier means. UUIDs also
> don't happen to cure cancer.

http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327

On the contrary, I would argue that an object identifier which is
completely random is possibly the worst way to form an ID of all
possible concepts; there's no relationship whatsoever between the ID,
the application stack, and the application data; you don't even get the
pseudo-time indexing you get with Serials.   The only reason to do it is
because you're too lazy do implement a better way.

Or to put it another way: a value which is truly random is no identifier
at all.

Compare this with a composite identifier which carries information about
the node, table, and schema of origin for the tuple.  Not only does this
help ensure uniqueness, but it also supports intelligent sharding and
multi-master replication systems.  I don't speak hypothetically; we've
done this in the past and will do it again in the future.

I would love to have some machinery inside PostgreSQL to make this
easier (for example, a useful unique database ID), but I suspect that
acutal implementation will always remain application-specific.

You may say "oh, that's not the job of the identifer", but if it's not,
WTF is the identifer for, then?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Clock sweep not caching enough B-Tree leaf pages?
Next
From: David Fetter
Date:
Subject: Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table