Re: serial + db key, or guid? - Mailing list pgsql-general
From | Christopher Browne |
---|---|
Subject | Re: serial + db key, or guid? |
Date | |
Msg-id | CAFNqd5X70eVqKqUKoXKV6eBLD1FY69wp7L0mHfsN3a4XvdqBrg@mail.gmail.com Whole thread Raw |
In response to | Re: serial + db key, or guid? (Christophe Pettus <xof@thebuild.com>) |
Responses |
Re: serial + db key, or guid?
|
List | pgsql-general |
On Tue, 11 Aug 2020 at 12:40, Christophe Pettus <xof@thebuild.com> wrote:
> On Aug 11, 2020, at 09:37, Mark Phillips <mark.phillips@mophilly.com> wrote:
>
> I posed the question on the chance things had evolved since 2012, specifically as it relates to postgres.
The essentials haven't changed. Keys (such as UUIDs, especially UUID v4) that have most of their randomness in the most significant bits can cause significant cache hit problems on large indexes. 128 bit keys are usually overkill for most applications, unless you need actual *global* uniqueness across more than a single database or installation; 64 bit keys are usually sufficient.
Thus, if performance is highly significant (e.g. - there's lots of data in the table, and it is heavily read/written) then it may be tempting to use a sequential value instead because that can be smaller, faster to compare, and won't trash caches (e.g. - with UUIDs, seemingly adjacent data will be spread wider across indexes and will need more cache accesses to get to the data).
If the table is small, or data is infrequently queried/updated, these costs may be irrelevant.
UUIDs (and similar very large random keys) do have the advantage that they are somewhat self-secure: You can expose them to outsiders without having to worry about other keys being guessable.
Not overly obvious from this; the "not guessable" part comes in that chronologically adjacent records won't have any apparent similarity.
With serially assigned transaction IDs, if you, as a user, buy something, and discover that your transaction ID was 1460795, you might well guess that other recent purchases were on transactions 1460794, 1460793, and such, and maybe get at someone else's data by messing with a web URL or such. Whereas, here's 5 uuids I just generated (dbus-uuidgen isn't generating RFC 4122 compliant values, but in context of a little illustration, who cares?)
$ for i in 1 2 3 4 5; do
\ dbus-uuidgen
\ done
0ff745301515c646498cd1165f32cc6e
a9ca459ab6330f24d24af5095f32cc6e
b1cff235d77b1f4d8504920a5f32cc6e
58773af20b34b3c550f4eebf5f32cc6e
f9a13ce961b28751b102c5545f32cc6e
\ dbus-uuidgen
\ done
0ff745301515c646498cd1165f32cc6e
a9ca459ab6330f24d24af5095f32cc6e
b1cff235d77b1f4d8504920a5f32cc6e
58773af20b34b3c550f4eebf5f32cc6e
f9a13ce961b28751b102c5545f32cc6e
There are some identical low-order bits, but they are pretty well hidden by the high-order stuff.
It's somewhat cache-destroying, but not especially well guessable.
There is something which has evolved since 2012; see <https://www.postgresql.org/docs/9.5/uuid-ossp.html>, notably
F.43.2. Building uuid-ossp
F.43.2. Building uuid-ossp
It used to be that adding UUID generator support required the OSSP library, which was sufficiently inconvenient that this would often not be built-in. Since 9.4 (released in 2014), uuid-ossp can use common built-in libraries on Linux, OSX, BSD systems, so it's more likely that it will be included "out of the box" on package-managed deployments of PostgreSQL.
-- When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"
question, "How would the Lone Ranger handle this?"
pgsql-general by date: