Re: uuid-ossp: Performance considerations for different UUID approaches? - Mailing list pgsql-general

From Bill Moran
Subject Re: uuid-ossp: Performance considerations for different UUID approaches?
Date
Msg-id 20151222124018.bee10b60b3d9b58d7b3a1839@potentialtech.com
Whole thread Raw
In response to uuid-ossp: Performance considerations for different UUID approaches?  (Brendan McCollam <bmccollam@uchicago.edu>)
List pgsql-general
On Tue, 22 Dec 2015 11:07:30 -0600
Brendan McCollam <bmccollam@uchicago.edu> wrote:

> (previously posted to the pgsql-performance list)
>
> Hello,
>
> We're in the process of designing the database for a new service, and
> some of our tables are going to be using UUID primary key columns.
>
> We're trying to decide between:
>
> * UUIDv1 (timestamp/MAC uuid) and
>
> * UUIDv4 (random uuid)
>
> And the separate but related choice between:
>
> * Generating the UUIDs client-side with the Python uuid library
> (https://docs.python.org/2/library/uuid.html) or
>
> * Letting PostgreSQL handle uuid creation with the uuid-ossp extension
> (http://www.postgresql.org/docs/9.4/static/uuid-ossp.html)
>
> In terms of insert and indexing/retrieval performance, is there one
> clearly superior approach? If not, could somebody speak to the
> performance tradeoffs of different approaches?
>
> There seem to be sources online (e.g.
> https://blog.starkandwayne.com/2015/05/23/uuid-primary-keys-in-postgresql/
> http://rob.conery.io/2014/05/29/a-better-id-generator-for-postgresql/)
> that claim that UUIDv4 (random) will lead to damaging keyspace
> fragmentation and using UUIDv1 will avoid this.

There's no substance to these claims. Chasing the links around we finally
find this article:
http://www.sqlskills.com/blogs/kimberly/guids-as-primary-keys-andor-the-clustering-key/
which makes the reasonable argument that random primary keys can cause
performance robbing fragmentation on clustered indexes.
But Postgres doesn't _have_ clustered indexes, so that article doesn't
apply at all. The other authors appear to have missed this important
point.

One could make the argument that the index itself becomming fragmented
could cause some performance degredation, but I've yet to see any
convincing evidence that index fragmentation produces any measurable
performance issues (my own experiments have been inconclusive).

Looking at it another way, a quick experiment shows that PG can fit
about 180 UUID primary keys per database page, which means a million
row table will use about 5600 pages to the tune of about 46m. On
modern hardware, that index is likely to be wholly in memory all the
time.

If your performance requirements are really so dire, then you should
probably consider ditching UUIDs as keys. Taking the same million row
table I postulated in the previous paragraph, but using ints insted
of UUIDs for the primary key, the primary key index would be about
3200 pages (~26m) ... or almost 1/2 the size -- making it more likely
to all be in memory at any point in time.

I seriously doubt that trying to make your UUIDs generate in a
predictable fashon will produce any measurable improvement, and I
see no evidence in the articles you cited that claims otherwise
have any real basis or were made by anyone knowledgeable enough
to know.

--
Bill Moran


pgsql-general by date:

Previous
From: Brendan McCollam
Date:
Subject: uuid-ossp: Performance considerations for different UUID approaches?
Next
From: Aleksander Łukasz
Date:
Subject: Re: Table with seemingly duplicated primary key values