Home > mailing lists

Re: uuid-ossp: Performance considerations for different UUID approaches? - Mailing list pgsql-general

From	Bill Moran
Subject	Re: uuid-ossp: Performance considerations for different UUID approaches?
Date	December 22, 2015 17:40:24
Msg-id	20151222124018.bee10b60b3d9b58d7b3a1839@potentialtech.com Whole thread Raw
In response to	uuid-ossp: Performance considerations for different UUID approaches? (Brendan McCollam <bmccollam@uchicago.edu>)
List	pgsql-general

Tree view

On Tue, 22 Dec 2015 11:07:30 -0600
Brendan McCollam <bmccollam@uchicago.edu> wrote:

> (previously posted to the pgsql-performance list)
>
> Hello,
>
> We're in the process of designing the database for a new service, and
> some of our tables are going to be using UUID primary key columns.
>
> We're trying to decide between:
>
> * UUIDv1 (timestamp/MAC uuid) and
>
> * UUIDv4 (random uuid)
>
> And the separate but related choice between:
>
> * Generating the UUIDs client-side with the Python uuid library
> (https://docs.python.org/2/library/uuid.html) or
>
> * Letting PostgreSQL handle uuid creation with the uuid-ossp extension
> (http://www.postgresql.org/docs/9.4/static/uuid-ossp.html)
>
> In terms of insert and indexing/retrieval performance, is there one
> clearly superior approach? If not, could somebody speak to the
> performance tradeoffs of different approaches?
>
> There seem to be sources online (e.g.
> https://blog.starkandwayne.com/2015/05/23/uuid-primary-keys-in-postgresql/
> http://rob.conery.io/2014/05/29/a-better-id-generator-for-postgresql/)
> that claim that UUIDv4 (random) will lead to damaging keyspace
> fragmentation and using UUIDv1 will avoid this.

There's no substance to these claims. Chasing the links around we finally
find this article:
http://www.sqlskills.com/blogs/kimberly/guids-as-primary-keys-andor-the-clustering-key/
which makes the reasonable argument that random primary keys can cause
performance robbing fragmentation on clustered indexes.
But Postgres doesn't _have_ clustered indexes, so that article doesn't
apply at all. The other authors appear to have missed this important
point.

One could make the argument that the index itself becomming fragmented
could cause some performance degredation, but I've yet to see any
convincing evidence that index fragmentation produces any measurable
performance issues (my own experiments have been inconclusive).

Looking at it another way, a quick experiment shows that PG can fit
about 180 UUID primary keys per database page, which means a million
row table will use about 5600 pages to the tune of about 46m. On
modern hardware, that index is likely to be wholly in memory all the
time.

If your performance requirements are really so dire, then you should
probably consider ditching UUIDs as keys. Taking the same million row
table I postulated in the previous paragraph, but using ints insted
of UUIDs for the primary key, the primary key index would be about
3200 pages (~26m) ... or almost 1/2 the size -- making it more likely
to all be in memory at any point in time.

I seriously doubt that trying to make your UUIDs generate in a
predictable fashon will produce any measurable improvement, and I
see no evidence in the articles you cited that claims otherwise
have any real basis or were made by anyone knowledgeable enough
to know.

--
Bill Moran

pgsql-general by date:

From: Brendan McCollam
Date: 22 December 2015, 17:07:40
Subject: uuid-ossp: Performance considerations for different UUID approaches?

From: Aleksander Łukasz
Date: 22 December 2015, 18:32:10
Subject: Re: Table with seemingly duplicated primary key values

Re: uuid-ossp: Performance considerations for different UUID approaches? - Mailing list pgsql-general

Previous

Next