Re: UUID v1 optimizations... - Mailing list pgsql-performance

From Tom Lane
Subject Re: UUID v1 optimizations...
Date
Msg-id 20635.1558796270@sss.pgh.pa.us
Whole thread Raw
In response to UUID v1 optimizations...  (Ancoron Luciferis <ancoron.luciferis@googlemail.com>)
Responses Re: UUID v1 optimizations...
List pgsql-performance
Ancoron Luciferis <ancoron.luciferis@googlemail.com> writes:
> So I investigated the PostgreSQL code to see how it is handling UUID's
> with respect to storage, sorting, aso. but all I could find was that it
> basically falls back to the 16-byte.

Yup, they're just blobs to us.

> After struggling to find a way to optimize things inside the database, I
> reverted to introduce a hack into the application by not shuffling the
> timestamp bytes for the UUID's, which makes it look quite serial in
> terms of byte order.

> So, my question now is: Would it make sense for you to handle these
> time-based UUID's differently internally? Specifically un-shuffling the
> timestamp before they are going to storage?

No, because

(1) UUID layout is standardized;

(2) such a change would break on-disk compatibility for existing
databases;

(3) particularly for the case of application-generated UUIDs, we do
not have enough information to know that this would actually do anything
useful;

(4) it in fact *wouldn't* do anything useful, because we'd still have
to sort UUIDs in the same order as today, meaning that btree index behavior
would remain the same as before.  Plus UUID comparison would get a lot
more complicated and slower than it is now.

(5) even if we ignored all that and did it anyway, it would only help
for version-1 UUIDs.  The index performance issue would still remain for
version-4 (random) UUIDs, which are if anything more common than v1.


FWIW, I don't know what tool you're using to get those "bloat" numbers,
but just because somebody calls it bloat doesn't mean that it is.
The normal, steady-state load factor for a btree index is generally
understood to be about 2/3rds, and that looks to be about what
you're getting for the regular-UUID-format index.  The fact that the
serially-loaded index has nearly no dead space is because we hack the
page split logic to make that happen --- but that is a hack, and it's
not without downsides.  It should *not* be taken to be an indication
of what you can expect for any other insertion pattern.

The insertion-speed aspect is a real problem, but the core of that problem
is that use of any sort of standard-format UUID turns applications that
might have had considerable locality of reference into applications that
have none.  If you can manage to keep your whole index in RAM that would
not hurt too much, but as soon as it doesn't fit you have a problem.
When your app has more or less predictable reference patterns it's best
to design a unique key that matches that, instead of expecting that
essentially-random keys will work well.

Or in short, hacking up the way your app generates UUIDs is exactly
the right way to proceed here.

            regards, tom lane



pgsql-performance by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: UUID v1 optimizations...
Next
From: Ancoron Luciferis
Date:
Subject: Re: UUID v1 optimizations...