Re: UUID v7 - Mailing list pgsql-hackers

From Andrey M. Borodin
Subject Re: UUID v7
Date
Msg-id 672CA2F0-9277-4EFE-808D-4B7D97937739@yandex-team.ru
Whole thread Raw
In response to Re: UUID v7  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers

> On 7 Nov 2024, at 12:42, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Nov 6, 2024 at 10:14 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>>
>>
>>
>>> On 5 Nov 2024, at 23:56, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>>>
>>> <v30-0001-Implement-UUID-v7.patch>
>>
>> Some more thoughts on this patch version:
>>
>> 0. Comment mentioning nanoseconds, while we do not need to carry anything
>> /* Convert TimestampTz back and carry nanoseconds. */
>>
>> 1. There's unnecessary &3 in
>> uuid->data[7] = uuid->data[7] | ((uuid->data[8] >> 6) & 3);
>>
>> 2. Currently we store 0..999 microseconds in 10 bits, so values 1000..1023 are unused. We could use them for
overflow.That would slightly increase non-overflowing capacity when generating more than million UUIDs per second on
onebackend. However, given current performance of our CSPRNG I do not think this feature worth code complexity. 
>>
>
> While using only 10 bits microseconds makes the implementation simple,
> I'm not sure if 10 bits is enough to generate UUIDs at microsecond
> granularity without losing monotonicity. Since 10-bit microseconds are
> used as is in rand_a space, 1000 UUIDs can be generated per
> millisecond without losing monotonicity.

We won’t loose monotonicity on one backend. We will just accumulate time shift.
See
+    us = tv.tv_sec * SECS_PER_DAY * USECS_PER_SEC + tv.tv_usec;
+    if (previous_us >= us)
+        us = previous_us + 1;


>
> For example, in my environment, it took 1808 milliseconds to generate
> 1 million UUIDs. This is about 533 UUIDs generated per millisecond.
BTW we can furether improve this performance by buffering CSPRNG. See
v6-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patchin this thread. 


> As
> UUID generation performance improves, I think 10 bits will not be
> enough.
>
> =# select count(uuidv7()) from generate_series(1, 1_000_000);
> count
> ---------
> 1000000
> (1 row)
>
> Time: 1808.734 ms
>
> I found a similar comment from Sergey Prokhorenko[1]. He also mentioned:
>
>> 4) Microsecond timestamp fraction subtracts 10 bits from random data, which increases the risk of collision. In the
counter,almost all bits are initialized with a random number, which reduces the risk of collision. 
>
> I feel that it's better to switch to Method 1 or 2 with 12 bits or
> larger counter space.
>

12 bits does not differ much. We can have much longer counters. Before switching to Method 3 I used 18 bits counter.
Seeversion v24-0001-Implement-UUID-v7.patch 
This version is more resilent to generating a lot of UUIDs on one backend while still not accumulating time shift.
Yet, UUIDs generated on parallel workers will loose some sortability.

Personally, I think both methods are good. I’d even combine them both. But RFC seems to be not allowing this. BTW if we
justcontinue to use nanoseconds patch, zero bits will act exactly as counters. 


Best regards, Andrey Borodin.


pgsql-hackers by date:

Previous
From: Dave Page
Date:
Subject: Re: doc: pgevent.dll location
Next
From: Bertrand Drouvot
Date:
Subject: Re: define pg_structiszero(addr, s, r)