Re: UUID v7 - Mailing list pgsql-hackers
From | Sergey Prokhorenko |
---|---|
Subject | Re: UUID v7 |
Date | |
Msg-id | 1945125834.2044089.1706574441164@mail.yahoo.com Whole thread Raw |
In response to | Re: UUID v7 ("Andrey M. Borodin" <x4mmm@yandex-team.ru>) |
List | pgsql-hackers |
Andrey,
I understand and agree with your goals. But instead of dangerous universal functions, it is better to develop safe highly specialized functions that implement only these goals.
There should not be a function uuidv7(T) from an arbitrary timestamp, but there should be a special function that implements your algorithm: uuidv8(now() + '1 century' * random(0,10)).
I replaced 1 day with 1 century because the spread of 1 day is too small. Over time, records will be inserted between existing records, which is undesirable.
Similarly, if we need to calculate the partition id, then we do not need to use the uuid_extract_time() function to provide the extracted timestamp, the accuracy of which cannot be guaranteed. Instead, we need to give exactly the partition id, calculated using the uuidv7 timestamp. For example, partitions may have approximately a month interval between each other.
As for the documentation, it must be indicated that the UUIDv7 structure is not timestamp + random, but timestamp + randomly seeded counter + random, like in all advanced implementations.
Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au
______________________________________________________________
On Monday, 29 January 2024 at 09:32:54 pm GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> On 25 Jan 2024, at 22:04, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:
>
> Aleksander,
>
> In this case the documentation must state that the functions uuid_extract_time() and uuidv7(T) are against the RFC requirements, and that developers may use these functions with caution at their own risk, and these functions are not recommended for production environment.
Refining documentation is good. However, saying that these functions are not recommended for production must be based on some real threats.
>
> The function uuidv7(T) is not better than uuid_extract_time(). Careless developers may well pass any business date into this function: document date, registration date, payment date, reporting date, start date of the current month, data download date, and even a constant. This would be a profanation of UUIDv7 with very negative consequences.
Even if the developer pass constant time to uuidv7(T) they will get what they asked for - unique identifier. Moreover - it still will be keeping locality. There will be no negative consequences at all.
On the contrary, experienced developer can leverage parameter when data locality should be reduced. If you have serveral streams of data, you might want to introduce some shift in reduce contention.
For example, you can generate uuidv7(now() + '1 day' * random(0,10)). This will split 1 contention point to 10 and increase ingestion performance 10x-fold.
> On 29 Jan 2024, at 18:58, Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> If other timestamp sources or
> a custom timestamp epoch are required, UUIDv8 MUST be used.
Well, yeah. RFC says this... in 4 capital letters :) I believe it's kind of a big deficiency that k-way sortable identifiers are not implementable on top of UUIDv7. Well, let's go without this function. UUIDv7 is still an improvement over previous versions.
Jelte, your documentation corrections looks good to me, I'll include them in next version.
Thanks!
Best regards, Andrey Borodin.
pgsql-hackers by date: