Re: UUID v7 - Mailing list pgsql-hackers

From Jelte Fennema-Nio
Subject Re: UUID v7
Date
Msg-id CAGECzQSdRW1PxqRM9F=DZ9daCc8D32i7HO7nHE1_ep3mOcv_1g@mail.gmail.com
Whole thread Raw
In response to Re: UUID v7  (Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au>)
Responses Re: UUID v7
List pgsql-hackers
tl;dr I believe we should remove the uuidv7(timestamp) function from
this patchset.

On Thu, 25 Jan 2024 at 18:04, Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
> In this case the documentation must state that the functions uuid_extract_time() and uuidv7(T) are against the RFC
requirements,and that developers may use these functions with caution at their own risk, and these functions are not
recommendedfor production environment. 
>
> The function uuidv7(T) is not better than uuid_extract_time(). Careless developers may well pass any business date
intothis function: document date, registration date, payment date, reporting date, start date of the current month,
datadownload date, and even a constant. This would be a profanation of UUIDv7 with very negative consequences. 

After re-reading the RFC more diligently, I'm inclined to agree with
Sergey that uuidv7(timestamp) is quite problematic. And I would even
say that we should not provide uuidv7(timestamp) at all, and instead
should only provide uuidv7(). Providing an explicit timestamp for
UUIDv7 is explicitly against the spec (in my reading):

> Implementations acquire the current timestamp from a reliable
> source to provide values that are time-ordered and continually
> increasing.  Care must be taken to ensure that timestamp changes
> from the environment or operating system are handled in a way that
> is consistent with implementation requirements.  For example, if
> it is possible for the system clock to move backward due to either
> manual adjustment or corrections from a time synchronization
> protocol, implementations need to determine how to handle such
> cases.  (See Altering, Fuzzing, or Smearing below.)
>
> ...
>
> UUID version 1 and 6 both utilize a Gregorian epoch timestamp
> while UUIDv7 utilizes a Unix Epoch timestamp.  If other timestamp
> sources or a custom timestamp epoch are required, UUIDv8 MUST be
> used.
>
> ...
>
> Monotonicity (each subsequent value being greater than the last) is
> the backbone of time-based sortable UUIDs.

By allowing users to provide a timestamp we're not using a continually
increasing timestamp for our UUIDv7 generation, and thus it would not
be a valid UUIDv7 implementation.

I do agree with others however, that being able to pass in an
arbitrary timestamp for UUID generation would be very useful. For
example to be able to partition by the timestamp in the UUID and then
being able to later load data for an older timestamp and have it be
added to to the older partition. But it's possible to do that while
still following the spec, by using a UUIDv8 instead of UUIDv7. So for
this usecase we could make a helper function that generates a UUIDv8
using the same format as a UUIDv7, but allows storing arbitrary
timestamps. You might say, why not sligthly change UUIDv7 then? Well
mainly because of this critical sentence in the RFC:

> UUIDv8's uniqueness will be implementation-specific and MUST NOT be assumed.

That would allow us to say that using this UUIDv8 helper requires
careful usage and checks if uniqueness is required.

So I believe we should remove the uuidv7(timestamp) function from this patchset.

I don't see a problem with including uuid_extract_time though. Afaict
the only thing the RFC says about extracting timestamps is that the
RFC does not give a requirement or guarantee about how close the
stored timestamp is to the actual time:

> Implementations MAY alter the actual timestamp.  Some examples
> include security considerations around providing a real clock
> value within a UUID, to correct inaccurate clocks, to handle leap
> seconds, or instead of dividing a number of microseconds by 1000
> to obtain a millisecond value; dividing by 1024 (or some other
> value) for performance reasons.  This specification makes no
> requirement or guarantee about how close the clock value needs to
> be to the actual time.

I see no reason why we cannot make stronger guarantees about the
timestamps that we use to generate UUIDs with our uuidv7() function.
And then we can update the documentation for
uuid_extract_time to something like this:

> This function extracts a timestamptz from UUID versions 1, 6 and 7. For other
> versions and variants this function returns NULL. The extracted timestamp
> does not necessarily equate to the time of UUID generation. How close it is
> to the actual time depends on the implementation that generated to UUID.
> The uuidv7() function provided PostgreSQL will normally store the actual time of
> generation to in the UUID, but if large batches of UUIDs are generated at the
> same time it's possible that some UUIDs will store a time that is slightly later
> than their actual generation time.



pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: Re: MERGE ... RETURNING
Next
From: Alvaro Herrera
Date:
Subject: Re: [EXTERNAL] Re: Add non-blocking version of PQcancel