Re: UUID v7 - Mailing list pgsql-hackers
From | Jelte Fennema-Nio |
---|---|
Subject | Re: UUID v7 |
Date | |
Msg-id | CAGECzQSdRW1PxqRM9F=DZ9daCc8D32i7HO7nHE1_ep3mOcv_1g@mail.gmail.com Whole thread Raw |
In response to | Re: UUID v7 (Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au>) |
Responses |
Re: UUID v7
|
List | pgsql-hackers |
tl;dr I believe we should remove the uuidv7(timestamp) function from this patchset. On Thu, 25 Jan 2024 at 18:04, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote: > In this case the documentation must state that the functions uuid_extract_time() and uuidv7(T) are against the RFC requirements,and that developers may use these functions with caution at their own risk, and these functions are not recommendedfor production environment. > > The function uuidv7(T) is not better than uuid_extract_time(). Careless developers may well pass any business date intothis function: document date, registration date, payment date, reporting date, start date of the current month, datadownload date, and even a constant. This would be a profanation of UUIDv7 with very negative consequences. After re-reading the RFC more diligently, I'm inclined to agree with Sergey that uuidv7(timestamp) is quite problematic. And I would even say that we should not provide uuidv7(timestamp) at all, and instead should only provide uuidv7(). Providing an explicit timestamp for UUIDv7 is explicitly against the spec (in my reading): > Implementations acquire the current timestamp from a reliable > source to provide values that are time-ordered and continually > increasing. Care must be taken to ensure that timestamp changes > from the environment or operating system are handled in a way that > is consistent with implementation requirements. For example, if > it is possible for the system clock to move backward due to either > manual adjustment or corrections from a time synchronization > protocol, implementations need to determine how to handle such > cases. (See Altering, Fuzzing, or Smearing below.) > > ... > > UUID version 1 and 6 both utilize a Gregorian epoch timestamp > while UUIDv7 utilizes a Unix Epoch timestamp. If other timestamp > sources or a custom timestamp epoch are required, UUIDv8 MUST be > used. > > ... > > Monotonicity (each subsequent value being greater than the last) is > the backbone of time-based sortable UUIDs. By allowing users to provide a timestamp we're not using a continually increasing timestamp for our UUIDv7 generation, and thus it would not be a valid UUIDv7 implementation. I do agree with others however, that being able to pass in an arbitrary timestamp for UUID generation would be very useful. For example to be able to partition by the timestamp in the UUID and then being able to later load data for an older timestamp and have it be added to to the older partition. But it's possible to do that while still following the spec, by using a UUIDv8 instead of UUIDv7. So for this usecase we could make a helper function that generates a UUIDv8 using the same format as a UUIDv7, but allows storing arbitrary timestamps. You might say, why not sligthly change UUIDv7 then? Well mainly because of this critical sentence in the RFC: > UUIDv8's uniqueness will be implementation-specific and MUST NOT be assumed. That would allow us to say that using this UUIDv8 helper requires careful usage and checks if uniqueness is required. So I believe we should remove the uuidv7(timestamp) function from this patchset. I don't see a problem with including uuid_extract_time though. Afaict the only thing the RFC says about extracting timestamps is that the RFC does not give a requirement or guarantee about how close the stored timestamp is to the actual time: > Implementations MAY alter the actual timestamp. Some examples > include security considerations around providing a real clock > value within a UUID, to correct inaccurate clocks, to handle leap > seconds, or instead of dividing a number of microseconds by 1000 > to obtain a millisecond value; dividing by 1024 (or some other > value) for performance reasons. This specification makes no > requirement or guarantee about how close the clock value needs to > be to the actual time. I see no reason why we cannot make stronger guarantees about the timestamps that we use to generate UUIDs with our uuidv7() function. And then we can update the documentation for uuid_extract_time to something like this: > This function extracts a timestamptz from UUID versions 1, 6 and 7. For other > versions and variants this function returns NULL. The extracted timestamp > does not necessarily equate to the time of UUID generation. How close it is > to the actual time depends on the implementation that generated to UUID. > The uuidv7() function provided PostgreSQL will normally store the actual time of > generation to in the UUID, but if large batches of UUIDs are generated at the > same time it's possible that some UUIDs will store a time that is slightly later > than their actual generation time.
pgsql-hackers by date: