Re: UUID v7 - Mailing list pgsql-hackers
From | Junwang Zhao |
---|---|
Subject | Re: UUID v7 |
Date | |
Msg-id | CAEG8a3+jKVT=wRbeT=dC25Tm_NyXZ-XQTPrDoOnJnGj12A5K4g@mail.gmail.com Whole thread Raw |
In response to | Re: UUID v7 (Jelte Fennema-Nio <postgres@jeltef.nl>) |
List | pgsql-hackers |
On Mon, Jan 29, 2024 at 7:38 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote: > > tl;dr I believe we should remove the uuidv7(timestamp) function from > this patchset. > > On Thu, 25 Jan 2024 at 18:04, Sergey Prokhorenko > <sergeyprokhorenko@yahoo.com.au> wrote: > > In this case the documentation must state that the functions uuid_extract_time() and uuidv7(T) are against the RFC requirements,and that developers may use these functions with caution at their own risk, and these functions are not recommendedfor production environment. > > > > The function uuidv7(T) is not better than uuid_extract_time(). Careless developers may well pass any business date intothis function: document date, registration date, payment date, reporting date, start date of the current month, datadownload date, and even a constant. This would be a profanation of UUIDv7 with very negative consequences. > > After re-reading the RFC more diligently, I'm inclined to agree with > Sergey that uuidv7(timestamp) is quite problematic. And I would even > say that we should not provide uuidv7(timestamp) at all, and instead > should only provide uuidv7(). Providing an explicit timestamp for > UUIDv7 is explicitly against the spec (in my reading): > > > Implementations acquire the current timestamp from a reliable > > source to provide values that are time-ordered and continually > > increasing. Care must be taken to ensure that timestamp changes > > from the environment or operating system are handled in a way that > > is consistent with implementation requirements. For example, if > > it is possible for the system clock to move backward due to either > > manual adjustment or corrections from a time synchronization > > protocol, implementations need to determine how to handle such > > cases. (See Altering, Fuzzing, or Smearing below.) > > > > ... > > > > UUID version 1 and 6 both utilize a Gregorian epoch timestamp > > while UUIDv7 utilizes a Unix Epoch timestamp. If other timestamp > > sources or a custom timestamp epoch are required, UUIDv8 MUST be > > used. > > > > ... > > > > Monotonicity (each subsequent value being greater than the last) is > > the backbone of time-based sortable UUIDs. > > By allowing users to provide a timestamp we're not using a continually > increasing timestamp for our UUIDv7 generation, and thus it would not > be a valid UUIDv7 implementation. > > I do agree with others however, that being able to pass in an > arbitrary timestamp for UUID generation would be very useful. For > example to be able to partition by the timestamp in the UUID and then > being able to later load data for an older timestamp and have it be > added to to the older partition. But it's possible to do that while > still following the spec, by using a UUIDv8 instead of UUIDv7. So for > this usecase we could make a helper function that generates a UUIDv8 > using the same format as a UUIDv7, but allows storing arbitrary > timestamps. You might say, why not sligthly change UUIDv7 then? Well > mainly because of this critical sentence in the RFC: > > > UUIDv8's uniqueness will be implementation-specific and MUST NOT be assumed. > > That would allow us to say that using this UUIDv8 helper requires > careful usage and checks if uniqueness is required. > > So I believe we should remove the uuidv7(timestamp) function from this patchset. Agreed, the RFC section 6.1[1] has the following statements: ``` UUID version 1 and 6 both utilize a Gregorian epoch timestamp while UUIDv7 utilizes a Unix Epoch timestamp. If other timestamp sources or a custom timestamp epoch are required, UUIDv8 MUST be used. ``` In contrib/uuid-ossp, uuidv1 does not allow the user to supply a custom timestamp, so I think it should be the same for uuidv6 and uuidv7. And I have the same feeling that we should not consider v6 and v8 in this patch. [1]: https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#section-6.1-2.4.1 > > I don't see a problem with including uuid_extract_time though. Afaict > the only thing the RFC says about extracting timestamps is that the > RFC does not give a requirement or guarantee about how close the > stored timestamp is to the actual time: > > > Implementations MAY alter the actual timestamp. Some examples > > include security considerations around providing a real clock > > value within a UUID, to correct inaccurate clocks, to handle leap > > seconds, or instead of dividing a number of microseconds by 1000 > > to obtain a millisecond value; dividing by 1024 (or some other > > value) for performance reasons. This specification makes no > > requirement or guarantee about how close the clock value needs to > > be to the actual time. > > I see no reason why we cannot make stronger guarantees about the > timestamps that we use to generate UUIDs with our uuidv7() function. > And then we can update the documentation for > uuid_extract_time to something like this: > > > This function extracts a timestamptz from UUID versions 1, 6 and 7. For other > > versions and variants this function returns NULL. The extracted timestamp > > does not necessarily equate to the time of UUID generation. How close it is > > to the actual time depends on the implementation that generated to UUID. > > The uuidv7() function provided PostgreSQL will normally store the actual time of > > generation to in the UUID, but if large batches of UUIDs are generated at the > > same time it's possible that some UUIDs will store a time that is slightly later > > than their actual generation time. > > -- Regards Junwang Zhao
pgsql-hackers by date: