Re: UUID v7 - Mailing list pgsql-hackers

From Junwang Zhao
Subject Re: UUID v7
Date
Msg-id CAEG8a3+jKVT=wRbeT=dC25Tm_NyXZ-XQTPrDoOnJnGj12A5K4g@mail.gmail.com
Whole thread Raw
In response to Re: UUID v7  (Jelte Fennema-Nio <postgres@jeltef.nl>)
List pgsql-hackers
On Mon, Jan 29, 2024 at 7:38 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> tl;dr I believe we should remove the uuidv7(timestamp) function from
> this patchset.
>
> On Thu, 25 Jan 2024 at 18:04, Sergey Prokhorenko
> <sergeyprokhorenko@yahoo.com.au> wrote:
> > In this case the documentation must state that the functions uuid_extract_time() and uuidv7(T) are against the RFC
requirements,and that developers may use these functions with caution at their own risk, and these functions are not
recommendedfor production environment. 
> >
> > The function uuidv7(T) is not better than uuid_extract_time(). Careless developers may well pass any business date
intothis function: document date, registration date, payment date, reporting date, start date of the current month,
datadownload date, and even a constant. This would be a profanation of UUIDv7 with very negative consequences. 
>
> After re-reading the RFC more diligently, I'm inclined to agree with
> Sergey that uuidv7(timestamp) is quite problematic. And I would even
> say that we should not provide uuidv7(timestamp) at all, and instead
> should only provide uuidv7(). Providing an explicit timestamp for
> UUIDv7 is explicitly against the spec (in my reading):
>
> > Implementations acquire the current timestamp from a reliable
> > source to provide values that are time-ordered and continually
> > increasing.  Care must be taken to ensure that timestamp changes
> > from the environment or operating system are handled in a way that
> > is consistent with implementation requirements.  For example, if
> > it is possible for the system clock to move backward due to either
> > manual adjustment or corrections from a time synchronization
> > protocol, implementations need to determine how to handle such
> > cases.  (See Altering, Fuzzing, or Smearing below.)
> >
> > ...
> >
> > UUID version 1 and 6 both utilize a Gregorian epoch timestamp
> > while UUIDv7 utilizes a Unix Epoch timestamp.  If other timestamp
> > sources or a custom timestamp epoch are required, UUIDv8 MUST be
> > used.
> >
> > ...
> >
> > Monotonicity (each subsequent value being greater than the last) is
> > the backbone of time-based sortable UUIDs.
>
> By allowing users to provide a timestamp we're not using a continually
> increasing timestamp for our UUIDv7 generation, and thus it would not
> be a valid UUIDv7 implementation.
>
> I do agree with others however, that being able to pass in an
> arbitrary timestamp for UUID generation would be very useful. For
> example to be able to partition by the timestamp in the UUID and then
> being able to later load data for an older timestamp and have it be
> added to to the older partition. But it's possible to do that while
> still following the spec, by using a UUIDv8 instead of UUIDv7. So for
> this usecase we could make a helper function that generates a UUIDv8
> using the same format as a UUIDv7, but allows storing arbitrary
> timestamps. You might say, why not sligthly change UUIDv7 then? Well
> mainly because of this critical sentence in the RFC:
>
> > UUIDv8's uniqueness will be implementation-specific and MUST NOT be assumed.
>
> That would allow us to say that using this UUIDv8 helper requires
> careful usage and checks if uniqueness is required.
>
> So I believe we should remove the uuidv7(timestamp) function from this patchset.

Agreed, the RFC section 6.1[1] has the following statements:

```
UUID version 1 and 6 both utilize a Gregorian epoch timestamp while
UUIDv7 utilizes a Unix Epoch timestamp. If other timestamp sources or
a custom timestamp epoch are required, UUIDv8 MUST be used.
```

In contrib/uuid-ossp, uuidv1 does not allow the user to supply a
custom timestamp,
so I think it should be the same for uuidv6 and uuidv7.

And I have the same feeling that we should not consider v6 and v8 in
this patch.


[1]: https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#section-6.1-2.4.1

>
> I don't see a problem with including uuid_extract_time though. Afaict
> the only thing the RFC says about extracting timestamps is that the
> RFC does not give a requirement or guarantee about how close the
> stored timestamp is to the actual time:
>
> > Implementations MAY alter the actual timestamp.  Some examples
> > include security considerations around providing a real clock
> > value within a UUID, to correct inaccurate clocks, to handle leap
> > seconds, or instead of dividing a number of microseconds by 1000
> > to obtain a millisecond value; dividing by 1024 (or some other
> > value) for performance reasons.  This specification makes no
> > requirement or guarantee about how close the clock value needs to
> > be to the actual time.
>
> I see no reason why we cannot make stronger guarantees about the
> timestamps that we use to generate UUIDs with our uuidv7() function.
> And then we can update the documentation for
> uuid_extract_time to something like this:
>
> > This function extracts a timestamptz from UUID versions 1, 6 and 7. For other
> > versions and variants this function returns NULL. The extracted timestamp
> > does not necessarily equate to the time of UUID generation. How close it is
> > to the actual time depends on the implementation that generated to UUID.
> > The uuidv7() function provided PostgreSQL will normally store the actual time of
> > generation to in the UUID, but if large batches of UUIDs are generated at the
> > same time it's possible that some UUIDs will store a time that is slightly later
> > than their actual generation time.
>
>


--
Regards
Junwang Zhao



pgsql-hackers by date:

Previous
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Synchronizing slots from primary to standby
Next
From: David Steele
Date:
Subject: Re: Use of backup_label not noted in log