Re: UUID v7 - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: UUID v7
Date
Msg-id CAD21AoAVU-VnuqvdafqUCHCAo7Msb6_2k7nSzd_fijMbbtKrCg@mail.gmail.com
Whole thread Raw
In response to Re: UUID v7  ("Daniel Verite" <daniel@manitou-mail.org>)
Responses Re: UUID v7
List pgsql-hackers
On Sun, Feb 9, 2025 at 9:07 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
> I've took into account note from Sergey that "offset" is better name for uuidv7() argument than "shift".
>
> > On 5 Feb 2025, at 03:02, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> >>
> >> I was thinking about incorporating test like this.
> >>
> >>>> With this patch we can generate correct UUIDs in a very distant future.
> >>>> postgres=# select x,
                   uuid_extract_timestamp(uuidv7((x::text || ' year'::text)::interval)), 
> >>>> (x::text || ' year'::text)::interval
> >>>> from generate_series(1,9000,1000) x;
> >>>> x   |   uuid_extract_timestamp    |  interval
> >>>> ------+-----------------------------+------------
> >>>>   1 | 2026-01-31 12:00:53.084+05  | 1 year
> >>>> 1001 | 3026-01-31 12:00:53.084+05  | 1001 years
> >>>> 2001 | 4026-01-31 12:00:53.084+05  | 2001 years
> >>>> 3001 | 5026-01-31 12:00:53.084+05  | 3001 years
> >>>> 4001 | 6026-01-31 12:00:53.084+05  | 4001 years
> >>>> 5001 | 7026-01-31 12:00:53.085+05  | 5001 years
> >>>> 6001 | 8026-01-31 12:00:53.085+05  | 6001 years
> >>>> 7001 | 9026-01-31 12:00:53.085+05  | 7001 years
> >>>> 8001 | 10026-01-31 12:00:53.085+05 | 8001 years
> >>>> (9 rows)
> >>
> >
> > Something like following queries might be workable for example?
> >
> > create table test (c serial, d uuid, t timestamptz generated always as
> > (uuid_extract_timestamp(d)) stored);
> > insert into test (d) select uuidv7((n || 'years')::interval) from
> > generate_series(1, 2000) n;
> > select count(*) from (select t - lag(t) over (order by c) as diff from
> > test) where diff > '10 year' ;
>
> Yeah, makes sense. I reduced tolerance to 366+1 day. Must be stable if we've done all the time offset business right.
>
> > Here are some review comments:
> >
> > #define NS_PER_S       INT64CONST(1000000000)
> > #define NS_PER_MS      INT64CONST(1000000)
> > +#define US_PER_MS      INT64CONST(1000)
> > #define NS_PER_US      INT64CONST(1000)
> >
> > I think it's clear if we put US_PER_MS below NS_PER_US.
>
> OK.
>
> > ---
> >  *
> > - * ns is a number of nanoseconds since start of the UNIX epoch. This value is
> > + * unix_ts_ms is a number of milliseconds since start of the UNIX epoch,
> > + * ns_in_ms is a number of nanoseconds within millisecond. These values are
> >  * used for time-dependent bits of UUID.
> >
> > I think we can mention that the RFC describes that stored unix
> > timestamp as an unsigned integer.
>
> Done. Feel free to adjust my wordings, I've no sense of idiomatic English.
>
> >
> > ---
> > static pg_uuid_t *
> > -generate_uuidv7(int64 ns)
> > +generate_uuidv7(uint64 unix_ts_ms, uint32 ns_in_ms)
> >
> > How about renaming ns_in_ms with sub_ms?
>
> OK.
>
> >
> > ---
> > +        /* 64 bits is enough for real time, but not for a time range of UUID */
> >
> > I could not understand the point of this comment. It seems to say that
> > 64-bits is not enough for a time range of UUID, but doesn't the time
> > range of UUIDv7 use only 48 bits? It seems to need more comments.
>
> I've tried to say that acquiring current time as an int64 ns since UNIX epoch is still viable for the code (until
year2262). 
>
>
> > ---
> > -        ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
> > SECS_PER_DAY * USECS_PER_SEC)
> > -                * NS_PER_US + ns % NS_PER_US;
> > +        us = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
> > SECS_PER_DAY * USECS_PER_SEC);
> >
> >         /* Generate an UUIDv7 */
> > -        uuid = generate_uuidv7(ns);
> > +        uuid = generate_uuidv7(us / US_PER_MS, (us % US_PER_MS) *
> > NS_PER_US + ns % NS_PER_US);
> >
> > Need to update comments in uuidv7_internval() such as:
> >
> >    /*
> >     * Shift the current timestamp by the given interval. To calculate time
> >     * shift correctly, we convert the UNIX epoch to TimestampTz and use
> >     * timestamptz_pl_interval(). Since this calculation is done with
> >     * microsecond precision, we carry nanoseconds from original ns value to
> >     * shifted ns value.
> >     */
> >
> > and
> >
> >    /*
> >     * Convert a TimestampTz value back to an UNIX epoch and back nanoseconds.
> >     */
>
> I've tried. I'm not very satisfied with comments, but could not come up with easier description.
>

Thank you for updating the patch. I had missed to track this patch.

I've updated the patch from your v4 patch. In this version, I excluded
the argument name change (from 'shift' to 'offset') as it's not
related to the bug fix and simplified the regression test case.

Please review it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.
Next
From: Amit Kapila
Date:
Subject: Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.