Re: UUID v7 - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: UUID v7
Date
Msg-id CAD21AoC4iAr7M_OgtHA0HZMezot68_0vwUCQjjXKk2iW89w0Jg@mail.gmail.com
Whole thread Raw
In response to Re: UUID v7  ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
List pgsql-hackers
On Fri, Nov 1, 2024 at 10:33 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 31 Oct 2024, at 23:04, Stepan Neretin <sndcppg@gmail.com> wrote:
> >
> >
> > Firstly, I'd like to discuss the increased_clock_precision variable, which
> > currently divides the timestamp into milliseconds and nanoseconds. However,
> > this approach only approximates the extra bits for sub-millisecond
> > precision, leading to imprecise timestamps in high-frequency UUID
> > generation.
> No, timestamp is taken in nanoseconds, we keep precision of 1/4096 of ms. If you observe precision loss anywhere let
meknow. 
>
> >
> > To address this issue, we could consider using a more accurate method for
> > calculating the timestamp. For instance, we could utilize a higher
> > resolution clock or implement a more precise algorithm to ensure accurate
> > timestamps.
>
> That's what we do.
>
> >
> > Additionally, it would be beneficial to add validation checks for the
> > interval argument. These checks could verify that the input interval is
> > within reasonable bounds and that the calculated timestamp is accurate.
> > Examples of checks could include verifying if the interval is too small,
> > too large, or exceeds the maximum possible number of milliseconds and
> > nanoseconds in a timestamp.
>
> timestamptz_pl_interval() is already doing this.
>
> > What do you think about these suggestions? Let me know your thoughts!
>
> Thanks a lot for reviewing the patch!
>
>
> > On 1 Nov 2024, at 10:33, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Oct 31, 2024 at 9:53 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> >>
> >>
> >>
> >>> On 1 Nov 2024, at 03:00, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >>>
> >>> Therefore, if the
> >>> system clock moves backward due to NTP, we cannot guarantee
> >>> monotonicity and sortability. Is that right?
> >>
> >> Not exactly. Monotonicity is ensured for a given backend. We make sure that timestamp is advanced at least for
~250nsforward on each UUID generation. 60 bits of time are unique and ascending for a given backend. 
> >>
> >
> > Thank you for your explanation. I now understand this code guarantees
> > the monotonicity:
> >
> > +/* minimum amount of ns that guarantees step of increased_clock_precision */
> > +#define SUB_MILLISECOND_STEP (1000000/4096 + 1)
> > +       ns = get_real_time_ns();
> > +       if (previous_ns + SUB_MILLISECOND_STEP >= ns)
> > +               ns = previous_ns + SUB_MILLISECOND_STEP;
> > +       previous_ns = ns;
> >
> >
> > I think that one of the most important parts in UUIDv7 implementation
> > is which method (1, 2, or 3 described in RFC 9562) we use to guarantee
> > the monotonicity. The current patch employs method 3 with the
> > assumption that 12 bits of sub-millisecond information is available on
> > most of the systems we support. However, as far as I tested, on MacOS,
> > values returned by  clock_gettime(CLOCK_REALTIME) are only microsecond
> > precision, meaning that we could waste some randomness. Has this point
> > been considered?
> >
>
> There was a thread "What is a typical precision of gettimeofday()?" [0]
> There we found out that routines of instr_time.h are precise enough. On my machine (MacBook Air M3) I do not observe
significantdifferences between CLOCK_MONOTONIC_RAW and CLOCK_REALTIME in pg_test_timing results. 
>
> CLOCK_MONOTONIC_RAW
> x4mmm@x4mmm-osx bin % ./pg_test_timing
> Testing timing overhead for 3 seconds.
> Per loop time including overhead: 15.30 ns
> Histogram of timing durations:
>   < us   % of total      count
>      1     98.47856  193113929
>      2      1.52039    2981452
>      4      0.00025        485
>      8      0.00062       1211
>     16      0.00012        237
>     32      0.00004         79
>     64      0.00002         30
>    128      0.00000          8
>    256      0.00000          5
>    512      0.00000          3
>   1024      0.00000          1
>   2048      0.00000          2
>
> CLOCK_REALTIME
> x4mmm@x4mmm-osx bin % ./pg_test_timing
> Testing timing overhead for 3 seconds.
> Per loop time including overhead: 15.04 ns
> Histogram of timing durations:
>   < us   % of total      count
>      1     98.49709  196477842
>      2      1.50268    2997479
>      4      0.00007        130
>      8      0.00012        238
>     16      0.00005         91
>     32      0.00000          4
>     64      0.00000          1

I applied the patch shared on that thread[1] to measure nanoseconds
and changed instr_time.h to use CLOCK_REALTIME even on macOS. Here is
the results on my machine (macOS 14.7, M1 Pro):

Testing timing overhead for 3 seconds.
Per loop time including overhead: 18.61 ns
Histogram of timing durations:
   <= ns   % of total  running %      count
       0      98.1433    98.1433  158212921
       1       0.0000    98.1433          0
       3       0.0000    98.1433          0
       7       0.0000    98.1433          0
      15       0.0000    98.1433          0
      31       0.0000    98.1433          0
      63       0.0000    98.1433          0
     127       0.0000    98.1433          0
     255       0.0000    98.1433          0
     511       0.0000    98.1433          0
    1023       1.8560    99.9994    2992054
    2047       0.0000    99.9994         51
    4095       0.0001    99.9995        110
    8191       0.0003    99.9998        463
   16383       0.0002   100.0000        313
   32767       0.0000   100.0000         49
   65535       0.0000   100.0000          4

Timing durations less than 128 ns:
      ns   % of total  running %      count
       0      98.1433    98.1433  158212921

Most of the timing durations were nanoseconds and fell into either 0
ns. Others fell into >1023 bins.

I've done a simple test as well on my Mac and saw that the time
returned by clock_gettime(CLOCK_REALTIME) doesn't have nanosecond
precision:

% cat test.c
#include <stdio.h>
#include <time.h>
#include <sys/time.h>

int
main(void)
{
        struct timespec real;
        struct timespec mono;
        struct timespec mono_raw;

        clock_gettime(CLOCK_REALTIME, &real);
       clock_gettime(CLOCK_MONOTONIC, &mono);
        clock_gettime(CLOCK_MONOTONIC_RAW, &mono_raw);

        printf("real:     %ld\t%ld\n", real.tv_sec, real.tv_nsec);
        printf("mono:     %ld\t%ld\n", mono.tv_sec, mono.tv_nsec);
        printf("mono_raw: %ld\t%ld\n", mono_raw.tv_sec, mono_raw.tv_nsec);

        return 0;
}
% gcc -o test test.c
% ./test
real:     1730495955    515018000
mono:     3212977       834578000
mono_raw: 3212982       962799958
% ./test
real:     1730495956    78927000
mono:     3212978       398488000
mono_raw: 3212983       526718333
% ./test
real:     1730495956    652751000
mono:     3212978       972312000
mono_raw: 3212984       100552333

Regards,

[1] https://www.postgresql.org/message-id/3110108.1719939353%40sss.pgh.pa.us

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Should we document how column DEFAULT expressions work?
Next
From: Masahiko Sawada
Date:
Subject: Re: New "raw" COPY format