Re: UUID v7 - Mailing list pgsql-hackers

From Andrey M. Borodin
Subject Re: UUID v7
Date
Msg-id 844E6A65-4A36-4A36-94C2-0CDF7615B53F@yandex-team.ru
Whole thread Raw
In response to Re: UUID v7  (Jelte Fennema-Nio <postgres@jeltef.nl>)
Responses Re: UUID v7
Re: UUID v7
List pgsql-hackers

> On 11 Mar 2024, at 20:56, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> Attached a few comment fixes/improvements and a pgindent run (patch 0002-0004)

Thanks!

> Now with the added comments, one thing pops out to me: The comments
> mention that we use "Monotonic Random", but when I read the spec that
> explicitly recommends against using an increment of 1 when using
> monotonic random. I feel like if we use an increment of 1, we're
> better off going for the "Fixed-Length Dedicated Counter Bits" method
> (i.e. change the code to start the counter at 0). See patch 0005 for
> an example of that change.
>
> I'm also wondering if we really want to use the extra rand_b bits for
> this. The spec says we MAY, but it does remove the amount of
> randomness in our UUIDs.

Method 1 is a just a Method 2 with specifically picked constants.
But I'll have to use some hand-wavy wordings...

UUID consists of these 128 bits:
a. Mandatory 2 var and 4 ver bits.
b. Flexible but strongly recommended 48 bits unix_ts_ms. These bits contribute to global sortability of values
generatedat frequency less than 1KHz. 
c. Counter bits:
c1. Initialised with 0 on any time tick.
c2. Initialised with randomness.
c3*. bit width of a counter step (*not counted in 128 bit capacity, can be non-integral)
d. Randomness bits.

Method 1 is when c2=0. My implementation of method 2 uses c1=1, c2=17

Consider all UUIDs generated at any given milliseconds. Probability of a collision of two UUIDs generated at frequency
lessthan 1KHz is p = 2^-(c2+d) 
Capacity of a counter has expected value of c = 2^(c1)*2^(c2-1)/2^c3
To guess next UUID you can correctly pick one of u = 2^(d+c3)

First, observe that c3 contributes unguessability at exactly same scale as decreases counter capacity. There is no
differencebetween using bits in d directly, or in c3. There is no point in non-zero c3. Every bit that could be given
toc3 can equally be given to d. 

Second, observe that c2 bits contribute to both collision protection and counter capacity! And when the time ticks, c2
alsocontribute to unguessability! So, technically, we should consider using all available bits as c2 bits. 

How many c1 bits do we need? I've chosen one - to prevent occasional counter capacity reduction.

If c1 = 1, we can distribute 73 bits between c2 and d. I've chosen c2 = 17 and d = 56 as an arbitrary compromise
betweencapacity of one backend per ms and prevention of global collision. 
This compromise is mostly dictated by maximum frequency of UUID generation by one backend, I've chosen 200MHz as a sane
value.


This compromise is much easier when you do not have 74 spare bits, this crazy amount of information forgives almost any
mistake.Imagine you have to distribute 10 bits between c2 and d. And you try to prevent collision between 10
independentdevices which need capacity to generate IDs with frequency of 10KHz each and keep sortability. You would
havesomething like c1=1, c2=3,d=6. 

Sorry for this long and vague explanation, if it still seems too uncertain we can have a chat or something like that. I
don'tthink this number picking stuff deserve to be commented, because it still is quite close to random. RFC gives us
toomuch freedom of choice. 

Thanks!


Best regards, Andrey Borodin.


pgsql-hackers by date:

Previous
From: Corey Huinker
Date:
Subject: Re: Statistics Import and Export
Next
From: Matthias van de Meent
Date:
Subject: Re: btree: downlink right separator/HIKEY optimization