Re: UUID v7 - Mailing list pgsql-hackers

From Sergey Prokhorenko
Subject Re: UUID v7
Date
Msg-id 305478845.5279532.1712440778735@mail.yahoo.com
Whole thread Raw
In response to Re: UUID v7  ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
List pgsql-hackers
For every complex problem there is an answer that is clear, simple, and wrong. Since the RFC allows microsecond timestamp granularity, the first thing that comes to everyone's mind is to insert microsecond granularity into UUIDv7. And if the RFC allowed nanosecond timestamp granularity, then they would try to insert nanosecond granularity into UUIDv7.

But I am categorically against abandoning the counter under pressure from the unfounded proposal to replace the counter with microsecond granularity.

1) The RFC specifies millisecond timestamp granularity by default.

2) All advanced UUIDv7 implementations include a counter:
• for JavaScript https://www.npmjs.com/package/uuidv7
• for Rust https://crates.io/crates/uuid7
• for Go (Golang) https://pkg.go.dev/github.com/gofrs/uuid#NewV7
• for Python https://github.com/oittaa/uuid6-python

3) The theoretical performance of generating UUIDv7 without loss of monotonicity for microsecond granularity is only 1000 UUIDv7 per millisecond. This is very low and insufficient generation performance! But the actual generation performance is even worse, since the generation demand is unevenly distributed within a millisecond. Therefore, a UUIDv7 will not be generated every microsecond.

For a counter 18 bits long, with the most significant bit initialized to zero and the remaining bits initialized to a random number, the actual performance of generating a UUIDv7 without loss of monotonicity is between 2 to the power of 17 = 131072 UUIDv7 per millisecond (if the random number happens to be all ones) to 2 to the power of 18 = 262144 UUIDv7 per millisecond (if the random number happens to be all zeros). This is more than enough.

4) Microsecond timestamp fraction subtracts 10 bits from random data, which increases the risk of collision. In the counter, almost all bits are initialized with a random number, which reduces the risk of collision.



The only reasonable use of microsecond granularity is when writing to a database table in parallel. However, monotonicity in this case can be ensured in another way, namely a single UUIDv7 generator per database table, similar to SERIAL (https://postgrespro.com/docs/postgresql/16/datatype-numeric#DATATYPE-SERIAL) in PostgreSQL.

Best regards,
Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Thursday, 4 April 2024 at 09:12:17 pm GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:



...


At this point we can skip the counter\microseconds entirely, just fill everything after unix_ts_ms with randomness. It's still a valid UUIDv7, exhibiting much more data locality than UUIDv4. We can adjust this sortability measures later.


Best regards, Andrey Borodin.

pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: [HACKERS] make async slave to wait for lsn to be replayed
Next
From: "Daniel Verite"
Date:
Subject: Re: Fixing backslash dot for COPY FROM...CSV