RE: UUID v7 - Mailing list pgsql-hackers

From Kyzer Davis (kydavis)
Subject RE: UUID v7
Date
Msg-id PH0PR11MB502930D3166E6E1372C68144BB2DA@PH0PR11MB5029.namprd11.prod.outlook.com
Whole thread Raw
In response to Re: UUID v7  ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
List pgsql-hackers
Great discussions group,

> I think it would be reasonable to review this patch now.
I am happy to review the format and logic for any proposed v7 and/or v8
UUID. Just point me to a PR or some code review location.

> Distributed UUID Generation" states that "node ID" is there to decrease
> the likelihood of a collision.
Correct, node identifiers help provide some bit space that ensures no
collision in the event the stars align where two nodes create the exact
UUID.

From what I have seen UUIDv7 should meet the requirements outlined thus far
In this thread.

Also to add, there are two UUID prototypes for postgres from my checks.
Although they are outdated from the latest draft sent up for official
Publication so review them from an academic perspective.)
- https://github.com/uuid6/prototypes
- pg_uuid_next (see this thread which nicely summarizes some UUIDv7
"checkboxes" https://github.com/x4m/pg_uuid_next/issues/1)
- UUID_v7_for_Postgres.sql

Don't forget, if we have UUIDv1 already implemented in the postgres code you
may want to examine UUIDv6.
UUIDv6 is simply a fork of that code and swap of the timestamp bits.
In terms of effort UUIDv6 easy to implement and gives you a time ordered
UUID re-using 99% of the code you may already have.

Lastly, my advice on v8 is that I would examine/implement v6 or v7 first
before jumping to v8
because whatever you do for implementing v6 or v7 will help you implement a
better v8.
There are also a number of v8 prototype implementations (at the previous
link) if somebody wants to give them a scroll.

Happy to answer any other questions where I can provide input.

Thanks,

-----Original Message-----
From: Andrey M. Borodin <x4mmm@yandex-team.ru> 
Sent: Friday, July 7, 2023 8:06 AM
To: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Daniel Gustafsson <daniel@yesql.se>;
Matthias van de Meent <boekewurm+postgres@gmail.com>; Nikolay Samokhvalov
<samokhvalov@gmail.com>; Kyzer Davis (kydavis) <kydavis@cisco.com>; Andres
Freund <andres@anarazel.de>; Andrey Borodin <amborodin86@gmail.com>;
PostgreSQL Hackers <pgsql-hackers@postgresql.org>; brad@peabody.io;
wolakk@gmail.com
Subject: Re: UUID v7



> On 6 Jul 2023, at 21:38, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
> 
> I think it would be reasonable to review this patch now.
+1.

Also, I think we should discuss UUID v8. UUID version 8 provides an
RFC-compatible format for experimental or vendor-specific use cases.
Revision 1 of IETF draft contained interesting code for v8: almost similar
to v7, but with fields for "node ID" and "rolling sequence number".
I think this is reasonable approach, thus I attach implementation of UUID v8
per [0]. But from my point of view this implementation has some flaws.
These two new fields "node ID" and "sequence" are there not for uniqueness,
but rather for data locality.
But they are placed at the end, in bytes 14 and 15, after randomly generated
numbers.

I think that "sequence" is there to help generate local ascending
identifiers when the real time clock do not provide enough resolution. So
"sequence" field must be placed after 6 bytes of time-generated identifier.

On a contrary "node ID" must differentiate identifiers generated on
different nodes. So it makes sense to place "node ID" before timing. So
identifiers generated on different nodes will tend to be in different
ranges.
Although, section "6.4. Distributed UUID Generation" states that "node ID"
is there to decrease the likelihood of a collision. So my intuition might be
wrong here.


Do we want to provide this "vendor-specific" UUID with tweaks for databases?
Or should we limit the scope with well defined UUID v7?


Best regards, Andrey Borodin.


[0] https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-01


Attachment

pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: Standardize type of variable when extending Buffers
Next
From: Stephen Frost
Date:
Subject: Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?