Thread: UUID v7

UUID v7

From
Andrey Borodin
Date:
Hello pgsql-hackers!

As you may know there's a new version of UUID being standardized [0].
These new algorithms of UUID generation are very promising for
database performance. It keeps data locality for time-ordered values.
From my POV v7 is especially needed for users. Current standard status
is "draft". And I'm not sure it will be accepted before our feature
freeze for PG16. Maybe we could provide a draft implementation in 16
and adjust it to the accepted version if the standard is changed? PFA
patch with implementation.

What do you think?

cc Brad Peabody and Kyzer R. Davis, authors of the standard
cc Kirk Wolak and Nik Samokhvalov who requested the feature

Thanks!

Best regards, Andrey Borodin.

[0] https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04

Attachment

Re: UUID v7

From
Andres Freund
Date:
Hi,

On 2023-02-10 15:57:50 -0800, Andrey Borodin wrote:
> As you may know there's a new version of UUID being standardized [0].
> These new algorithms of UUID generation are very promising for
> database performance.

I agree it's very useful to have.


> [0] https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04

That looks to not be the current version anymore, it's superseded by:
https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis


> It keeps data locality for time-ordered values.
> From my POV v7 is especially needed for users. Current standard status
> is "draft". And I'm not sure it will be accepted before our feature
> freeze for PG16. Maybe we could provide a draft implementation in 16
> and adjust it to the accepted version if the standard is changed? PFA
> patch with implementation.

Hm. It seems somewhat worrisome to claim something is a v7 UUID when it might
turn out to not be one.


Perhaps we should name the function something like
gen_time_ordered_random_uuid() instead?  That gives us a bit more flexibility
about what uuid version we generate. And it might be easier for users, anyway.

Still not sure what version we'd best use for now. Perhaps v8?

Greetings,

Andres Freund



Re: UUID v7

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2023-02-10 15:57:50 -0800, Andrey Borodin wrote:
>> From my POV v7 is especially needed for users. Current standard status
>> is "draft". And I'm not sure it will be accepted before our feature
>> freeze for PG16. Maybe we could provide a draft implementation in 16
>> and adjust it to the accepted version if the standard is changed? PFA
>> patch with implementation.

> Hm. It seems somewhat worrisome to claim something is a v7 UUID when it might
> turn out to not be one.

I think there is no need to rush this into v16.  Let's wait for the
standardization process to play out.

            regards, tom lane



Re: UUID v7

From
Andrey Borodin
Date:
On Fri, Feb 10, 2023 at 5:14 PM Andres Freund <andres@anarazel.de> wrote:
>
> Perhaps we should name the function something like
> gen_time_ordered_random_uuid() instead?  That gives us a bit more flexibility
> about what uuid version we generate. And it might be easier for users, anyway.
I think users would be happy with any name.

> Still not sure what version we'd best use for now. Perhaps v8?
V8 is just a "custom data" format. Like "place whatever you want".
Though I agree that its sample implementation looks to be better.



On Fri, Feb 10, 2023 at 5:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > Hm. It seems somewhat worrisome to claim something is a v7 UUID when it might
> > turn out to not be one.
>
> I think there is no need to rush this into v16.  Let's wait for the
> standardization process to play out.
>

Standardization per se does not bring value to users. However, I agree
that eager users can just have it today as an extension and be happy
with it [0].
Maybe it's fine to wait a year for others...


Best regards, Andrey Borodin.


[0] https://github.com/x4m/pg_uuid_next



Re: UUID v7

From
Peter Eisentraut
Date:
On 11.02.23 02:14, Andres Freund wrote:
> On 2023-02-10 15:57:50 -0800, Andrey Borodin wrote:
>> As you may know there's a new version of UUID being standardized [0].
>> These new algorithms of UUID generation are very promising for
>> database performance.
> 
> I agree it's very useful to have.
> 
> 
>> [0] https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04
> 
> That looks to not be the current version anymore, it's superseded by:
> https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis

Yes, this means that the draft that an individual had uploaded has now 
been taken on by a working group for formal review.  If there is a 
prototype implementation, this is a good time to provide feedback.  But 
it's too early to ship a production version.




RE: UUID v7

From
"Kyzer Davis (kydavis)"
Date:
Hello Group,

I am happy to see others interested in the improvements provided by UUIDv7!

I caught up on the thread and you all are correct.

Work has moved on GitHub from uuid6/uuid6-ietf-draft to 
ietf-wg-uuidrev/rfc4122bis
- Draft 00 merged RFC4122 with Draft 04 and fixed as many problems as possible 
with RFC4122.
- Draft 01 continued to iterate on RFC4122 problems: 
https://author-tools.ietf.org/iddiff?url2=draft-ietf-uuidrev-rfc4122bis-01
- Draft 02 items being changed are summarized in the latest PR for review in 
the upcoming interim meeting (Feb 16th): 
https://github.com/ietf-wg-uuidrev/rfc4122bis/pull/60
Note: Draft 02 should be published by the end of the week and long term we 
have one more meeting at IETF 116 to iron out the replacement of RFC4122, 
perform last call and submit to the IESG for official review and consideration 
for replacement of RFC4122 (actual timeline for that varies based on what IESG 
wants me to fix.)

That all being said:
The point is 99% of the work since adoption by the IETF has been ironing out 
RFC4122's problems and nothing major related to UUIDv6/7/8 which are all in a 
very good state.

If anybody has any feedback found during draft reviewing or prototyping; 
please either email uuidrev@ietf.org or drop an issue on the tracker:
https://github.com/ietf-wg-uuidrev/rfc4122bis/issues

Lastly, I have added the C/SQL implementation to the prototypes page below:
https://github.com/uuid6/prototypes

Thanks!

-----Original Message-----
From: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Sent: Saturday, February 11, 2023 10:51 AM
To: Andres Freund <andres@anarazel.de>; Andrey Borodin <amborodin86@gmail.com>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; brad@peabody.io; 
wolakk@gmail.com; Kyzer Davis (kydavis) <kydavis@cisco.com>; Nikolay 
Samokhvalov <samokhvalov@gmail.com>
Subject: Re: UUID v7

On 11.02.23 02:14, Andres Freund wrote:
> On 2023-02-10 15:57:50 -0800, Andrey Borodin wrote:
>> As you may know there's a new version of UUID being standardized [0].
>> These new algorithms of UUID generation are very promising for
>> database performance.
>
> I agree it's very useful to have.
>
>
>> [0]
>> https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid
>> -format-04
>
> That looks to not be the current version anymore, it's superseded by:
> https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis

Yes, this means that the draft that an individual had uploaded has now been 
taken on by a working group for formal review.  If there is a prototype 
implementation, this is a good time to provide feedback.  But it's too early 
to ship a production version.


Attachment

Re: UUID v7

From
Nikolay Samokhvalov
Date:
On Tue, Feb 14, 2023 at 6:13 AM Kyzer Davis (kydavis) <kydavis@cisco.com> wrote:
> I am happy to see others interested in the improvements provided by UUIDv7!

Thank you for providing the details!

Some small updates as I see them:
- there is revision 7 now in https://github.com/ietf-wg-uuidrev/rfc4122bis
- noticing that there is no commitfest record, I created one:
https://commitfest.postgresql.org/43/4388/
- recent post by Ants Aasma, Cybertec about the downsides of
traditional UUID raised a big discussion today on HN:
https://news.ycombinator.com/item?id=36429986



Re: UUID v7

From
Daniel Gustafsson
Date:
> On 22 Jun 2023, at 20:30, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:

> Some small updates as I see them:
> - there is revision 7 now in https://github.com/ietf-wg-uuidrev/rfc4122bis
> - noticing that there is no commitfest record, I created one:

I will actually go ahead and close this entry in the current CF, not because we
don't want the feature but because it's unlikely that it will go in now given
that standardization is still underway.  Comitting anything right now seems
premature, we might as well wait for standardization given that we have lots of
time before the v17 freeze.

--
Daniel Gustafsson




Re: UUID v7

From
Matthias van de Meent
Date:
On Thu, 6 Jul 2023 at 14:24, Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 22 Jun 2023, at 20:30, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:
>
> > Some small updates as I see them:
> > - there is revision 7 now in https://github.com/ietf-wg-uuidrev/rfc4122bis
> > - noticing that there is no commitfest record, I created one:
>
> I will actually go ahead and close this entry in the current CF, not because we
> don't want the feature but because it's unlikely that it will go in now given
> that standardization is still underway.  Comitting anything right now seems
> premature, we might as well wait for standardization given that we have lots of
> time before the v17 freeze.

I'd like to note that this draft has recently had its last call
period, and has been proposed for publishing early last month. I don't
know how long this publishing process usually takes, but it seems like
the WG considers the text final, so unless this would take months I
wouldn't mind keeping this patch around as "waiting for external
process to complete". Sure, it's earlier than the actual release of
the standard, but that wasn't a blocker for SQL features that were
considered finalized either.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



Re: UUID v7

From
Daniel Gustafsson
Date:
> On 6 Jul 2023, at 15:29, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
>
> On Thu, 6 Jul 2023 at 14:24, Daniel Gustafsson <daniel@yesql.se> wrote:
>>
>>> On 22 Jun 2023, at 20:30, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:
>>
>>> Some small updates as I see them:
>>> - there is revision 7 now in https://github.com/ietf-wg-uuidrev/rfc4122bis
>>> - noticing that there is no commitfest record, I created one:
>>
>> I will actually go ahead and close this entry in the current CF, not because we
>> don't want the feature but because it's unlikely that it will go in now given
>> that standardization is still underway.  Comitting anything right now seems
>> premature, we might as well wait for standardization given that we have lots of
>> time before the v17 freeze.
>
> I'd like to note that this draft has recently had its last call
> period, and has been proposed for publishing early last month.

Sure, but this document is in AD Evaluation and there are many stages left in
the IESG process, it may still take a fair bit of time before this is done.

> Sure, it's earlier than the actual release of
> the standard, but that wasn't a blocker for SQL features that were
> considered finalized either.

I can't speak for any SQL standard features we've committed before being
standardized, it's for sure not the norm for the project.  I'm only commenting
on this particular Internet standard which we have plenty of time to commit
before v17 without rushing to beat a standards committee.

Also, if you look you can see that I moved it to the next CF in a vague hope
that standardization will be swift (which is admittedly never is).

--
Daniel Gustafsson




Re: UUID v7

From
Tom Lane
Date:
Daniel Gustafsson <daniel@yesql.se> writes:
> On 6 Jul 2023, at 15:29, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
>> Sure, it's earlier than the actual release of
>> the standard, but that wasn't a blocker for SQL features that were
>> considered finalized either.

> I can't speak for any SQL standard features we've committed before being
> standardized, it's for sure not the norm for the project.

We have done a couple of things that way recently.  An important
reason why we felt we could get away with that is that nowadays
we have people who actually sit on the SQL committee and have
reliable information on what's likely to make it into the final text
of the next version.  I don't think we have equivalent visibility or
should have equivalent confidence about how UUID v7 standardization
will play out.

            regards, tom lane



Re: UUID v7

From
Peter Eisentraut
Date:
On 06.07.23 16:02, Tom Lane wrote:
> Daniel Gustafsson <daniel@yesql.se> writes:
>> On 6 Jul 2023, at 15:29, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
>>> Sure, it's earlier than the actual release of
>>> the standard, but that wasn't a blocker for SQL features that were
>>> considered finalized either.
> 
>> I can't speak for any SQL standard features we've committed before being
>> standardized, it's for sure not the norm for the project.
> 
> We have done a couple of things that way recently.  An important
> reason why we felt we could get away with that is that nowadays
> we have people who actually sit on the SQL committee and have
> reliable information on what's likely to make it into the final text
> of the next version.  I don't think we have equivalent visibility or
> should have equivalent confidence about how UUID v7 standardization
> will play out.

(I have been attending some meetings and I'm on the mailing list.)

Anyway, I think it would be reasonable to review this patch now.  We 
might leave it hanging in "Ready for Committer" for a while when we get 
there.  But surely review can start now.




Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 6 Jul 2023, at 21:38, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
>
> I think it would be reasonable to review this patch now.
+1.

Also, I think we should discuss UUID v8. UUID version 8 provides an RFC-compatible format for experimental or
vendor-specificuse cases. Revision 1 of IETF draft contained interesting code for v8: almost similar to v7, but with
fieldsfor "node ID" and "rolling sequence number". 
I think this is reasonable approach, thus I attach implementation of UUID v8 per [0]. But from my point of view this
implementationhas some flaws. 
These two new fields "node ID" and "sequence" are there not for uniqueness, but rather for data locality.
But they are placed at the end, in bytes 14 and 15, after randomly generated numbers.

I think that "sequence" is there to help generate local ascending identifiers when the real time clock do not provide
enoughresolution. So "sequence" field must be placed after 6 bytes of time-generated identifier. 

On a contrary "node ID" must differentiate identifiers generated on different nodes. So it makes sense to place "node
ID"before timing. So identifiers generated on different nodes will tend to be in different ranges. 
Although, section "6.4. Distributed UUID Generation" states that "node ID" is there to decrease the likelihood of a
collision.So my intuition might be wrong here. 


Do we want to provide this "vendor-specific" UUID with tweaks for databases? Or should we limit the scope with well
definedUUID v7? 


Best regards, Andrey Borodin.


[0] https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-01


Attachment

RE: UUID v7

From
"Kyzer Davis (kydavis)"
Date:
Great discussions group,

> I think it would be reasonable to review this patch now.
I am happy to review the format and logic for any proposed v7 and/or v8
UUID. Just point me to a PR or some code review location.

> Distributed UUID Generation" states that "node ID" is there to decrease
> the likelihood of a collision.
Correct, node identifiers help provide some bit space that ensures no
collision in the event the stars align where two nodes create the exact
UUID.

From what I have seen UUIDv7 should meet the requirements outlined thus far
In this thread.

Also to add, there are two UUID prototypes for postgres from my checks.
Although they are outdated from the latest draft sent up for official
Publication so review them from an academic perspective.)
- https://github.com/uuid6/prototypes
- pg_uuid_next (see this thread which nicely summarizes some UUIDv7
"checkboxes" https://github.com/x4m/pg_uuid_next/issues/1)
- UUID_v7_for_Postgres.sql

Don't forget, if we have UUIDv1 already implemented in the postgres code you
may want to examine UUIDv6.
UUIDv6 is simply a fork of that code and swap of the timestamp bits.
In terms of effort UUIDv6 easy to implement and gives you a time ordered
UUID re-using 99% of the code you may already have.

Lastly, my advice on v8 is that I would examine/implement v6 or v7 first
before jumping to v8
because whatever you do for implementing v6 or v7 will help you implement a
better v8.
There are also a number of v8 prototype implementations (at the previous
link) if somebody wants to give them a scroll.

Happy to answer any other questions where I can provide input.

Thanks,

-----Original Message-----
From: Andrey M. Borodin <x4mmm@yandex-team.ru> 
Sent: Friday, July 7, 2023 8:06 AM
To: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Daniel Gustafsson <daniel@yesql.se>;
Matthias van de Meent <boekewurm+postgres@gmail.com>; Nikolay Samokhvalov
<samokhvalov@gmail.com>; Kyzer Davis (kydavis) <kydavis@cisco.com>; Andres
Freund <andres@anarazel.de>; Andrey Borodin <amborodin86@gmail.com>;
PostgreSQL Hackers <pgsql-hackers@postgresql.org>; brad@peabody.io;
wolakk@gmail.com
Subject: Re: UUID v7



> On 6 Jul 2023, at 21:38, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
> 
> I think it would be reasonable to review this patch now.
+1.

Also, I think we should discuss UUID v8. UUID version 8 provides an
RFC-compatible format for experimental or vendor-specific use cases.
Revision 1 of IETF draft contained interesting code for v8: almost similar
to v7, but with fields for "node ID" and "rolling sequence number".
I think this is reasonable approach, thus I attach implementation of UUID v8
per [0]. But from my point of view this implementation has some flaws.
These two new fields "node ID" and "sequence" are there not for uniqueness,
but rather for data locality.
But they are placed at the end, in bytes 14 and 15, after randomly generated
numbers.

I think that "sequence" is there to help generate local ascending
identifiers when the real time clock do not provide enough resolution. So
"sequence" field must be placed after 6 bytes of time-generated identifier.

On a contrary "node ID" must differentiate identifiers generated on
different nodes. So it makes sense to place "node ID" before timing. So
identifiers generated on different nodes will tend to be in different
ranges.
Although, section "6.4. Distributed UUID Generation" states that "node ID"
is there to decrease the likelihood of a collision. So my intuition might be
wrong here.


Do we want to provide this "vendor-specific" UUID with tweaks for databases?
Or should we limit the scope with well defined UUID v7?


Best regards, Andrey Borodin.


[0] https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-01


Attachment

Re: UUID v7

From
Peter Eisentraut
Date:
On 07.07.23 14:06, Andrey M. Borodin wrote:
> Also, I think we should discuss UUID v8. UUID version 8 provides an RFC-compatible format for experimental or
vendor-specificuse cases. Revision 1 of IETF draft contained interesting code for v8: almost similar to v7, but with
fieldsfor "node ID" and "rolling sequence number".
 
> I think this is reasonable approach, thus I attach implementation of UUID v8 per [0].

I suggest we keep this thread to v7, which has pretty straightforward 
semantics for PostgreSQL.  v8 by definition has many possible 
implementations, so you're going to have to make pretty strong arguments 
that yours is the best and only one, if you are going to claim the 
gen_uuid_v8 function name.




Re: UUID v7

From
"Andrey M. Borodin"
Date:


On 10 Jul 2023, at 21:50, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

I suggest we keep this thread to v7, which has pretty straightforward semantics for PostgreSQL.  v8 by definition has many possible implementations, so you're going to have to make pretty strong arguments that yours is the best and only one, if you are going to claim the gen_uuid_v8 function name.

Thanks Peter, I'll follow this course of action.

After discussion on GitHub with Sergey Prokhorenko [0] I understood that counter is optional, but useful part of UUID v7. It actually promotes sortability of data generated at high speed.
The standard does not specify how big counter should be. PFA patch with 16 bit counter. Maybe it worth doing 18bit counter - it will save us one byte of PRNG data. Currently we only take 2 bits out of the whole random byte.


Best regards, Andrey Borodin.


Attachment

Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 30 Jul 2023, at 13:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
> After discussion on GitHub with Sergey Prokhorenko [0] I understood that counter is optional, but useful part of UUID
v7.It actually promotes sortability of data generated at high speed. 
> The standard does not specify how big counter should be. PFA patch with 16 bit counter. Maybe it worth doing 18bit
counter- it will save us one byte of PRNG data. Currently we only take 2 bits out of the whole random byte. 
>

Here's a new patch version. Now counter is initialised with strong random on every time change (each ms). However, one
firstbit of the counter is preserved to zero. This is done to extend counter capacity (I left comments with reference
toRFC with explanations). 

Thanks!

Best regards, Andrey Borodin.

Attachment

Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 20 Aug 2023, at 23:56, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
> <v4-0001-Implement-UUID-v7-as-per-IETF-draft.patch>

I've observed, that pre-generating and buffering random numbers makes UUID generation 10 times faster.

Without buffering
postgres=# with x as (select gen_uuid_v7() from generate_series(1,1e6)) select count(*) from x;
Time: 5286.572 ms (00:05.287)

With buffering
postgres=# with x as (select gen_uuid_v7() from generate_series(1,1e6)) select count(*) from x;
Time: 390.091 ms

This can speed up gen_random_uuid() on the same scale too. PFA implementation of this technique.


Best regards, Andrey Borodin.

Attachment

Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 21 Aug 2023, at 13:42, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
<v5-0001-Implement-UUID-v7-as-per-IETF-draft.patch><v5-0002-Buffer-random-numbers.patch><v5-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patch>

FPA attached next version.
Changes:
- implemented protection from time leap backwards when series is generated on the same backend
- counter overflow is now translated into ms step forward


Best regards, Andrey Borodin.

Attachment

Re: UUID v7

From
Mat Arye
Date:
Andrey,

Thanks for all your work on this. I think this will be really useful. 

From a user perspective, it would be great to add 2 things:
- A function to extract the timestamp from a V7 UUID (very useful for defining constraints if partitioning by the uuid-embedded timestamps, for instance).
- Can we add an optional timestamptz argument to gen_uuid_v7 so that you can explicitly specify a time instead of always generating for the current time? If the argument is NULL, then use current time. This could be useful for backfilling and other applications.

Thanks,
Matvey Arye
Timescale software developer.


On Wed, Aug 30, 2023 at 3:05 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:


> On 21 Aug 2023, at 13:42, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
> <v5-0001-Implement-UUID-v7-as-per-IETF-draft.patch><v5-0002-Buffer-random-numbers.patch><v5-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patch>

FPA attached next version.
Changes:
- implemented protection from time leap backwards when series is generated on the same backend
- counter overflow is now translated into ms step forward


Best regards, Andrey Borodin.

Re: UUID v7

From
"Andrey M. Borodin"
Date:
Thanks for interesting ideas, Mat!

> On 31 Aug 2023, at 20:32, Mat Arye <mat@timescaledb.com> wrote:
>
> From a user perspective, it would be great to add 2 things:
> - A function to extract the timestamp from a V7 UUID (very useful for defining constraints if partitioning by the
uuid-embeddedtimestamps, for instance). 

Well, as far as I know, RFC discourages extracting timestamps from UUIDs. But we still can have such functions...maybe
asan extension? 

> - Can we add an optional timestamptz argument to gen_uuid_v7 so that you can explicitly specify a time instead of
alwaysgenerating for the current time? If the argument is NULL, then use current time. This could be useful for
backfillingand other applications. 

I think this makes sense. We could also have a counter as an argument. I'll try to implement that.
However, so far I haven't figured out how to implement optional arguments for catalog functions. I'd appreciate any
pointershere. 


Best regards, Andrey Borodin.


Re: UUID v7

From
Chris Travers
Date:
So I am in the process of reviewing the patch and hopefully can provide something there soon.

However I want to address in the mean time the question of timestamp functions.  I know that is outside the scope of
thispatch but I would be in favor of adding them generally, not just as an extension but eventually into core.  I
understand(and generally agree with) the logic of not generally extracting timestamps from UUIDs or other such field,s
butthere are cases where it is really, really helpful to be able to do.  In particular when you are troubleshooting
misbehavior,all information you can get is helpful.  And so extracting all of the subfields can be helpful.
 

The problem with putting this in an extension is that this is mostly useful when debugging systems (particularly larger
distributedsystems) and so the chances of it hitting a critical mass enough to be supported by all major cloud vendors
iseffectively zero.
 

So I am not asking for this to be included in this patch but I am saying I would love to see these sort of things
contributedat some point to core. 

Re: UUID v7

From
Nick Babadzhanian
Date:
On Thu, 31 Aug 2023 at 23:10, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> Well, as far as I know, RFC discourages extracting timestamps from UUIDs. But we still can have such
functions...maybeas an extension?
 

Do you know of any reason for that?

> However, so far I haven't figured out how to implement optional arguments for catalog functions. I'd appreciate any
pointershere.
 

I'd argue that the time argument shouldn't be optional. Asking the
user to supply time would force them to think whether they want to go
with `now()` or `clock_timestamp()` or something else.

Also, a shameless plug with my extension for UUID v1 that implements
extract and create from (and an opclass):
https://github.com/pgnickb/uuid_v1_ops



Re: UUID v7

From
Jelte Fennema
Date:
On Mon, 9 Oct 2023 at 18:46, Nick Babadzhanian <pgnickb@gmail.com> wrote:
>
> On Thu, 31 Aug 2023 at 23:10, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> > Well, as far as I know, RFC discourages extracting timestamps from UUIDs. But we still can have such
functions...maybeas an extension? 
>
> Do you know of any reason for that?

No reasons are given but the RFC states this:

> UUIDs SHOULD be treated as opaque values and implementations SHOULD NOT examine the bits in a UUID to whatever extent
ispossible. However, where necessary, inspectors should refer to Section 4 for more information on determining UUID
versionand variant. 

> > However, so far I haven't figured out how to implement optional arguments for catalog functions. I'd appreciate any
pointershere. 
>
> I'd argue that the time argument shouldn't be optional. Asking the
> user to supply time would force them to think whether they want to go
> with `now()` or `clock_timestamp()` or something else.

I think using `now()` is quite prone to sequence rollover. With the
current patch inserting more than 2^18~=0.26M rows into a table with
`gen_uuid_v7()` as the default in a single transaction would already
cause sequence rollover. I think using a monotonic clock source is the
only reasonable thing to do. From the RFC:

> Implementations SHOULD use the current timestamp from a reliable source to provide values that are time-ordered and
continuallyincreasing. Care SHOULD be taken to ensure that timestamp changes from the environment or operating system
arehandled in a way that is consistent with implementation requirements. For example, if it is possible for the system
clockto move backward due to either manual adjustment or corrections from a time synchronization protocol,
implementationsmust decide how to handle such cases. (See Altering, Fuzzing, or Smearing bullet below.) 



Re: UUID v7

From
Brad Peabody
Date:
> > Well, as far as I know, RFC discourages extracting timestamps from UUIDs. But we still can have such functions...maybe as an extension?
> Do you know of any reason for that?

I guess some of the detail may have been edited out over time with all of the changes, but it’s basically this: https://github.com/ietf-wg-uuidrev/rfc4122bis/blob/main/draft-ietf-uuidrev-rfc4122bis.md#opacity-opacity. The rationale is that when you introspect a UUID you essentially add interoperability concerns.  E.g. if we say that applications can rely on being able to parse the timestamp from the UUID then it means that other implementations must provide guarantees about what that timestamp is.  And since the point of a UUID is to provide a unique value, not to transmit additional metadata, the decision was made early on that it’s more realistic and representative of the reality of the situation to say that applications should generate values, try not to parse them if they don’t have to, but if they do it’s only going to be as accurate as the original data put into it.  So systems with no NTP enabled, or that fuzz part of the time so as not to leak the exact moment in time something was done, etc - those are things that are going to happen and so buyer beware when parsing.

If the question is whether or not a function should exist to parse a timestamp from a UUID, I would say sure go ahead, just mention that the timestamp is only accurate as the input, and the spec doesn’t guarantee anything if your UUID came from another source.  I imagine a common case would be UUIDs generated in within the same database, and someone wants to extract the timestamp, which would be as reliable as the timestamp on the database machine - seems like a perfectly good case where supporting timestamp extraction as practical value.


On Oct 9, 2023, at 11:11 AM, Jelte Fennema <postgres@jeltef.nl> wrote:

On Mon, 9 Oct 2023 at 18:46, Nick Babadzhanian <pgnickb@gmail.com> wrote:

On Thu, 31 Aug 2023 at 23:10, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
Well, as far as I know, RFC discourages extracting timestamps from UUIDs. But we still can have such functions...maybe as an extension?

Do you know of any reason for that?

No reasons are given but the RFC states this:

UUIDs SHOULD be treated as opaque values and implementations SHOULD NOT examine the bits in a UUID to whatever extent is possible. However, where necessary, inspectors should refer to Section 4 for more information on determining UUID version and variant.

However, so far I haven't figured out how to implement optional arguments for catalog functions. I'd appreciate any pointers here.

I'd argue that the time argument shouldn't be optional. Asking the
user to supply time would force them to think whether they want to go
with `now()` or `clock_timestamp()` or something else.

I think using `now()` is quite prone to sequence rollover. With the
current patch inserting more than 2^18~=0.26M rows into a table with
`gen_uuid_v7()` as the default in a single transaction would already
cause sequence rollover. I think using a monotonic clock source is the
only reasonable thing to do. From the RFC:

Implementations SHOULD use the current timestamp from a reliable source to provide values that are time-ordered and continually increasing. Care SHOULD be taken to ensure that timestamp changes from the environment or operating system are handled in a way that is consistent with implementation requirements. For example, if it is possible for the system clock to move backward due to either manual adjustment or corrections from a time synchronization protocol, implementations must decide how to handle such cases. (See Altering, Fuzzing, or Smearing bullet below.)

Re: UUID v7

From
Andrey Borodin
Date:
On Mon, Oct 9, 2023 at 11:11 PM Jelte Fennema <postgres@jeltef.nl> wrote:
>
> I think using `now()` is quite prone to sequence rollover. With the
> current patch inserting more than 2^18~=0.26M rows into a table with
> `gen_uuid_v7()` as the default in a single transaction would already
> cause sequence rollover.

Well, the current patch will just use now()+1ms when 2^18 is
exhausted. Even if now() would be passed as an argument (however
current patch does not support an argument).

Best regards, Andrey Borodin.



Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 9 Oct 2023, at 23:46, Andrey Borodin <amborodin86@gmail.com> wrote:

Here's next iteration of the patch. I've added get_uuid_v7_time().
This function extracts timestamp from uuid, iff it is v7. Timestamp correctness only guaranteed if the timestamp was
generatedby the same implementation (6 bytes for milliseconds obtained by gettimeofday()). 
Tests verify that get_uuid_v7_time(gen_uuid_v7()) differs no more than 1ms from now(). Maybe we should allow more
tolerantvalues for slow test machines. 


Best regards, Andrey Borodin.



Attachment

Re: UUID v7

From
"Andrey M. Borodin"
Date:


On 2 Jan 2024, at 14:17, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Tests verify that get_uuid_v7_time(gen_uuid_v7()) differs no more than 1ms from now(). Maybe we should allow more tolerant values for slow test machines.

Indeed, CFbot complained about flaky tests. I've increased test tolerance to 100ms. (this does not affect test time)


Best regards, Andrey Borodin.
Attachment

Re: Re: UUID v7

From
Przemysław Sztoch
Date:
Dear Andrey,

1. Is it possible to add a function that returns the version of the generated uuid?
It will be very useful.
I don't know if it's possible, but I think there are bits in the UUID that inform about the version.

2. If there is any doubt about adding the function to the main sources (standard development in progress), in my opinion you can definitely add this function to the uuid-ossp extension.

3. Wouldn't it be worth including UUID version 6 as well?

4. Sometimes you will need to generate a uuid for historical time. There should be an additional function gen_uuid_v7(timestamp).

Nevertheless, the need for uuid v6/7/8 is very high and I'm glad it's coming to PostgreSQL. It should be a PG17 version.
--
Przemysław Sztoch | Mobile +48 509 99 00 66

Re: UUID v7

From
"Andrey M. Borodin"
Date:
Hello Przemysław,

thanks for your interest in this patch!

> On 3 Jan 2024, at 04:37, Przemysław Sztoch <przemyslaw@sztoch.pl> wrote:
>
> 1. Is it possible to add a function that returns the version of the generated uuid?
> It will be very useful.
> I don't know if it's possible, but I think there are bits in the UUID that inform about the version.
What do you think if we have functions get_uuid_v7_ver(uuid) and get_uuid_v7_var(uuid) to extract bit fields according
to[0] ? Or, perhaps, this should be one function with two return parameters? 
It's not in a patch yet, I'm just considering how this functionality should look like.

> 2. If there is any doubt about adding the function to the main sources (standard development in progress), in my
opinionyou can definitely add this function to the uuid-ossp extension. 
From my POV we can just have this function in the core. OSSP support for UUID seems more or less dead [1]: "Newsflash:
04-Jul-2008:Released OSSP uuid 1.6.2". Or do I look into wrong place? 

> 3. Wouldn't it be worth including UUID version 6 as well?
The standard in [0] says "Systems that do not involve legacy UUIDv1 SHOULD use UUIDv7 Section 5.7 instead." If there's
apoint in developing v6 - I'm OK to do so. 

> 4. Sometimes you will need to generate a uuid for historical time. There should be an additional function
gen_uuid_v7(timestamp).
Done, please see patch attached. But I changed signature to gen_uuid_v7(int8), to avoid messing with bytes from user
whoknows what they want. Or do you think gen_uuid_v7(timestamp) would be more convenient? 

Thanks!

Best regards, Andrey Borodin.


[0] https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#uuidv7
[1] http://www.ossp.org/

Attachment

Re: UUID v7

From
Jelte Fennema-Nio
Date:
First of all, I'm a huge fan of UUID v7. So I'm very excited that this
is progressing. I'm definitely going to look closer at this patch
soon. Some tiny initial feedback:

(bikeshed) I'd prefer renaming `get_uuid_v7_time` to the shorter
`uuid_v7_time`, the `get_` prefix seems rarely used in Postgres
functions (e.g. `date_part` is not called `get_date_part`). Also it's
visually very similar to the gen_ prefix.

On Thu, 4 Jan 2024 at 19:20, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> > On 3 Jan 2024, at 04:37, Przemysław Sztoch <przemyslaw@sztoch.pl> wrote:
> > 1. Is it possible to add a function that returns the version of the generated uuid?
> > It will be very useful.
> > I don't know if it's possible, but I think there are bits in the UUID that inform about the version.
> What do you think if we have functions get_uuid_v7_ver(uuid) and get_uuid_v7_var(uuid) to extract bit fields
accordingto [0] ? Or, perhaps, this should be one function with two return parameters? 
> It's not in a patch yet, I'm just considering how this functionality should look like.

I do agree that those functions would be useful, especially now that
we're introducing a function that errors when it's passed a UUID
that's not of version 7. With the version extraction function you
could return something else for other uuids if you have many and not
all of them are version 7.

I do think though that these functions should not have v7 in their
name, since they would apply to all uuids of all versions (so if also
removing the get_ prefix they would be called uuid_ver and uuid_var)

> > 4. Sometimes you will need to generate a uuid for historical time. There should be an additional function
gen_uuid_v7(timestamp).
> Done, please see patch attached. But I changed signature to gen_uuid_v7(int8), to avoid messing with bytes from user
whoknows what they want. Or do you think gen_uuid_v7(timestamp) would be more convenient? 

I think timestamp would be quite useful. timestamp would encode the
time in the same way as gen_uuid_v7() would, but based on the given
time instead of the current time.



Re: UUID v7

From
Przemysław Sztoch
Date:
Andrey M. Borodin wrote on 1/4/2024 7:20 PM:
Hello Przemysław,

thanks for your interest in this patch!

On 3 Jan 2024, at 04:37, Przemysław Sztoch <przemyslaw@sztoch.pl> wrote:

1. Is it possible to add a function that returns the version of the generated uuid?
It will be very useful. 
I don't know if it's possible, but I think there are bits in the UUID that inform about the version.
What do you think if we have functions get_uuid_v7_ver(uuid) and get_uuid_v7_var(uuid) to extract bit fields according to [0] ? Or, perhaps, this should be one function with two return parameters?
It's not in a patch yet, I'm just considering how this functionality should look like.
uuid_ver(uuid) -> smallint/integer 1/3/4/5/6/7/8

Of course there is RFC 4122 Variant "bits: 10x". If it is other variant then uuid_ver should return -1 OR NULL.
For UUIDs generated by your patch this function should always return 7.
2. If there is any doubt about adding the function to the main sources (standard development in progress), in my opinion you can definitely add this function to the uuid-ossp extension.
From my POV we can just have this function in the core. OSSP support for UUID seems more or less dead [1]: "Newsflash: 04-Jul-2008: Released OSSP uuid 1.6.2". Or do I look into wrong place?
After two days of thinking about UUID v7, I consider it a very important functionality that should be included in PG17.
3. Wouldn't it be worth including UUID version 6 as well?
The standard in [0] says "Systems that do not involve legacy UUIDv1 SHOULD use UUIDv7 Section 5.7 instead." If there's a point in developing v6 - I'm OK to do so.
IETF standard should provide information about possibility of conversion from v1 to v6.
Then the usefulness of v6 is much greater and it would be worth implementing this version as well.
4. Sometimes you will need to generate a uuid for historical time. There should be an additional function gen_uuid_v7(timestamp).
Done, please see patch attached. But I changed signature to gen_uuid_v7(int8), to avoid messing with bytes from user who knows what they want. Or do you think gen_uuid_v7(timestamp) would be more convenient?
I talked to my colleagues and everyone chooses the timestamp version.
If timestamp is outside the allowed range, the function must return an error.

We also talked about uuid-ossp. Still, v5 is a great solution in some applications.
It is worth moving this function from extension to PG17. Many people don't use it because they don't know it and this uuid schema.

We think it would be quite reasonable to add:
uuid_generate_v5 (namespace uuid, name text) -> uuid
uuid_generate_v6 () -> uuid
uuid_generate_v6 (timestamptz) -> uuid
uuid_generate_v7() -> uuid
uuid_generate_v7(timestamptz) -> uuid
uuid_ver(uuid) -> smallint -1/1/2/3/4/5/6/7/8
uuid_ts(uuid) -> timestamptz (for 1/6/7 version, for other should return NULL, error is too heavy in our opinion)
uuid_v1_to_v6 (uuid) -> uuid

The naming of this family of functions needs to be rethought.
Do we adopt the naming standard from Postgres and the uuid-ossp extension?
Or should we continue with a slightly less accurate name for PG: get_random_uuid (get_random_uuid, get_uuid_v7)?

5. Please add in docs reference to RFC4122 (https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#uuid)
People should read standards. :-)
Thanks!

Best regards, Andrey Borodin.


[0] https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#uuidv7
[1] http://www.ossp.org/

--
Przemysław Sztoch | Mobile +48 509 99 00 66

Re: UUID v7

From
Sergey Prokhorenko
Date:
Hello Przemysław and Andrey,

When naming functions, I would advise using the shorter abbreviation uuidv7 from the new version of the RFC instead of uuid_v7. When people search Google for new versions of UUIDs, they enter the abbreviation uuidv7 into the search bar. The name generate_uuidv7() looks good, as well as uuidv1_to_uuidv6() and timestamp_to_uuidv7().

Best regards,

Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Friday, 5 January 2024 at 11:53:04 am GMT+3, Przemysław Sztoch <przemyslaw@sztoch.pl> wrote:


Andrey M. Borodin wrote on 1/4/2024 7:20 PM:
Hello Przemysław,

thanks for your interest in this patch!

On 3 Jan 2024, at 04:37, Przemysław Sztoch <przemyslaw@sztoch.pl> wrote:

1. Is it possible to add a function that returns the version of the generated uuid?
It will be very useful. 
I don't know if it's possible, but I think there are bits in the UUID that inform about the version.
What do you think if we have functions get_uuid_v7_ver(uuid) and get_uuid_v7_var(uuid) to extract bit fields according to [0] ? Or, perhaps, this should be one function with two return parameters?
It's not in a patch yet, I'm just considering how this functionality should look like.
uuid_ver(uuid) -> smallint/integer 1/3/4/5/6/7/8

Of course there is RFC 4122 Variant "bits: 10x". If it is other variant then uuid_ver should return -1 OR NULL.
For UUIDs generated by your patch this function should always return 7.
2. If there is any doubt about adding the function to the main sources (standard development in progress), in my opinion you can definitely add this function to the uuid-ossp extension.
From my POV we can just have this function in the core. OSSP support for UUID seems more or less dead [1]: "Newsflash: 04-Jul-2008: Released OSSP uuid 1.6.2". Or do I look into wrong place?
After two days of thinking about UUID v7, I consider it a very important functionality that should be included in PG17.
3. Wouldn't it be worth including UUID version 6 as well?
The standard in [0] says "Systems that do not involve legacy UUIDv1 SHOULD use UUIDv7 Section 5.7 instead." If there's a point in developing v6 - I'm OK to do so.
IETF standard should provide information about possibility of conversion from v1 to v6.
Then the usefulness of v6 is much greater and it would be worth implementing this version as well.
4. Sometimes you will need to generate a uuid for historical time. There should be an additional function gen_uuid_v7(timestamp).
Done, please see patch attached. But I changed signature to gen_uuid_v7(int8), to avoid messing with bytes from user who knows what they want. Or do you think gen_uuid_v7(timestamp) would be more convenient?
I talked to my colleagues and everyone chooses the timestamp version.
If timestamp is outside the allowed range, the function must return an error.

We also talked about uuid-ossp. Still, v5 is a great solution in some applications.
It is worth moving this function from extension to PG17. Many people don't use it because they don't know it and this uuid schema.

We think it would be quite reasonable to add:
uuid_generate_v5 (namespace uuid, name text) -> uuid
uuid_generate_v6 () -> uuid
uuid_generate_v6 (timestamptz) -> uuid
uuid_generate_v7() -> uuid
uuid_generate_v7(timestamptz) -> uuid
uuid_ver(uuid) -> smallint -1/1/2/3/4/5/6/7/8
uuid_ts(uuid) -> timestamptz (for 1/6/7 version, for other should return NULL, error is too heavy in our opinion)
uuid_v1_to_v6 (uuid) -> uuid

The naming of this family of functions needs to be rethought.
Do we adopt the naming standard from Postgres and the uuid-ossp extension?
Or should we continue with a slightly less accurate name for PG: get_random_uuid (get_random_uuid, get_uuid_v7)?

5. Please add in docs reference to RFC4122 (https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#uuid)
People should read standards. :-)
Thanks!

Best regards, Andrey Borodin.


[0] https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#uuidv7
[1] http://www.ossp.org/

--
Przemysław Sztoch
| Mobile +48 509 99 00 66

Re: UUID v7

From
Andrey Borodin
Date:

> On 5 Jan 2024, at 15:57, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:

Sergey, Przemysław, Jelte, thanks for your feedback.
Here's v9. Changes:
1. Swapped type of the argument to timestamptz in gen_uuid_v7()
2. Renamed get_uuid_v7_time() to uuid_v7_time()
3. Added  uuid_ver() and uuid_var().

What do you think?

Best regards, Andrey Borodin.

Attachment

Re: UUID v7

From
Aleksander Alekseev
Date:
Hi Andrey,

> Sergey, Przemysław, Jelte, thanks for your feedback.
> Here's v9. Changes:
> 1. Swapped type of the argument to timestamptz in gen_uuid_v7()
> 2. Renamed get_uuid_v7_time() to uuid_v7_time()
> 3. Added  uuid_ver() and uuid_var().
>
> What do you think?

Many thanks for the updated patch. It's an important work and I very
much hope we will see this in the upcoming PG release.

```
+Datum
+pg_node_tree_in(PG_FUNCTION_ARGS)
+{
+    if (!IsBootstrapProcessingMode())
+        elog(ERROR, "cannot accept a value of type pg_node_tree_in");
+    return textin(fcinfo);
+}
```

Not 100% sure what this is for. Any chance this could be part of another patch?

One thing I don't particularly like about the tests is the fact that
they don't check if a correct UUID was actually generated. I realize
that's not quite trivial due to the random nature of the function, but
maybe we could use some substring/regex magic here? Something like:

```
select gen_uuid_v7() :: text ~ '^[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}$';
 ?column?
----------
 t

select regexp_replace(gen_uuid_v7('2024-01-16 15:45:33 MSK') :: text,
'[0-9a-f]{4}-[0-9a-f]{12}$', 'XXXX-' || repeat('X', 12));
            regexp_replace
--------------------------------------
 018d124e-39c8-74c7-XXXX-XXXXXXXXXXXX
```


```
+  proname => 'uuid_v7_time', proleakproof => 't', provolatile => 'i',
```

I don't think we conventionally specify IMMUTABLE volatility, it's the
default. Other values also are worth checking.

Another question: how did you choose between using TimestampTz and
Timestamp types? I realize that internally it's all the same. Maybe
Timestamp will be slightly better since the way it is displayed
doesn't depend on the session settings. Many people I talked to find
this part of TimestampTz confusing.

Also I would like to point out that part of the documentation is
missing, but I guess at this stage of the game it's OK.

Last but not least: maybe we should support casting Timestamp[Tz] to
UUIDv7 and vice versa? Shouldn't be difficult to implement and I
suspect somebody will request this eventually. During the cast to UUID
we will always get the same value for the given Timestamp[Tz], which
probably can be useful in certain applications. It can't be done with
gen_uuid_v7() and its volatility doesn't permit it.

--
Best regards,
Aleksander Alekseev



Re: UUID v7

From
"Andrey M. Borodin"
Date:
Thanks for your review, Aleksander!

> On 16 Jan 2024, at 18:00, Aleksander Alekseev <aleksander@timescale.com> wrote:
>
>
> ```
> +Datum
> +pg_node_tree_in(PG_FUNCTION_ARGS)
> +{
> +    if (!IsBootstrapProcessingMode())
> +        elog(ERROR, "cannot accept a value of type pg_node_tree_in");
> +    return textin(fcinfo);
> +}
> ```
>
> Not 100% sure what this is for. Any chance this could be part of another patch?
Nope, it’s necessary there. Without these changes catalog functions cannot have defaults for arguments. These defaults
havetype pg_node_tree which has no-op in function. 

> One thing I don't particularly like about the tests is the fact that
> they don't check if a correct UUID was actually generated. I realize
> that's not quite trivial due to the random nature of the function, but
> maybe we could use some substring/regex magic here? Something like:
>
> ```
> select gen_uuid_v7() :: text ~ '^[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}$';
> ?column?
> ----------
> t
>
> select regexp_replace(gen_uuid_v7('2024-01-16 15:45:33 MSK') :: text,
> '[0-9a-f]{4}-[0-9a-f]{12}$', 'XXXX-' || repeat('X', 12));
>            regexp_replace
> --------------------------------------
> 018d124e-39c8-74c7-XXXX-XXXXXXXXXXXX
> ```
Any 8 bytes which have ver and var bits (6 bits total) are correct UUID.
This is checked by tests when uuid_var() and uuid_ver() functions are exercised.

> ```
> +  proname => 'uuid_v7_time', proleakproof => 't', provolatile => 'i',
> ```
>
> I don't think we conventionally specify IMMUTABLE volatility, it's the
> default. Other values also are worth checking.
Makes sense, I’ll drop this values in next version.
BTW I’m in doubt if provided functions are leakproof. They ERROR-out with messages that can give a clue about several
bitsof UUID. Does this break leakproofness? I think yest, but I’m not sure. 
gen_uuid_v7() seems leakproof to me.

> Another question: how did you choose between using TimestampTz and
> Timestamp types? I realize that internally it's all the same. Maybe
> Timestamp will be slightly better since the way it is displayed
> doesn't depend on the session settings. Many people I talked to find
> this part of TimestampTz confusing.

I mean, this argument is expected to be used to implement K-way sorted identifiers. In this context, it seems to me,
it’sgood to remember to developer that time shift also depend on timezones. 
But this is too vague.
Do you have any reasons that apply to UUID generation?

> Also I would like to point out that part of the documentation is
> missing, but I guess at this stage of the game it's OK.
>
> Last but not least: maybe we should support casting Timestamp[Tz] to
> UUIDv7 and vice versa? Shouldn't be difficult to implement and I
> suspect somebody will request this eventually. During the cast to UUID
> we will always get the same value for the given Timestamp[Tz], which
> probably can be useful in certain applications. It can't be done with
> gen_uuid_v7() and its volatility doesn't permit it.
I’m strongly opposed to doing this cast. I was not adding this function to extract timestamp from UUID, because
standarddoes not recommend it. But a lot of people asked for this. 
But supporting easy way to do unrecommended thing seem bad.


Best regards, Andrey Borodin.


Re: UUID v7

From
Sergey Prokhorenko
Date:
Andrey,

It is not clear how to interpret uuid_v7_time():
  • uuid_v7 to time() (extracting the timestamp)
  • time() to uuid_v7  (generation of the uuid_v7)
It is worth improving the naming, for example, adding prepositions.


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Tuesday, 16 January 2024 at 05:44:51 pm GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:


Thanks for your review, Aleksander!

> On 16 Jan 2024, at 18:00, Aleksander Alekseev <aleksander@timescale.com> wrote:
>
>
> ```
> +Datum
> +pg_node_tree_in(PG_FUNCTION_ARGS)
> +{
> +    if (!IsBootstrapProcessingMode())
> +        elog(ERROR, "cannot accept a value of type pg_node_tree_in");
> +    return textin(fcinfo);
> +}
> ```
>
> Not 100% sure what this is for. Any chance this could be part of another patch?
Nope, it’s necessary there. Without these changes catalog functions cannot have defaults for arguments. These defaults have type pg_node_tree which has no-op in function.

> One thing I don't particularly like about the tests is the fact that
> they don't check if a correct UUID was actually generated. I realize
> that's not quite trivial due to the random nature of the function, but
> maybe we could use some substring/regex magic here? Something like:
>
> ```
> select gen_uuid_v7() :: text ~ '^[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}$';
> ?column?
> ----------
> t
>
> select regexp_replace(gen_uuid_v7('2024-01-16 15:45:33 MSK') :: text,
> '[0-9a-f]{4}-[0-9a-f]{12}$', 'XXXX-' || repeat('X', 12));
>            regexp_replace
> --------------------------------------
> 018d124e-39c8-74c7-XXXX-XXXXXXXXXXXX
> ```
Any 8 bytes which have ver and var bits (6 bits total) are correct UUID.
This is checked by tests when uuid_var() and uuid_ver() functions are exercised.

> ```
> +  proname => 'uuid_v7_time', proleakproof => 't', provolatile => 'i',
> ```
>
> I don't think we conventionally specify IMMUTABLE volatility, it's the
> default. Other values also are worth checking.
Makes sense, I’ll drop this values in next version.
BTW I’m in doubt if provided functions are leakproof. They ERROR-out with messages that can give a clue about several bits of UUID. Does this break leakproofness? I think yest, but I’m not sure.
gen_uuid_v7() seems leakproof to me.

> Another question: how did you choose between using TimestampTz and
> Timestamp types? I realize that internally it's all the same. Maybe
> Timestamp will be slightly better since the way it is displayed
> doesn't depend on the session settings. Many people I talked to find
> this part of TimestampTz confusing.

I mean, this argument is expected to be used to implement K-way sorted identifiers. In this context, it seems to me, it’s good to remember to developer that time shift also depend on timezones.
But this is too vague.
Do you have any reasons that apply to UUID generation?


> Also I would like to point out that part of the documentation is
> missing, but I guess at this stage of the game it's OK.
>
> Last but not least: maybe we should support casting Timestamp[Tz] to
> UUIDv7 and vice versa? Shouldn't be difficult to implement and I
> suspect somebody will request this eventually. During the cast to UUID
> we will always get the same value for the given Timestamp[Tz], which
> probably can be useful in certain applications. It can't be done with
> gen_uuid_v7() and its volatility doesn't permit it.

I’m strongly opposed to doing this cast. I was not adding this function to extract timestamp from UUID, because standard does not recommend it. But a lot of people asked for this.
But supporting easy way to do unrecommended thing seem bad.


Best regards, Andrey Borodin.

Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 16 Jan 2024, at 21:49, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:
>
> It is not clear how to interpret uuid_v7_time():
>     • uuid_v7 to time() (extracting the timestamp)
>     • time() to uuid_v7  (generation of the uuid_v7)
> It is worth improving the naming, for example, adding prepositions.

Previously, Jelte had some thoughts on idiomatic function names.

Jelte, what is your opinion on naming the function which extracts timestamp from UUID v7?
Of cause, it would be great to hear opinion from anyone else.


Best regards, Andrey Borodin.


Re: UUID v7

From
Jelte Fennema-Nio
Date:
On Tue, 16 Jan 2024 at 15:44, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> > On 16 Jan 2024, at 18:00, Aleksander Alekseev <aleksander@timescale.com> wrote:
> > Not 100% sure what this is for. Any chance this could be part of another patch?
> Nope, it’s necessary there. Without these changes catalog functions cannot have defaults for arguments. These
defaultshave type pg_node_tree which has no-op in function. 

That seems like the wrong way to make that work then. How about
instead we define the same function name twice, once with and once
without a timestamp argument. That's how this is done for other
functions that are overloaded in pg_catalog.



Re: UUID v7

From
Przemysław Sztoch
Date:
Andrey Borodin wrote on 1/16/2024 1:15 PM:
Sergey, Przemysław, Jelte, thanks for your feedback.
Here's v9. Changes:
1. Swapped type of the argument to timestamptz in gen_uuid_v7()
Please update docs part about optional timestamp argument.
2. Renamed get_uuid_v7_time() to uuid_v7_time()
Please rename uuid_v7_time to uuid_time() and add support for v1 and v6.
If version is incompatible then return NULL.
3. Added  uuid_ver() and uuid_var().
Looks good.
But for me, throwing an error is problematic. Wouldn't it be better to return -1.
What do you think?
Best regards, Andrey Borodin.
--
Przemysław Sztoch | Mobile +48 509 99 00 66

Re: UUID v7

From
Jelte Fennema-Nio
Date:
On Tue, 16 Jan 2024 at 19:17, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> Jelte, what is your opinion on naming the function which extracts timestamp from UUID v7?

I looked at a few more datatypes: json, jsonb & hstore. The get_
prefix is not used there at all, so I'm still opposed to that. But
they seem to use either an _to_ or an _extract_ infix. _to_ is then
used for conversion of the whole object, and _extract_ is used to
extract a subset. So I think _extract_ would fit well here.

On Fri, 5 Jan 2024 at 11:57, Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
> When naming functions, I would advise using the shorter abbreviation uuidv7 from the new version of the RFC instead
ofuuid_v7.
 

I also agree with that, uuid_v7 looks weird to my eyes. The RFC also
abbreviates them as UUIDv7 (without a space).

The more I look at it the more I also think the gen_ prefix is quite
strange, and I already thought the gen_random_uuid name was quite
weird. But now that we will also have a uuidv7 I think it's even
stranger that one uses the name from the RFC.

The name of gen_random_uuid was taken verbatim from pgcrypto, without
any discussion on the list[0]:

> Here is a proposed patch for this. I did a fair bit of looking around
> in other systems for a naming pattern but didn't find anything
> consistent. So I ended up just taking the function name and code from
> pgcrypto.


So currently my preference for the function names would be:

- uuidv4() -> alias for gen_random_uuid()
- uuidv7()
- uuidv7(timestamptz)
- uuid_extract_ver(uuid)
- uuid_extract_var(uuid)
- uuidv7_extract_time(uuid)

[0]:
https://www.postgresql.org/message-id/flat/6a65610c-46fc-2323-6b78-e8086340a325%402ndquadrant.com#76e40e950a44aa8b6844297e8d2efe2c



Re: UUID v7

From
Przemysław Sztoch
Date:
Jelte Fennema-Nio wrote on 1/16/2024 9:25 PM:
On Tue, 16 Jan 2024 at 19:17, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
So currently my preference for the function names would be:

- uuidv4() -> alias for gen_random_uuid()
- uuidv7()
- uuidv7(timestamptz)
- uuid_extract_ver(uuid)
- uuid_extract_var(uuid)
- uuidv7_extract_time(uuid)
+1
But replace uuidv7_extract_time(uuid) with uuid_extract_time(uuid) - function should be able extract timestamp from v1/v6/v7

I would highly recommend to add:
uuidv5(namespace uuid, name text) -> uuid
using uuid_generate_v5 from uuid-ossp extension (https://www.postgresql.org/docs/current/uuid-ossp.html)
There is an important version and it should be included into the main PG code.

Jelte: Please propose the name of the function that will convert uuid from version 1 to 6.
v6 is almost as good as v7 for indexes. And v6 allows you to convert from v1 which some people use.
--
Przemysław Sztoch | Mobile +48 509 99 00 66

Re: UUID v7

From
Przemysław Sztoch
Date:
Another question: how did you choose between using TimestampTz and
Timestamp types? I realize that internally it's all the same. Maybe
Timestamp will be slightly better since the way it is displayed
doesn't depend on the session settings. Many people I talked to find
this part of TimestampTz confusing.
timstamptz internally always store UTC.
I believe that in SQL, when operating with time in UTC, you should always use timestamptz.
timestamp is theoretically the same thing. But internally it does not convert time to UTC and will lead to incorrect use.
--
Przemysław Sztoch | Mobile +48 509 99 00 66

Re: UUID v7

From
Jelte Fennema-Nio
Date:
On Tue, 16 Jan 2024 at 22:02, Przemysław Sztoch <przemyslaw@sztoch.pl> wrote:
> But replace uuidv7_extract_time(uuid) with uuid_extract_time(uuid) - function should be able extract timestamp from
v1/v6/v7

I'm fine with this.

> I would highly recommend to add:
> uuidv5(namespace uuid, name text) -> uuid
> using uuid_generate_v5 from uuid-ossp extension (https://www.postgresql.org/docs/current/uuid-ossp.html)
> There is an important version and it should be included into the main PG code.

I think adding more uuid versions would probably be desirable. But I
don't think it makes sense to clutter this patchset with that. I feel
like on this uuidv7 patchset we've had enough discussion that it could
reasonably get into PG17, but I think adding even more uuid versions
to this patchset would severely reduce the chances of that happening.



Re: UUID v7

From
Andrey Borodin
Date:

> On 17 Jan 2024, at 02:19, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

I want to ask Kyzer or Brad, I hope they will see this message. I'm working on the patch for time extraction for v1, v6
andv7. 

Do I understand correctly, that UUIDs contain local time, not UTC time? For examples in [0] I see that "A.6. Example of
aUUIDv7 Value" I see that February 22, 2022 2:22:22.00 PM GMT-05:00 results in unix_ts_ms = 0x017F22E279B0, which is
notUTC, but local time. 
Is it intentional? Section "5.1. UUID Version 1" states otherwise.

If so, I should swap signatures of functions from TimestampTz to Timestamp.
I'm hard-coding examples from this standard to tests, so I want to be precise...

If I follow the standard I see this in tests:
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') at time zone 'GMT-05';
+         timezone
+--------------------------
+ Wed Feb 23 00:22:22 2022
+(1 row)

Current patch version attached. I've addressed all other requests: function renames, aliases, multiple functions
insteadof optional params, cleaner catalog definitions, not throwing error when [var,ver,time] value is unknown. 
What is left: deal with timezones, improve documentation.


Best regards, Andrey Borodin.

[0] https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis#name-example-of-a-uuidv1-value

Attachment

Re: UUID v7

From
Aleksander Alekseev
Date:
Hi,

> Another question: how did you choose between using TimestampTz and
> Timestamp types? I realize that internally it's all the same. Maybe
> Timestamp will be slightly better since the way it is displayed
> doesn't depend on the session settings. Many people I talked to find
> this part of TimestampTz confusing.
>
> timstamptz internally always store UTC.
> I believe that in SQL, when operating with time in UTC, you should always use timestamptz.
> timestamp is theoretically the same thing. But internally it does not convert time to UTC and will lead to incorrect
use.

No.

Timestamp and TimestampTz are absolutely the same thing. The only
difference is how they are shown to the user. TimestampTz uses session
context in order to be displayed in the TZ chosen by the user. Thus
typically it is somewhat more confusing to the users and thus I asked
whether there was a good reason to choose TimestampTz over Timestamp.

-- 
Best regards,
Aleksander Alekseev



Re: UUID v7

From
Andrey Borodin
Date:

> On 18 Jan 2024, at 19:20, Aleksander Alekseev <aleksander@timescale.com> wrote:
>
> Timestamp and TimestampTz are absolutely the same thing.
My question is not about Postgres data types. I'm asking about examples in the standard.

There's an example 017F22E2-79B0-7CC3-98C4-DC0C0C07398F. It is expected to be generated on "Tuesday, February 22, 2022
2:22:22.00PM GMT-05:00". 
It's exaplained to be 164555774200000ns after 1582-10-15 00:00:00 UTC.

But 164555774200000ns after 1582-10-15 00:00:00 UTC  was  2022-02-22 19:22:22 UTC. And that was 2022-02-23 00:22:22 in
UTC-05.


Best regards, Andrey Borodin.


Re: UUID v7

From
Aleksander Alekseev
Date:
Hi Andrey,

> > Timestamp and TimestampTz are absolutely the same thing.
> My question is not about Postgres data types. I'm asking about examples in the standard.
>
> There's an example 017F22E2-79B0-7CC3-98C4-DC0C0C07398F. It is expected to be generated on "Tuesday, February 22,
20222:22:22.00 PM GMT-05:00".
 
> It's exaplained to be 164555774200000ns after 1582-10-15 00:00:00 UTC.
>
> But 164555774200000ns after 1582-10-15 00:00:00 UTC  was  2022-02-22 19:22:22 UTC. And that was 2022-02-23 00:22:22
inUTC-05.
 

Not 100% sure which text you are referring to exactly, but I'm
guessing it's section B.2 of [1]

"""
This example UUIDv7 test vector utilizes a well-known 32 bit Unix
epoch with additional millisecond precision to fill the first 48 bits
[...]
The timestamp is Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00
represented as 0x17F22E279B0 or 1645557742000
"""

If this is the case, I think the example is indeed wrong:

```
=# select extract(epoch from 'Tuesday, February 22, 2022 2:22:22.00 PM
GMT-05:00' :: timestamptz)*1000;
       ?column?
----------------------
 1645521742000.000000
(1 row)
```

And the difference between the value in the text and the actual value
is 10 hours as you pointed out.

Also you named the date 1582-10-15 00:00:00 UTC. Maybe you actually
meant 1970-01-01 00:00:00 UTC?

[1]: https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-04.html

-- 
Best regards,
Aleksander Alekseev



Re: UUID v7

From
Sergey Prokhorenko
Date:
Hi Andrey,

Aleksander Alekseev wrote: "If this is the case, I think the example is indeed wrong". 

This is one of the reasons why I was categorically against any examples of implementation in the new RFC. The examples have been very poorly studied and discussed, and therefore it is better not to use them at all. But the text of the RFC itself clearly refers to UTC, and not at all about local time: "UUID version 7 features a time-ordered value field derived from the widely implemented and well known Unix Epoch timestamp source, the number of milliseconds since midnight 1 Jan 1970 UTC, leap seconds excluded". The main reason for using UTC is so that UUIDv7's, generated approximately simultaneously in different time zones, are correctly ordered in time when they get into one database.


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Thursday, 18 January 2024 at 07:22:05 pm GMT+3, Aleksander Alekseev <aleksander@timescale.com> wrote:


Hi Andrey,

> > Timestamp and TimestampTz are absolutely the same thing.
> My question is not about Postgres data types. I'm asking about examples in the standard.
>
> There's an example 017F22E2-79B0-7CC3-98C4-DC0C0C07398F. It is expected to be generated on "Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00".
> It's exaplained to be 164555774200000ns after 1582-10-15 00:00:00 UTC.
>
> But 164555774200000ns after 1582-10-15 00:00:00 UTC  was  2022-02-22 19:22:22 UTC. And that was 2022-02-23 00:22:22 in UTC-05.

Not 100% sure which text you are referring to exactly, but I'm
guessing it's section B.2 of [1]

"""
This example UUIDv7 test vector utilizes a well-known 32 bit Unix
epoch with additional millisecond precision to fill the first 48 bits
[...]
The timestamp is Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00
represented as 0x17F22E279B0 or 1645557742000
"""

If this is the case, I think the example is indeed wrong:

```
=# select extract(epoch from 'Tuesday, February 22, 2022 2:22:22.00 PM
GMT-05:00' :: timestamptz)*1000;
      ?column?
----------------------
1645521742000.000000
(1 row)
```

And the difference between the value in the text and the actual value
is 10 hours as you pointed out.

Also you named the date 1582-10-15 00:00:00 UTC. Maybe you actually
meant 1970-01-01 00:00:00 UTC?

[1]: https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-04.html

--
Best regards,

Aleksander Alekseev

Re: UUID v7

From
Andrey Borodin
Date:

> On 18 Jan 2024, at 20:39, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
> But 164555774200000ns after 1582-10-15 00:00:00 UTC  was  2022-02-22 19:22:22 UTC. And that was 2022-02-23 00:22:22
inUTC-05. 


'2022-02-22 19:22:22 UTC' is exactly that moment which was encoded into example UUIDs. It's not '2022-02-23 00:22:22 in
UTC-05'as I thought. 
I got confused by "at timezone" changes which in fact removes timezone information. And that's per SQL standard...

Now I'm completely lost in time... I've set local time to NY (UTC-5).

postgres=# select TIMESTAMP WITH TIME ZONE '2022-02-22 14:22:22-05' - TIMESTAMP WITH TIME ZONE 'Tuesday, February 22,
20222:22:22.00 PM GMT-05:00'; 
 ?column?
----------
 10:00:00
(1 row)

postgres=# select TIMESTAMP WITH TIME ZONE 'Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00';
      timestamptz
------------------------
 2022-02-22 04:22:22-05
(1 row)

I cannot wrap my mind around it... Any pointers would be appreciated.
I'm certain that code extracted UTC time correctly, I just want a reliable test that verifies timestamp constant (+ I
understandwhat is going on). 


Best regards, Andrey Borodin.


Re: UUID v7

From
Sergey Prokhorenko
Date:
Hi Andrey,

You'd better generate a test UUIDv7 for midnight 1 Jan 1970 UTC. In this case, the timestamp in UUIDv7 according to the new RFC must be filled with zeros. By extracting the timestamp from this test UUIDv7, you should get exactly midnight 1 Jan 1970 UTC.

I also recommend this article: https://habr.com/ru/articles/772954/



Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Thursday, 18 January 2024 at 09:31:16 pm GMT+3, Andrey Borodin <x4mmm@yandex-team.ru> wrote:




> On 18 Jan 2024, at 20:39, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
> But 164555774200000ns after 1582-10-15 00:00:00 UTC  was  2022-02-22 19:22:22 UTC. And that was 2022-02-23 00:22:22 in UTC-05.


'2022-02-22 19:22:22 UTC' is exactly that moment which was encoded into example UUIDs. It's not '2022-02-23 00:22:22 in UTC-05' as I thought.
I got confused by "at timezone" changes which in fact removes timezone information. And that's per SQL standard...

Now I'm completely lost in time... I've set local time to NY (UTC-5).

postgres=# select TIMESTAMP WITH TIME ZONE '2022-02-22 14:22:22-05' - TIMESTAMP WITH TIME ZONE 'Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00';
?column?
----------
10:00:00
(1 row)

postgres=# select TIMESTAMP WITH TIME ZONE 'Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00';
      timestamptz     
------------------------
2022-02-22 04:22:22-05
(1 row)

I cannot wrap my mind around it... Any pointers would be appreciated.
I'm certain that code extracted UTC time correctly, I just want a reliable test that verifies timestamp constant (+ I understand what is going on).



Best regards, Andrey Borodin.

Re: UUID v7

From
Przemysław Sztoch
Date:
Using localtime would be absurd. Especially since time goes back during summer time change.
I believe our implementation should use UTC. No one forbids us from assuming that our local time for generating uuid is UTC.

Andrey Borodin wrote on 1/18/2024 2:17 PM:

On 17 Jan 2024, at 02:19, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
I want to ask Kyzer or Brad, I hope they will see this message. I'm working on the patch for time extraction for v1, v6 and v7.

Do I understand correctly, that UUIDs contain local time, not UTC time? For examples in [0] I see that "A.6. Example of a UUIDv7 Value" I see that February 22, 2022 2:22:22.00 PM GMT-05:00 results in unix_ts_ms = 0x017F22E279B0, which is not UTC, but local time.
Is it intentional? Section "5.1. UUID Version 1" states otherwise.

If so, I should swap signatures of functions from TimestampTz to Timestamp.
I'm hard-coding examples from this standard to tests, so I want to be precise...

If I follow the standard I see this in tests:
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') at time zone 'GMT-05';
+         timezone         
+--------------------------
+ Wed Feb 23 00:22:22 2022
+(1 row)

Current patch version attached. I've addressed all other requests: function renames, aliases, multiple functions instead of optional params, cleaner catalog definitions, not throwing error when [var,ver,time] value is unknown.
What is left: deal with timezones, improve documentation.


Best regards, Andrey Borodin.

[0] https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis#name-example-of-a-uuidv1-value

--
Przemysław Sztoch | Mobile +48 509 99 00 66

Re: UUID v7

From
Przemysław Sztoch
Date:
Aleksander Alekseev wrote on 1/18/2024 3:20 PM:
Hi,

Another question: how did you choose between using TimestampTz and
Timestamp types? I realize that internally it's all the same. Maybe
Timestamp will be slightly better since the way it is displayed
doesn't depend on the session settings. Many people I talked to find
this part of TimestampTz confusing.

timstamptz internally always store UTC.
I believe that in SQL, when operating with time in UTC, you should always use timestamptz.
timestamp is theoretically the same thing. But internally it does not convert time to UTC and will lead to incorrect use.
No.

Timestamp and TimestampTz are absolutely the same thing. The only
difference is how they are shown to the user. TimestampTz uses session
context in order to be displayed in the TZ chosen by the user. Thus
typically it is somewhat more confusing to the users and thus I asked
whether there was a good reason to choose TimestampTz over Timestamp.


Theoretically, you're right. But look at this example:

SET timezone TO 'Europe/Warsaw';
SELECT extract(epoch from '2024-01-18 9:27:30'::timestamp), extract(epoch from '2024-01-18 9:27:30'::timestamptz);

 date_part  | date_part
------------+------------
 1705570050 | 1705566450
(1 row)

In my opinion, timestamptz gives greater guarantees that the time internally is in UTC and the user gets the time in his/her time zone.

In the case of timestamp, it is never certain whether it keeps time in UTC or in the local zone.

In the case of argument's type, there would be no problem because we could create two functions.
Of course timestamp would be treated the same as timestamptz.
But here we have a problem with the function return type, which can only be one. And since the time returned is in UTC, it should be timestamptz.

--
Przemysław Sztoch | Mobile +48 509 99 00 66

Re: UUID v7

From
Przemysław Sztoch
Date:
We are not allowed to consider any time other than UTC.

You need to write to the authors of the standard. I suppose this is a mistake.

I know from experience that errors in such standards most often appear in examples.
Nobody detects them at first.
Everyone reads and checks ideas, not calculations.
Then developers during implementation tears out their hair.

Andrey Borodin wrote on 1/18/2024 4:39 PM:

On 18 Jan 2024, at 19:20, Aleksander Alekseev <aleksander@timescale.com> wrote:

Timestamp and TimestampTz are absolutely the same thing.
My question is not about Postgres data types. I'm asking about examples in the standard.

There's an example 017F22E2-79B0-7CC3-98C4-DC0C0C07398F. It is expected to be generated on "Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00".
It's exaplained to be 164555774200000ns after 1582-10-15 00:00:00 UTC.

But 164555774200000ns after 1582-10-15 00:00:00 UTC  was  2022-02-22 19:22:22 UTC. And that was 2022-02-23 00:22:22 in UTC-05.


Best regards, Andrey Borodin.

--
Przemysław Sztoch | Mobile +48 509 99 00 66

Re: UUID v7

From
Lukas Fittl
Date:
On Thu, Jan 18, 2024 at 5:18 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
Current patch version attached. I've addressed all other requests: function renames, aliases, multiple functions instead of optional params, cleaner catalog definitions, not throwing error when [var,ver,time] value is unknown.
What is left: deal with timezones, improve documentation.

I've done a test of the v10 patch, and ran into an interesting behavior when passing in a timestamp to the function (which, as a side note, is actually very useful to have as a feature, to support creating time-based range partitions on UUIDv7 fields):

postgres=# SELECT uuid_extract_time(uuidv7());
     uuid_extract_time    
---------------------------
 2024-01-18 18:49:00.01-08
(1 row)

postgres=# SELECT uuid_extract_time(uuidv7('2024-04-01'));
   uuid_extract_time    
------------------------
 2024-04-01 00:00:00-07
(1 row)

postgres=# SELECT uuid_extract_time(uuidv7());
   uuid_extract_time    
------------------------
 2024-04-01 00:00:00-07
(1 row)

Note how calling the uuidv7 function again after having called it with a fixed future timestamp, returns the future timestamp, even though it should return the current time.

I believe this is caused by incorrectly re-using the cached previous_timestamp. In the second call here (with a fixed future timestamp), we end up setting ts and tms to 2024-04-01, with increment_counter = false, which leads us to set previous_timestamp to the passed in timestamp (else branch of the second if in uuidv7). When we then call the function again without an argument, we end up getting a new timestamp from gettimeofday, but because we try to detect backwards leaps, we set increment_counter to true, and thus end up reusing the previous (future) timestamp here:

/* protection from leap backward */
tms = previous_timestamp;

Not sure how to fix this, but clearly something is amiss here.

Thanks,
Lukas

--
Lukas Fittl

Re: UUID v7

From
"David G. Johnston"
Date:
On Thu, Jan 18, 2024 at 11:31 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Now I'm completely lost in time... I've set local time to NY (UTC-5).

postgres=# select TIMESTAMP WITH TIME ZONE '2022-02-22 14:22:22-05' - TIMESTAMP WITH TIME ZONE 'Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00';
 ?column?
----------
 10:00:00
(1 row)


You are mixing POSIX and ISO-8601 conventions and, as noted in our appendix, they disagree on the direction that is positive.

 
The offset fields specify the hours, and optionally minutes and seconds, difference from UTC. They have the format hh[:mm[:ss]] optionally with a leading sign (+ or -). The positive sign is used for zones west of Greenwich. (Note that this is the opposite of the ISO-8601 sign convention used elsewhere in PostgreSQL.)

David J.


Re: UUID v7

From
Andrey Borodin
Date:

> On 19 Jan 2024, at 08:24, David G. Johnston <david.g.johnston@gmail.com> wrote:
>
>
> You are mixing POSIX and ISO-8601 conventions and, as noted in our appendix, they disagree on the direction that is
positive.

Thanks! Now everything seems on its place.

I want to include in the patch following tests:
-- extract UUID v1, v6 and v7 timestamp
SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM
GMT+05:00';
SELECT uuid_extract_time('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM
GMT+05:00';
SELECT uuid_extract_time('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM
GMT+05:00';

How do you think, will it be stable all across buildfarm? Or should we change anything to avoid false positives
inferredfrom different timestamp parsing? 


> On 19 Jan 2024, at 07:58, Lukas Fittl <lukas@fittl.com> wrote:
>
> Note how calling the uuidv7 function again after having called it with a fixed future timestamp, returns the future
timestamp,even though it should return the current time. 

Thanks for the review.
Well, that was intentional. But now I see it's kind of confusing behaviour. I've changed it to more expected version.

Also, I've added some documentation on all functions.


Best regards, Andrey Borodin.

Attachment

Re: UUID v7

From
Aleksander Alekseev
Date:
Hi,

> No.
>
> Timestamp and TimestampTz are absolutely the same thing. The only
> difference is how they are shown to the user. TimestampTz uses session
> context in order to be displayed in the TZ chosen by the user. Thus
> typically it is somewhat more confusing to the users and thus I asked
> whether there was a good reason to choose TimestampTz over Timestamp.
>
>
> Theoretically, you're right. But look at this example:
>
> SET timezone TO 'Europe/Warsaw';
> SELECT extract(epoch from '2024-01-18 9:27:30'::timestamp), extract(epoch from '2024-01-18 9:27:30'::timestamptz);
>
>  date_part  | date_part
> ------------+------------
>  1705570050 | 1705566450
> (1 row)
>
> In my opinion, timestamptz gives greater guarantees that the time internally is in UTC and the user gets the time in
his/hertime zone.
 

I believe you didn't notice, but this example just proves my point.

In this case you have two timestamps that are different _internally_,
but the way they are _shown_ is the same because the first one is in
UTC and the second one in your local session timezone, Europe/Warsaw.
extract(epoch ...) extract UNIX epoch, i.e. relies on the _internal_
representation. This is why you got different results.

This demonstrates that TimestampTz is a permanent source of confusion
for the users and the reason why personally I would prefer if UUIDv7
always used Timestamp (no Tz). TimestampTz can be converted to
TimestampTz by users who need them and have experience using them.

-- 
Best regards,
Aleksander Alekseev



Re: UUID v7

From
Andrey Borodin
Date:

> On 19 Jan 2024, at 13:25, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> 
> Also, I've added some documentation on all functions.

Here's v12. Changes:
1. Documentation improvements
2. Code comments
3. Better commit message and reviews list

Best regards, Andrey Borodin.


Attachment

Re: UUID v7

From
Nikolay Samokhvalov
Date:
On Fri, Jan 19, 2024 at 10:07 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:


> On 19 Jan 2024, at 13:25, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
> Also, I've added some documentation on all functions.

Here's v12. Changes:
1. Documentation improvements
2. Code comments
3. Better commit message and reviews list

Thank you, Andrey! I have just checked v12 – cleanly applied to HEAD, and functions work well. I especially like that fact that we keep uuid_extract_time(..) here – this is a great thing to have for time-based partitioning, and in many cases we will be able to decide not to have a creation column timestamp (e.g., "created_at") at all, saving 8 bytes.

The docs and comments look great too.

Overall, the patch looks mature enough. It would be great to have it in pg17. Yes, the RFC is not fully finalized yet, but it's very close. And many libraries are already including implementation of UUIDv7 – here are some examples:

- https://www.npmjs.com/package/uuidv7
- https://crates.io/crates/uuidv7
- https://github.com/google/uuid/pull/139

Nik

Re: UUID v7

From
Nikolay Samokhvalov
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            tested, passed

Manually tested uuidv7(), uuid_extract_time() – they work as expected. The basic docs provided look clear.

I haven't checked the tests though and possible edge cases, so leaving it as "needs review" waiting for more reviewers

Re: UUID v7

From
Aleksander Alekseev
Date:
Hi,

> But now (after big timeseries project with multiple time zones and DST problems) I think differently.
> Even though timestamp and timestamptz are practically the same, timestamptz should be used to store the time in UTC.
> Choosing timestamp is more likely to lead to problems and misunderstandings than timestamptz.

As somebody who contributed TZ support to TimescaleDB I'm more or less
aware about the pros and cons of Timestamp and TimestampTz :)
Engineering is all about compromises. I can imagine a project where it
makes sense to use only TimestampTz for the entire database, and the
opposite - when it's better to use only UTC and Timestamp. In this
particular case I was merely concerned that the particular choice
could be confusing for the users but I think I changed my mind by now,
see below.

>> Here's v12. Changes:
>> 1. Documentation improvements
>> 2. Code comments
>> 3. Better commit message and reviews list
>
>
> Thank you, Andrey! I have just checked v12 – cleanly applied to HEAD, and functions work well. I especially like that
factthat we keep uuid_extract_time(..) here – this is a great thing to have for time-based partitioning, and in many
caseswe will be able to decide not to have a creation column timestamp (e.g., "created_at") at all, saving 8 bytes. 
>
> The docs and comments look great too.
>
> Overall, the patch looks mature enough. It would be great to have it in pg17. Yes, the RFC is not fully finalized
yet,but it's very close. And many libraries are already including implementation of UUIDv7 – here are some examples: 
>
> - https://www.npmjs.com/package/uuidv7
> - https://crates.io/crates/uuidv7
> - https://github.com/google/uuid/pull/139

Thanks!

After playing with v12 I'm inclined to agree that it's RfC.

I only have a couple of silly nitpicks:

- It could make sense to decompose the C implementation of uuidv7() in
two functions, for readability.
- It could make sense to get rid of curly braces in SQL tests when
calling uuid_extract_ver() and uuid_extract_ver(), for consistency.

I'm not going to insist on these changes though and prefer leaving it
to the author and the committer to decide.

Also I take back what I said above about using Timestamp instead of
TimestampTz. I forgot that Timestamps are implicitly casted to
TimestampTz's, so users preferring Timestamps can do this:

```
=# select uuidv7('2024-01-22 12:34:56' :: timestamp);
                uuidv7
--------------------------------------
 018d3085-de00-77c1-9e7b-7b04ddb9ebb9
```

Cfbot also seems to be happy with the patch so I'm changing the CF
entry status to RfC.

--
Best regards,
Aleksander Alekseev



Re: UUID v7

From
Aleksander Alekseev
Date:
Hi,

> Cfbot also seems to be happy with the patch so I'm changing the CF
> entry status to RfC.

I've found a bug:

```
=# select now() - interval '5000 years';
                ?column?
----------------------------------------
 2977-01-24 15:29:01.779462+02:30:17 BC

Time: 0.957 ms

=# select uuidv7(now() - interval '5000 years');
                uuidv7
--------------------------------------
 720c1868-0764-7677-99cd-265b84ea08b9

=# select uuid_extract_time('720c1868-0764-7677-99cd-265b84ea08b9');
     uuid_extract_time
----------------------------
 5943-08-26 21:30:44.836+03
```

-- 
Best regards,
Aleksander Alekseev



Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 24 Jan 2024, at 17:31, Aleksander Alekseev <aleksander@timescale.com> wrote:
>
> Hi,
>
>> Cfbot also seems to be happy with the patch so I'm changing the CF
>> entry status to RfC.
>
> I've found a bug:
>
> ```
> =# select now() - interval '5000 years';
>                ?column?
> ----------------------------------------
> 2977-01-24 15:29:01.779462+02:30:17 BC
>
> Time: 0.957 ms
>
> =# select uuidv7(now() - interval '5000 years');
>                uuidv7
> --------------------------------------
> 720c1868-0764-7677-99cd-265b84ea08b9
>
> =# select uuid_extract_time('720c1868-0764-7677-99cd-265b84ea08b9');
>     uuid_extract_time
> ----------------------------
> 5943-08-26 21:30:44.836+03
> ```

UUIDv7 range does not correspond to timestamp range. But it’s purpose is not in storing timestamp, but in being unique
identifier.So I don’t think it worth throwing an error when overflowing value is given. BTW if you will subtract some
nanoseconds- you will not get back timestamp you put into UUID too. 
UUID does not store timpestamp, it only uses it to generate an identifier. Some value can be extracted back, but with
limitedprecision, limited range and only if UUID was generated precisely by the specification in standard (and standard
allowsdeviation! Most of implementation try to tradeoff something). 


Best regards, Andrey Borodin.


Re: UUID v7

From
Aleksander Alekseev
Date:
Hi,

> UUIDv7 range does not correspond to timestamp range. But it’s purpose is not in storing timestamp, but in being
uniqueidentifier. So I don’t think it worth throwing an error when overflowing value is given. BTW if you will subtract
somenanoseconds - you will not get back timestamp you put into UUID too. 
> UUID does not store timpestamp, it only uses it to generate an identifier. Some value can be extracted back, but with
limitedprecision, limited range and only if UUID was generated precisely by the specification in standard (and standard
allowsdeviation! Most of implementation try to tradeoff something). 

I don't claim that UUIDv7 purpose is storing timestamps, but I think
the invariant:

```
uuid_extract_time(uidv7(X)) == X
```

and (!) even more importantly:

```
if X > Y then uuidv7(X) > uuidv7(Y)
```

... should hold. Otherwise you can calculate crc64(X) or sha256(X)
internally in order to generate an unique ID and claim that it's fine.

Values that violate named invariants should be rejected with an error.

--
Best regards,
Aleksander Alekseev



Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 24 Jan 2024, at 18:02, Aleksander Alekseev <aleksander@timescale.com> wrote:
>
> Hi,
>
>> UUIDv7 range does not correspond to timestamp range. But it’s purpose is not in storing timestamp, but in being
uniqueidentifier. So I don’t think it worth throwing an error when overflowing value is given. BTW if you will subtract
somenanoseconds - you will not get back timestamp you put into UUID too. 
>> UUID does not store timpestamp, it only uses it to generate an identifier. Some value can be extracted back, but
withlimited precision, limited range and only if UUID was generated precisely by the specification in standard (and
standardallows deviation! Most of implementation try to tradeoff something). 
>
> I don't claim that UUIDv7 purpose is storing timestamps, but I think
> the invariant:
>
> ```
> uuid_extract_time(uidv7(X)) == X
> ```
>
> and (!) even more importantly:
>
> ```
> if X > Y then uuidv7(X) > uuidv7(Y)
> ```
>
> ... should hold.
Function to extract timestamp does not provide any guarantees at all. Standard states this, see Kyzer answers upthread.
Moreover, standard urges against relying on that if uuidX was generated before uuidY, then uuidX<uuid. The standard is
doinga lot to make this happen, but does not guaranty that. 
All what is guaranteed is the uniqueness at certain conditions.

> Otherwise you can calculate crc64(X) or sha256(X)
> internally in order to generate an unique ID and claim that it's fine.
>
> Values that violate named invariants should be rejected with an error.

Think about the value that you pass to uuid generation function as an entropy. It’s there to ensure uniqueness and
promoteordering (but not guarantee). 


Best regards, Andrey Borodin.


Re: UUID v7

From
Aleksander Alekseev
Date:
Hi,

> Values that violate named invariants should be rejected with an error.

To clarify, I don't think we should bother about the precision part.
"Equals" in the example above means "equal within UUIDv7 precision",
same for "more" and "less". However, years 2977 BC and 5943 AC are
clearly not equal, thus 2977 BC should be rejected as an invalid value
for UUIDv7.

-- 
Best regards,
Aleksander Alekseev



Re: UUID v7

From
Aleksander Alekseev
Date:
Hi,

> Function to extract timestamp does not provide any guarantees at all. Standard states this, see Kyzer answers
upthread.
> Moreover, standard urges against relying on that if uuidX was generated before uuidY, then uuidX<uuid. The standard
isdoing a lot to make this happen, but does not guaranty that. 
> All what is guaranteed is the uniqueness at certain conditions.
>
> > Otherwise you can calculate crc64(X) or sha256(X)
> > internally in order to generate an unique ID and claim that it's fine.
> >
> > Values that violate named invariants should be rejected with an error.
>
> Think about the value that you pass to uuid generation function as an entropy. It’s there to ensure uniqueness and
promoteordering (but not guarantee). 

If the standard doesn't guarantee something it doesn't mean it forbids
us to give stronger guarantees. I'm convinced that these guarantees
will be useful in real-world applications, at least the ones acting
exclusively within Postgres.

This being said, I understand your point of view too. Let's see what
other people think.

--
Best regards,
Aleksander Alekseev



Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 24 Jan 2024, at 18:29, Aleksander Alekseev <aleksander@timescale.com> wrote:
>
> Hi,
>
>> Function to extract timestamp does not provide any guarantees at all. Standard states this, see Kyzer answers
upthread.
>> Moreover, standard urges against relying on that if uuidX was generated before uuidY, then uuidX<uuid. The standard
isdoing a lot to make this happen, but does not guaranty that. 
>> All what is guaranteed is the uniqueness at certain conditions.
>>
>>> Otherwise you can calculate crc64(X) or sha256(X)
>>> internally in order to generate an unique ID and claim that it's fine.
>>>
>>> Values that violate named invariants should be rejected with an error.
>>
>> Think about the value that you pass to uuid generation function as an entropy. It’s there to ensure uniqueness and
promoteordering (but not guarantee). 
>
> If the standard doesn't guarantee something it doesn't mean it forbids
> us to give stronger guarantees.
No, the standard makes these guarantees impossible.
If we insist that uuid_extract_time(uuidv7(time))==time, we won't be able to generate uuidv7 most of the time.
uuidv7(now())will always ERROR-out. 
Standard implies more coarse-grained timestamp that we have.

Also, please not that uuidv7(time+1us) and uuidv7(time) will have the same internal timestamp, so despite time+1us >
time,still second uuid will be greater. 

Both invariants you proposed cannot be reasonably guaranteed. Upholding any of them greatly reduces usability of UUID
v7.


Best regards, Andrey Borodin.


Re: UUID v7

From
Aleksander Alekseev
Date:
Hi,

> Also, please not that uuidv7(time+1us) and uuidv7(time) will have the same internal timestamp, so despite time+1us >
time,still second uuid will be greater.
 
>
> Both invariants you proposed cannot be reasonably guaranteed. Upholding any of them greatly reduces usability of UUID
v7.

Again, personally I don't insist on the 1us precision [1]. Only the
fact that timestamp from the far past generates UUID from the future
bothers me.

[1]: https://postgr.es/m/CAJ7c6TPCSprWwVNdOB%3D%3DpgKZPqO5q%3DHRgmU7zmYqz9Dz5ffVYw%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev



Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 24 Jan 2024, at 20:46, Aleksander Alekseev <aleksander@timescale.com> wrote:
>
> Only the
> fact that timestamp from the far past generates UUID from the future
> bothers me.

PFA implementation of guard checks, but I'm afraid that this can cause failures in ID generation unexpected to the
user...
See tests

+-- errors in edge cases of UUID v7
+SELECT 1 FROM uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval '0ms');
+SELECT uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval '1ms'); -- ERROR expected
+SELECT 1 FROM uuidv7(uuid_extract_time('FFFFFFFF-FFFF-7FFF-B000-000000000000'));
+SELECT uuidv7(uuid_extract_time('FFFFFFFF-FFFF-7FFF-B000-000000000000')+'1ms'); -- ERROR expected

Range is from 1970-01-01 00:00:00 to 10889-08-02 05:31:50.655. I'm not sure we should give this information in error
message...
Thanks!


Best regards, Andrey Borodin.

Attachment

Re: UUID v7

From
Marcos Pegoraro
Date:
Is enough from 1970 ?
How about if user wants to have an UUID of his birth date ?

regards
Marcos

Em qua., 24 de jan. de 2024 às 13:54, Andrey M. Borodin <x4mmm@yandex-team.ru> escreveu:


> On 24 Jan 2024, at 20:46, Aleksander Alekseev <aleksander@timescale.com> wrote:
>
> Only the
> fact that timestamp from the far past generates UUID from the future
> bothers me.

PFA implementation of guard checks, but I'm afraid that this can cause failures in ID generation unexpected to the user...
See tests

+-- errors in edge cases of UUID v7
+SELECT 1 FROM uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval '0ms');
+SELECT uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval '1ms'); -- ERROR expected
+SELECT 1 FROM uuidv7(uuid_extract_time('FFFFFFFF-FFFF-7FFF-B000-000000000000'));
+SELECT uuidv7(uuid_extract_time('FFFFFFFF-FFFF-7FFF-B000-000000000000')+'1ms'); -- ERROR expected

Range is from 1970-01-01 00:00:00 to 10889-08-02 05:31:50.655. I'm not sure we should give this information in error message...
Thanks!


Best regards, Andrey Borodin.

Re: UUID v7

From
Andrey Borodin
Date:

> On 24 Jan 2024, at 22:00, Marcos Pegoraro <marcos@f10.com.br> wrote:
>
> Is enough from 1970 ?
Per standard unix_ts_ms field is a number of milliseconds from UNIX start date 1970-01-01.

> How about if user wants to have an UUID of his birth date ?

I've claimed my
0078c135-bd00-70b1-865a-63c3741922a5

But again, UUIDs are not designed to store timestamp. They are unique and v7 promote data locality via time-ordering.


Best regards, Andrey Borodin.


Re: UUID v7

From
Marcos Pegoraro
Date:
I understand your point, but 
'2000-01-01' :: timestamp and '1900-01-01' :: timestamp are both valid timestamps.

So looks strange if user can do
select uuidv7(TIMESTAMP '2000-01-01')
but cannot do
select uuidv7(TIMESTAMP '1900-01-01')

Regards
Marcos


Em qua., 24 de jan. de 2024 às 14:51, Andrey Borodin <x4mmm@yandex-team.ru> escreveu:


> On 24 Jan 2024, at 22:00, Marcos Pegoraro <marcos@f10.com.br> wrote:
>
> Is enough from 1970 ?
Per standard unix_ts_ms field is a number of milliseconds from UNIX start date 1970-01-01.

> How about if user wants to have an UUID of his birth date ?

I've claimed my
0078c135-bd00-70b1-865a-63c3741922a5

But again, UUIDs are not designed to store timestamp. They are unique and v7 promote data locality via time-ordering.


Best regards, Andrey Borodin.

Re: UUID v7

From
Jelte Fennema-Nio
Date:
On Wed, 24 Jan 2024 at 21:47, Marcos Pegoraro <marcos@f10.com.br> wrote:
>
> I understand your point, but
> '2000-01-01' :: timestamp and '1900-01-01' :: timestamp are both valid timestamps.
>
> So looks strange if user can do
> select uuidv7(TIMESTAMP '2000-01-01')
> but cannot do
> select uuidv7(TIMESTAMP '1900-01-01')



I think that would be okay honestly. I don't think there's any
reasonable value for the uuid when a timestamp is given outside of the
date range that the uuid7 "algorithm" supports.

So +1 for erroring when you provide a timestamp outside of that range
(either too far in the past or too far in the future).



Re: UUID v7

From
Sergey Prokhorenko
Date:
"Other people" think that extracting the timestamp from UUIDv7 in violation of the new RFC, and generating UUIDv7 from the timestamp were both terrible and poorly thought out ideas. The authors of the new RFC had very good reasons to prohibit this. And the problems you face are the best confirmation of the correctness of the new RFC. It’s better to throw all this gag out of the official patch. Don't tempt developers to break the new RFC with these error-producing functions.


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Wednesday, 24 January 2024 at 04:30:02 pm GMT+3, Aleksander Alekseev <aleksander@timescale.com> wrote:


Hi,

> Function to extract timestamp does not provide any guarantees at all. Standard states this, see Kyzer answers upthread.
> Moreover, standard urges against relying on that if uuidX was generated before uuidY, then uuidX<uuid. The standard is doing a lot to make this happen, but does not guaranty that.
> All what is guaranteed is the uniqueness at certain conditions.
>
> > Otherwise you can calculate crc64(X) or sha256(X)
> > internally in order to generate an unique ID and claim that it's fine.
> >
> > Values that violate named invariants should be rejected with an error.
>
> Think about the value that you pass to uuid generation function as an entropy. It’s there to ensure uniqueness and promote ordering (but not guarantee).

If the standard doesn't guarantee something it doesn't mean it forbids
us to give stronger guarantees. I'm convinced that these guarantees
will be useful in real-world applications, at least the ones acting
exclusively within Postgres.

This being said, I understand your point of view too. Let's see what
other people think.

--
Best regards,

Aleksander Alekseev

Re: UUID v7

From
Sergey Prokhorenko
Date:
That's right! There is no point in waiting for the official approval of the new RFC, which obviously will not change anything. I have been a contributor to this RFC for several years, and I can testify that every aspect imaginable has been thoroughly researched and agreed upon. Nothing new will definitely appear in the new RFC.


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Monday, 22 January 2024 at 07:22:32 am GMT+3, Nikolay Samokhvalov <nik@postgres.ai> wrote:


On Fri, Jan 19, 2024 at 10:07 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:


> On 19 Jan 2024, at 13:25, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
> Also, I've added some documentation on all functions.

Here's v12. Changes:
1. Documentation improvements
2. Code comments
3. Better commit message and reviews list


Thank you, Andrey! I have just checked v12 – cleanly applied to HEAD, and functions work well. I especially like that fact that we keep uuid_extract_time(..) here – this is a great thing to have for time-based partitioning, and in many cases we will be able to decide not to have a creation column timestamp (e.g., "created_at") at all, saving 8 bytes.

The docs and comments look great too.

Overall, the patch looks mature enough. It would be great to have it in pg17. Yes, the RFC is not fully finalized yet, but it's very close. And many libraries are already including implementation of UUIDv7 – here are some examples:

- https://www.npmjs.com/package/uuidv7
- https://crates.io/crates/uuidv7
- https://github.com/google/uuid/pull/139

Nik

Re: UUID v7

From
Nikolay Samokhvalov
Date:
On Wed, Jan 24, 2024 at 1:52 PM Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:
That's right! There is no point in waiting for the official approval of the new RFC, which obviously will not change anything. I have been a contributor to this RFC for several years, and I can testify that every aspect imaginable has been thoroughly researched and agreed upon. Nothing new will definitely appear in the new RFC.

From a practical point of view, these two things are extremely important to have to support partitioning. It is better to implement limitations than throw them away.

Without them, this functionality will be of a very limited use in databases. We need to think about large tables – which means partitioning.

Nik

Re: UUID v7

From
Nikolay Samokhvalov
Date:

On Wed, Jan 24, 2024 at 8:40 PM Nikolay Samokhvalov <nik@postgres.ai> wrote:
On Wed, Jan 24, 2024 at 1:52 PM Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:
That's right! There is no point in waiting for the official approval of the new RFC, which obviously will not change anything. I have been a contributor to this RFC for several years, and I can testify that every aspect imaginable has been thoroughly researched and agreed upon. Nothing new will definitely appear in the new RFC.

From a practical point of view, these two things are extremely important to have to support partitioning. It is better to implement limitations than throw them away.

Without them, this functionality will be of a very limited use in databases. We need to think about large tables – which means partitioning.

apologies -- this was a response to another email from you:

> "Other people" think that extracting the timestamp from UUIDv7 in violation of the new RFC, and generating UUIDv7 from the timestamp were both terrible and poorly thought out ideas. The authors of the new RFC had very good reasons to prohibit this. And the problems you face are the best confirmation of the correctness of the new RFC. It’s better to throw all this gag out of the official patch. Don't tempt developers to break the new RFC with these error-producing functions.
 

Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 25 Jan 2024, at 09:40, Nikolay Samokhvalov <nik@postgres.ai> wrote:
>
> From a practical point of view, these two things are extremely important to have to support partitioning. It is
betterto implement limitations than throw them away. 

Postgres always was a bit hackerish, allowing slightly more then is safe. I.e. you can define immutable function that
isnot really immutable, turn off autovacuum or fsync. Why bother with safety guards here? 
My opinion is that we should have this function to extract timestamp. Even if it can return strange values for
impreciseRFC implementation. 


> On 25 Jan 2024, at 02:15, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> So +1 for erroring when you provide a timestamp outside of that range
> (either too far in the past or too far in the future).


OK, it seems like we have some consensus on ERRORing..

Do we have any other open items? Does v13 address all open items? Maybe let’s compose better error message?


Best regards, Andrey Borodin.


Re: UUID v7

From
Sergey Prokhorenko
Date:
I am against turning the DBMS into another C++, in which they do not so much design something new as fix bugs in production after a crash.

As for partitioning, I already wrote to Andrey Borodin that we need a special function to generate a partition id using the UUIDv7 timestamp or even simultaneously with the generation of the timestamp. For example, every month (or so, since precision is not needed here) a new partition is created. Here's a good example: https://elixirforum.com/t/partitioning-postgres-tables-by-timestamp-based-uuids/60916

But without a separate function for extracting the entire timestamp from the UUID! Let's solve this specific problem, and not give the developers a grenade with the safety removed. Many developers have already decided to store the timestamp in UUIDv7, so as not to create a separate created_at field. Then they will delete table records with the old timestamp, etc. Horrible mistakes are simply guaranteed.


Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au


On Thursday, 25 January 2024 at 09:51:58 am GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:




> On 25 Jan 2024, at 09:40, Nikolay Samokhvalov <nik@postgres.ai> wrote:
>
> From a practical point of view, these two things are extremely important to have to support partitioning. It is better to implement limitations than throw them away.

Postgres always was a bit hackerish, allowing slightly more then is safe. I.e. you can define immutable function that is not really immutable, turn off autovacuum or fsync. Why bother with safety guards here?
My opinion is that we should have this function to extract timestamp. Even if it can return strange values for imprecise RFC implementation.


> On 25 Jan 2024, at 02:15, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> So +1 for erroring when you provide a timestamp outside of that range
> (either too far in the past or too far in the future).


OK, it seems like we have some consensus on ERRORing..

Do we have any other open items? Does v13 address all open items? Maybe let’s compose better error message?


Best regards, Andrey Borodin.

Re: UUID v7

From
Przemysław Sztoch
Date:
Andrey M. Borodin wrote on 25.01.2024 07:51:

On 25 Jan 2024, at 09:40, Nikolay Samokhvalov <nik@postgres.ai> wrote:

>From a practical point of view, these two things are extremely important to have to support partitioning. It is better to implement limitations than throw them away.
Postgres always was a bit hackerish, allowing slightly more then is safe. I.e. you can define immutable function that is not really immutable, turn off autovacuum or fsync. Why bother with safety guards here?
My opinion is that we should have this function to extract timestamp. Even if it can return strange values for imprecise RFC implementation.


On 25 Jan 2024, at 02:15, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

So +1 for erroring when you provide a timestamp outside of that range
(either too far in the past or too far in the future).

OK, it seems like we have some consensus on ERRORing..

Do we have any other open items? Does v13 address all open items? Maybe let’s compose better error message?
+1 for erroring when ts is outside range.

v13 looks good for me. I think we have reached a optimal compromise.

--
Przemysław Sztoch | Mobile +48 509 99 00 66

Re: UUID v7

From
Aleksander Alekseev
Date:
Hi,

> Postgres always was a bit hackerish, allowing slightly more then is safe. I.e. you can define immutable function that
isnot really immutable, turn off autovacuum or fsync. Why bother with safety guards here? 
> My opinion is that we should have this function to extract timestamp. Even if it can return strange values for
impreciseRFC implementation. 

Completely agree.

Users that don't like or don't need it can pretend there are no
uuid_extract_time() and uuidv7(T) in Postgres. If we don't provide
them however, users that need them will end up writing their own
probably buggy and not compatible implementations. That would be much
worse.

> So +1 for erroring when you provide a timestamp outside of that range
> (either too far in the past or too far in the future).
>
> OK, it seems like we have some consensus on ERRORing..
>
> Do we have any other open items? Does v13 address all open items? Maybe let’s compose better error message?
>
> +1 for erroring when ts is outside range.
>
> v13 looks good for me. I think we have reached a optimal compromise.

Andrey, many thanks for the updated patch.

LGTM, cfbot is happy and I don't think we have any open items left. So
changing CF entry status back to RfC.

--
Best regards,
Aleksander Alekseev



Re: UUID v7

From
Aleksander Alekseev
Date:
Hi,

> Andrey, many thanks for the updated patch.
>
> LGTM, cfbot is happy and I don't think we have any open items left. So
> changing CF entry status back to RfC.

PFA v14. I changed:

```
elog(ERROR, "Time argument of UUID v7 cannot exceed 6 bytes");
```

... to:

```
ereport(ERROR,
    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
    errmsg("Time argument of UUID v7 is outside of the valid range")));
```

Which IMO tells a bit more to the average user and is translatable.

> At a quick glance, the patch needs improving English, IMO.

Agree. We could use some help from a native English speaker for this.

-- 
Best regards,
Aleksander Alekseev

Attachment

Re: UUID v7

From
Sergey Prokhorenko
Date:
Aleksander,

In this case the documentation must state that the functions uuid_extract_time() and uuidv7(T) are against the RFC requirements, and that developers may use these functions with caution at their own risk, and these functions are not recommended for production environment.

The function uuidv7(T) is not better than uuid_extract_time(). Careless developers may well pass any business date into this function: document date, registration date, payment date, reporting date, start date of the current month, data download date, and even a constant. This would be a profanation of UUIDv7 with very negative consequences.


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Thursday, 25 January 2024 at 03:06:50 pm GMT+3, Aleksander Alekseev <aleksander@timescale.com> wrote:


Hi,

> Postgres always was a bit hackerish, allowing slightly more then is safe. I.e. you can define immutable function that is not really immutable, turn off autovacuum or fsync. Why bother with safety guards here?
> My opinion is that we should have this function to extract timestamp. Even if it can return strange values for imprecise RFC implementation.

Completely agree.

Users that don't like or don't need it can pretend there are no
uuid_extract_time() and uuidv7(T) in Postgres. If we don't provide
them however, users that need them will end up writing their own
probably buggy and not compatible implementations. That would be much
worse.


> So +1 for erroring when you provide a timestamp outside of that range
> (either too far in the past or too far in the future).
>
> OK, it seems like we have some consensus on ERRORing..
>
> Do we have any other open items? Does v13 address all open items? Maybe let’s compose better error message?
>
> +1 for erroring when ts is outside range.
>
> v13 looks good for me. I think we have reached a optimal compromise.


Andrey, many thanks for the updated patch.

LGTM, cfbot is happy and I don't think we have any open items left. So
changing CF entry status back to RfC.

--
Best regards,
Aleksander Alekseev

Re: UUID v7

From
Sergey Prokhorenko
Date:
By the way, the Go language has also already implemented a function for UUIDv7: https://pkg.go.dev/github.com/gofrs/uuid#NewV7




Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au


On Thursday, 25 January 2024 at 12:49:46 am GMT+3, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:


That's right! There is no point in waiting for the official approval of the new RFC, which obviously will not change anything. I have been a contributor to this RFC for several years, and I can testify that every aspect imaginable has been thoroughly researched and agreed upon. Nothing new will definitely appear in the new RFC.


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Monday, 22 January 2024 at 07:22:32 am GMT+3, Nikolay Samokhvalov <nik@postgres.ai> wrote:


On Fri, Jan 19, 2024 at 10:07 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:


> On 19 Jan 2024, at 13:25, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
> Also, I've added some documentation on all functions.

Here's v12. Changes:
1. Documentation improvements
2. Code comments
3. Better commit message and reviews list


Thank you, Andrey! I have just checked v12 – cleanly applied to HEAD, and functions work well. I especially like that fact that we keep uuid_extract_time(..) here – this is a great thing to have for time-based partitioning, and in many cases we will be able to decide not to have a creation column timestamp (e.g., "created_at") at all, saving 8 bytes.

The docs and comments look great too.

Overall, the patch looks mature enough. It would be great to have it in pg17. Yes, the RFC is not fully finalized yet, but it's very close. And many libraries are already including implementation of UUIDv7 – here are some examples:

- https://www.npmjs.com/package/uuidv7
- https://crates.io/crates/uuidv7
- https://github.com/google/uuid/pull/139

Nik

Re: UUID v7

From
Jelte Fennema-Nio
Date:
tl;dr I believe we should remove the uuidv7(timestamp) function from
this patchset.

On Thu, 25 Jan 2024 at 18:04, Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
> In this case the documentation must state that the functions uuid_extract_time() and uuidv7(T) are against the RFC
requirements,and that developers may use these functions with caution at their own risk, and these functions are not
recommendedfor production environment. 
>
> The function uuidv7(T) is not better than uuid_extract_time(). Careless developers may well pass any business date
intothis function: document date, registration date, payment date, reporting date, start date of the current month,
datadownload date, and even a constant. This would be a profanation of UUIDv7 with very negative consequences. 

After re-reading the RFC more diligently, I'm inclined to agree with
Sergey that uuidv7(timestamp) is quite problematic. And I would even
say that we should not provide uuidv7(timestamp) at all, and instead
should only provide uuidv7(). Providing an explicit timestamp for
UUIDv7 is explicitly against the spec (in my reading):

> Implementations acquire the current timestamp from a reliable
> source to provide values that are time-ordered and continually
> increasing.  Care must be taken to ensure that timestamp changes
> from the environment or operating system are handled in a way that
> is consistent with implementation requirements.  For example, if
> it is possible for the system clock to move backward due to either
> manual adjustment or corrections from a time synchronization
> protocol, implementations need to determine how to handle such
> cases.  (See Altering, Fuzzing, or Smearing below.)
>
> ...
>
> UUID version 1 and 6 both utilize a Gregorian epoch timestamp
> while UUIDv7 utilizes a Unix Epoch timestamp.  If other timestamp
> sources or a custom timestamp epoch are required, UUIDv8 MUST be
> used.
>
> ...
>
> Monotonicity (each subsequent value being greater than the last) is
> the backbone of time-based sortable UUIDs.

By allowing users to provide a timestamp we're not using a continually
increasing timestamp for our UUIDv7 generation, and thus it would not
be a valid UUIDv7 implementation.

I do agree with others however, that being able to pass in an
arbitrary timestamp for UUID generation would be very useful. For
example to be able to partition by the timestamp in the UUID and then
being able to later load data for an older timestamp and have it be
added to to the older partition. But it's possible to do that while
still following the spec, by using a UUIDv8 instead of UUIDv7. So for
this usecase we could make a helper function that generates a UUIDv8
using the same format as a UUIDv7, but allows storing arbitrary
timestamps. You might say, why not sligthly change UUIDv7 then? Well
mainly because of this critical sentence in the RFC:

> UUIDv8's uniqueness will be implementation-specific and MUST NOT be assumed.

That would allow us to say that using this UUIDv8 helper requires
careful usage and checks if uniqueness is required.

So I believe we should remove the uuidv7(timestamp) function from this patchset.

I don't see a problem with including uuid_extract_time though. Afaict
the only thing the RFC says about extracting timestamps is that the
RFC does not give a requirement or guarantee about how close the
stored timestamp is to the actual time:

> Implementations MAY alter the actual timestamp.  Some examples
> include security considerations around providing a real clock
> value within a UUID, to correct inaccurate clocks, to handle leap
> seconds, or instead of dividing a number of microseconds by 1000
> to obtain a millisecond value; dividing by 1024 (or some other
> value) for performance reasons.  This specification makes no
> requirement or guarantee about how close the clock value needs to
> be to the actual time.

I see no reason why we cannot make stronger guarantees about the
timestamps that we use to generate UUIDs with our uuidv7() function.
And then we can update the documentation for
uuid_extract_time to something like this:

> This function extracts a timestamptz from UUID versions 1, 6 and 7. For other
> versions and variants this function returns NULL. The extracted timestamp
> does not necessarily equate to the time of UUID generation. How close it is
> to the actual time depends on the implementation that generated to UUID.
> The uuidv7() function provided PostgreSQL will normally store the actual time of
> generation to in the UUID, but if large batches of UUIDs are generated at the
> same time it's possible that some UUIDs will store a time that is slightly later
> than their actual generation time.



Re: UUID v7

From
Jelte Fennema-Nio
Date:
On Thu, 25 Jan 2024 at 13:31, Aleksander Alekseev
<aleksander@timescale.com> wrote:
> PFA v14.

+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   Both functions return a version 4 (random) UUID. This is the most commonly
+   used type of UUID and is appropriate when random distribution of keys does
+   not affect performance of an application.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 (time-ordered + random) UUID. This UUID
+   version should be used when application prefers locality of identifiers.
+<synopsis>

I think it would be good to explain the tradeoffs between uuidv4 and
uuidv7 a bit better. How about changing the docs to something like
this:

<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
</synopsis>
Both functions return a version 4 (random) UUID. UUIDv4 is one of the
most commonly used types of UUID. It is appropriate when random
distribution of keys does not affect performance of an application or
when exposing the generation time of a UUID has unacceptable security
or business intelligence implications.
<synopsis>
<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
</synopsis>
This function returns a version 7 (time-ordered + random) UUID. It
provides much better data locality than UUIDv4, which can greatly
improve performance when UUID is used in a BTREE index (the default
index type in PostgreSQL). To achieve this data locality, UUIDv7
embeds its own generation time into the UUID. If exposing such a
timestamp has unacceptable security or business intelligence
implications, then uuidv4() should be used instead.
<synopsis>



Re: UUID v7

From
Junwang Zhao
Date:
On Mon, Jan 29, 2024 at 7:38 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> tl;dr I believe we should remove the uuidv7(timestamp) function from
> this patchset.
>
> On Thu, 25 Jan 2024 at 18:04, Sergey Prokhorenko
> <sergeyprokhorenko@yahoo.com.au> wrote:
> > In this case the documentation must state that the functions uuid_extract_time() and uuidv7(T) are against the RFC
requirements,and that developers may use these functions with caution at their own risk, and these functions are not
recommendedfor production environment. 
> >
> > The function uuidv7(T) is not better than uuid_extract_time(). Careless developers may well pass any business date
intothis function: document date, registration date, payment date, reporting date, start date of the current month,
datadownload date, and even a constant. This would be a profanation of UUIDv7 with very negative consequences. 
>
> After re-reading the RFC more diligently, I'm inclined to agree with
> Sergey that uuidv7(timestamp) is quite problematic. And I would even
> say that we should not provide uuidv7(timestamp) at all, and instead
> should only provide uuidv7(). Providing an explicit timestamp for
> UUIDv7 is explicitly against the spec (in my reading):
>
> > Implementations acquire the current timestamp from a reliable
> > source to provide values that are time-ordered and continually
> > increasing.  Care must be taken to ensure that timestamp changes
> > from the environment or operating system are handled in a way that
> > is consistent with implementation requirements.  For example, if
> > it is possible for the system clock to move backward due to either
> > manual adjustment or corrections from a time synchronization
> > protocol, implementations need to determine how to handle such
> > cases.  (See Altering, Fuzzing, or Smearing below.)
> >
> > ...
> >
> > UUID version 1 and 6 both utilize a Gregorian epoch timestamp
> > while UUIDv7 utilizes a Unix Epoch timestamp.  If other timestamp
> > sources or a custom timestamp epoch are required, UUIDv8 MUST be
> > used.
> >
> > ...
> >
> > Monotonicity (each subsequent value being greater than the last) is
> > the backbone of time-based sortable UUIDs.
>
> By allowing users to provide a timestamp we're not using a continually
> increasing timestamp for our UUIDv7 generation, and thus it would not
> be a valid UUIDv7 implementation.
>
> I do agree with others however, that being able to pass in an
> arbitrary timestamp for UUID generation would be very useful. For
> example to be able to partition by the timestamp in the UUID and then
> being able to later load data for an older timestamp and have it be
> added to to the older partition. But it's possible to do that while
> still following the spec, by using a UUIDv8 instead of UUIDv7. So for
> this usecase we could make a helper function that generates a UUIDv8
> using the same format as a UUIDv7, but allows storing arbitrary
> timestamps. You might say, why not sligthly change UUIDv7 then? Well
> mainly because of this critical sentence in the RFC:
>
> > UUIDv8's uniqueness will be implementation-specific and MUST NOT be assumed.
>
> That would allow us to say that using this UUIDv8 helper requires
> careful usage and checks if uniqueness is required.
>
> So I believe we should remove the uuidv7(timestamp) function from this patchset.

Agreed, the RFC section 6.1[1] has the following statements:

```
UUID version 1 and 6 both utilize a Gregorian epoch timestamp while
UUIDv7 utilizes a Unix Epoch timestamp. If other timestamp sources or
a custom timestamp epoch are required, UUIDv8 MUST be used.
```

In contrib/uuid-ossp, uuidv1 does not allow the user to supply a
custom timestamp,
so I think it should be the same for uuidv6 and uuidv7.

And I have the same feeling that we should not consider v6 and v8 in
this patch.


[1]: https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#section-6.1-2.4.1

>
> I don't see a problem with including uuid_extract_time though. Afaict
> the only thing the RFC says about extracting timestamps is that the
> RFC does not give a requirement or guarantee about how close the
> stored timestamp is to the actual time:
>
> > Implementations MAY alter the actual timestamp.  Some examples
> > include security considerations around providing a real clock
> > value within a UUID, to correct inaccurate clocks, to handle leap
> > seconds, or instead of dividing a number of microseconds by 1000
> > to obtain a millisecond value; dividing by 1024 (or some other
> > value) for performance reasons.  This specification makes no
> > requirement or guarantee about how close the clock value needs to
> > be to the actual time.
>
> I see no reason why we cannot make stronger guarantees about the
> timestamps that we use to generate UUIDs with our uuidv7() function.
> And then we can update the documentation for
> uuid_extract_time to something like this:
>
> > This function extracts a timestamptz from UUID versions 1, 6 and 7. For other
> > versions and variants this function returns NULL. The extracted timestamp
> > does not necessarily equate to the time of UUID generation. How close it is
> > to the actual time depends on the implementation that generated to UUID.
> > The uuidv7() function provided PostgreSQL will normally store the actual time of
> > generation to in the UUID, but if large batches of UUIDs are generated at the
> > same time it's possible that some UUIDs will store a time that is slightly later
> > than their actual generation time.
>
>


--
Regards
Junwang Zhao



Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 25 Jan 2024, at 22:04, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:
>
> Aleksander,
>
> In this case the documentation must state that the functions uuid_extract_time() and uuidv7(T) are against the RFC
requirements,and that developers may use these functions with caution at their own risk, and these functions are not
recommendedfor production environment. 

Refining documentation is good. However, saying that these functions are not recommended for production must be based
onsome real threats. 

>
> The function uuidv7(T) is not better than uuid_extract_time(). Careless developers may well pass any business date
intothis function: document date, registration date, payment date, reporting date, start date of the current month,
datadownload date, and even a constant. This would be a profanation of UUIDv7 with very negative consequences. 

Even if the developer pass constant time to uuidv7(T) they will get what they asked for - unique identifier. Moreover -
itstill will be keeping locality. There will be no negative consequences at all. 
On the contrary, experienced developer can leverage parameter when data locality should be reduced. If you have
serveralstreams of data, you might want to introduce some shift in reduce contention. 
For example, you can generate uuidv7(now() + '1 day' * random(0,10)). This will split 1 contention point to 10 and
increaseingestion performance 10x-fold. 

> On 29 Jan 2024, at 18:58, Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> If other timestamp sources or
> a custom timestamp epoch are required, UUIDv8 MUST be used.

Well, yeah. RFC says this... in 4 capital letters :) I believe it's kind of a big deficiency that k-way sortable
identifiersare not implementable on top of UUIDv7. Well, let's go without this function. UUIDv7 is still an improvement
overprevious versions. 


Jelte, your documentation corrections looks good to me, I'll include them in next version.

Thanks!


Best regards, Andrey Borodin.


Re: UUID v7

From
Jelte Fennema-Nio
Date:
On Mon, 29 Jan 2024 at 19:32, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> Even if the developer pass constant time to uuidv7(T) they will get what they asked for - unique identifier. Moreover
-it still will be keeping locality. There will be no negative consequences at all.
 

It will be significantly "less unique" than if they wouldn't pass a
constant time. Basically it would become a UUIDv4, but with 74 bits of
random data instead of 122. That might not be enough anymore to
"guarantee" uniqueness. I guess that's why it is required to use
UUIDv8 in these cases, because correct usage is now a requirement for
assuming uniqueness. And for UUIDv8 the spec says this:

> UUIDv8's uniqueness will be implementation-specific and MUST NOT be assumed.

> > On 29 Jan 2024, at 18:58, Junwang Zhao <zhjwpku@gmail.com> wrote:
> >
> > If other timestamp sources or
> > a custom timestamp epoch are required, UUIDv8 MUST be used.
>
> Well, yeah. RFC says this... in 4 capital letters :)

As an FYI, there is an RFC that defines these keywords that's why they
are capital letters: https://www.ietf.org/rfc/rfc2119.txt

> I believe it's kind of a big deficiency that k-way sortable identifiers are not implementable on top of UUIDv7. Well,
let'sgo without this function. UUIDv7 is still an improvement over previous versions.
 

Yeah, I liked the feature to generate UUIDv7 based on timestamp too.
But following the spec seems more important than a nice feature to me.



Re: UUID v7

From
Sergey Prokhorenko
Date:
Andrey,

I understand and agree with your goals. But instead of dangerous universal functions, it is better to develop safe highly specialized functions that implement only these goals.

There should not be a function uuidv7(T) from an arbitrary timestamp, but there should be a special function that implements your algorithm: uuidv8(now() + '1 century' * random(0,10)).

I replaced 1 day with 1 century because the spread of 1 day is too small. Over time, records will be inserted between existing records, which is undesirable.

Similarly, if we need to calculate the partition id, then we do not need to use the uuid_extract_time() function to provide the extracted timestamp, the accuracy of which cannot be guaranteed. Instead, we need to give exactly the partition id, calculated using the uuidv7 timestamp. For example, partitions may have approximately a month interval between each other.

As for the documentation, it must be indicated that the UUIDv7 structure is not timestamp + random, but timestamp + randomly seeded counter + random, like in all advanced implementations.


Sergey Prokhorenko

sergeyprokhorenko@yahoo.com.au

______________________________________________________________


On Monday, 29 January 2024 at 09:32:54 pm GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:




> On 25 Jan 2024, at 22:04, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:
>
> Aleksander,
>
> In this case the documentation must state that the functions uuid_extract_time() and uuidv7(T) are against the RFC requirements, and that developers may use these functions with caution at their own risk, and these functions are not recommended for production environment.

Refining documentation is good. However, saying that these functions are not recommended for production must be based on some real threats.

>
> The function uuidv7(T) is not better than uuid_extract_time(). Careless developers may well pass any business date into this function: document date, registration date, payment date, reporting date, start date of the current month, data download date, and even a constant. This would be a profanation of UUIDv7 with very negative consequences.

Even if the developer pass constant time to uuidv7(T) they will get what they asked for - unique identifier. Moreover - it still will be keeping locality. There will be no negative consequences at all.
On the contrary, experienced developer can leverage parameter when data locality should be reduced. If you have serveral streams of data, you might want to introduce some shift in reduce contention.
For example, you can generate uuidv7(now() + '1 day' * random(0,10)). This will split 1 contention point to 10 and increase ingestion performance 10x-fold.

> On 29 Jan 2024, at 18:58, Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> If other timestamp sources or
> a custom timestamp epoch are required, UUIDv8 MUST be used.

Well, yeah. RFC says this... in 4 capital letters :) I believe it's kind of a big deficiency that k-way sortable identifiers are not implementable on top of UUIDv7. Well, let's go without this function. UUIDv7 is still an improvement over previous versions.


Jelte, your documentation corrections looks good to me, I'll include them in next version.

Thanks!


Best regards, Andrey Borodin.

Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 30 Jan 2024, at 01:38, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> Yeah, I liked the feature to generate UUIDv7 based on timestamp too.
> But following the spec seems more important than a nice feature to me.

PFA v15. Changes: removed timestamp argument, incorporated Jelte’s documentation addons.

Thanks!


Best regards, Andrey Borodin.

Attachment

Re: UUID v7

From
Sergey Prokhorenko
Date:
Andrey,

I think this phrase is outdated: "This function can optionally accept a timestamp used instead of current time.
This allows implementation of k-way sotable identifiers."

This phrase is wrong: "Both functions return a version 4 (random) UUID."

For this phrase the reason is unclear and the phrase is most likely incorrect:
if large batches of UUIDs are generated at the
+   same time it's possible that some UUIDs will store a time that is slightly later
+   than their actual generation time


Sergey Prokhorenko

sergeyprokhorenko@yahoo.com.au


On Tuesday, 30 January 2024 at 09:55:04 am GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:




> On 30 Jan 2024, at 01:38, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> Yeah, I liked the feature to generate UUIDv7 based on timestamp too.
> But following the spec seems more important than a nice feature to me.


PFA v15. Changes: removed timestamp argument, incorporated Jelte’s documentation addons.

Thanks!


Best regards, Andrey Borodin.

Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 30 Jan 2024, at 12:28, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:
>
>
> I think this phrase is outdated: "This function can optionally accept a timestamp used instead of current time.
> This allows implementation of k-way sotable identifiers.”
Fixed.

> This phrase is wrong: "Both functions return a version 4 (random) UUID.”
This applies to functions gen_random_uuid() and uuidv4().
>
> For this phrase the reason is unclear and the phrase is most likely incorrect:
> if large batches of UUIDs are generated at the
> +   same time it's possible that some UUIDs will store a time that is slightly later
> +   than their actual generation time

I’ve rewritten this phrase, hope it’s more clear now.


Best regards, Andrey Borodin.

Attachment

Re: UUID v7

From
Junwang Zhao
Date:
Hi Andrey,

On Tue, Jan 30, 2024 at 5:56 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 30 Jan 2024, at 12:28, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:
> >
> >
> > I think this phrase is outdated: "This function can optionally accept a timestamp used instead of current time.
> > This allows implementation of k-way sotable identifiers.”
> Fixed.
>
> > This phrase is wrong: "Both functions return a version 4 (random) UUID.”
> This applies to functions gen_random_uuid() and uuidv4().
> >
> > For this phrase the reason is unclear and the phrase is most likely incorrect:
> > if large batches of UUIDs are generated at the
> > +   same time it's possible that some UUIDs will store a time that is slightly later
> > +   than their actual generation time
>
> I’ve rewritten this phrase, hope it’s more clear now.
>
>
> Best regards, Andrey Borodin.

+Datum
+uuid_extract_var(PG_FUNCTION_ARGS)
+{
+ pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+ uint16_t result;
+ result = uuid->data[8] >> 6;
+
+ PG_RETURN_UINT16(result);
+}
\ No newline at end of file

It's always good to add a newline at the end of a  source file, though
this might be nitpicky.

--
Regards
Junwang Zhao



Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 30 Jan 2024, at 15:33, Junwang Zhao <zhjwpku@gmail.com> wrote:
> 
> It's always good to add a newline at the end of a  source file, though
> this might be nitpicky.

Thanks, also fixed warning found by CFBot.


Best regards, Andrey Borodin.

Attachment

Re: UUID v7

From
Sergey Prokhorenko
Date:
typo:
being carried to time step

should be:
being carried to timestemp

Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au


On Tuesday, 30 January 2024 at 04:35:45 pm GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:




> On 30 Jan 2024, at 15:33, Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> It's always good to add a newline at the end of a  source file, though
> this might be nitpicky.

Thanks, also fixed warning found by CFBot.



Best regards, Andrey Borodin.

Re: UUID v7

From
Peter Eisentraut
Date:
On 30.01.24 14:35, Andrey M. Borodin wrote:
>> On 30 Jan 2024, at 15:33, Junwang Zhao <zhjwpku@gmail.com> wrote:
>>
>> It's always good to add a newline at the end of a  source file, though
>> this might be nitpicky.
> 
> Thanks, also fixed warning found by CFBot.

I have various comments on this patch:


- doc/src/sgml/func.sgml

The documentation of the new functions should be broken up a bit.
It's all one paragraph now.  At least make it several paragraphs, or
possibly tables or something else.

Avoid listing the functions twice: Once before the description and
then again in the description.  That's just going to get out of date.
The first listing is not necessary, I think.

The return values in the documentation should use the public-facing
type names, like "timestamp with time zone" and "smallint".

The descriptions of the UUID generation functions use handwavy
language in their descriptions, like "It provides much better data
locality" or "unacceptable security or business intelligence
implications", which isn't useful.  Either we cut that all out and
just say, it creates a UUIDv7, done, look elsewhere for more
information, or we provide some more concretely useful details.

We shouldn't label a link as "IETF standard" when it's actually a
draft.


- src/include/catalog/pg_proc.dat

The description of uuidv4 should be "generate UUID version 4", so that
it parallels uuidv7.

The description of uuid_extract_time says 'extract timestamp from UUID
version 7', the implementation is not limited to version 7.

I think uuid_extract_time should be named uuid_extract_timestamp,
because it extracts a timestamp, not a time.

The functions uuid_extract_ver and uuid_extract_var could be named
uuid_extract_version and uuid_extract_variant.  Otherwise, it's hard
to tell them apart, with only one letter different.


- src/test/regress/sql/uuid.sql

Why are the tests using the input format '{...}', which is not the
standard one?


- src/backend/utils/adt/uuid.c

All this new code should have more comments.  There is a lot of bit
twiddling going on, and I suppose one is expected to follow along in
the RFC?  At least each function should have a header comment, so one
doesn't have to check in pg_proc.dat what it's supposed to do.

I'm suspicious that these functions all appear to return null for
erroneous input, rather than raising errors.  I think at least some
explanation for this should be recorded somewhere.

I think the behavior of uuid_extract_var(iant) is wrong.  The code
takes just two bits to return, but the draft document is quite clear
that the variant is 4 bits (see Table 1).

The uuidv7 function could really use a header comment that explains
the choices that were made.  The RFC draft provides various options
that implementations could use; we should describe which ones we
chose.

I would have expected that, since gettimeofday() provides microsecond
precision, we'd put the extra precision into "rand_a" as per Section 6.2 
method 3.

You use some kind of counter, but could you explain which method that
counter implements?

I don't see any acknowledgment of issues relating to concurrency or
restarts.  Like, how do we prevent duplicates being generated by
concurrent sessions or between restarts?  Maybe the counter or random
stuff does that, but it's not explained.




Re: UUID v7

From
"Andrey M. Borodin"
Date:
Hi Peter,

thank you for so thoughtful review.

> On 6 Mar 2024, at 12:13, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> I have various comments on this patch:
>
>
> - doc/src/sgml/func.sgml
>
> The documentation of the new functions should be broken up a bit.
> It's all one paragraph now.  At least make it several paragraphs, or
> possibly tables or something else.
I've split functions to generate UUIDs from functions to extract stuff.

>
> Avoid listing the functions twice: Once before the description and
> then again in the description.  That's just going to get out of date.
> The first listing is not necessary, I think.

Fixed.

> The return values in the documentation should use the public-facing
> type names, like "timestamp with time zone" and "smallint".

Fixed.

> The descriptions of the UUID generation functions use handwavy
> language in their descriptions, like "It provides much better data
> locality" or "unacceptable security or business intelligence
> implications", which isn't useful.  Either we cut that all out and
> just say, it creates a UUIDv7, done, look elsewhere for more
> information, or we provide some more concretely useful details.

I've removed all that stuff entirely.

> We shouldn't label a link as "IETF standard" when it's actually a
> draft.

Fixed.

Well, all my modifications of documentation are kind of blind... I tried to "make docs", but it gives me gazilion of
errors...Is there an easy way to see resulting HTML? 


> - src/include/catalog/pg_proc.dat
>
> The description of uuidv4 should be "generate UUID version 4", so that
> it parallels uuidv7.

Fixed.

> The description of uuid_extract_time says 'extract timestamp from UUID
> version 7', the implementation is not limited to version 7.

Fixed.

> I think uuid_extract_time should be named uuid_extract_timestamp,
> because it extracts a timestamp, not a time.

Renamed.

> The functions uuid_extract_ver and uuid_extract_var could be named
> uuid_extract_version and uuid_extract_variant.  Otherwise, it's hard
> to tell them apart, with only one letter different.

Renamed.

> - src/test/regress/sql/uuid.sql
>
> Why are the tests using the input format '{...}', which is not the
> standard one?

Fixed.

> - src/backend/utils/adt/uuid.c
>
> All this new code should have more comments.  There is a lot of bit
> twiddling going on, and I suppose one is expected to follow along in
> the RFC?  At least each function should have a header comment, so one
> doesn't have to check in pg_proc.dat what it's supposed to do.

I've added some header comment. One big comment is attached to v7, I tried to take parts mostly from RFC. Yet there are
alot of my additions that now need review... 

> I'm suspicious that these functions all appear to return null for
> erroneous input, rather than raising errors.  I think at least some
> explanation for this should be recorded somewhere.

The input is not erroneous per se.
But the fact that
# select 1/0;
ERROR:  division by zero
makes me consider throwing an error. There was some argumentation upthread for not throwing error though, but now I
cannotfind it... maybe I accepted this behaviour as more user-friendly. 

> I think the behavior of uuid_extract_var(iant) is wrong.  The code
> takes just two bits to return, but the draft document is quite clear
> that the variant is 4 bits (see Table 1).

Well, it was correct only for implemented variant. I've made version that implements full table 1 from section 4.1.

> The uuidv7 function could really use a header comment that explains
> the choices that were made.  The RFC draft provides various options
> that implementations could use; we should describe which ones we
> chose.

Done.

>
> I would have expected that, since gettimeofday() provides microsecond
> precision, we'd put the extra precision into "rand_a" as per Section 6.2 method 3.

I had chosen method 2 over method 3 as most portable. Can we be sure how many bits (after reading milliseconds) are
thereacross different OSes? Even if we put extra 10 bits of timestamp, we cannot extract safely them. 
These bits could promote inter-backend stortability. E.i. when many backends generate data fast - this data is still
somewhatordered even within 1ms. But I think benefits of this sortability are outweighed by portability(unknown real
resolution),simplicity(we don't store microseconds, thus do not try to extract them). 
All this arguments are weak, but if one method would be strictly better than another - there would be only one method.

>
> You use some kind of counter, but could you explain which method that
> counter implements?
I described counter in uuidv7() header.

>
> I don't see any acknowledgment of issues relating to concurrency or
> restarts.  Like, how do we prevent duplicates being generated by
> concurrent sessions or between restarts?  Maybe the counter or random
> stuff does that, but it's not explained.

I think restart takes more than 1ms, so this is covered with time tick.
I've added paragraph about frequency of generation in uuidv7() header.


Best regards, Andrey Borodin.

Attachment

Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 10 Mar 2024, at 17:59, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
> I tried to "make docs", but it gives me gazilion of errors... Is there an easy way to see resulting HTML?

Oops, CFbot expectedly found a problem...
Sorry for the noise, this version, I hope, will pass all the tests.
Thanks!


Best regards, Andrey Borodin.

Attachment

Re: UUID v7

From
Aleksander Alekseev
Date:
Hi,

> Oops, CFbot expectedly found a problem...
> Sorry for the noise, this version, I hope, will pass all the tests.
> Thanks!
>
> Best regards, Andrey Borodin.

I had some issues applying v19 against the current `master` branch.
PFA the rebased and minorly tweaked v20.

The patch LGTM. I think it could be merged unless there are any open
issues left. I don't think so, but maybe I missed something.

-- 
Best regards,
Aleksander Alekseev

Attachment

Re: UUID v7

From
Jelte Fennema-Nio
Date:
Attached a few comment fixes/improvements and a pgindent run (patch 0002-0004)

Now with the added comments, one thing pops out to me: The comments
mention that we use "Monotonic Random", but when I read the spec that
explicitly recommends against using an increment of 1 when using
monotonic random. I feel like if we use an increment of 1, we're
better off going for the "Fixed-Length Dedicated Counter Bits" method
(i.e. change the code to start the counter at 0). See patch 0005 for
an example of that change.

I'm also wondering if we really want to use the extra rand_b bits for
this. The spec says we MAY, but it does remove the amount of
randomness in our UUIDs.

Attachment

Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 11 Mar 2024, at 20:56, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> Attached a few comment fixes/improvements and a pgindent run (patch 0002-0004)

Thanks!

> Now with the added comments, one thing pops out to me: The comments
> mention that we use "Monotonic Random", but when I read the spec that
> explicitly recommends against using an increment of 1 when using
> monotonic random. I feel like if we use an increment of 1, we're
> better off going for the "Fixed-Length Dedicated Counter Bits" method
> (i.e. change the code to start the counter at 0). See patch 0005 for
> an example of that change.
>
> I'm also wondering if we really want to use the extra rand_b bits for
> this. The spec says we MAY, but it does remove the amount of
> randomness in our UUIDs.

Method 1 is a just a Method 2 with specifically picked constants.
But I'll have to use some hand-wavy wordings...

UUID consists of these 128 bits:
a. Mandatory 2 var and 4 ver bits.
b. Flexible but strongly recommended 48 bits unix_ts_ms. These bits contribute to global sortability of values
generatedat frequency less than 1KHz. 
c. Counter bits:
c1. Initialised with 0 on any time tick.
c2. Initialised with randomness.
c3*. bit width of a counter step (*not counted in 128 bit capacity, can be non-integral)
d. Randomness bits.

Method 1 is when c2=0. My implementation of method 2 uses c1=1, c2=17

Consider all UUIDs generated at any given milliseconds. Probability of a collision of two UUIDs generated at frequency
lessthan 1KHz is p = 2^-(c2+d) 
Capacity of a counter has expected value of c = 2^(c1)*2^(c2-1)/2^c3
To guess next UUID you can correctly pick one of u = 2^(d+c3)

First, observe that c3 contributes unguessability at exactly same scale as decreases counter capacity. There is no
differencebetween using bits in d directly, or in c3. There is no point in non-zero c3. Every bit that could be given
toc3 can equally be given to d. 

Second, observe that c2 bits contribute to both collision protection and counter capacity! And when the time ticks, c2
alsocontribute to unguessability! So, technically, we should consider using all available bits as c2 bits. 

How many c1 bits do we need? I've chosen one - to prevent occasional counter capacity reduction.

If c1 = 1, we can distribute 73 bits between c2 and d. I've chosen c2 = 17 and d = 56 as an arbitrary compromise
betweencapacity of one backend per ms and prevention of global collision. 
This compromise is mostly dictated by maximum frequency of UUID generation by one backend, I've chosen 200MHz as a sane
value.


This compromise is much easier when you do not have 74 spare bits, this crazy amount of information forgives almost any
mistake.Imagine you have to distribute 10 bits between c2 and d. And you try to prevent collision between 10
independentdevices which need capacity to generate IDs with frequency of 10KHz each and keep sortability. You would
havesomething like c1=1, c2=3,d=6. 

Sorry for this long and vague explanation, if it still seems too uncertain we can have a chat or something like that. I
don'tthink this number picking stuff deserve to be commented, because it still is quite close to random. RFC gives us
toomuch freedom of choice. 

Thanks!


Best regards, Andrey Borodin.


Re: UUID v7

From
Michael Paquier
Date:
On Mon, Mar 11, 2024 at 11:27:43PM +0500, Andrey M. Borodin wrote:
> Sorry for this long and vague explanation, if it still seems too
> uncertain we can have a chat or something like that. I don't think
> this number picking stuff deserve to be commented, because it still
> is quite close to random. RFC gives us too much freedom of choice.

Speaking about the RFC, I can see that there is a draft but nothing
formal yet.  The last one I can see is v14 from last November:
https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14

It does not strike me as a good idea to rush an implementation without
a specification officially approved because there is always a risk of
shipping something that's non-compliant into core.  But perhaps I am
missing something on the RFC side?
--
Michael

Attachment

Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 12 Mar 2024, at 10:53, Michael Paquier <michael@paquier.xyz> wrote:
>
> It does not strike me as a good idea to rush an implementation without
> a specification officially approved because there is always a risk of
> shipping something that's non-compliant into core.  But perhaps I am
> missing something on the RFC side?

Upthread one of document’s authors commented:

> On 14 Feb 2023, at 19:13, Kyzer Davis (kydavis) <kydavis@cisco.com> wrote:
>
> The point is 99% of the work since adoption by the IETF has been ironing out
> RFC4122's problems and nothing major related to UUIDv6/7/8 which are all in a
> very good state.

And also


> On 22 Jan 2024, at 09:22, Nikolay Samokhvalov <nik@postgres.ai> wrote:
>
> And many libraries are already including implementation of UUIDv7 – here are some examples:
>
> - https://www.npmjs.com/package/uuidv7
> - https://crates.io/crates/uuidv7
> - https://github.com/google/uuid/pull/139

So at least reviewing patch and agreeing on chosen methods and constants makes sense.


Best regards, Andrey Borodin.


Re: UUID v7

From
Michael Paquier
Date:
On Tue, Mar 12, 2024 at 11:10:37AM +0500, Andrey M. Borodin wrote:
> On 12 Mar 2024, at 10:53, Michael Paquier <michael@paquier.xyz> wrote:
>> On 22 Jan 2024, at 09:22, Nikolay Samokhvalov <nik@postgres.ai> wrote:
>>
>> And many libraries are already including implementation of UUIDv7 – here are some examples:
>>
>> - https://www.npmjs.com/package/uuidv7
>> - https://crates.io/crates/uuidv7
>> - https://github.com/google/uuid/pull/139
>
> So at least reviewing patch and agreeing on chosen methods and constants makes sense.

Sure, there is no problem in discussing a patch to implement a
behavior.  But I disagree about taking a risk in merging something
that could become non-compliant with the approved RFC, if the draft is
approved at the end, of course.  This just strikes me as a bad idea.
--
Michael

Attachment

Re: UUID v7

From
Jelte Fennema-Nio
Date:
On Mon, 11 Mar 2024 at 19:27, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> Sorry for this long and vague explanation, if it still seems too uncertain we can have a chat or something like that.
Idon't think this number picking stuff deserve to be commented, because it still is quite close to random. RFC gives us
toomuch freedom of choice. 

I thought your explanation was quite clear and I agree that this
approach makes the most sense. I sent an email to the RFC authors to
ask for their feedback with you (Andrey) in the CC, because even
though it makes the most sense it does not comply with the either of
method 1 or 2 as described in the RFC.



Re: UUID v7

From
Jelte Fennema-Nio
Date:
On Tue, 12 Mar 2024 at 07:32, Michael Paquier <michael@paquier.xyz> wrote:
> Sure, there is no problem in discussing a patch to implement a
> behavior.  But I disagree about taking a risk in merging something
> that could become non-compliant with the approved RFC, if the draft is
> approved at the end, of course.  This just strikes me as a bad idea.

I agree that we shouldn't release UUIDv7 support if the RFC describing
that is not yet approved. But I do think it would be a shame if e.g.
the RFC got approved 2 weeks after Postgres its feature freeze. Which
would then mean we'd have to wait another 1.5 years before actually
using uuidv7. Would it be a reasonable compromise to still merge the
patch for PG17 (assuming the code is good to merge with regards to the
current draft RFC), but revert the commit if the RFC is not approved
before some deadline before the release date (e.g. before the first
release candidate)?



Re: UUID v7

From
Sergey Prokhorenko
Date:
Hi Jelte,


Andrey's patch corresponds exactly to Fixed-Length Dedicated Counter Bits (Method 1).

Andrey and you simply did not read the RFC a little further down in the text:
__________________________________________________________________

The following sub-topics cover topics related solely with creating reliable fixed-length dedicated counters:

Fixed-Length Dedicated Counter Seeding:

Implementations utilizing the fixed-length counter method randomly initialize the counter with each new timestamp tick. However, when the timestamp has not increased, the counter is instead incremented by the desired increment logic. When utilizing a randomly seeded counter alongside Method 1, the random value MAY be regenerated with each counter increment without impacting sortability. The downside is that Method 1 is prone to overflows if a counter of adequate length is not selected or the random data generated leaves little room for the required number of increments. Implementations utilizing fixed-length counter method MAY also choose to randomly initialize a portion of the counter rather than the entire counter. For example, a 24 bit counter could have the 23 bits in least-significant, right-most, position randomly initialized. The remaining most significant, left-most counter bit is initialized as zero for the sole purpose of guarding against counter rollovers.

Fixed-Length Dedicated Counter Length:
Select a counter bit-length that can properly handle the level of timestamp precision in use. For example, millisecond precision generally requires a larger counter than a timestamp with nanosecond precision. General guidance is that the counter SHOULD be at least 12 bits but no longer than 42 bits. Care must be taken to ensure that the counter length selected leaves room for sufficient entropy in the random portion of the UUID after the counter. This entropy helps improve the unguessability characteristics of UUIDs created within the batch.
The following sub-topics cover rollover handling with either type of counter method:
...
Counter Rollover Handling:

Counter rollovers MUST be handled by the application to avoid sorting issues. The general guidance is that applications that care about absolute monotonicity and sortability should freeze the counter and wait for the timestamp to advance which ensures monotonicity is not broken. Alternatively, implementations MAY increment the timestamp ahead of the actual time and reinitialize the counter.


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Tuesday, 12 March 2024 at 06:36:13 pm GMT+3, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:


On Mon, 11 Mar 2024 at 19:27, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

> Sorry for this long and vague explanation, if it still seems too uncertain we can have a chat or something like that. I don't think this number picking stuff deserve to be commented, because it still is quite close to random. RFC gives us too much freedom of choice.


I thought your explanation was quite clear and I agree that this
approach makes the most sense. I sent an email to the RFC authors to
ask for their feedback with you (Andrey) in the CC, because even
though it makes the most sense it does not comply with the either of
method 1 or 2 as described in the RFC.

Re: UUID v7

From
Jelte Fennema-Nio
Date:
On Tue, 12 Mar 2024 at 18:18, Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
> Andrey and you simply did not read the RFC a little further down in the text:

You're totally right, sorry about that. Maybe it would be good to move
those subsections around a bit in the RFC though, so that anything
related to only one method is included in the section for that method.



Re: UUID v7

From
Peter Eisentraut
Date:
On 10.03.24 13:59, Andrey M. Borodin wrote:
>> The functions uuid_extract_ver and uuid_extract_var could be named
>> uuid_extract_version and uuid_extract_variant.  Otherwise, it's hard
>> to tell them apart, with only one letter different.
> 
> Renamed.

Another related comment: Throughout your patch, swap the order of 
uuid_extract_variant and uuid_extract_version.  First, this makes more 
sense because version is subordinate to variant, and also it makes it 
alphabetical.

>> I think the behavior of uuid_extract_var(iant) is wrong.  The code
>> takes just two bits to return, but the draft document is quite clear
>> that the variant is 4 bits (see Table 1).
> 
> Well, it was correct only for implemented variant. I've made version that implements full table 1 from section 4.1.

I think we are still interpreting this differently.  I think 
uuid_extract_variant should just return whatever is in those four bits. 
Your function comment says "Can return only 0, 0b10, 0b110 and 0b111.", 
which I don't think it is correct.  It should return 0 through 15.

>> I would have expected that, since gettimeofday() provides microsecond
>> precision, we'd put the extra precision into "rand_a" as per Section 6.2 method 3.
> 
> I had chosen method 2 over method 3 as most portable. Can we be sure how many bits (after reading milliseconds) are
thereacross different OSes?
 

I think this should have been researched.  If we don't know how many 
bits we have, how do we know we have enough for milliseconds?  I think 
we should at least have some kind of idea, if we are going to have this 
conversation.




Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 14 Mar 2024, at 16:07, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 10.03.24 13:59, Andrey M. Borodin wrote:
>>> The functions uuid_extract_ver and uuid_extract_var could be named
>>> uuid_extract_version and uuid_extract_variant.  Otherwise, it's hard
>>> to tell them apart, with only one letter different.
>> Renamed.
>
> Another related comment: Throughout your patch, swap the order of uuid_extract_variant and uuid_extract_version.
First,this makes more sense because version is subordinate to variant, and also it makes it alphabetical. 
I will do it soon.
>
>>> I think the behavior of uuid_extract_var(iant) is wrong.  The code
>>> takes just two bits to return, but the draft document is quite clear
>>> that the variant is 4 bits (see Table 1).
>> Well, it was correct only for implemented variant. I've made version that implements full table 1 from section 4.1.
>
> I think we are still interpreting this differently.  I think uuid_extract_variant should just return whatever is in
thosefour bits. Your function comment says "Can return only 0, 0b10, 0b110 and 0b111.", which I don't think it is
correct. It should return 0 through 15. 
We will return "do not care" bits. This bits can confuse someone. E.g. for varaint 0b10 we can return 8, 9, 10 and 11
randomly.Is it OK? BTW for some reason document lists number 1-15, but your are correct that range is 0-15. 

>
>>> I would have expected that, since gettimeofday() provides microsecond
>>> precision, we'd put the extra precision into "rand_a" as per Section 6.2 method 3.
>> I had chosen method 2 over method 3 as most portable. Can we be sure how many bits (after reading milliseconds) are
thereacross different OSes? 
>
> I think this should have been researched.  If we don't know how many bits we have, how do we know we have enough for
milliseconds? I think we should at least have some kind of idea, if we are going to have this conversation. 

Bits for milliseconds are strictly defined by the document: there are always 48 bits, independently from clock
resolution.
But I don't think it's main problem for Method 3. Method 1 actually guarantees strictly increasing order of UUIDs
generatedby single backend. Method 3 can generate a lot of unsorted data in case of time leaping backward. 

BTW Kyzer (in an off-list discussion) and Sergey confirmed that implemented method from the patch actually is Method 1.


Best regards, Andrey Borodin.


Re: UUID v7

From
Peter Eisentraut
Date:
On 14.03.24 12:25, Andrey M. Borodin wrote:
>>>> I think the behavior of uuid_extract_var(iant) is wrong.  The code
>>>> takes just two bits to return, but the draft document is quite clear
>>>> that the variant is 4 bits (see Table 1).
>>> Well, it was correct only for implemented variant. I've made version that implements full table 1 from section
4.1.
>> I think we are still interpreting this differently.  I think uuid_extract_variant should just return whatever is in
thosefour bits. Your function comment says "Can return only 0, 0b10, 0b110 and 0b111.", which I don't think it is
correct. It should return 0 through 15.
 
> We will return "do not care" bits. This bits can confuse someone. E.g. for varaint 0b10 we can return 8, 9, 10 and 11
randomly.Is it OK? BTW for some reason document lists number 1-15, but your are correct that range is 0-15.
 

I agree it's confusing.  Before I studied the RFC 4122bis project, I 
didn't even know about variant vs. version.  I think overall people will 
find this more confusing than useful.  If you just want to know, "is 
this UUID of the kind specified in RFC 4122", you can query it with 
uuid_extract_version(x) IS NOT NULL.  So maybe we don't need the 
_extract_variant function?




Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 14 Mar 2024, at 20:10, Peter Eisentraut <peter@eisentraut.org> wrote:
>
>>>>> I think the behavior of uuid_extract_var(iant) is wrong.  The code
>>>>> takes just two bits to return, but the draft document is quite clear
>>>>> that the variant is 4 bits (see Table 1).
>>>> Well, it was correct only for implemented variant. I've made version that implements full table 1 from section
4.1.
>>> I think we are still interpreting this differently.  I think uuid_extract_variant should just return whatever is in
thosefour bits. Your function comment says "Can return only 0, 0b10, 0b110 and 0b111.", which I don't think it is
correct. It should return 0 through 15. 
>> We will return "do not care" bits. This bits can confuse someone. E.g. for varaint 0b10 we can return 8, 9, 10 and
11randomly. Is it OK? BTW for some reason document lists number 1-15, but your are correct that range is 0-15. 
>
> I agree it's confusing.  Before I studied the RFC 4122bis project, I didn't even know about variant vs. version.  I
thinkoverall people will find this more confusing than useful.  If you just want to know, "is this UUID of the kind
specifiedin RFC 4122", you can query it with uuid_extract_version(x) IS NOT NULL.  So maybe we don't need the
_extract_variantfunction? 

I think it's the best possible solution. The variant has no value besides detecting if a version can be extracted.


Best regards, Andrey Borodin.


Re: UUID v7

From
Aleksander Alekseev
Date:
Hi,

> > So maybe we don't need the _extract_variant function?
>
> I think it's the best possible solution. The variant has no value besides detecting if a version can be extracted.

+1 to the idea. I doubt that anyone will miss it.

-- 
Best regards,
Aleksander Alekseev



Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 15 Mar 2024, at 14:47, Aleksander Alekseev <aleksander@timescale.com> wrote:
>
> +1 to the idea. I doubt that anyone will miss it.

PFA v22.

Changes:
1. Squashed all editorialisation by Jelte
2. Fixed my erroneous comments on using Method 2 (we are using method 1 instead)
3. Remove all traces of uuid_extract_variant()

Thanks!


Best regards, Andrey Borodin.

Attachment

Re: UUID v7

From
Peter Eisentraut
Date:
On 16.03.24 18:43, Andrey M. Borodin wrote:
>> On 15 Mar 2024, at 14:47, Aleksander Alekseev <aleksander@timescale.com> wrote:
>>
>> +1 to the idea. I doubt that anyone will miss it.
> 
> PFA v22.
> 
> Changes:
> 1. Squashed all editorialisation by Jelte
> 2. Fixed my erroneous comments on using Method 2 (we are using method 1 instead)
> 3. Remove all traces of uuid_extract_variant()

I have committed a subset of this for now, namely the additions of 
uuid_extract_timestamp() and uuid_extract_version().  These seemed 
mature and agreed upon.  You can rebase the rest of your patch on top of 
that.

I have started a separate discussion to learn about the precision we can 
expect from gettimeofday().




Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 19 Mar 2024, at 13:55, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 16.03.24 18:43, Andrey M. Borodin wrote:
>>> On 15 Mar 2024, at 14:47, Aleksander Alekseev <aleksander@timescale.com> wrote:
>>>
>>> +1 to the idea. I doubt that anyone will miss it.
>> PFA v22.
>> Changes:
>> 1. Squashed all editorialisation by Jelte
>> 2. Fixed my erroneous comments on using Method 2 (we are using method 1 instead)
>> 3. Remove all traces of uuid_extract_variant()
>
> I have committed a subset of this for now, namely the additions of uuid_extract_timestamp() and
uuid_extract_version(). These seemed mature and agreed upon.  You can rebase the rest of your patch on top of that. 

Great! Thank you! PFA v23 with rebase on HEAD.

> I have started a separate discussion to learn about the precision we can expect from gettimeofday().

Even in presence of real microsecond-enabled and portable timer using microseconds does not seem to me an optimal way
ofutilising UUID bits. 

Timer-based bits contribute to global sortability. But the real timers we have are not even millisecond adjusted. We
canhope for ~few ms variation in one datacenter or in presence of atomic clocks. 

Time-based bits contribute to global uniqueness, but certainly they are not that effective as counter bits.

Time-based bits do not provide local sortability guarantees: some UUIDs might get same microseconds or be affected by
leapbackwards. 

I think that microseconds are good only for hardware-specific solutions, not for something that runs on variety of
platforms,OSes, devices. 


Best regards, Andrey Borodin.

Attachment

Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 21 Mar 2024, at 20:21, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>> Timer-based bits contribute to global sortability. But the real timers we have are not even millisecond adjusted. We
canhope for ~few ms variation in one datacenter or in presence of atomic clocks. 
>
> I think the main benefit of using microseconds would not be
> sortability between servers, but sortability between backends.

Oh, that’s an interesting practical feature!
Se, essentially counter is a theoretical guaranty of sortability in one backend, while microseconds are practical
sortabilitybetween backends. 

> However, I don't really think it is incredibly important to get the
> "perfect" approach to filling in rand_a/rand_b right now. As long as
> we don't document what we do, we can choose to change the method
> without breaking backwards compatibility. Because either approach
> results in valid UUIDv7s.

Makes sense to me. I think both methods would be much better than UUIDv4 for practical reasons. And even not using
extrabits at all (fill them with random numbers) would work for 0.999 cases. 


Best regards, Andrey Borodin.


Re: UUID v7

From
Sergey Prokhorenko
Date:
I think it's better to leave Andrey's patch as is, and add another function in the future with a customizable UUIDv7 structure for special use cases. The structure description can be in JSON format. See this discussion.



Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au


On Friday, 22 March 2024 at 09:54:07 am GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:




> On 21 Mar 2024, at 20:21, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>> Timer-based bits contribute to global sortability. But the real timers we have are not even millisecond adjusted. We can hope for ~few ms variation in one datacenter or in presence of atomic clocks.
>
> I think the main benefit of using microseconds would not be
> sortability between servers, but sortability between backends.

Oh, that’s an interesting practical feature!
Se, essentially counter is a theoretical guaranty of sortability in one backend, while microseconds are practical sortability between backends.


> However, I don't really think it is incredibly important to get the
> "perfect" approach to filling in rand_a/rand_b right now. As long as
> we don't document what we do, we can choose to change the method
> without breaking backwards compatibility. Because either approach
> results in valid UUIDv7s.


Makes sense to me. I think both methods would be much better than UUIDv4 for practical reasons. And even not using extra bits at all (fill them with random numbers) would work for 0.999 cases.


Best regards, Andrey Borodin.

Re: UUID v7

From
Peter Eisentraut
Date:
On 21.03.24 16:21, Jelte Fennema-Nio wrote:
> On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>> Timer-based bits contribute to global sortability. But the real timers we have are not even millisecond adjusted. We
canhope for ~few ms variation in one datacenter or in presence of atomic clocks.
 
> 
> I think the main benefit of using microseconds would not be
> sortability between servers, but sortability between backends.

There is that, and there are also multiple backend workers for one session.




Re: UUID v7

From
Sergey Prokhorenko
Date:
Why not use a single UUID generator for the database table in this case, similar to autoincrement?


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Friday, 22 March 2024 at 03:51:20 pm GMT+3, Peter Eisentraut <peter@eisentraut.org> wrote:


On 21.03.24 16:21, Jelte Fennema-Nio wrote:

> On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>> Timer-based bits contribute to global sortability. But the real timers we have are not even millisecond adjusted. We can hope for ~few ms variation in one datacenter or in presence of atomic clocks.
>
> I think the main benefit of using microseconds would not be
> sortability between servers, but sortability between backends.


There is that, and there are also multiple backend workers for one session.


Re: UUID v7

From
Sergey Prokhorenko
Date:
BTW: Each microservice should have its own database to ensure data isolation and independence, enabling better scalability and fault tolerance






Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au


On Friday, 22 March 2024 at 04:42:20 pm GMT+3, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:


Why not use a single UUID generator for the database table in this case, similar to autoincrement?


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Friday, 22 March 2024 at 03:51:20 pm GMT+3, Peter Eisentraut <peter@eisentraut.org> wrote:


On 21.03.24 16:21, Jelte Fennema-Nio wrote:

> On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>> Timer-based bits contribute to global sortability. But the real timers we have are not even millisecond adjusted. We can hope for ~few ms variation in one datacenter or in presence of atomic clocks.
>
> I think the main benefit of using microseconds would not be
> sortability between servers, but sortability between backends.


There is that, and there are also multiple backend workers for one session.


Re: UUID v7

From
Sergey Prokhorenko
Date:
On Friday, 22 March 2024 at 04:58:59 pm GMT+3, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:


BTW: Each microservice should have its own database to ensure data isolation and independence, enabling better scalability and fault tolerance









Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au


On Friday, 22 March 2024 at 04:42:20 pm GMT+3, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:


Why not use a single UUID generator for the database table in this case, similar to autoincrement?


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Friday, 22 March 2024 at 03:51:20 pm GMT+3, Peter Eisentraut <peter@eisentraut.org> wrote:


On 21.03.24 16:21, Jelte Fennema-Nio wrote:

> On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>> Timer-based bits contribute to global sortability. But the real timers we have are not even millisecond adjusted. We can hope for ~few ms variation in one datacenter or in presence of atomic clocks.
>
> I think the main benefit of using microseconds would not be
> sortability between servers, but sortability between backends.


There is that, and there are also multiple backend workers for one session.


Re: UUID v7

From
Peter Eisentraut
Date:
On 20.03.24 19:08, Andrey M. Borodin wrote:
>> On 19 Mar 2024, at 13:55, Peter Eisentraut <peter@eisentraut.org> wrote:
>>
>> On 16.03.24 18:43, Andrey M. Borodin wrote:
>>>> On 15 Mar 2024, at 14:47, Aleksander Alekseev <aleksander@timescale.com> wrote:
>>>>
>>>> +1 to the idea. I doubt that anyone will miss it.
>>> PFA v22.
>>> Changes:
>>> 1. Squashed all editorialisation by Jelte
>>> 2. Fixed my erroneous comments on using Method 2 (we are using method 1 instead)
>>> 3. Remove all traces of uuid_extract_variant()
>>
>> I have committed a subset of this for now, namely the additions of uuid_extract_timestamp() and
uuid_extract_version(). These seemed mature and agreed upon.  You can rebase the rest of your patch on top of that.
 
> 
> Great! Thank you! PFA v23 with rebase on HEAD.

I have been studying the uuidv() function.

I find this code extremely hard to follow.

We don't need to copy all that documentation from the RFC 4122bis 
document.  People can read that themselves.  What I would like to see is 
easy to find information what from there we are implementing.  Like,

- UUID version 7
- fixed-length dedicated counter
- counter is 18 bits
- 4 bits are initialized as zero

That's more or less all I would need to know what is going on.

That said, I don't understand why you say it's an 18 bit counter, when 
you overwrite 6 bits with variant and version.  Then it's just a 12 bit 
counter?  Which is the size of the rand_a field, so that kind of makes 
sense.  But 12 bits is the recommended minimum, and (in this patch) we 
don't use sub-millisecond timestamp precision, so we should probably use 
more than the minimum?

Also, you are initializing 4 bits (I think?) to zero to guard against 
counter rollovers (so it's really just an 8 bit counter?).  But nothing 
checks against such rollovers, so I don't understand the use of that.

The code code be organized better.  In the not-increment_counter case, 
you could use two separate pg_strong_random calls: One to initialize 
rand_b, starting at &uuid->data[8], and one to initialize the counter. 
Then the former could be shared between the two branches, and the code 
to assign the sequence_counter to the uuid fields could also be shared.

I would also prefer if the normal case (not-increment_counter) were the 
first if branch.


Some other notes on your patch:

- Your rebase duplicated the documentation of uuid_extract_timestamp and 
uuid_extract_version.

- PostgreSQL code uses uint64 etc. instead of uint64_t etc.

- It seems the added includes

#include "access/xlog.h"
#include "utils/builtins.h"
#include "utils/datetime.h"

are not needed.

- The static variables sequence_counter and previous_timestamp could be 
kept inside the uuidv7() function.




Re: UUID v7

From
"Andrey M. Borodin"
Date:
Sorry for this long reply. I was looking on refactoring around pg_strong_random() and could not decide what to do.
Finally,I decided to post at least something. 

> On 22 Mar 2024, at 19:15, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> I have been studying the uuidv() function.
>
> I find this code extremely hard to follow.
>
> We don't need to copy all that documentation from the RFC 4122bis document.  People can read that themselves.  What I
wouldlike to see is easy to find information what from there we are implementing.  Like, 
>
> - UUID version 7
> - fixed-length dedicated counter
> - counter is 18 bits
> - 4 bits are initialized as zero

I've removed table taken from RFC.

> That's more or less all I would need to know what is going on.
>
> That said, I don't understand why you say it's an 18 bit counter, when you overwrite 6 bits with variant and version.
Then it's just a 12 bit counter?  Which is the size of the rand_a field, so that kind of makes sense.  But 12 bits is
therecommended minimum, and (in this patch) we don't use sub-millisecond timestamp precision, so we should probably use
morethan the minimum? 


No, we use 4 bits in data[6], 8 bits in data[7], and 6 bits data[8]. It's 18 total. Essentially, we use both partial
bytesand one whole byte between. 
There was a bug - we used 1 extra byte of random numbers that was not necessary, I think that's what lead you to think
thatwe use 12-bit counter. 

> Also, you are initializing 4 bits (I think?) to zero to guard against counter rollovers (so it's really just an 8 bit
counter?). But nothing checks against such rollovers, so I don't understand the use of that. 

No, there's only one guard rollover bit.
Here: uuid->data[6] = (uuid->data[6] & 0xf7);
Bits that are called "guard bits" do not guard anything, they just ensure counter capacity when it is initialized.
Rollover is carried into time tick here:
    ++sequence_counter;
        if (sequence_counter > 0x3ffff)
        {
            /* We only have 18-bit counter */
            sequence_counter = 0;
            previous_timestamp++;
        }

I think we might use 10 bits of microseconds and have 8 bits of a counter. Effect of a counter won't change much. But
I'mnot sure if this is allowed per RFC. 
If time source is coarse-grained it still acts like a random initializer. And when it is precise - time is "natural"
sourceof entropy. 


> The code code be organized better.  In the not-increment_counter case, you could use two separate pg_strong_random
calls:One to initialize rand_b, starting at &uuid->data[8], and one to initialize the counter. Then the former could be
sharedbetween the two branches, and the code to assign the sequence_counter to the uuid fields could also be shared. 

Call to pg_strong_random() is very expensive in builds without ssl (and even with ssl too). If we could ammortize
randomnumbers in small buffers - that would save a lot of time (see v8-0002-Buffer-random-numbers.patch upthread). Or,
perhaps,we can ignore cost of two pg_string_random() calls. 

>
> I would also prefer if the normal case (not-increment_counter) were the first if branch.

Done.

> Some other notes on your patch:
>
> - Your rebase duplicated the documentation of uuid_extract_timestamp and uuid_extract_version.
>
> - PostgreSQL code uses uint64 etc. instead of uint64_t etc.
>
> - It seems the added includes
>
> #include "access/xlog.h"
> #include "utils/builtins.h"
> #include "utils/datetime.h"
>
> are not needed.
>
> - The static variables sequence_counter and previous_timestamp could be kept inside the uuidv7() function.

Fixed.

Thanks!


Best regards, Andrey Borodin.

Attachment

Re: UUID v7

From
Peter Eisentraut
Date:
On 26.03.24 18:26, Andrey M. Borodin wrote:
>> Also, you are initializing 4 bits (I think?) to zero to guard against counter rollovers (so it's really just an 8
bitcounter?).  But nothing checks against such rollovers, so I don't understand the use of that.
 
> No, there's only one guard rollover bit.
> Here: uuid->data[6] = (uuid->data[6] & 0xf7);
> Bits that are called "guard bits" do not guard anything, they just ensure counter capacity when it is initialized.

Uh, I guess I don't understand this at all.  I tried to dig up some 
information about this, but didn't find anything.  What exactly is the 
mechanism of these "counter rollover guards"?  If they don't guard 
anything, what are they supposed to accomplish?




Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 4 Apr 2024, at 18:45, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 26.03.24 18:26, Andrey M. Borodin wrote:
>>> Also, you are initializing 4 bits (I think?) to zero to guard against counter rollovers (so it's really just an 8
bitcounter?).  But nothing checks against such rollovers, so I don't understand the use of that. 
>> No, there's only one guard rollover bit.
>> Here: uuid->data[6] = (uuid->data[6] & 0xf7);
>> Bits that are called "guard bits" do not guard anything, they just ensure counter capacity when it is initialized.
>
> Uh, I guess I don't understand this at all.  I tried to dig up some information about this, but didn't find anything.
What exactly is the mechanism of these "counter rollover guards"?  If they don't guard anything, what are they supposed
toaccomplish? 
>

My understanding of guard bits is the following: on every UUID generation, when time is advancing, counter bits are
initializedwith random numbers, except guard bits. Guard bits are always initialized with zeroes. 

Let's consider we have a 1-byte counter with 4 guard bits and 4 normal bits.
If we generate some UUIDs at the very same millisecond we might have following counter values:

0C <--- lower nibble is initialized with random 4 bits C.
0D
0E
0F
10
11
12

If we have no these guard bits we might get random numbers that are immifiately at the end of a range of allowed
values:

FE <--- first UUID at given millisecond
FF
00 <--- rollover to next millisecond
01


If we have 1 guard bit and 7 normal bits we get at worst 128 values before rollover to next millisecond.
If we have 2 guard bits and 6 normal bits this guaranty is extended to 192.
3/5 will guaranty capacity of 224.
But usefulness of every next guard bits decreases, so I think there is a point in only one.

That's my understanding of guard bits in the counter. Please correct me if I'm wrong.


At this point we can skip the counter\microseconds entirely, just fill everything after unix_ts_ms with randomness.
It'sstill a valid UUIDv7, exhibiting much more data locality than UUIDv4. We can adjust this sortability measures
later.


Best regards, Andrey Borodin.


Re: UUID v7

From
Sergey Prokhorenko
Date:
For every complex problem there is an answer that is clear, simple, and wrong. Since the RFC allows microsecond timestamp granularity, the first thing that comes to everyone's mind is to insert microsecond granularity into UUIDv7. And if the RFC allowed nanosecond timestamp granularity, then they would try to insert nanosecond granularity into UUIDv7.

But I am categorically against abandoning the counter under pressure from the unfounded proposal to replace the counter with microsecond granularity.

1) The RFC specifies millisecond timestamp granularity by default.

2) All advanced UUIDv7 implementations include a counter:
• for JavaScript https://www.npmjs.com/package/uuidv7
• for Rust https://crates.io/crates/uuid7
• for Go (Golang) https://pkg.go.dev/github.com/gofrs/uuid#NewV7
• for Python https://github.com/oittaa/uuid6-python

3) The theoretical performance of generating UUIDv7 without loss of monotonicity for microsecond granularity is only 1000 UUIDv7 per millisecond. This is very low and insufficient generation performance! But the actual generation performance is even worse, since the generation demand is unevenly distributed within a millisecond. Therefore, a UUIDv7 will not be generated every microsecond.

For a counter 18 bits long, with the most significant bit initialized to zero and the remaining bits initialized to a random number, the actual performance of generating a UUIDv7 without loss of monotonicity is between 2 to the power of 17 = 131072 UUIDv7 per millisecond (if the random number happens to be all ones) to 2 to the power of 18 = 262144 UUIDv7 per millisecond (if the random number happens to be all zeros). This is more than enough.

4) Microsecond timestamp fraction subtracts 10 bits from random data, which increases the risk of collision. In the counter, almost all bits are initialized with a random number, which reduces the risk of collision.



The only reasonable use of microsecond granularity is when writing to a database table in parallel. However, monotonicity in this case can be ensured in another way, namely a single UUIDv7 generator per database table, similar to SERIAL (https://postgrespro.com/docs/postgresql/16/datatype-numeric#DATATYPE-SERIAL) in PostgreSQL.

Best regards,
Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Thursday, 4 April 2024 at 09:12:17 pm GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:



...


At this point we can skip the counter\microseconds entirely, just fill everything after unix_ts_ms with randomness. It's still a valid UUIDv7, exhibiting much more data locality than UUIDv4. We can adjust this sortability measures later.


Best regards, Andrey Borodin.

Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 12 Mar 2024, at 20:41, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> if e.g.
> the RFC got approved 2 weeks after Postgres its feature freeze

Jelte, you seem to be the visionary! I would consider participating in lotteries or betting.
New UUID is assigned RFC number 9562, it was aproved by RFC editors and is now in AUTH48 state. This means after final
approvalby authors RFC will be imminently publicised. Most probably, this will happen circa 2 weeks after feature
freeze:) 


Best regards, Andrey Borodin.

[0] https://www.rfc-editor.org/auth48/rfc9562


Re: UUID v7

From
Sergey Prokhorenko
Date:
I think that for the sake of such an epoch-making thing as UUIDv7 it would be worth slightly unfreezing this feature freeze.

Best regards, 


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Saturday, 13 April 2024 at 09:58:29 am GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:




> On 12 Mar 2024, at 20:41, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> if e.g.
> the RFC got approved 2 weeks after Postgres its feature freeze

Jelte, you seem to be the visionary! I would consider participating in lotteries or betting.
New UUID is assigned RFC number 9562, it was aproved by RFC editors and is now in AUTH48 state. This means after final approval by authors RFC will be imminently publicised. Most probably, this will happen circa 2 weeks after feature freeze :)


Best regards, Andrey Borodin.


[0]
https://www.rfc-editor.org/auth48/rfc9562

Re: UUID v7

From
Michael Paquier
Date:
On Sat, Apr 13, 2024 at 07:07:34PM +0000, Sergey Prokhorenko wrote:
> I think that for the sake of such an epoch-making thing as UUIDv7 it
> would be worth slightly unfreezing this feature freeze.

A feature freeze is here to freeze things in place.  This comes up
every year, and that won't happen.

> New UUID is assigned RFC number 9562, it was aproved by RFC editors
> and is now in AUTH48 state. This means after final approval by
> authors RFC will be imminently publicised. Most probably, this will
> happen circa 2 weeks after feature freeze :)
>
> [0] https://www.rfc-editor.org/auth48/rfc9562

Well, that's life.  It looks like this is waiting for some final
approval, which may take some more time.  I have no idea how long this
usually takes.
--
Michael

Attachment

Re: UUID v7

From
"Andrey M. Borodin"
Date:
> On 13 Apr 2024, at 11:58, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
> New UUID is assigned RFC number 9562, it was aproved by RFC editors and is now in AUTH48 state.

RFC 9562 is not in AUTH48-Done state, it was approved by authors and editor, and now should be published.


Best regards, Andrey Borodin.


Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 3 May 2024, at 11:18, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>

Here's the documentation from ClickHouse [0] for their implementation. It's identical to provided patch in this thread,
withfew notable exceptions: 

1. Counter is 42 bits, not 18. The counter have no guard bits, every bit is initialized with random number on time
ticks.
2. By default counter is shared between threads. Alternative function generateUUIDv7ThreadMonotonic() provides
thread-localcounter. 

Thanks!


Best regards, Andrey Borodin.

[0] https://clickhouse.com/docs/en/sql-reference/functions/uuid-functions#generateUUIDv7


Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 3 May 2024, at 11:18, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
> RFC 9562 is not in AUTH48-Done state, it was approved by authors and editor, and now should be published.

It's RFC now.
https://datatracker.ietf.org/doc/rfc9562/


Best regards, Andrey Borodin.


Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 8 May 2024, at 18:37, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
> It's RFC now.

PFA version with references to RFC instead of drafts.
In nearby thread [0] we found out that most systems have enough presicion to fill additional 12 bits of sub-millisecond
information.So I switched implementation to this method. 
We have a portable gettimeofday(), but unfortunately it gives only 10 bits of sub-millisecond information. So I created
portableget_real_time_ns() for this purpose: it reads clock_gettime() on non-Windows platforms and
GetSystemTimePreciseAsFileTime()on Windows. 

Thanks!


Best regards, Andrey Borodin.

[0] https://www.postgresql.org/message-id/flat/be0339cc-1ae1-4892-9445-8e6d8995a44d%40eisentraut.org


Attachment

Re: UUID v7

From
Sergey Prokhorenko
Date:

Dear Colleagues,

Although the uuidv7(timestamp) function clearly contradicts RFC 9562, but the uuidv7(timestamp_offset) function is fully compliant with RFC 9562 and is absolutely necessary.

"Altering, Fuzzing, or Smearing:

Implementations MAY alter the actual timestamp. Some examples include security considerations around providing a real-clock value within a UUID to 1) correct inaccurate clocks, 2) handle leap seconds, or 3) obtain a millisecond value by dividing by 1024 (or some other value) for performance reasons (instead of dividing a number of microseconds by 1000). This specification makes no requirement or guarantee about how close the clock value needs to be to the actual time. "

It’s written clumsily, of course, but the intention of the authors of RFC 9562 is completely clear: the current timestamp can be changed by any amount and for any reason, including security or performance reasons. The wording provides only a few examples, the list of which is certainly not exhaustive.

The motives of the authors of RFC 9562 are also clear. The timestamp is needed only to generate monotonically increasing UUIDv7. The timestamp should not be used as a source of data about the time the record was created (this is explicitly stated in section 6.12. Opacity). Therefore, the actual timestamp can and should be changed if necessary.

Why then does RFC 9562 contain wording about the need to use "Unix Epoch timestamp"? First, the authors of RFC 9562 wanted to get away from using the Gregorian calendar, which required a timestamp that was too long. Second, the RFC 9562 prohibits inserting into UUIDv7 a completely arbitrary date and time value that does not increase with the passage of real time. And this is correct, since in this case the generated UUIDv7 would not be monotonically increasing. Thirdly, on almost all computing platforms there is a convenient source of "Unix Epoch timestamp".

Why does the uuidv7() function need the optional formal parameter timestamp_offset? This question is best answered by a quote from https://lu.sagebl.eu/notes/maybe-we-dont-need-uuidv7 :

"Leaking information

UUIDv4 does not leak information assuming a proper implementation. But, UUIDv7 in fact does: the timestamp of the server is embeded into the ID. From a business point of view it discloses information about resource creation time. It may not be a problem depending on the context. Current RFC draft allows implementation to tweak timestamps a little to enforce a strict increasing order between two generations and to alleviate some security concerns."

There is a lot of hate on the internet about "UUIDv7 should not be used because it discloses the date and time the record was created." If there was a ban on changing the actual timestamp, this would prevent the use of UUIDv7 in mission-critical databases, and would generally lead to a decrease in the popularity of UUIDv7.

The implementation details of timestamp_offset are, of course, up to the developer. But I would suggest two features:

1. If the result of applying timestamp_offset the timestamp goes beyond the permissible interval, the timestamp_offset value must be reset to zero

2. The data type for timestamp_offset should be developer-friendly interval type, (https://postgrespro.ru/docs/postgresql/16/datatype-datetime?lang=en#DATATYPE-INTERVAL-INPUT), which allows you to enter the argument value using words microsecond, millisecond, second, minute, hour, day, week, month, year, decade, century, millennium.

I really hope that timestamp_offset will be used in the uuidv7() function for PostgreSQL.


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au



Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 24 Jul 2024, at 04:09, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:
>
> Implementations MAY alter the actual timestamp.

Hmm… looks like we slightly misinterpreted words about clock source.
Well, that’s great, let’s get offset back.
PFA version accepting offset interval.
It works like this:
postgres=# select uuidv7(interval '-2 months’);
 018fc02f-0996-7136-aeb4-8936b5a516a1


postgres=# select uuid_extract_timestamp(uuidv7(interval '-2 months'));
 2024-05-28 22:11:15.71+05

What do you think?


Best regards, Andrey Borodin.

Attachment

Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 28 Jul 2024, at 23:44, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
> PFA version accepting offset interval.

There was a bug: when time was not moving on, I was updating used time by a nanosecond, instead of 1/4096 of
millisecond.
V27 fixes that.

Thanks!


Best regards, Andrey Borodin.

Attachment

Re: UUID v7

From
Masahiko Sawada
Date:
On Sun, Aug 4, 2024 at 3:51 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 28 Jul 2024, at 23:44, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> >
> > PFA version accepting offset interval.
>
> There was a bug: when time was not moving on, I was updating used time by a nanosecond, instead of 1/4096 of
millisecond.
> V27 fixes that.
>
> Thanks!

I've reviewed the v27 patch and have some comments:

---
in datatype.sgml:

    The data type <type>uuid</type> stores Universally Unique Identifiers
    (UUID) as defined by <ulink
url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>,
    ISO/IEC 9834-8:2005, and related standards.

In funcs.sgml:
   This function extracts the version from a UUID of the variant described by
   <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC
4122</ulink>.  For

Maybe these references of RFC4122 need to be updated as well.

---
'git show --check' raises a warning:

src/backend/utils/adt/uuid.c:520: trailing whitespace.
+

---
+
+   if (PG_NARGS() > 0)
+   {
+       Interval *span;
+       TimestampTz ts = (TimestampTz) (ns / 1000) -
+           (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY *
USECS_PER_SEC;
+       span = PG_GETARG_INTERVAL_P(0);
+       ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
+                                                    TimestampTzGetDatum(ts),
+                                                    IntervalPGetDatum(span)));
+       ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
SECS_PER_DAY * USECS_PER_SEC)
+           * 1000 + ns % 1000;
+   }

We need to add a comment to describe what/why we're doing here.

---
+ * Monotonicity (regarding generation on given backend) is ensured with method
+ * "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)"

Need a period at the end of this sentence.

---
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuidv7' },

Both functions have the same description but work differently. I think
it's better to clarify the description of uuidv7() that takes an
interval.

---
- oid | proname | oid | proname
------+---------+-----+---------
-(0 rows)
+ oid  | proname | oid  | proname
+------+---------+------+---------
+ 9896 | uuidv7  | 9897 | uuidv7
+(1 row)

I think that we need to change these functions so that this check
query doesn't return anything, no?

---
+   if (version == 6)
+   {
+       tms = ((uint64) uuid->data[0]) << 52;
+       tms += ((uint64) uuid->data[1]) << 44;
+       tms += ((uint64) uuid->data[2]) << 36;
+       tms += ((uint64) uuid->data[3]) << 28;
+       tms += ((uint64) uuid->data[4]) << 20;
+       tms += ((uint64) uuid->data[5]) << 12;
+       tms += (((uint64) uuid->data[6]) & 0xf) << 8;
+       tms += ((uint64) uuid->data[7]);
+
+       /* convert 100-ns intervals to us, then adjust */
+       ts = (TimestampTz) (tms / 10) -
+           ((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) *
SECS_PER_DAY * USECS_PER_SEC;
+
+       PG_RETURN_TIMESTAMPTZ(ts);
+   }

It's odd to me that only uuid_extract_timestamp() supports UUID v6 in
spite of not supporting UUID v6 generation. I think it makes more
sense to support UUID v6 generation as well, if the need for it is
high.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 16 Oct 2024, at 11:05, Michael Paquier <michael@paquier.xyz> wrote:
>
> This part of the patch looks structurally wrong to me because we've
> already spent some time refactoring the clock APIs into instr_time.h
> that deals about cross-platform requirements for monotonic times.
> Particularly, on MacOS, we have CLOCK_MONOTONIC_RAW, and your patch
> does not use it.  So you should avoid calling these routines, and
> build something using the interface unified across the board, like
> anywhere else.  And you know, duplication.

Thanks for looking!
Actually, CLOCK_MONOTONIC_RAW on MacOS was exactly a problem: this clocks have nothing to do with astronomic clock. And
wemust put real UTC time into UUID. 
I’d be happy to reuse instr_time.h infrastructure, but it just does not fit for the purpose. It’s optimized to measure
timespans. 


Best regards, Andrey Borodin.


Re: UUID v7

From
Masahiko Sawada
Date:
On Sat, Oct 26, 2024 at 10:05 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
> >
> > ---
> > - oid | proname | oid | proname
> > ------+---------+-----+---------
> > -(0 rows)
> > + oid  | proname | oid  | proname
> > +------+---------+------+---------
> > + 9896 | uuidv7  | 9897 | uuidv7
> > +(1 row)
> >
> > I think that we need to change these functions so that this check
> > query doesn't return anything, no?
>
> We have 4 options:
> 0. Remove uuidv7(interval). But it brings imporatne functionality to the table: we can avoid contention points while
massivelyinsert data. 
> 1. Give different names to uuidv7() and uuidv7(interval).
> 2. Allow importing pg_node_tree  (see v7 of the patch)
> 3. Change this query. Comment to this query suggest that it checks for exactly this case: same function is declared
withdifferent number of arguments. 
>
> IMO approach number 3 is best. However, I do not understand why this query check was introduced in the first place.
Maybe,there are string arguments why we should not do same-named functions with different number of arguments. 
>

I think we typically avoid this kind of check failure by assigning
uuidv7() and uuidv7(interval) different C functions that call the
common function. That is, we have pg_proc entries like:

{ oid => '9896', descr => 'generate UUID version 7',
  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
{ oid => '9897', descr => 'generate UUID version 7 with a timestamp
shifted on specific interval',
  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
  prorettype => 'uuid', proargtypes => 'interval', prosrc =>
'uuidv7_interval' },

And then have C codes like:

static Datum
generate_uuidv7(FunctionCallInfo fcinfo)
{
    static uint64 previous_ns = 0;:
:
    PG_RETURN_UUID_P(uuid);
}

Datum
uuidv7(PG_FUNCTION_ARGS)
{
    return generate_uuidv7(fcinfo);
}

Datum
uuidv7_interval(PG_FUNCTION_ARGS)
{
    return generate_uuidv7(fcinfo);
}

> >
> > ---
> > +   if (version == 6)
> > +   {
> > +       tms = ((uint64) uuid->data[0]) << 52;
> > +       tms += ((uint64) uuid->data[1]) << 44;
> > +       tms += ((uint64) uuid->data[2]) << 36;
> > +       tms += ((uint64) uuid->data[3]) << 28;
> > +       tms += ((uint64) uuid->data[4]) << 20;
> > +       tms += ((uint64) uuid->data[5]) << 12;
> > +       tms += (((uint64) uuid->data[6]) & 0xf) << 8;
> > +       tms += ((uint64) uuid->data[7]);
> > +
> > +       /* convert 100-ns intervals to us, then adjust */
> > +       ts = (TimestampTz) (tms / 10) -
> > +           ((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) *
> > SECS_PER_DAY * USECS_PER_SEC;
> > +
> > +       PG_RETURN_TIMESTAMPTZ(ts);
> > +   }
> >
> > It's odd to me that only uuid_extract_timestamp() supports UUID v6 in
> > spite of not supporting UUID v6 generation. I think it makes more
> > sense to support UUID v6 generation as well, if the need for it is
> > high.
>
> RFC urges to use UUIDv7 instead of UUIDv6 when possible. I'm fine with providing implementation, it's trivial. PFA
patchwith implementation. 
>

My point is that we should either support full functionality for
UUIDv6 (such as generation and extraction) or none of them. I'm not
really sure we want UUIDv6 as well, but if we want it, it should be
implemented in a separate patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: UUID v7

From
Stepan Neretin
Date:
> > I think we typically avoid this kind of check failure by assigning
> > uuidv7() and uuidv7(interval) different C functions that call the
> > common function. That is, we have pg_proc entries like:
> >
>
> Done.
>
> >>>
> >>> It's odd to me that only uuid_extract_timestamp() supports UUID v6 in
> >>> spite of not supporting UUID v6 generation. I think it makes more
> >>> sense to support UUID v6 generation as well, if the need for it is
> >>> high.
> >>
> >> RFC urges to use UUIDv7 instead of UUIDv6 when possible. I'm fine with
> providing implementation, it's trivial. PFA patch with implementation.
> >>
> >
> > My point is that we should either support full functionality for
> > UUIDv6 (such as generation and extraction) or none of them. I'm not
> > really sure we want UUIDv6 as well, but if we want it, it should be
> > implemented in a separate patch.
>
> Make sense. I've removed all traces of v6.
>


Hi there,

Firstly, I'd like to discuss the increased_clock_precision variable, which
currently divides the timestamp into milliseconds and nanoseconds. However,
this approach only approximates the extra bits for sub-millisecond
precision, leading to imprecise timestamps in high-frequency UUID
generation.

To address this issue, we could consider using a more accurate method for
calculating the timestamp. For instance, we could utilize a higher
resolution clock or implement a more precise algorithm to ensure accurate
timestamps.

Additionally, it would be beneficial to add validation checks for the
interval argument. These checks could verify that the input interval is
within reasonable bounds and that the calculated timestamp is accurate.
Examples of checks could include verifying if the interval is too small,
too large, or exceeds the maximum possible number of milliseconds and
nanoseconds in a timestamp.

By implementing these changes, we can improve the accuracy and reliability
of UUID generation, making it more suitable for high-frequency usage
scenarios.

What do you think about these suggestions? Let me know your thoughts!

Best Regards, Stepan Neretin!

Re: UUID v7

From
Masahiko Sawada
Date:
On Thu, Oct 31, 2024 at 11:46 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 31 Oct 2024, at 22:15, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Sat, Oct 26, 2024 at 10:05 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> >
> > I think we typically avoid this kind of check failure by assigning
> > uuidv7() and uuidv7(interval) different C functions that call the
> > common function. That is, we have pg_proc entries like:
> >
>
> Done.
>
> >>>
> >>> It's odd to me that only uuid_extract_timestamp() supports UUID v6 in
> >>> spite of not supporting UUID v6 generation. I think it makes more
> >>> sense to support UUID v6 generation as well, if the need for it is
> >>> high.
> >>
> >> RFC urges to use UUIDv7 instead of UUIDv6 when possible. I'm fine with providing implementation, it's trivial. PFA
patchwith implementation. 
> >>
> >
> > My point is that we should either support full functionality for
> > UUIDv6 (such as generation and extraction) or none of them. I'm not
> > really sure we want UUIDv6 as well, but if we want it, it should be
> > implemented in a separate patch.
>
> Make sense. I've removed all traces of v6.

Thank you for updating the patch.

I've been studying UUID v7 and have a question about the current (v29)
implementation. IIUC the current implementation uses
nanosecond-precision timestamps for both the first 48 bit space and 12
bits of pseudorandom data space (referred as 'rand_a' space in RFC
9562). IOW, all data except for random, version, and variant parts
consist of a timestamp. The nanosecond-precision timestamp is
generated using clock_gettime() with CLOCK_REALTIME on Linux, which
however could be affected by time adjustment by NTP. Therefore, if the
system clock moves backward due to NTP, we cannot guarantee
monotonicity and sortability. Is that right?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 1 Nov 2024, at 03:00, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Therefore, if the
> system clock moves backward due to NTP, we cannot guarantee
> monotonicity and sortability. Is that right?

Not exactly. Monotonicity is ensured for a given backend. We make sure that timestamp is advanced at least for ~250ns
forwardon each UUID generation. 60 bits of time are unique and ascending for a given backend. 

Thanks!


Best regards, Andrey Borodin.


Re: UUID v7

From
Masahiko Sawada
Date:
On Thu, Oct 31, 2024 at 9:53 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 1 Nov 2024, at 03:00, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Therefore, if the
> > system clock moves backward due to NTP, we cannot guarantee
> > monotonicity and sortability. Is that right?
>
> Not exactly. Monotonicity is ensured for a given backend. We make sure that timestamp is advanced at least for ~250ns
forwardon each UUID generation. 60 bits of time are unique and ascending for a given backend. 
>

Thank you for your explanation. I now understand this code guarantees
the monotonicity:

+/* minimum amount of ns that guarantees step of increased_clock_precision */
+#define SUB_MILLISECOND_STEP (1000000/4096 + 1)
+       ns = get_real_time_ns();
+       if (previous_ns + SUB_MILLISECOND_STEP >= ns)
+               ns = previous_ns + SUB_MILLISECOND_STEP;
+       previous_ns = ns;


I think that one of the most important parts in UUIDv7 implementation
is which method (1, 2, or 3 described in RFC 9562) we use to guarantee
the monotonicity. The current patch employs method 3 with the
assumption that 12 bits of sub-millisecond information is available on
most of the systems we support. However, as far as I tested, on MacOS,
values returned by  clock_gettime(CLOCK_REALTIME) are only microsecond
precision, meaning that we could waste some randomness. Has this point
been considered?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 31 Oct 2024, at 23:04, Stepan Neretin <sndcppg@gmail.com> wrote:
>
>
> Firstly, I'd like to discuss the increased_clock_precision variable, which
> currently divides the timestamp into milliseconds and nanoseconds. However,
> this approach only approximates the extra bits for sub-millisecond
> precision, leading to imprecise timestamps in high-frequency UUID
> generation.
No, timestamp is taken in nanoseconds, we keep precision of 1/4096 of ms. If you observe precision loss anywhere let me
know.

>
> To address this issue, we could consider using a more accurate method for
> calculating the timestamp. For instance, we could utilize a higher
> resolution clock or implement a more precise algorithm to ensure accurate
> timestamps.

That's what we do.

>
> Additionally, it would be beneficial to add validation checks for the
> interval argument. These checks could verify that the input interval is
> within reasonable bounds and that the calculated timestamp is accurate.
> Examples of checks could include verifying if the interval is too small,
> too large, or exceeds the maximum possible number of milliseconds and
> nanoseconds in a timestamp.

timestamptz_pl_interval() is already doing this.

> What do you think about these suggestions? Let me know your thoughts!

Thanks a lot for reviewing the patch!


> On 1 Nov 2024, at 10:33, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Oct 31, 2024 at 9:53 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>>
>>
>>
>>> On 1 Nov 2024, at 03:00, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>
>>> Therefore, if the
>>> system clock moves backward due to NTP, we cannot guarantee
>>> monotonicity and sortability. Is that right?
>>
>> Not exactly. Monotonicity is ensured for a given backend. We make sure that timestamp is advanced at least for
~250nsforward on each UUID generation. 60 bits of time are unique and ascending for a given backend. 
>>
>
> Thank you for your explanation. I now understand this code guarantees
> the monotonicity:
>
> +/* minimum amount of ns that guarantees step of increased_clock_precision */
> +#define SUB_MILLISECOND_STEP (1000000/4096 + 1)
> +       ns = get_real_time_ns();
> +       if (previous_ns + SUB_MILLISECOND_STEP >= ns)
> +               ns = previous_ns + SUB_MILLISECOND_STEP;
> +       previous_ns = ns;
>
>
> I think that one of the most important parts in UUIDv7 implementation
> is which method (1, 2, or 3 described in RFC 9562) we use to guarantee
> the monotonicity. The current patch employs method 3 with the
> assumption that 12 bits of sub-millisecond information is available on
> most of the systems we support. However, as far as I tested, on MacOS,
> values returned by  clock_gettime(CLOCK_REALTIME) are only microsecond
> precision, meaning that we could waste some randomness. Has this point
> been considered?
>

There was a thread "What is a typical precision of gettimeofday()?" [0]
There we found out that routines of instr_time.h are precise enough. On my machine (MacBook Air M3) I do not observe
significantdifferences between CLOCK_MONOTONIC_RAW and CLOCK_REALTIME in pg_test_timing results. 

CLOCK_MONOTONIC_RAW
x4mmm@x4mmm-osx bin % ./pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 15.30 ns
Histogram of timing durations:
  < us   % of total      count
     1     98.47856  193113929
     2      1.52039    2981452
     4      0.00025        485
     8      0.00062       1211
    16      0.00012        237
    32      0.00004         79
    64      0.00002         30
   128      0.00000          8
   256      0.00000          5
   512      0.00000          3
  1024      0.00000          1
  2048      0.00000          2

CLOCK_REALTIME
x4mmm@x4mmm-osx bin % ./pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 15.04 ns
Histogram of timing durations:
  < us   % of total      count
     1     98.49709  196477842
     2      1.50268    2997479
     4      0.00007        130
     8      0.00012        238
    16      0.00005         91
    32      0.00000          4
    64      0.00000          1

Thanks!


[0]
https://www.postgresql.org/message-id/flat/212C2E24-32CF-400E-982E-A446AB21E8CC%40yandex-team.ru#c89fa36829b2003147b6ce72170b5342





Re: UUID v7

From
Masahiko Sawada
Date:
On Fri, Nov 1, 2024 at 10:33 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 31 Oct 2024, at 23:04, Stepan Neretin <sndcppg@gmail.com> wrote:
> >
> >
> > Firstly, I'd like to discuss the increased_clock_precision variable, which
> > currently divides the timestamp into milliseconds and nanoseconds. However,
> > this approach only approximates the extra bits for sub-millisecond
> > precision, leading to imprecise timestamps in high-frequency UUID
> > generation.
> No, timestamp is taken in nanoseconds, we keep precision of 1/4096 of ms. If you observe precision loss anywhere let
meknow. 
>
> >
> > To address this issue, we could consider using a more accurate method for
> > calculating the timestamp. For instance, we could utilize a higher
> > resolution clock or implement a more precise algorithm to ensure accurate
> > timestamps.
>
> That's what we do.
>
> >
> > Additionally, it would be beneficial to add validation checks for the
> > interval argument. These checks could verify that the input interval is
> > within reasonable bounds and that the calculated timestamp is accurate.
> > Examples of checks could include verifying if the interval is too small,
> > too large, or exceeds the maximum possible number of milliseconds and
> > nanoseconds in a timestamp.
>
> timestamptz_pl_interval() is already doing this.
>
> > What do you think about these suggestions? Let me know your thoughts!
>
> Thanks a lot for reviewing the patch!
>
>
> > On 1 Nov 2024, at 10:33, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Oct 31, 2024 at 9:53 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> >>
> >>
> >>
> >>> On 1 Nov 2024, at 03:00, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >>>
> >>> Therefore, if the
> >>> system clock moves backward due to NTP, we cannot guarantee
> >>> monotonicity and sortability. Is that right?
> >>
> >> Not exactly. Monotonicity is ensured for a given backend. We make sure that timestamp is advanced at least for
~250nsforward on each UUID generation. 60 bits of time are unique and ascending for a given backend. 
> >>
> >
> > Thank you for your explanation. I now understand this code guarantees
> > the monotonicity:
> >
> > +/* minimum amount of ns that guarantees step of increased_clock_precision */
> > +#define SUB_MILLISECOND_STEP (1000000/4096 + 1)
> > +       ns = get_real_time_ns();
> > +       if (previous_ns + SUB_MILLISECOND_STEP >= ns)
> > +               ns = previous_ns + SUB_MILLISECOND_STEP;
> > +       previous_ns = ns;
> >
> >
> > I think that one of the most important parts in UUIDv7 implementation
> > is which method (1, 2, or 3 described in RFC 9562) we use to guarantee
> > the monotonicity. The current patch employs method 3 with the
> > assumption that 12 bits of sub-millisecond information is available on
> > most of the systems we support. However, as far as I tested, on MacOS,
> > values returned by  clock_gettime(CLOCK_REALTIME) are only microsecond
> > precision, meaning that we could waste some randomness. Has this point
> > been considered?
> >
>
> There was a thread "What is a typical precision of gettimeofday()?" [0]
> There we found out that routines of instr_time.h are precise enough. On my machine (MacBook Air M3) I do not observe
significantdifferences between CLOCK_MONOTONIC_RAW and CLOCK_REALTIME in pg_test_timing results. 
>
> CLOCK_MONOTONIC_RAW
> x4mmm@x4mmm-osx bin % ./pg_test_timing
> Testing timing overhead for 3 seconds.
> Per loop time including overhead: 15.30 ns
> Histogram of timing durations:
>   < us   % of total      count
>      1     98.47856  193113929
>      2      1.52039    2981452
>      4      0.00025        485
>      8      0.00062       1211
>     16      0.00012        237
>     32      0.00004         79
>     64      0.00002         30
>    128      0.00000          8
>    256      0.00000          5
>    512      0.00000          3
>   1024      0.00000          1
>   2048      0.00000          2
>
> CLOCK_REALTIME
> x4mmm@x4mmm-osx bin % ./pg_test_timing
> Testing timing overhead for 3 seconds.
> Per loop time including overhead: 15.04 ns
> Histogram of timing durations:
>   < us   % of total      count
>      1     98.49709  196477842
>      2      1.50268    2997479
>      4      0.00007        130
>      8      0.00012        238
>     16      0.00005         91
>     32      0.00000          4
>     64      0.00000          1

I applied the patch shared on that thread[1] to measure nanoseconds
and changed instr_time.h to use CLOCK_REALTIME even on macOS. Here is
the results on my machine (macOS 14.7, M1 Pro):

Testing timing overhead for 3 seconds.
Per loop time including overhead: 18.61 ns
Histogram of timing durations:
   <= ns   % of total  running %      count
       0      98.1433    98.1433  158212921
       1       0.0000    98.1433          0
       3       0.0000    98.1433          0
       7       0.0000    98.1433          0
      15       0.0000    98.1433          0
      31       0.0000    98.1433          0
      63       0.0000    98.1433          0
     127       0.0000    98.1433          0
     255       0.0000    98.1433          0
     511       0.0000    98.1433          0
    1023       1.8560    99.9994    2992054
    2047       0.0000    99.9994         51
    4095       0.0001    99.9995        110
    8191       0.0003    99.9998        463
   16383       0.0002   100.0000        313
   32767       0.0000   100.0000         49
   65535       0.0000   100.0000          4

Timing durations less than 128 ns:
      ns   % of total  running %      count
       0      98.1433    98.1433  158212921

Most of the timing durations were nanoseconds and fell into either 0
ns. Others fell into >1023 bins.

I've done a simple test as well on my Mac and saw that the time
returned by clock_gettime(CLOCK_REALTIME) doesn't have nanosecond
precision:

% cat test.c
#include <stdio.h>
#include <time.h>
#include <sys/time.h>

int
main(void)
{
        struct timespec real;
        struct timespec mono;
        struct timespec mono_raw;

        clock_gettime(CLOCK_REALTIME, &real);
       clock_gettime(CLOCK_MONOTONIC, &mono);
        clock_gettime(CLOCK_MONOTONIC_RAW, &mono_raw);

        printf("real:     %ld\t%ld\n", real.tv_sec, real.tv_nsec);
        printf("mono:     %ld\t%ld\n", mono.tv_sec, mono.tv_nsec);
        printf("mono_raw: %ld\t%ld\n", mono_raw.tv_sec, mono_raw.tv_nsec);

        return 0;
}
% gcc -o test test.c
% ./test
real:     1730495955    515018000
mono:     3212977       834578000
mono_raw: 3212982       962799958
% ./test
real:     1730495956    78927000
mono:     3212978       398488000
mono_raw: 3212983       526718333
% ./test
real:     1730495956    652751000
mono:     3212978       972312000
mono_raw: 3212984       100552333

Regards,

[1] https://www.postgresql.org/message-id/3110108.1719939353%40sss.pgh.pa.us

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 5 Nov 2024, at 23:56, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
> <v30-0001-Implement-UUID-v7.patch>

Some more thoughts on this patch version:

0. Comment mentioning nanoseconds, while we do not need to carry anything
/* Convert TimestampTz back and carry nanoseconds. */

1. There's unnecessary &3 in
uuid->data[7] = uuid->data[7] | ((uuid->data[8] >> 6) & 3);

2. Currently we store 0..999 microseconds in 10 bits, so values 1000..1023 are unused. We could use them for overflow.
Thatwould slightly increase non-overflowing capacity when generating more than million UUIDs per second on one backend.
However,given current performance of our CSPRNG I do not think this feature worth code complexity. 


Best regards, Andrey Borodin.


Re: UUID v7

From
Masahiko Sawada
Date:
On Wed, Nov 6, 2024 at 10:14 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 5 Nov 2024, at 23:56, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> >
> > <v30-0001-Implement-UUID-v7.patch>
>
> Some more thoughts on this patch version:
>
> 0. Comment mentioning nanoseconds, while we do not need to carry anything
> /* Convert TimestampTz back and carry nanoseconds. */
>
> 1. There's unnecessary &3 in
> uuid->data[7] = uuid->data[7] | ((uuid->data[8] >> 6) & 3);
>
> 2. Currently we store 0..999 microseconds in 10 bits, so values 1000..1023 are unused. We could use them for
overflow.That would slightly increase non-overflowing capacity when generating more than million UUIDs per second on
onebackend. However, given current performance of our CSPRNG I do not think this feature worth code complexity. 
>

While using only 10 bits microseconds makes the implementation simple,
I'm not sure if 10 bits is enough to generate UUIDs at microsecond
granularity without losing monotonicity. Since 10-bit microseconds are
used as is in rand_a space, 1000 UUIDs can be generated per
millisecond without losing monotonicity.

For example, in my environment, it took 1808 milliseconds to generate
1 million UUIDs. This is about 533 UUIDs generated per millisecond. As
UUID generation performance improves, I think 10 bits will not be
enough.

=# select count(uuidv7()) from generate_series(1, 1_000_000);
count
---------
1000000
(1 row)

Time: 1808.734 ms

I found a similar comment from Sergey Prokhorenko[1]. He also mentioned:

> 4) Microsecond timestamp fraction subtracts 10 bits from random data, which increases the risk of collision. In the
counter,almost all bits are initialized with a random number, which reduces the risk of collision. 

I feel that it's better to switch to Method 1 or 2 with 12 bits or
larger counter space.

Regards,

[1] https://www.postgresql.org/message-id/305478845.5279532.1712440778735%40mail.yahoo.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 7 Nov 2024, at 12:42, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Nov 6, 2024 at 10:14 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>>
>>
>>
>>> On 5 Nov 2024, at 23:56, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>>>
>>> <v30-0001-Implement-UUID-v7.patch>
>>
>> Some more thoughts on this patch version:
>>
>> 0. Comment mentioning nanoseconds, while we do not need to carry anything
>> /* Convert TimestampTz back and carry nanoseconds. */
>>
>> 1. There's unnecessary &3 in
>> uuid->data[7] = uuid->data[7] | ((uuid->data[8] >> 6) & 3);
>>
>> 2. Currently we store 0..999 microseconds in 10 bits, so values 1000..1023 are unused. We could use them for
overflow.That would slightly increase non-overflowing capacity when generating more than million UUIDs per second on
onebackend. However, given current performance of our CSPRNG I do not think this feature worth code complexity. 
>>
>
> While using only 10 bits microseconds makes the implementation simple,
> I'm not sure if 10 bits is enough to generate UUIDs at microsecond
> granularity without losing monotonicity. Since 10-bit microseconds are
> used as is in rand_a space, 1000 UUIDs can be generated per
> millisecond without losing monotonicity.

We won’t loose monotonicity on one backend. We will just accumulate time shift.
See
+    us = tv.tv_sec * SECS_PER_DAY * USECS_PER_SEC + tv.tv_usec;
+    if (previous_us >= us)
+        us = previous_us + 1;


>
> For example, in my environment, it took 1808 milliseconds to generate
> 1 million UUIDs. This is about 533 UUIDs generated per millisecond.
BTW we can furether improve this performance by buffering CSPRNG. See
v6-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patchin this thread. 


> As
> UUID generation performance improves, I think 10 bits will not be
> enough.
>
> =# select count(uuidv7()) from generate_series(1, 1_000_000);
> count
> ---------
> 1000000
> (1 row)
>
> Time: 1808.734 ms
>
> I found a similar comment from Sergey Prokhorenko[1]. He also mentioned:
>
>> 4) Microsecond timestamp fraction subtracts 10 bits from random data, which increases the risk of collision. In the
counter,almost all bits are initialized with a random number, which reduces the risk of collision. 
>
> I feel that it's better to switch to Method 1 or 2 with 12 bits or
> larger counter space.
>

12 bits does not differ much. We can have much longer counters. Before switching to Method 3 I used 18 bits counter.
Seeversion v24-0001-Implement-UUID-v7.patch 
This version is more resilent to generating a lot of UUIDs on one backend while still not accumulating time shift.
Yet, UUIDs generated on parallel workers will loose some sortability.

Personally, I think both methods are good. I’d even combine them both. But RFC seems to be not allowing this. BTW if we
justcontinue to use nanoseconds patch, zero bits will act exactly as counters. 


Best regards, Andrey Borodin.


Re: UUID v7

From
Sergey Prokhorenko
Date:
On Thursday 7 November 2024 at 11:34:31 am GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:


> 12 bits does not differ much. We can have much longer counters. Before switching to Method 3 I used 18 bits counter. See version v24-0001-Implement-UUID-v7.patch
> This version is more resilent to generating a lot of UUIDs on one backend while still not accumulating time shift.
> Yet, UUIDs generated on parallel workers will loose some sortability.

> Personally, I think both methods are good. I’d even combine them both. But RFC seems to be not allowing this. BTW if we just continue to use nanoseconds patch, zero bits will act exactly as counters.


> Best regards, Andrey Borodin.
------------------------------------------------------------------------


In fact, the RFC does allow combining methods 3 and 1:
5.7. UUID Version 7




"Alternatively, implementations MAY fill the 74 bits, jointly, with a combination of the following subfields, in this order from the most significant bits to the least, to guarantee additional monotonicity within a millisecond:

1. An OPTIONAL sub-millisecond timestamp fraction (12 bits at maximum) as per Section 6.2 (Method 3).
2. An OPTIONAL carefully seeded counter as per Section 6.2 (Method 1 or 2).
3. Random data for each new UUIDv7 generated for any remaining space."


This clearly refers to a "combination of the following subfields". COMBINATION!!!


However, with the current performance of computers, method 3 is quite sufficient without the addition of method 1.

Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au

Re: UUID v7

From
Masahiko Sawada
Date:
On Fri, Nov 8, 2024 at 1:25 PM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
>
> On Thursday 7 November 2024 at 11:34:31 am GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
> > 12 bits does not differ much. We can have much longer counters. Before switching to Method 3 I used 18 bits
counter.See version v24-0001-Implement-UUID-v7.patch 
> > This version is more resilent to generating a lot of UUIDs on one backend while still not accumulating time shift.
> > Yet, UUIDs generated on parallel workers will loose some sortability.
>
> > Personally, I think both methods are good. I’d even combine them both. But RFC seems to be not allowing this. BTW
ifwe just continue to use nanoseconds patch, zero bits will act exactly as counters. 
>
>
> > Best regards, Andrey Borodin.
> ------------------------------------------------------------------------
>
>
> In fact, the RFC does allow combining methods 3 and 1:
> https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-version-7
> 5.7. UUID Version 7
>
>
>
>
> "Alternatively, implementations MAY fill the 74 bits, jointly, with a combination of the following subfields, in this
orderfrom the most significant bits to the least, to guarantee additional monotonicity within a millisecond: 
>
> 1. An OPTIONAL sub-millisecond timestamp fraction (12 bits at maximum) as per Section 6.2 (Method 3).
> 2. An OPTIONAL carefully seeded counter as per Section 6.2 (Method 1 or 2).
> 3. Random data for each new UUIDv7 generated for any remaining space."
>
>
> This clearly refers to a "combination of the following subfields". COMBINATION!!!
>
>
> However, with the current performance of computers, method 3 is quite sufficient without the addition of method 1.

Do you think method 3 is sufficient even with microsecond precision
(i.e. storing only 10 bits microseconds in rand_a space)?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: UUID v7

From
Masahiko Sawada
Date:
On Thu, Nov 7, 2024 at 12:34 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 7 Nov 2024, at 12:42, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Nov 6, 2024 at 10:14 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> >>
> >>
> >>
> >>> On 5 Nov 2024, at 23:56, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> >>>
> >>> <v30-0001-Implement-UUID-v7.patch>
> >>
> >> Some more thoughts on this patch version:
> >>
> >> 0. Comment mentioning nanoseconds, while we do not need to carry anything
> >> /* Convert TimestampTz back and carry nanoseconds. */
> >>
> >> 1. There's unnecessary &3 in
> >> uuid->data[7] = uuid->data[7] | ((uuid->data[8] >> 6) & 3);
> >>
> >> 2. Currently we store 0..999 microseconds in 10 bits, so values 1000..1023 are unused. We could use them for
overflow.That would slightly increase non-overflowing capacity when generating more than million UUIDs per second on
onebackend. However, given current performance of our CSPRNG I do not think this feature worth code complexity. 
> >>
> >
> > While using only 10 bits microseconds makes the implementation simple,
> > I'm not sure if 10 bits is enough to generate UUIDs at microsecond
> > granularity without losing monotonicity. Since 10-bit microseconds are
> > used as is in rand_a space, 1000 UUIDs can be generated per
> > millisecond without losing monotonicity.
>
> We won’t loose monotonicity on one backend. We will just accumulate time shift.
> See
> +       us = tv.tv_sec * SECS_PER_DAY * USECS_PER_SEC + tv.tv_usec;
> +       if (previous_us >= us)
> +               us = previous_us + 1;

IIUC the microsecond part is working also as a counter in a sense. It
seems fine to me but I'm slightly concerned that there is no guidance
of such implementation in RFC 9562.

> BTW if we just continue to use nanoseconds patch, zero bits will act exactly as counters.

Yes, but we will lose some randomness on macOS as the nanosecond part
is 0 in most cases.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: UUID v7

From
Sergey Prokhorenko
Date:
On Saturday 9 November 2024 at 01:00:15 am GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

> the microsecond part is working also as a counter in a sense. IT seems fine to me but I'm slightly concerned that there is no guidance of such implementation in RFC 9562.

In fact, there is guidance of similar implementation in RFC 9562:
https://datatracker.ietf.org/doc/html/rfc9562#name-monotonicity-and-counters
"Counter Rollover Handling:"
"Alternatively, implementations MAY increment the timestamp ahead of the actual time and reinitialize the counter."



> Do you think method 3 is sufficient even with microsecond precision (i.e. storing only 10 bits microseconds in rand_a space)?

The maximum write performance in PostgreSQL is approximately 500 rows per millisecond, but under normal conditions 50 rows per millisecond. This corresponds to a precision of 2 microseconds and 20 microseconds respectively.

Andrey Borodin's implementation of method 3 provides a precision of approximately 0.25 microseconds.

You offer a precision of approximately 0.98 microseconds. This is about twice as good as what is needed for 500 rows per millisecond write performance. But in the near future, this may not be enough for the highest-performance systems.


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au

Re: UUID v7

From
Masahiko Sawada
Date:
On Sat, Nov 9, 2024 at 9:07 AM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
>
> On Saturday 9 November 2024 at 01:00:15 am GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > the microsecond part is working also as a counter in a sense. IT seems fine to me but I'm slightly concerned that
thereis no guidance of such implementation in RFC 9562. 
>
> In fact, there is guidance of similar implementation in RFC 9562:
> https://datatracker.ietf.org/doc/html/rfc9562#name-monotonicity-and-counters
> "Counter Rollover Handling:"
> "Alternatively, implementations MAY increment the timestamp ahead of the actual time and reinitialize the counter."
>

Indeed, thank you.

> But in the near future, this may not be enough for the highest-performance systems.

Yeah, I'm concerned about this. That time might gradually come. That
being said, as long as rand_a part works also as a counter, it's fine.
Also, 12 bits does not differ much as Andrey Borodin mentioned. I
think in the first version it's better to start with a simple
implementation rather than over-engineering it.

Regarding the implementation, the v30 patch uses only microseconds
precision time even on platforms where nanoseconds precision is
available such as Linux. I think it's better to store the value of
(sub-milliseconds * 4096) into 12-bits of rand_a space instead of
directly storing microseconds into 10 bits space. That way, we can use
nanoseconds precision timestamps where available. On some platforms
such as macOS, the sub-milliseconds precision timestamp is restricted
to microseconds, we can consider it as a kind of special case. If
12-bits of rand_a space is not enough to guarantee monotonically in
the future, it is also possible to improve it by putting a (random)
counter into rand_b.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: UUID v7

From
Masahiko Sawada
Date:
On Mon, Nov 11, 2024 at 12:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sat, Nov 9, 2024 at 9:07 AM Sergey Prokhorenko
> <sergeyprokhorenko@yahoo.com.au> wrote:
> >
> > On Saturday 9 November 2024 at 01:00:15 am GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > > the microsecond part is working also as a counter in a sense. IT seems fine to me but I'm slightly concerned that
thereis no guidance of such implementation in RFC 9562. 
> >
> > In fact, there is guidance of similar implementation in RFC 9562:
> > https://datatracker.ietf.org/doc/html/rfc9562#name-monotonicity-and-counters
> > "Counter Rollover Handling:"
> > "Alternatively, implementations MAY increment the timestamp ahead of the actual time and reinitialize the counter."
> >
>
> Indeed, thank you.
>
> > But in the near future, this may not be enough for the highest-performance systems.
>
> Yeah, I'm concerned about this. That time might gradually come. That
> being said, as long as rand_a part works also as a counter, it's fine.
> Also, 12 bits does not differ much as Andrey Borodin mentioned. I
> think in the first version it's better to start with a simple
> implementation rather than over-engineering it.
>
> Regarding the implementation, the v30 patch uses only microseconds
> precision time even on platforms where nanoseconds precision is
> available such as Linux. I think it's better to store the value of
> (sub-milliseconds * 4096) into 12-bits of rand_a space instead of
> directly storing microseconds into 10 bits space.

IIUC v29 patch implements UUIDv7 generation in this way. So I've
reviewed v29 patch and here are some review comments:

---
     * Set magic numbers for a "version 4" (pseudorandom) UUID, see
-    * http://tools.ietf.org/html/rfc4122#section-4.4
+    * http://tools.ietf.org/html/rfc9562#section-4.4
     */

The new RFC doesn't have section 4.4.

---
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant 0b10 bits.

I'm concerned that "version and variant 0b10 bits" is not very clear
to readers. I think we can just mention "... except version and
variant bits".

---
+
+#ifndef WIN32
+#include <time.h>
+
+static uint64 get_real_time_ns()
+{
+       struct timespec tmp;
+
+       clock_gettime(CLOCK_REALTIME, &tmp);
+       return tmp.tv_sec * 1000000000L + tmp.tv_nsec;
+}
+#else /* WIN32 */
+
+#include "c.h"
+#include <sysinfoapi.h>
+#include <sys/time.h>
+
+/* FILETIME of Jan 1 1970 00:00:00, the PostgreSQL epoch */
+static const unsigned __int64 epoch = UINT64CONST(116444736000000000);
+
+/*
+ * FILETIME represents the number of 100-nanosecond intervals since
+ * January 1, 1601 (UTC).
+ */
+#define FILETIME_UNITS_TO_NS UINT64CONST(100)
+
+
+/*
+ * timezone information is stored outside the kernel so tzp isn't used anymore.
+ *
+ * Note: this function is not for Win32 high precision timing purposes. See
+ * elapsed_time().
+ */
+static uint64
+get_real_time_ns()
+{
+       FILETIME        file_time;
+       ULARGE_INTEGER ularge;
+
+       GetSystemTimePreciseAsFileTime(&file_time);
+       ularge.LowPart = file_time.dwLowDateTime;
+       ularge.HighPart = file_time.dwHighDateTime;
+
+       return (ularge.QuadPart - epoch) * FILETIME_UNITS_TO_NS;
+}
+#endif

I think that it's better to implement these functions in instr_time.h
or another file.

---
+/* minimum amount of ns that guarantees step of increased_clock_precision */
+#define SUB_MILLISECOND_STEP (1000000/4096 + 1)

I think we can rewrite it to:

#define NS_PER_MS INT64CONST(1000000)
#define SUB_MILLISECOND_STEP ((NS_PER_MS / (1 << 12)) + 1)

Which improves the readability.

Also, I think "#define NS_PER_US INT64CONST(1000)" can also be used in
many places.

---
+       /* set version field, top four bits are 0, 1, 1, 1 */
+       uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+       /* set variant field, top two bits are 1, 0 */
+       uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;

I think we can make an inline function to set both variant and version
so we can use it for generating UUIDv4 and UUIDv7.

--
+               tms = uuid->data[5];
+               tms += ((uint64) uuid->data[4]) << 8;
+               tms += ((uint64) uuid->data[3]) << 16;
+               tms += ((uint64) uuid->data[2]) << 24;
+               tms += ((uint64) uuid->data[1]) << 32;
+               tms += ((uint64) uuid->data[0]) << 40;

How about rewriting these to the following for consistency with UUIDv1 codes?

        tms = uuid->data[5]
        + ((uint64) uuid->data[4] << 8)
        + ((uint64) uuid->data[3] << 16)
        + ((uint64) uuid->data[2] << 24)
        + ((uint64) uuid->data[1] << 32)
        + ((uint64) uuid->data[0] << 40);

---
Thinking about the function structures more, I think we can refactor
generate_uuidv7(), uuidv7() and uuidv7_interval():

- create a function, get_clock_timestamp_ns(), that provides a
nanosecond-precision timestamp
    - the returned timestamp is guaranteed to be greater than the
previous returned value.
    - this function can be inlined.
- create a function, generate_uuidv7(), that takes a
nanosecond-precision timestamp as a function argument, and generate
UUIDv7 based on it.
- this function can be inlined too.
- uuidv7() gets the timestamp from get_clock_timestamp_ns() and passes
it to generate_uuidv7().
- uuidv7() gets the timestamp from get_clock_timestamp_ns(), adjusts
it based on the given interval, and passes it to generate_uuidv7().

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: UUID v7

From
Masahiko Sawada
Date:
On Sun, Nov 17, 2024 at 10:39 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 17 Nov 2024, at 00:06, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> >
> > v31
>
> There was a problem with MingWG build. I've considered all options and decided to include all necessary stuff into
instr_time.h.So much fuss for these 2 bits about nanoseconds :) 
>

I realized that what we do in get_real_time_ns() on Windows is
essentially the same as what we do in gettimeofday(). Probably we can
just do either clock_gettime() with CLOCK_REALTIME on unix-like
systems and gettimeofday() on Windows, and then don't change anything
in instr_time.h? We need to explain why we don't use gettimeofday() on
unix-like systems in get_real_time_ns() function.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 20 Nov 2024, at 00:06, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Nov 19, 2024 at 9:45 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>>
>>
>>
>>> On 19 Nov 2024, at 14:31, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>>>
>>> Done.
>>
>> Here's v33 intact + one more patch to add 2 bits of entropy on MacOS (to compensate lack of nanoseconds).
>> What do you think?
>>
>
> Thank you for updating the patch!
>
> I've reviewed the v33 patch and made some changes mostly for cosmetic
> things. Please review it to see if we accept these changes.

Your changes look good to me. I particularly like sortability test.
I see that you removed implementation of clock_gettime() for Windows. Well, this makes sense.


>
> I have one question about the additional patch:
>
> +#if defined(__darwin__)
> + /*
> + * On MacOS real time is truncted to microseconds. Thus, 2 least
> + * significant bits of increased_clock_precision are neither random
> + * (CSPRNG), nor time-dependent (in a sense - truly random). These 2 bits
> + * are dependent on other time-specific bits, thus they do not contribute
> + * to uniqueness. To make these bit random we mix in two bits from CSPRNG.
> + */
> + uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
> +#endif
>
> I thought that the whole 12 bits in "rand_a" is actually
> time-dependent since we store 1/4096 fraction of sub-milliseconds. Am
> I missing something?

We have 12 bits in increaesd_clock_precission but only 1000 possible values of these bits. 2 least significant bits are
definedby other 10 bits. 
These bits are not equal to 0, they are changing.
True, these bits are time-dependent in a sense that these bits are be computed from a full timestamp. I wanted to
expressthe fact that timestamp cannot be altered in a way so only these 2 bits are changed. 


Best regards, Andrey Borodin.


Re: UUID v7

From
Masahiko Sawada
Date:
On Tue, Nov 19, 2024 at 7:54 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 20 Nov 2024, at 00:06, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Nov 19, 2024 at 9:45 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> >>
> >>
> >>
> >>> On 19 Nov 2024, at 14:31, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> >>>
> >>> Done.
> >>
> >> Here's v33 intact + one more patch to add 2 bits of entropy on MacOS (to compensate lack of nanoseconds).
> >> What do you think?
> >>
> >
> > Thank you for updating the patch!
> >
> > I've reviewed the v33 patch and made some changes mostly for cosmetic
> > things. Please review it to see if we accept these changes.
>
> Your changes look good to me. I particularly like sortability test.
> I see that you removed implementation of clock_gettime() for Windows. Well, this makes sense.
>
>
> >
> > I have one question about the additional patch:
> >
> > +#if defined(__darwin__)
> > + /*
> > + * On MacOS real time is truncted to microseconds. Thus, 2 least
> > + * significant bits of increased_clock_precision are neither random
> > + * (CSPRNG), nor time-dependent (in a sense - truly random). These 2 bits
> > + * are dependent on other time-specific bits, thus they do not contribute
> > + * to uniqueness. To make these bit random we mix in two bits from CSPRNG.
> > + */
> > + uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
> > +#endif
> >
> > I thought that the whole 12 bits in "rand_a" is actually
> > time-dependent since we store 1/4096 fraction of sub-milliseconds. Am
> > I missing something?
>
> We have 12 bits in increaesd_clock_precission but only 1000 possible values of these bits. 2 least significant bits
aredefined by other 10 bits. 
> These bits are not equal to 0, they are changing.
> True, these bits are time-dependent in a sense that these bits are be computed from a full timestamp. I wanted to
expressthe fact that timestamp cannot be altered in a way so only these 2 bits are changed. 

Understood the idea. But does replacing the least significant 2 bits
with random 2 bits really not affect monotonicity? The ensured minimal
timestamp step is 245, ((NS_PER_MS / (1 << 12)) + 1), meaning that if
two UUIDs are generated within a microsecond on macOS, the two
timestamps differ by 245 ns. After calculating the increased clock
precision with these two timestamps, they differ only by 1, which
seems to be problematic to me.

Suppose the two timestamps are:

ns1: 1732142033754429000 (Nov 20, 2024 10:33:53.754429000)
ns2: 1732142033754429245 (Nov 20, 2024 10:33:53.754429245)

Their sub-milliseconds are calculated (by multiplying by 4096) to:

subms1: 1757 (0b011011011101)
subms2: 1758 (0b011011011110)

If we replace the least significant bits '01' of subms1 with random
bits '11' and replace '10' of subms2 with '00', we cannot guarantee
the monotonicity.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: UUID v7

From
Masahiko Sawada
Date:
On Thu, Nov 21, 2024 at 1:22 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 21 Nov 2024, at 02:24, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> >  But does replacing the least significant 2 bits
> > with random 2 bits really not affect monotonicity?
>
> You are right. We have to take into account this when calculating monotonicity. PFA another version.
>

While it works fine, I think we need a comment for this change:

 -#define SUB_MILLISECOND_STEP ((NS_PER_MS / (1 << 12)) + 1)
 +#if defined(__darwin__) || _MSC_VER
 +#define SUB_MILLISECOND_BITS 10
 +#else
 +#define SUB_MILLISECOND_BITS 12
 +#endif
 +#define SUB_MILLISECOND_STEP ((NS_PER_MS / (1 << SUB_MILLISECOND_BITS)) + 1)

because the reader might think we should use SUB_MILLISECOND_BITS
here too at a glance:

+       /* sub-millisecond timestamp fraction (12 bits) */
+       increased_clock_precision = ((ns % NS_PER_MS) * (1 << 12)) / NS_PER_MS;

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: UUID v7

From
wenhui qiu
Date:
HI Andrey M. Borodin
     It's not just mariadb, percona also implements the uuid plugin.


Thanks 

Andrey M. Borodin <x4mmm@yandex-team.ru> 于2024年11月23日周六 16:21写道:


> On 23 Nov 2024, at 10:58, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> I've attached an updated patch that squashed changes I made for v33.
> We're still discussing increasing entropy on Windows and macOS, but
> the patch seems to be in good shape.

+1, thanks!

PFA version with improved comment.

Sergey Prokhorenko just draw my attention to the new release of MariaDB [0]. They are doing very similar UUID v7 generation as we do [1].


Best regards, Andrey Borodin.


[0] https://mariadb.com/resources/blog/announcing-mariadb-community-server-11-7-rc-with-vector-search-and-11-6-ga/
[1] https://github.com/mariadb/server/blob/main/plugin/type_uuid/sql_type_uuid_v7.h#L32


Re: UUID v7

From
Masahiko Sawada
Date:
On Sat, Nov 23, 2024 at 12:20 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 23 Nov 2024, at 10:58, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I've attached an updated patch that squashed changes I made for v33.
> > We're still discussing increasing entropy on Windows and macOS, but
> > the patch seems to be in good shape.
>
> +1, thanks!
>
> PFA version with improved comment.

Thank you for updating the patch!

In the following code, we use "defined(__darwin__) || defined(_MSC_VER)":

+#if defined(__darwin__) || defined(_MSC_VER)
+#define SUBMS_MINIMAL_STEP_BITS 10
+#else
+#define SUBMS_MINIMAL_STEP_BITS 12
+#endif
 #define SUBMS_BITS     12
-#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 << SUBMS_BITS)) + 1)
+#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 <<
SUBMS_MINIMAL_STEP_BITS)) + 1)

on the other hand, we use "defined(__darwin__) || defined(WIN32)" here:

+#if defined(__darwin__) || defined(WIN32)
+       /*
+        * On MacOS real time is truncted to microseconds. Thus, 2 least
+        * significant are dependent on other time-specific bits, thus
they do not
+        * contribute to uniqueness. To make these bit random we mix in two bits
+        * from CSPRNG.
+        *
+        * SUBMS_MINIMAL_STEP is chosen so that we still guarantee monotonicity
+        * despite altering these bits.
+        */
+       uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
+#endif

Is there a reason for using different macros?

In get_real_time_ns_ascending(), we use _MSC_VER so we use
clock_gettime() on MinGW.

>
> Sergey Prokhorenko just draw my attention to the new release of MariaDB [0]. They are doing very similar UUID v7
generationas we do [1]. 
>

Thank you for the references. It made me think that we can use the
function name uuid_v7() rather than uuidv7().

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 25 Nov 2024, at 22:53, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> In the following code, we use "defined(__darwin__) || defined(_MSC_VER)":
>
> +#if defined(__darwin__) || defined(_MSC_VER)
> +#define SUBMS_MINIMAL_STEP_BITS 10
> +#else
> +#define SUBMS_MINIMAL_STEP_BITS 12
> +#endif
> #define SUBMS_BITS     12
> -#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 << SUBMS_BITS)) + 1)
> +#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 <<
> SUBMS_MINIMAL_STEP_BITS)) + 1)
>
> on the other hand, we use "defined(__darwin__) || defined(WIN32)" here:
>
> +#if defined(__darwin__) || defined(WIN32)
> +       /*
> +        * On MacOS real time is truncted to microseconds. Thus, 2 least
> +        * significant are dependent on other time-specific bits, thus
> they do not
> +        * contribute to uniqueness. To make these bit random we mix in two bits
> +        * from CSPRNG.
> +        *
> +        * SUBMS_MINIMAL_STEP is chosen so that we still guarantee monotonicity
> +        * despite altering these bits.
> +        */
> +       uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
> +#endif
>
> Is there a reason for using different macros?

No, that's an oversight. We should mix these 2 bits if an only if SUBMS_MINIMAL_STEP_BITS=10.

<tldr>
In your review change_v33.patch you used WIN32, but it did not actually compile on Windows.
So on Saturday I squashed v33+change_v33.patch, and composed a message that I think we still should switch to _MSC_VER.
Andjust before sending I received your message with v36 where you used _MSC_VER :) 

I think this way:
_MSC_VER - native Windows without clock_gettime, we used gettimeofday() and 10 bits of sub-ms.
MinGW - we use clock_gettime() and 12 bits.
Darwin - we use clock_gettime() and 10 bits.
Anything else -  clock_gettime() and 12 bits.
</tldr>

>
> In get_real_time_ns_ascending(), we use _MSC_VER so we use
> clock_gettime() on MinGW.
>
>>
>> Sergey Prokhorenko just draw my attention to the new release of MariaDB [0]. They are doing very similar UUID v7
generationas we do [1]. 
>>
>
> Thank you for the references. It made me think that we can use the
> function name uuid_v7() rather than uuidv7().

I think it's a good idea if we will be kind of SQL-compatible.


Best regards, Andrey Borodin.


Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 26 Nov 2024, at 01:11, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> I've merged patches and renamed functions (also updated the commit
> message). Please find the attachment.

This comment
* On MacOS real time is truncated to microseconds.
should also note that on Windows we use ported version of gettimeofday(). Interface of this functions limits us with
only10 bits just like MacOS. 

Besides this patch looks good to me. Thanks!


Best regards, Andrey Borodin.


Re: UUID v7

From
Japin Li
Date:
On Mon, 25 Nov 2024 at 12:11, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Mon, Nov 25, 2024 at 10:15 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>>
>>
>>
>> > On 25 Nov 2024, at 22:53, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> >
>> > In the following code, we use "defined(__darwin__) || defined(_MSC_VER)":
>> >
>> > +#if defined(__darwin__) || defined(_MSC_VER)
>> > +#define SUBMS_MINIMAL_STEP_BITS 10
>> > +#else
>> > +#define SUBMS_MINIMAL_STEP_BITS 12
>> > +#endif
>> > #define SUBMS_BITS     12
>> > -#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 << SUBMS_BITS)) + 1)
>> > +#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 <<
>> > SUBMS_MINIMAL_STEP_BITS)) + 1)
>> >
>> > on the other hand, we use "defined(__darwin__) || defined(WIN32)" here:
>> >
>> > +#if defined(__darwin__) || defined(WIN32)
>> > +       /*
>> > +        * On MacOS real time is truncted to microseconds. Thus, 2 least
>> > +        * significant are dependent on other time-specific bits, thus
>> > they do not
>> > +        * contribute to uniqueness. To make these bit random we mix in two bits
>> > +        * from CSPRNG.
>> > +        *
>> > +        * SUBMS_MINIMAL_STEP is chosen so that we still guarantee monotonicity
>> > +        * despite altering these bits.
>> > +        */
>> > +       uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
>> > +#endif
>> >
>> > Is there a reason for using different macros?
>>
>> No, that's an oversight. We should mix these 2 bits if an only if SUBMS_MINIMAL_STEP_BITS=10.
>>
>> <tldr>
>> In your review change_v33.patch you used WIN32, but it did not actually compile on Windows.
>> So on Saturday I squashed v33+change_v33.patch, and composed a
>> message that I think we still should switch to _MSC_VER. And just
>> before sending I received your message with v36 where you used
>> _MSC_VER :)
>>
>> I think this way:
>> _MSC_VER - native Windows without clock_gettime, we used gettimeofday() and 10 bits of sub-ms.
>> MinGW - we use clock_gettime() and 12 bits.
>> Darwin - we use clock_gettime() and 10 bits.
>> Anything else -  clock_gettime() and 12 bits.
>> </tldr>
>
> Thank you for the summary.
>
> On MinGW, IIUC we can get 100-ns precision timestamps[1], so using 12
> bits for calculating the minimal step would make sense.
>
> Also, if we implement the Windows port of clock_gettime() in the
> future, we can remove the part of using gettimeofday() in
> get_real_time_ns_ascending(). It seems to me that it's
> over-engineering to implement that part only for the UUID v7. So the
> current implementation of get_real_time_ns_ascending() makes sense to
> me.
>
>>
>> >
>> > In get_real_time_ns_ascending(), we use _MSC_VER so we use
>> > clock_gettime() on MinGW.
>> >
>> >>
>> >> Sergey Prokhorenko just draw my attention to the new release of MariaDB [0]. They are doing very similar UUID v7
generationas we do [1]. 
>> >>
>> >
>> > Thank you for the references. It made me think that we can use the
>> > function name uuid_v7() rather than uuidv7().
>>
>> I think it's a good idea if we will be kind of SQL-compatible.
>>
>
> Okay, let"s rename it.
>
> I've merged patches and renamed functions (also updated the commit
> message). Please find the attachment.
>

It seems a typo about uuid_v{4,7}.

+<function>uuidv_4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv_7</function> () <returnvalue>uuid</returnvalue>

--
Regrads,
Japin Li



Re: UUID v7

From
Sergey Prokhorenko
Date:
Changing the name uuidv7() to uuid_v7() is a bad idea because the RFC 9562 uses the term UUIDv7, and therefore code containing uuid_v7() will not be found by searching the web in most cases.

It makes much more sense to rename it to get_uuidv7(), so that a query for "uuidv7" does not return a bunch of other unnecessary functions related to UUIDv7.


Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au


On Monday 25 November 2024 at 11:12:35 pm GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:


On Mon, Nov 25, 2024 at 10:15 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 25 Nov 2024, at 22:53, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > In the following code, we use "defined(__darwin__) || defined(_MSC_VER)":
> >
> > +#if defined(__darwin__) || defined(_MSC_VER)
> > +#define SUBMS_MINIMAL_STEP_BITS 10
> > +#else
> > +#define SUBMS_MINIMAL_STEP_BITS 12
> > +#endif
> > #define SUBMS_BITS    12
> > -#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 << SUBMS_BITS)) + 1)
> > +#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 <<
> > SUBMS_MINIMAL_STEP_BITS)) + 1)
> >
> > on the other hand, we use "defined(__darwin__) || defined(WIN32)" here:
> >
> > +#if defined(__darwin__) || defined(WIN32)
> > +      /*
> > +        * On MacOS real time is truncted to microseconds. Thus, 2 least
> > +        * significant are dependent on other time-specific bits, thus
> > they do not
> > +        * contribute to uniqueness. To make these bit random we mix in two bits
> > +        * from CSPRNG.
> > +        *
> > +        * SUBMS_MINIMAL_STEP is chosen so that we still guarantee monotonicity
> > +        * despite altering these bits.
> > +        */
> > +      uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
> > +#endif
> >
> > Is there a reason for using different macros?
>
> No, that's an oversight. We should mix these 2 bits if an only if SUBMS_MINIMAL_STEP_BITS=10.
>
> <tldr>
> In your review change_v33.patch you used WIN32, but it did not actually compile on Windows.
> So on Saturday I squashed v33+change_v33.patch, and composed a message that I think we still should switch to _MSC_VER. And just before sending I received your message with v36 where you used _MSC_VER :)
>
> I think this way:
> _MSC_VER - native Windows without clock_gettime, we used gettimeofday() and 10 bits of sub-ms.
> MinGW - we use clock_gettime() and 12 bits.
> Darwin - we use clock_gettime() and 10 bits.
> Anything else -  clock_gettime() and 12 bits.
> </tldr>

Thank you for the summary.

On MinGW, IIUC we can get 100-ns precision timestamps[1], so using 12
bits for calculating the minimal step would make sense.

Also, if we implement the Windows port of clock_gettime() in the
future, we can remove the part of using gettimeofday() in
get_real_time_ns_ascending(). It seems to me that it's
over-engineering to implement that part only for the UUID v7. So the
current implementation of get_real_time_ns_ascending() makes sense to
me.

>
> >
> > In get_real_time_ns_ascending(), we use _MSC_VER so we use
> > clock_gettime() on MinGW.
> >
> >>
> >> Sergey Prokhorenko just draw my attention to the new release of MariaDB [0]. They are doing very similar UUID v7 generation as we do [1].
> >>
> >
> > Thank you for the references. It made me think that we can use the
> > function name uuid_v7() rather than uuidv7().
>
> I think it's a good idea if we will be kind of SQL-compatible.
>

Okay, let"s rename it.

I've merged patches and renamed functions (also updated the commit
message). Please find the attachment.

Regards,

[1]  https://github.com/Alexpux/mingw-w64/blob/d0d7f784833bbb0b2d279310ddc6afb52fe47a46/mingw-w64-libraries/winpthreads/src/clock.c#L119


--
Masahiko Sawada

Amazon Web Services: https://aws.amazon.com

Re: UUID v7

From
Masahiko Sawada
Date:
On Tue, Nov 26, 2024 at 11:11 AM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
>
> Changing the name uuidv7() to uuid_v7() is a bad idea because the RFC 9562 uses the term UUIDv7, and therefore code
containinguuid_v7() will not be found by searching the web in most cases. 
>
> It makes much more sense to rename it to get_uuidv7(), so that a query for "uuidv7" does not return a bunch of other
unnecessaryfunctions related to UUIDv7. 

Thank you for pointing it out. How about gen_uuidv7() and gen_uuidv4()
as we already have gen_random_uuid()?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: UUID v7

From
Przemysław Sztoch
Date:

A lot of people use https://www.postgresql.org/docs/current/uuid-ossp.html.

And `uuid_generate_v7()` will be the continuation...

From my point of view, absorbing uuid_generate_v5 into mainline would be a great move too.

On 26.11.2024 20:30, Masahiko Sawada wrote:
On Tue, Nov 26, 2024 at 11:11 AM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
Changing the name uuidv7() to uuid_v7() is a bad idea because the RFC 9562 uses the term UUIDv7, and therefore code containing uuid_v7() will not be found by searching the web in most cases.

It makes much more sense to rename it to get_uuidv7(), so that a query for "uuidv7" does not return a bunch of other unnecessary functions related to UUIDv7.
Thank you for pointing it out. How about gen_uuidv7() and gen_uuidv4()
as we already have gen_random_uuid()?

Regards,

--
Przemysław Sztoch | Mobile +48 509 99 00 66


Re: UUID v7

From
Sergey Prokhorenko
Date:
gen_uuidv7() is OK

uuid-ossp is outdated, slow and not supported by the author. UUIDv7 is the renaissance of UUIDs. So we should not depend on legacy technology names


Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au


On Tuesday 26 November 2024 at 10:35:20 pm GMT+3, Przemysław Sztoch <przemyslaw@sztoch.pl> wrote:


A lot of people use https://www.postgresql.org/docs/current/uuid-ossp.html.

And `uuid_generate_v7()` will be the continuation...

From my point of view, absorbing uuid_generate_v5 into mainline would be a great move too.

On 26.11.2024 20:30, Masahiko Sawada wrote:
On Tue, Nov 26, 2024 at 11:11 AM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
Changing the name uuidv7() to uuid_v7() is a bad idea because the RFC 9562 uses the term UUIDv7, and therefore code containing uuid_v7() will not be found by searching the web in most cases.

It makes much more sense to rename it to get_uuidv7(), so that a query for "uuidv7" does not return a bunch of other unnecessary functions related to UUIDv7.
Thank you for pointing it out. How about gen_uuidv7() and gen_uuidv4()
as we already have gen_random_uuid()?

Regards,

--
Przemysław Sztoch | Mobile +48 509 99 00 66



Re: UUID v7

From
Jelte Fennema-Nio
Date:
On Tue, 26 Nov 2024 at 21:48, Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
> gen_uuidv7() is OK

I'd very much prefer to not have a gen_ or get_ prefix as argued before[1][2].

My vote is still for simply uuidv7() and uuidv4()

> uuid-ossp is outdated, slow and not supported by the author. UUIDv7 is the renaissance of UUIDs. So we should not
dependon legacy technology names
 

agreed

[1]: https://www.postgresql.org/message-id/CAGECzQQ%3D38bVUR%3DLZ6vmBCEjaDfOOoQa%2BygFJ1mCG_H2jsC90Q%40mail.gmail.com
[2]:
https://www.postgresql.org/message-id/flat/CAGECzQS%3DEjfLxdX89N95tHFGXS4m1aj2V_%2BxrJppBohgaKQhtQ%40mail.gmail.com#a3f72c5e74537046f58049f58f4fffe4



Re: UUID v7

From
Andrey Borodin
Date:

> On 27 Nov 2024, at 04:11, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Nov 26, 2024 at 1:55 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>>
>> On Tue, 26 Nov 2024 at 21:48, Sergey Prokhorenko
>> <sergeyprokhorenko@yahoo.com.au> wrote:
>>> gen_uuidv7() is OK
>>
>> I'd very much prefer to not have a gen_ or get_ prefix as argued before[1][2].
>>
>> My vote is still for simply uuidv7() and uuidv4()
>>
>>> uuid-ossp is outdated, slow and not supported by the author. UUIDv7 is the renaissance of UUIDs. So we should not
dependon legacy technology names 
>>
>> agreed
>>
>
> It seems that we agreed to use 'uuidv7' instead of 'uuid_v7()'. There
> is discussion whether we should add 'gen_' or 'get_' but let's go back
> to the previously-agreed function name 'uuidv7()' for now. We can
> rename it later if we find a better name.

I think uuidv7() is kind of consensual.

> I've attached the new version patch that incorporated all comments and
> renamed the functions. Also I avoided using 'if defined(__darwin__) ||
> defined(_MSC_VER)' twice.

Good, I think now it's a bit easier to understand those 2 bits.

Thanks!


Best regards, Andrey Borodin.


Re: UUID v7

From
Masahiko Sawada
Date:
On Tue, Nov 26, 2024 at 7:11 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 27 Nov 2024, at 04:11, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Nov 26, 2024 at 1:55 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
> >>
> >> On Tue, 26 Nov 2024 at 21:48, Sergey Prokhorenko
> >> <sergeyprokhorenko@yahoo.com.au> wrote:
> >>> gen_uuidv7() is OK
> >>
> >> I'd very much prefer to not have a gen_ or get_ prefix as argued before[1][2].
> >>
> >> My vote is still for simply uuidv7() and uuidv4()
> >>
> >>> uuid-ossp is outdated, slow and not supported by the author. UUIDv7 is the renaissance of UUIDs. So we should not
dependon legacy technology names 
> >>
> >> agreed
> >>
> >
> > It seems that we agreed to use 'uuidv7' instead of 'uuid_v7()'. There
> > is discussion whether we should add 'gen_' or 'get_' but let's go back
> > to the previously-agreed function name 'uuidv7()' for now. We can
> > rename it later if we find a better name.
>
> I think uuidv7() is kind of consensual.
>
> > I've attached the new version patch that incorporated all comments and
> > renamed the functions. Also I avoided using 'if defined(__darwin__) ||
> > defined(_MSC_VER)' twice.
>
> Good, I think now it's a bit easier to understand those 2 bits.
>

Thanks.

I'm going to push the v39 patch (after self review again), barring any
objections and further comments.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: UUID v7

From
Sergey Prokhorenko
Date:
It would be useful to add a standard comparative benchmark with several parameters and use cases to the patch, so that IT departments can compare UUIDv7, ULID, UUIDv4, Snowflake ID and BIGSERIAL for their hardware and conditions.

I know for a fact that IT departments make such benchmarks of low quality. They usually measure the generation rate, which is meaningless because it is usually excessive. It makes sense to measure the rate of single-threaded and multi-threaded insertion of a large number of records (with and without partitioning), as well as the rate of execution of queries to join big tables, to update or delete a large number of records. It is important to measure memory usage, processor load, etc.


Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au


On Wednesday 27 November 2024 at 09:24:40 pm GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:


On Tue, Nov 26, 2024 at 7:11 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 27 Nov 2024, at 04:11, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Nov 26, 2024 at 1:55 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
> >>
> >> On Tue, 26 Nov 2024 at 21:48, Sergey Prokhorenko
> >> <sergeyprokhorenko@yahoo.com.au> wrote:
> >>> gen_uuidv7() is OK
> >>
> >> I'd very much prefer to not have a gen_ or get_ prefix as argued before[1][2].
> >>
> >> My vote is still for simply uuidv7() and uuidv4()
> >>
> >>> uuid-ossp is outdated, slow and not supported by the author. UUIDv7 is the renaissance of UUIDs. So we should not depend on legacy technology names
> >>
> >> agreed
> >>
> >
> > It seems that we agreed to use 'uuidv7' instead of 'uuid_v7()'. There
> > is discussion whether we should add 'gen_' or 'get_' but let's go back
> > to the previously-agreed function name 'uuidv7()' for now. We can
> > rename it later if we find a better name.
>
> I think uuidv7() is kind of consensual.
>
> > I've attached the new version patch that incorporated all comments and
> > renamed the functions. Also I avoided using 'if defined(__darwin__) ||
> > defined(_MSC_VER)' twice.
>
> Good, I think now it's a bit easier to understand those 2 bits.
>

Thanks.

I'm going to push the v39 patch (after self review again), barring any
objections and further comments.

Regards,


--
Masahiko Sawada

Amazon Web Services: https://aws.amazon.com

Отв.: Re: UUID v7

From
Sergey Prokhorenko
Date:
I forgot to mention the incremental download



Пользователь четверг, ноября 28, 2024, 2:07 AM написал Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au>:

It would be useful to add a standard comparative benchmark with several parameters and use cases to the patch, so that IT departments can compare UUIDv7, ULID, UUIDv4, Snowflake ID and BIGSERIAL for their hardware and conditions.

I know for a fact that IT departments make such benchmarks of low quality. They usually measure the generation rate, which is meaningless because it is usually excessive. It makes sense to measure the rate of single-threaded and multi-threaded insertion of a large number of records (with and without partitioning), as well as the rate of execution of queries to join big tables, to update or delete a large number of records. It is important to measure memory usage, processor load, etc.


Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au


On Wednesday 27 November 2024 at 09:24:40 pm GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:


On Tue, Nov 26, 2024 at 7:11 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 27 Nov 2024, at 04:11, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Nov 26, 2024 at 1:55 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
> >>
> >> On Tue, 26 Nov 2024 at 21:48, Sergey Prokhorenko
> >> <sergeyprokhorenko@yahoo.com.au> wrote:
> >>> gen_uuidv7() is OK
> >>
> >> I'd very much prefer to not have a gen_ or get_ prefix as argued before[1][2].
> >>
> >> My vote is still for simply uuidv7() and uuidv4()
> >>
> >>> uuid-ossp is outdated, slow and not supported by the author. UUIDv7 is the renaissance of UUIDs. So we should not depend on legacy technology names
> >>
> >> agreed
> >>
> >
> > It seems that we agreed to use 'uuidv7' instead of 'uuid_v7()'. There
> > is discussion whether we should add 'gen_' or 'get_' but let's go back
> > to the previously-agreed function name 'uuidv7()' for now. We can
> > rename it later if we find a better name.
>
> I think uuidv7() is kind of consensual.
>
> > I've attached the new version patch that incorporated all comments and
> > renamed the functions. Also I avoided using 'if defined(__darwin__) ||
> > defined(_MSC_VER)' twice.
>
> Good, I think now it's a bit easier to understand those 2 bits.
>

Thanks.

I'm going to push the v39 patch (after self review again), barring any
objections and further comments.

Regards,


--
Masahiko Sawada

Amazon Web Services: https://aws.amazon.com

Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 28 Nov 2024, at 04:07, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:
>
> It would be useful to add a standard comparative benchmark with several parameters and use cases to the patch, so
thatIT departments can compare UUIDv7, ULID, UUIDv4, Snowflake ID and BIGSERIAL for their hardware and conditions. 
>
> I know for a fact that IT departments make such benchmarks of low quality. They usually measure the generation rate,
whichis meaningless because it is usually excessive. It makes sense to measure the rate of single-threaded and
multi-threadedinsertion of a large number of records (with and without partitioning), as well as the rate of execution
ofqueries to join big tables, to update or delete a large number of records. It is important to measure memory usage,
processorload, etc. 

Publishing benchmarks seems to be far beyond what our documentation go for. Mostly, because benchmarks are tricky. You
canprove anything with benchmarks. 

Everyone is welcome to publish benchmark results in their blogs, but IMO docs have a very different job to do.

I’ll just publish one benchmark in this mailing list. With patch v39 applied on my MB Air M2 I get:

postgres=# create table table_for_uuidv4(id uuid primary key);
CREATE TABLE
Time: 9.479 ms
postgres=# insert into table_for_uuidv4 select uuidv4() from generate_series(1,3e7);
INSERT 0 30000000
Time: 2003918.770 ms (33:23.919)
postgres=# create table table_for_uuidv7(id uuid primary key);
CREATE TABLE
Time: 3.930 ms
postgres=# insert into table_for_uuidv7 select uuidv7() from generate_series(1,3e7);
INSERT 0 30000000
Time: 337001.315 ms (05:37.001)

Almost an order of magnitude better :)


Best regards, Andrey Borodin.


Re: UUID v7

From
Peter Eisentraut
Date:
On 27.11.24 00:11, Masahiko Sawada wrote:
> On Tue, Nov 26, 2024 at 1:55 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>>
>> On Tue, 26 Nov 2024 at 21:48, Sergey Prokhorenko
>> <sergeyprokhorenko@yahoo.com.au> wrote:
>>> gen_uuidv7() is OK
>>
>> I'd very much prefer to not have a gen_ or get_ prefix as argued before[1][2].
>>
>> My vote is still for simply uuidv7() and uuidv4()
>>
>>> uuid-ossp is outdated, slow and not supported by the author. UUIDv7 is the renaissance of UUIDs. So we should not
dependon legacy technology names
 
>>
>> agreed
>>
> 
> It seems that we agreed to use 'uuidv7' instead of 'uuid_v7()'. There
> is discussion whether we should add 'gen_' or 'get_' but let's go back
> to the previously-agreed function name 'uuidv7()' for now. We can
> rename it later if we find a better name.
> 
> I've attached the new version patch that incorporated all comments and
> renamed the functions. Also I avoided using 'if defined(__darwin__) ||
> defined(_MSC_VER)' twice.

* doc/src/sgml/func.sgml

The function variant uuidv7(interval) is not documented.


* src/backend/utils/adt/uuid.c

+/* Set the given UUID version and the variant bits */
+static inline void
+uuid_set_version(pg_uuid_t *uuid, unsigned char version)
+{

This should be a block comment, like

/*
  * Set the ...
  */

+/*
+ * Generate UUID version 7 per RFC 9562, with the given timestamp.
+ *
...
+static pg_attribute_always_inline pg_uuid_t *
+generate_uuidv7(int64 ns)

Is "ns" the timestamp argument?  What format is it?  Explain.

+   /*
+    * Shift the current timestamp by the given interval. To make correct
+    * calculating the time shift, we convert the UNIX epoch to TimestampTz
+    * and use timestamptz_pl_interval(). Since this calculation is done 
with
+    * microsecond precision, we carry back the nanoseconds.
+    */

This needs a bit of grammar tweaking, I think: "To make correct calculating"

I don't know what the meaning of "carry back" is.

+   Interval   *span = PG_GETARG_INTERVAL_P(0);

Not sure why this is named "span"?  Maybe "shift" would be better?


* src/include/catalog/pg_proc.dat

+{ oid => '9897', descr => 'generate UUID version 7 with a timestamp 
shifted on specific interval',

better "shifted by"?


* src/test/regress/expected/opr_sanity.out

+uuidv4()
+uuidv7()
+uuidv7(interval)

Functions without arguments don't need to be marked leakproof.

uuidv7(interval) internally calls timestamptz_pl_interval(), which is
not leakproof, so I don't think that classification is sound.


* src/test/regress/sql/uuid.sql

+SELECT uuid_extract_version(uuidv4()); --4
+SELECT uuid_extract_version(uuidv7()); --7

Make the whitespace of the comment consistent with the rest of the file.

-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 
'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis 
test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 
'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test 
vector for v1

Here as well.




Отв.: Re: UUID v7

From
Sergey Prokhorenko
Date:
I mean to add not benchmark results to the patch, but functions so that everyone can compare themselves on their equipment. The comparison with UUIDv4 is not very interesting, as the choice is usually between UUIDv7 and an integer key. And I have described many use cases, and in your benchmark there is only one, the simplest.



Пользователь четверг, ноября 28, 2024, 11:09 AM написал Andrey M. Borodin <x4mmm@yandex-team.ru>:



> On 28 Nov 2024, at 04:07, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:
>
> It would be useful to add a standard comparative benchmark with several parameters and use cases to the patch, so that IT departments can compare UUIDv7, ULID, UUIDv4, Snowflake ID and BIGSERIAL for their hardware and conditions.
>
> I know for a fact that IT departments make such benchmarks of low quality. They usually measure the generation rate, which is meaningless because it is usually excessive. It makes sense to measure the rate of single-threaded and multi-threaded insertion of a large number of records (with and without partitioning), as well as the rate of execution of queries to join big tables, to update or delete a large number of records. It is important to measure memory usage, processor load, etc.


Publishing benchmarks seems to be far beyond what our documentation go for. Mostly, because benchmarks are tricky. You can prove anything with benchmarks.

Everyone is welcome to publish benchmark results in their blogs, but IMO docs have a very different job to do.

I’ll just publish one benchmark in this mailing list. With patch v39 applied on my MB Air M2 I get:

postgres=# create table table_for_uuidv4(id uuid primary key);
CREATE TABLE
Time: 9.479 ms
postgres=# insert into table_for_uuidv4 select uuidv4() from generate_series(1,3e7);
INSERT 0 30000000
Time: 2003918.770 ms (33:23.919)
postgres=# create table table_for_uuidv7(id uuid primary key);
CREATE TABLE
Time: 3.930 ms
postgres=# insert into table_for_uuidv7 select uuidv7() from generate_series(1,3e7);
INSERT 0 30000000
Time: 337001.315 ms (05:37.001)

Almost an order of magnitude better :)


Best regards, Andrey Borodin.

Re: Отв.: Re: UUID v7

From
Kirill Reshke
Date:


On Fri, 29 Nov 2024, 09:14 Sergey Prokhorenko, <sergeyprokhorenko@yahoo.com.au> wrote:
I mean to add not benchmark results to the patch, but functions so that everyone can compare themselves on their equipment. The comparison with UUIDv4 is not very interesting, as the choice is usually between UUIDv7 and an integer key. And I have described many use cases, and in your benchmark there is only one, the simplest.



Пользователь четверг, ноября 28, 2024, 11:09 AM написал Andrey M. Borodin <x4mmm@yandex-team.ru>:



> On 28 Nov 2024, at 04:07, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:
>
> It would be useful to add a standard comparative benchmark with several parameters and use cases to the patch, so that IT departments can compare UUIDv7, ULID, UUIDv4, Snowflake ID and BIGSERIAL for their hardware and conditions.
>
> I know for a fact that IT departments make such benchmarks of low quality. They usually measure the generation rate, which is meaningless because it is usually excessive. It makes sense to measure the rate of single-threaded and multi-threaded insertion of a large number of records (with and without partitioning), as well as the rate of execution of queries to join big tables, to update or delete a large number of records. It is important to measure memory usage, processor load, etc.


Publishing benchmarks seems to be far beyond what our documentation go for. Mostly, because benchmarks are tricky. You can prove anything with benchmarks.

Everyone is welcome to publish benchmark results in their blogs, but IMO docs have a very different job to do.

I’ll just publish one benchmark in this mailing list. With patch v39 applied on my MB Air M2 I get:

postgres=# create table table_for_uuidv4(id uuid primary key);
CREATE TABLE
Time: 9.479 ms
postgres=# insert into table_for_uuidv4 select uuidv4() from generate_series(1,3e7);
INSERT 0 30000000
Time: 2003918.770 ms (33:23.919)
postgres=# create table table_for_uuidv7(id uuid primary key);
CREATE TABLE
Time: 3.930 ms
postgres=# insert into table_for_uuidv7 select uuidv7() from generate_series(1,3e7);
INSERT 0 30000000
Time: 337001.315 ms (05:37.001)

Almost an order of magnitude better :)


Best regards, Andrey Borodin.
Hi!
Do not top-post on this list

Re: Отв.: Re: UUID v7

From
Masahiko Sawada
Date:
On Thu, Nov 28, 2024 at 8:13 PM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
>
> I mean to add not benchmark results to the patch, but functions so that everyone can compare themselves on their
equipment.The comparison with UUIDv4 is not very interesting, as the choice is usually between UUIDv7 and an integer
key.And I have described many use cases, and in your benchmark there is only one, the simplest. 

I don't think we should add such benchmark functions at least to this
patch. If there already is a well-established workload using UUIDv7
and UUIDv4 etc, users can use pgbench with custom scripts, or it might
make sense to add it to pgbench as a built-in workload. Which however
should be a separate patch. Having said that, I think users should use
benchmarks that fit their workloads, and it would not be easy to
establish workloads that are reasonable for most systems.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Отв.: Re: UUID v7

From
Sergey Prokhorenko
Date:


Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au


On Friday 29 November 2024 at 09:19:33 am GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:


On Thu, Nov 28, 2024 at 8:13 PM Sergey Prokhorenko

<sergeyprokhorenko@yahoo.com.au> wrote:
>
> I mean to add not benchmark results to the patch, but functions so that everyone can compare themselves on their equipment. The comparison with UUIDv4 is not very interesting, as the choice is usually between UUIDv7 and an integer key. And I have described many use cases, and in your benchmark there is only one, the simplest.


I don't think we should add such benchmark functions at least to this
patch. If there already is a well-established workload using UUIDv7
and UUIDv4 etc, users can use pgbench with custom scripts, or it might
make sense to add it to pgbench as a built-in workload. Which however
should be a separate patch. Having said that, I think users should use
benchmarks that fit their workloads, and it would not be easy to
establish workloads that are reasonable for most systems.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com






Workloads can and must be added with parameters. Typically, companies use test tables of 10,000 and 1,000,000 records, etc. Different companies have mostly similar usage scenarios (for example, incremental loading). Each company has to duplicate the work of others, creating the same benchmarks. The worst thing is that this is entrusted to incompetent employees who are not very good at understanding typical key usage scenarios. As a rule, these are programmers, not system analysts. Accordingly, the solution in 99% of cases will be in favor of integer keys, as they take up less space and are generated faster. If we leave this problem until the next patch, it will take us a year and a half. This is completely wrong.


Re: Отв.: Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 29 Nov 2024, at 18:57, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:
>
> Workloads can and must be added with parameters. Typically, companies use test tables of 10,000 and 1,000,000
records,etc. Different companies have mostly similar usage scenarios (for example, incremental loading). Each company
hasto duplicate the work of others, creating the same benchmarks. The worst thing is that this is entrusted to
incompetentemployees who are not very good at understanding typical key usage scenarios. As a rule, these are
programmers,not system analysts. Accordingly, the solution in 99% of cases will be in favor of integer keys, as they
takeup less space and are generated faster. If we leave this problem until the next patch, it will take us a year and a
half.This is completely wrong. 

I think we have pretty decent documentation in the patch. It only points to RFC and that's it.
There were patch versions with opinionated novels in docs. Giving advises, comparing possibilities and all that stuff.
I'mso happy we passed through this stage and moved forward :) 


Best regards, Andrey Borodin.


Re: Отв.: Re: UUID v7

From
Masahiko Sawada
Date:
On Fri, Nov 29, 2024 at 5:59 AM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
>
>
>
> Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au
>
>
> On Friday 29 November 2024 at 09:19:33 am GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
>
> On Thu, Nov 28, 2024 at 8:13 PM Sergey Prokhorenko
>
> <sergeyprokhorenko@yahoo.com.au> wrote:
> >
> > I mean to add not benchmark results to the patch, but functions so that everyone can compare themselves on their
equipment.The comparison with UUIDv4 is not very interesting, as the choice is usually between UUIDv7 and an integer
key.And I have described many use cases, and in your benchmark there is only one, the simplest. 
>
>
> I don't think we should add such benchmark functions at least to this
> patch. If there already is a well-established workload using UUIDv7
> and UUIDv4 etc, users can use pgbench with custom scripts, or it might
> make sense to add it to pgbench as a built-in workload. Which however
> should be a separate patch. Having said that, I think users should use
> benchmarks that fit their workloads, and it would not be easy to
> establish workloads that are reasonable for most systems.
>
> Regards,
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com
>
>
>
>
>
>
> Workloads can and must be added with parameters. Typically, companies use test tables of 10,000 and 1,000,000
records,etc. Different companies have mostly similar usage scenarios (for example, incremental loading). Each company
hasto duplicate the work of others, creating the same benchmarks. The worst thing is that this is entrusted to
incompetentemployees who are not very good at understanding typical key usage scenarios. As a rule, these are
programmers,not system analysts. Accordingly, the solution in 99% of cases will be in favor of integer keys, as they
takeup less space and are generated faster. If we leave this problem until the next patch, it will take us a year and a
half.This is completely wrong. 

There are still 4 months left until the feature freeze. We can discuss
this topic and might find solutions. I don't think it's a blocker of
this patch (UUIDv7 implementation patch).

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: UUID v7

From
Masahiko Sawada
Date:
On Fri, Nov 29, 2024 at 10:39 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 29 Nov 2024, at 00:46, Peter Eisentraut <peter@eisentraut.org> wrote:
> >
> > Here as well.
>
> Peter, many thanks for the next round of review. I agree with all corrections.
> I'm sending amendments addressing your review as a separate step in patch set. Step 1 of this patch set is identical
tov39. 
>

Thank you for updating the patch! Here are two comments:

 <function>uuidv7</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
     This function returns a version 7 UUID (UNIX timestamp with millisecond
-    precision + sub-millisecond timestamp + random).
+    precision + sub-millisecond timestamp + random). This function can accept
+    optional <parameter>shift</parameter> parameter of type
<type>interval</type>
+    which shift internal timestamp by the given interval.
   </para>

There is no "shift" parameter in the function synopsis.

Also, while reviewing the changes for func.sgml, I find that now that
we have 5 UUID functions, it might make sense to create a table for
UUID functions instead of describing functions separately. Which seems
to be more readable and consistent with other functions in docs.

---
+/*
+ * Wrapper for gen_random_uuid()
+ */
+Datum
+uuidv4(PG_FUNCTION_ARGS)
+{
+ return gen_random_uuid(fcinfo);
+}

Why do we need this? IIUC we marked uuidv4() (and uuidv7()) leafproof
because gen_random_uuid() is marked too. Otherwise, the following test
in opr_sanity would fail:

-- Considering only built-in procs (prolang = 12), look for multiple uses
-- of the same internal function (ie, matching prosrc fields).  It's OK to
-- have several entries with different pronames for the same internal function,
-- but conflicts in the number of arguments and other critical items should
-- be complained of.  (We don't check data types here; see next query.)
-- Note: ignore aggregate functions here, since they all point to the same
-- dummy built-in function.

Given that these functions don't need to be marked leakproof, does it
make sense to remove the leakproof mark from gen_random_uuid() too?
That way, we don't need the wrapper function.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: UUID v7

From
Marcos Pegoraro
Date:
Em sex., 29 de nov. de 2024 às 15:49, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:
 <function>uuidv7</function> () <returnvalue>uuid</returnvalue>

Wouldn't it be better to change this to
 <function>uuidv7</function> ([interval]) <returnvalue>uuid</returnvalue 
and explain what that param is ?

regards
Marcos

Re: UUID v7

From
Masahiko Sawada
Date:
On Fri, Nov 29, 2024 at 11:47 AM Marcos Pegoraro <marcos@f10.com.br> wrote:
>
> Em sex., 29 de nov. de 2024 às 15:49, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:
>>
>>  <function>uuidv7</function> () <returnvalue>uuid</returnvalue>
>
>
> Wouldn't it be better to change this to
>  <function>uuidv7</function> ([interval]) <returnvalue>uuid</returnvalue
> and explain what that param is ?

Yes, the function synopsis in the doc should be either:

uuidv7([interval]) -> uuid

or

uuidv7([shift interval]) -> uuid

Since this function has only one function argument it doesn't
necessarily need an argument name 'shift'. So the proposed description
might be okay but we need to change at least the function synopsis.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: UUID v7

From
Masahiko Sawada
Date:
On Fri, Nov 29, 2024 at 12:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Nov 29, 2024 at 11:47 AM Marcos Pegoraro <marcos@f10.com.br> wrote:
> >
> > Em sex., 29 de nov. de 2024 às 15:49, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:
> >>
> >>  <function>uuidv7</function> () <returnvalue>uuid</returnvalue>
> >
> >
> > Wouldn't it be better to change this to
> >  <function>uuidv7</function> ([interval]) <returnvalue>uuid</returnvalue
> > and explain what that param is ?
>
> Yes, the function synopsis in the doc should be either:
>
> uuidv7([interval]) -> uuid
>
> or
>
> uuidv7([shift interval]) -> uuid
>
> Since this function has only one function argument it doesn't
> necessarily need an argument name 'shift'. So the proposed description
> might be okay but we need to change at least the function synopsis.
>

I realized that the description of uuid_extract_timestamp() needs to
be updated as well since it now supports version 7 too:

<synopsis>
<function>uuid_extract_timestamp</function> (uuid)
<returnvalue>timestamp with time zone</returnvalue>
</synopsis>
   This function extracts a <type>timestamp with time zone</type> from UUID
   version 1.  For other versions, this function returns null.  Note that the
   extracted timestamp is not necessarily exactly equal to the time the UUID
   was generated; this depends on the implementation that generated the UUID.
  </para>

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: UUID v7

From
"Daniel Verite"
Date:
    Andrey M. Borodin wrote:


> I'm sending amendments addressing your review as a separate step in patch
> set. Step 1 of this patch set is identical to v39.

Some comments about the implementation of monotonicity:

+/*
+ * Get the current timestamp with nanosecond precision for UUID generation.
+ * The returned timestamp is ensured to be at least SUBMS_MINIMAL_STEP
greater
+ * than the previous returned timestamp (on this backend).
+ */
+static inline int64
+get_real_time_ns_ascending()
+{
+    static int64 previous_ns = 0;

[...]

+    /* Guarantee the minimal step advancement of the timestamp */
+    if (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns)
+        ns = previous_ns + SUBMS_MINIMAL_STEP_NS;
+    previous_ns = ns;

In the case of parallel execution (uuidv7() being parallel-safe), if
there have been previous calls to uuidv7() in that backend,
previous_ns will be set in the backend process,
but zero in a newly spawned worker process.
If (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns) ever happens
to be true in the main process, it will start at false in the workers,
leading to non-monotonic results within the same query.


Also in the case of a backward clock change, we can end up with some
backends sticking to the "old time" plus increment per invocation
until they die, while some other backends spawned after the clock
change are on the "new time". These backends may produce series of
UUIDv7 that would be completely out of sync with each others.
A backward clock change is an abnormality, but if it occurs, what's
the best choice? Take the bullet and switch to the new time , or
stick to a time that is permanently decorrelated from the OS
clock? I would think that the latter is worse.


Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite



Re: UUID v7

From
Masahiko Sawada
Date:
On Sat, Nov 30, 2024 at 7:01 AM Daniel Verite <daniel@manitou-mail.org> wrote:
>
>         Andrey M. Borodin wrote:
>
>
> > I'm sending amendments addressing your review as a separate step in patch
> > set. Step 1 of this patch set is identical to v39.
>
> Some comments about the implementation of monotonicity:
>
> +/*
> + * Get the current timestamp with nanosecond precision for UUID generation.
> + * The returned timestamp is ensured to be at least SUBMS_MINIMAL_STEP
> greater
> + * than the previous returned timestamp (on this backend).
> + */
> +static inline int64
> +get_real_time_ns_ascending()
> +{
> +       static int64 previous_ns = 0;
>
> [...]
>
> +       /* Guarantee the minimal step advancement of the timestamp */
> +       if (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns)
> +               ns = previous_ns + SUBMS_MINIMAL_STEP_NS;
> +       previous_ns = ns;
>
> In the case of parallel execution (uuidv7() being parallel-safe), if
> there have been previous calls to uuidv7() in that backend,
> previous_ns will be set in the backend process,
> but zero in a newly spawned worker process.
> If (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns) ever happens
> to be true in the main process, it will start at false in the workers,
> leading to non-monotonic results within the same query.

The monotonicity of generated UUIDv7 is guaranteed only within a
single backend. I think your point is that UUIDs in parallel query
results might not be ordered. But is it guaranteed that without ORDER
BY clause, the results returned by parallel queries are in the same
order as the results from non-parallel queries in the first place?

>
> Also in the case of a backward clock change, we can end up with some
> backends sticking to the "old time" plus increment per invocation
> until they die, while some other backends spawned after the clock
> change are on the "new time". These backends may produce series of
> UUIDv7 that would be completely out of sync with each others.
> A backward clock change is an abnormality, but if it occurs, what's
> the best choice? Take the bullet and switch to the new time , or
> stick to a time that is permanently decorrelated from the OS
> clock? I would think that the latter is worse.

IIUC after generating a UUIDv7 with the correct time, even if the
system time goes back, the time in the next UUIDv7 will be
SUBMS_MINIMAL_STEP_NS nanoseconds ahead of the last correct time.
Also, in case where the backend generates its first UUIDv7 with an
incorrect (e.g. an old) time, it generates UUIDv7 based on the
incorrect timestamp. However, it starts generating UUIDv7 with the
correct timestamp as soon as the system time goes back to the correct
time. So I think that it doesn't happen that one backend is sticking
to an old time while another backend is using the correct timestamp to
generate UUIDv7. Note that we use (the previous timestamp +
SUBMS_MINIMAL_STEP_NS) only if the system clock didn't move forward by
SUBMS_MINIMAL_STEP_NS.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Benchmark function for uuidv7()

From
Sergey Prokhorenko
Date:


On Friday 29 November 2024 at 08:55:09 pm GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:


On Fri, Nov 29, 2024 at 5:59 AM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
>
>
>
> Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au
>
>
> On Friday 29 November 2024 at 09:19:33 am GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
>
> On Thu, Nov 28, 2024 at 8:13 PM Sergey Prokhorenko
>
> <sergeyprokhorenko@yahoo.com.au> wrote:
> >
> > I mean to add not benchmark results to the patch, but functions so that everyone can compare themselves on their equipment. The comparison with UUIDv4 is not very interesting, as the choice is usually between UUIDv7 and an integer key. And I have described many use cases, and in your benchmark there is only one, the simplest.
>
>
> I don't think we should add such benchmark functions at least to this
> patch. If there already is a well-established workload using UUIDv7
> and UUIDv4 etc, users can use pgbench with custom scripts, or it might
> make sense to add it to pgbench as a built-in workload. Which however
> should be a separate patch. Having said that, I think users should use
> benchmarks that fit their workloads, and it would not be easy to
> establish workloads that are reasonable for most systems.
>
> Regards,
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com
>
>
>
>
>
>
> Workloads can and must be added with parameters. Typically, companies use test tables of 10,000 and 1,000,000 records, etc. Different companies have mostly similar usage scenarios (for example, incremental loading). Each company has to duplicate the work of others, creating the same benchmarks. The worst thing is that this is entrusted to incompetent employees who are not very good at understanding typical key usage scenarios. As a rule, these are programmers, not system analysts. Accordingly, the solution in 99% of cases will be in favor of integer keys, as they take up less space and are generated faster. If we leave this problem until the next patch, it will take us a year and a half. This is completely wrong.

There are still 4 months left until the feature freeze. We can discuss
this topic and might find solutions. I don't think it's a blocker of
this patch (UUIDv7 implementation patch).


Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


_________________________________________________________________________________________
_________________________________________________________________________________________


I am not a programmer, but a systems analyst. Therefore, I cannot develop a function for the benchmark myself, but I can describe the requirements for this function.

The function for the benchmark could be implemented as a separate patch (not UUIDv7 implementation patch) to avoid blocking the UUIDv7 implementation patch.

Requirements for the uuidv7() benchmark function

1. Benchmark function

The function for uuidv7() benchmark should be called uuidv7_benchmark().

This function should have parameters:
• optional parameter mock_table_record_count. If the parameter is not passed, then the value of 1 million records is taken
• optional parameter payload_size_b. If the parameter is not passed, then the value of 2048 bytes is taken

2. Benchmark results table

The result of each benchmark step for a certain surrogate key type should be dumped into a separate row of the uuidv7_benchmark_results table. This table should contain the following columns:
• benchmark_start_datetime
• mock_table_record_count
step_name
surrogate_key_type
• rate_per_ms (nullable)
• cpu_usage_ percent (nullable)
• memory_usage_mb (nullable)
• drive_usage_mb_per_s (nullable)

The table rows should be sequentially sorted by the columns benchmark_start_datetime, step_name, surrogate_key_type.

Sample benchmark results table:

| benchmark_start_datetime | mock_table_record_count | step_name         | surrogate_key_type | rate_per_ms | cpu_usage \_ percent | memory_usage_mb | drive_usage_mb_per_s |
| ------------------------ | ----------------------- | ----------------- | ------------------ | ----------- | -------------------- | --------------- | -------------------- |
|                          |                         | 1_insert          | UUIDv7             |             |                      |                 |                      |
|                          |                         | 1_insert          | BIGSERIAL          |             |                      |                 |                      |
|                          |                         | 1_insert          | UUIDv4             |             |                      |                 |                      |
|                          |                         | 2_parallel_insert | UUIDv7             |             |                      |                 |                      |
|                          |                         | 2_parallel_insert | BIGSERIAL          |             |                      |                 |                      |
|                          |                         | 2_parallel_insert | UUIDv4             |             |                      |                 |                      |
|                          |                         | 3_left_join       | UUIDv7             |             |                      |                 |                      |
|                          |                         | 3_left_join       | BIGSERIAL          |             |                      |                 |                      |
|                          |                         | 3_left_join       | UUIDv4             |             |                      |                 |                      |
|                          |                         | …                 | …                  |             |                      |                 |                      |

3. Mock table
Each record in the mock_table table must contain the following columns:
id with the UUID data type, PRIMARY KEY (indexed)
payload with the bytea data type

4. Compared surrogate key types
Each benchmark step is run sequentially with the following surrogate key types in the mock_table.id column:
• UUIDv7
• BIGSERIAL
• The name of the function for generating surrogate keys (for example, one of the formats: UUIDv4, ULID or Snowflake ID), if the user specifies such a function and makes it available

5. Benchmark steps

Before running the benchmark steps, a mock table is created.
Pseudocode:
CREATE TABLE mock_table (
id UUID PRIMARY KEY DEFAULT uuidv7(),
payload BYTEA
);

The benchmark must have the following steps:

step_name = '1_insert'
Pseudocode:
INSERT INTO mock_table (payload)
SELECT filled_payload(payload_size_b)
FROM generate_series(1, mock_table_record_count);

step_name = '2_parallel_insert'
The algorithm is at the discretion of the developer.

step_name = '3_left_join'
Pseudocode:
SELECT COUNT(*)
FROM mock_table a
LEFT JOIN mock_table b ON b.id = a.id
WHERE b.id IS NULL;

step_name = '4_inner_join'
Pseudocode:
SELECT COUNT(*)
FROM mock_table a
INNER JOIN mock_table b ON b.id = a.id
WHERE b.id IS NULL;

step_name = '5_group_by'
Pseudocode:
SELECT id, COUNT(*)
FROM mock_table
GROUP BY id
HAVING COUNT(*) > 1;

step_name = '6_delete'
Pseudocode:
DELETE FROM mock_table a
USING mock_table b
WHERE b.id = a.id;


Regards,


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au
 





Re: UUID v7

From
"Andrey M. Borodin"
Date:

> On 10 Dec 2024, at 03:34, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> I've attached the updated patches.

Both patches look good to me.
I'm not sure, but, perhaps, commit message of unleakproofing a function should mention that the problem was reported in
PeterE. review. 


Best regards, Andrey Borodin.


Re: UUID v7

From
Masahiko Sawada
Date:
On Mon, Dec 9, 2024 at 7:42 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 10 Dec 2024, at 03:34, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I've attached the updated patches.
>
> Both patches look good to me.
> I'm not sure, but, perhaps, commit message of unleakproofing a function should mention that the problem was reported
inPeter E. review. 

Pushed both patches.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: UUID v7

From
"Daniel Verite"
Date:
    Andrey M. Borodin wrote:

> I've addressed all items, except formatting a table...

Sorry for not following up sooner.

To illustrate my point upthread that was left unaddressed, let's say
I have a server with an incorrect date in the future.

A session generates an uuid

postgres=# select pg_backend_pid(), uuidv7();
 pg_backend_pid |         uuidv7
----------------+--------------------------------------
      13545 | 019ad701-c798-7000-a0e4-7119e2c82446


Now somebody sets the clock backward to the correct date.

Then if that backend continues to generate uuids, here's
what it outputs:

postgres=# select pg_backend_pid(), uuidv7() from generate_series(1,10);
 pg_backend_pid |         uuidv7
----------------+--------------------------------------
      13545 | 019ad701-c798-7001-8df7-d296dafd98fd
      13545 | 019ad701-c798-7002-9995-bf103bbb56d7
      13545 | 019ad701-c798-7003-88b3-5ea58c738ade
      13545 | 019ad701-c798-7004-ba5e-e675fe103060
      13545 | 019ad701-c798-7005-8608-59b9c852b4ce
      13545 | 019ad701-c798-7006-832c-d06c15e2865a
      13545 | 019ad701-c798-7007-8f45-360c0825c671
      13545 | 019ad701-c798-7008-bb47-bcb7915503b2
      13545 | 019ad701-c798-7009-9124-e6873b0265f6
      13545 | 019ad701-c798-700a-8422-8d75c5ade9f7
(10 rows)

The timestamps are now just a sequence incrementing by 1
on each call, independently of the server's clock and
the actual time span between calls. It has become a counter
and will remain so until the backend terminates.

It does not have to be that way. In get_real_time_ns_ascending(),
it could switch immediately to the new time:

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 2e32592f57..8df194daea 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -505,8 +505,11 @@ get_real_time_ns_ascending()
    ns = tmp.tv_sec * NS_PER_S + tmp.tv_nsec;
 #endif

-    /* Guarantee the minimal step advancement of the timestamp */
-    if (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns)
+    /*
+     * Guarantee the minimal step advancement of the timestamp,
+     * unless the clock has moved backward.
+     */
+    if (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns && previous_ns <= ns)
        ns = previous_ns + SUBMS_MINIMAL_STEP_NS;
    previous_ns = ns;

> Also PFA a prototype of making uuidv7() ordered across all backends via
> keeping previous_ns in shared memory. IMO it's overcomplicating and RFC
> does not require such guarantees

It does not have to be in core, but an extension might want to provide
a generator that guarantees monotonicity across backends.


Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite



Re: UUID v7

From
Andrey Borodin
Date:
Hi Daniel!

> On 16 Dec 2024, at 19:08, Daniel Verite <daniel@manitou-mail.org> wrote:
>
> The timestamps are now just a sequence incrementing by 1
> on each call, independently of the server's clock and
> the actual time span between calls. It has become a counter
> and will remain so until the backend terminates.

This is exactly what RFC suggest us to do. It’s a feature, not a bug.

>
> It does not have to be that way. In get_real_time_ns_ascending(),
> it could switch immediately to the new time:
>
> diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
> index 2e32592f57..8df194daea 100644
> --- a/src/backend/utils/adt/uuid.c
> +++ b/src/backend/utils/adt/uuid.c
> @@ -505,8 +505,11 @@ get_real_time_ns_ascending()
> ns = tmp.tv_sec * NS_PER_S + tmp.tv_nsec;
> #endif
>
> - /* Guarantee the minimal step advancement of the timestamp */
> - if (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns)
> + /*
> + * Guarantee the minimal step advancement of the timestamp,
> + * unless the clock has moved backward.
> + */
> + if (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns && previous_ns <= ns)
> ns = previous_ns + SUBMS_MINIMAL_STEP_NS;
> previous_ns = ns;

We have that previous_ns to protect us from clocks moving backwards. And you suggest us to disable this protection.
To achieve this we would rather delete previous_ns at all. It was there not to guarantee minimal step, but to ensure
clocksalways move forward only. 

>
>> Also PFA a prototype of making uuidv7() ordered across all backends via
>> keeping previous_ns in shared memory. IMO it's overcomplicating and RFC
>> does not require such guarantees
>
> It does not have to be in core, but an extension might want to provide
> a generator that guarantees monotonicity across backends.

AFAIK extension pg_uuidv7 does not have this protection right now. But Florian might add it in future.


Best regards, Andrey Borodin.