Re: UUID v7 - Mailing list pgsql-hackers

From Andrey M. Borodin
Subject Re: UUID v7
Date
Msg-id BFDA918D-1CB0-4980-A31F-C1AC95BE68F6@yandex-team.ru
Whole thread Raw
In response to Re: UUID v7  (Peter Eisentraut <peter@eisentraut.org>)
Responses Re: UUID v7
Re: UUID v7
List pgsql-hackers
Hi Peter,

thank you for so thoughtful review.

> On 6 Mar 2024, at 12:13, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> I have various comments on this patch:
>
>
> - doc/src/sgml/func.sgml
>
> The documentation of the new functions should be broken up a bit.
> It's all one paragraph now.  At least make it several paragraphs, or
> possibly tables or something else.
I've split functions to generate UUIDs from functions to extract stuff.

>
> Avoid listing the functions twice: Once before the description and
> then again in the description.  That's just going to get out of date.
> The first listing is not necessary, I think.

Fixed.

> The return values in the documentation should use the public-facing
> type names, like "timestamp with time zone" and "smallint".

Fixed.

> The descriptions of the UUID generation functions use handwavy
> language in their descriptions, like "It provides much better data
> locality" or "unacceptable security or business intelligence
> implications", which isn't useful.  Either we cut that all out and
> just say, it creates a UUIDv7, done, look elsewhere for more
> information, or we provide some more concretely useful details.

I've removed all that stuff entirely.

> We shouldn't label a link as "IETF standard" when it's actually a
> draft.

Fixed.

Well, all my modifications of documentation are kind of blind... I tried to "make docs", but it gives me gazilion of
errors...Is there an easy way to see resulting HTML? 


> - src/include/catalog/pg_proc.dat
>
> The description of uuidv4 should be "generate UUID version 4", so that
> it parallels uuidv7.

Fixed.

> The description of uuid_extract_time says 'extract timestamp from UUID
> version 7', the implementation is not limited to version 7.

Fixed.

> I think uuid_extract_time should be named uuid_extract_timestamp,
> because it extracts a timestamp, not a time.

Renamed.

> The functions uuid_extract_ver and uuid_extract_var could be named
> uuid_extract_version and uuid_extract_variant.  Otherwise, it's hard
> to tell them apart, with only one letter different.

Renamed.

> - src/test/regress/sql/uuid.sql
>
> Why are the tests using the input format '{...}', which is not the
> standard one?

Fixed.

> - src/backend/utils/adt/uuid.c
>
> All this new code should have more comments.  There is a lot of bit
> twiddling going on, and I suppose one is expected to follow along in
> the RFC?  At least each function should have a header comment, so one
> doesn't have to check in pg_proc.dat what it's supposed to do.

I've added some header comment. One big comment is attached to v7, I tried to take parts mostly from RFC. Yet there are
alot of my additions that now need review... 

> I'm suspicious that these functions all appear to return null for
> erroneous input, rather than raising errors.  I think at least some
> explanation for this should be recorded somewhere.

The input is not erroneous per se.
But the fact that
# select 1/0;
ERROR:  division by zero
makes me consider throwing an error. There was some argumentation upthread for not throwing error though, but now I
cannotfind it... maybe I accepted this behaviour as more user-friendly. 

> I think the behavior of uuid_extract_var(iant) is wrong.  The code
> takes just two bits to return, but the draft document is quite clear
> that the variant is 4 bits (see Table 1).

Well, it was correct only for implemented variant. I've made version that implements full table 1 from section 4.1.

> The uuidv7 function could really use a header comment that explains
> the choices that were made.  The RFC draft provides various options
> that implementations could use; we should describe which ones we
> chose.

Done.

>
> I would have expected that, since gettimeofday() provides microsecond
> precision, we'd put the extra precision into "rand_a" as per Section 6.2 method 3.

I had chosen method 2 over method 3 as most portable. Can we be sure how many bits (after reading milliseconds) are
thereacross different OSes? Even if we put extra 10 bits of timestamp, we cannot extract safely them. 
These bits could promote inter-backend stortability. E.i. when many backends generate data fast - this data is still
somewhatordered even within 1ms. But I think benefits of this sortability are outweighed by portability(unknown real
resolution),simplicity(we don't store microseconds, thus do not try to extract them). 
All this arguments are weak, but if one method would be strictly better than another - there would be only one method.

>
> You use some kind of counter, but could you explain which method that
> counter implements?
I described counter in uuidv7() header.

>
> I don't see any acknowledgment of issues relating to concurrency or
> restarts.  Like, how do we prevent duplicates being generated by
> concurrent sessions or between restarts?  Maybe the counter or random
> stuff does that, but it's not explained.

I think restart takes more than 1ms, so this is covered with time tick.
I've added paragraph about frequency of generation in uuidv7() header.


Best regards, Andrey Borodin.

Attachment

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Failures in constraints regression test, "read only 0 of 8192 bytes"
Next
From: Robert Treat
Date:
Subject: Re: [DOC] Add detail regarding resource consumption wrt max_connections