Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions - Mailing list pgsql-hackers

From Sergey Prokhorenko
Subject Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
Date
Msg-id 574624399.175025.1761290201491@mail.yahoo.com
Whole thread Raw
In response to Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
List pgsql-hackers
On Thu, Oct 23, 2025 at 3:46 PM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
>
> > Given that what uuid_to_base32hex() actually does is encoding the
> input UUID,  I find that it could be confusing if we have a similar
> function other than encode() function. Also, we could end up
> introducing as many encoding and decoding functions dedicated for UUID
> as we want to support encoding methods, bloating the functions.
>
> > So as the first step, +1 for supporting base32hex for encode() and
> decode() functions and supporting the UUID <-> bytea conversion. I
> believe it would cover most use cases and the cost of UUID <-> bytea
> conversion is negligible.
>
> > Regards,
>
> > --
> > Masahiko Sawada
> > Amazon Web Services: https://aws.amazon.com
>
>
> Masahiko,
>
> I see you're in favor of base32hex encoding. That's great!
>
> Your arguments make sense, and I generally support enhancing the standard encode() and decode() functions to handle base32hex. It seems like the right approach from a developer experience standpoint.
>
> However, I'm unclear about some implementation aspects. Why add conversions between UUID and bytea data types? Wouldn't that require creating dedicated UUID <-> bytea conversion functions? Instead, could we implement encode() as polymorphic to handle UUID type inputs directly? For decode(), we'd need  some way (a parameter?) to specify the UUID output type instead of bytea. Another option would be automatic type casting when inserting bytea data into UUID columns. Neither an extra parameter nor additional type casting seems ideal to me, though I don't have better alternatives.

While we can implement something like decode(uuid, text), I don't
think we can implement decode() in the way you proposed unless I'm
missing something.

I think the conversion support between UUID and bytea is useful in
general, not limited to encode()/decode() support. And users would be
able to create wrapper functions if they don't want to add casting for
every encode() and decode() calls. For example,

create function uuid_to_base32(uuid) returns text language sql immutable strict
begin atomic
    select encode($1::bytea, 'base32hex');
end;

Since such functions are inlineable, the different between executing
encode(uuid_data::bytea, 'base32hex') and encode(uuid_data,
'base32hex') would only be the conversion; one palloc and one memcpy.

> But actually, for a short UUID text encoding to succeed, it's more important that it becomes the single, de facto standard. We should avoid supporting multiple encodings, just as the authors and contributors of RFC 9562 did: https://github.com/uuid6/new-uuid-encoding-techniques-ietf-draft/discussions/17#discussioncomment-10614817    Therefore, whenever possible, encode() and decode() should support just one UUID text encoding, namely base32hex.

I guess it's ultimately the developer's choice, no? For example, if
they are using multiple databases (or data processing platforms) in
their system and 'hex' is the only encoding that all components can
encode and decode, they might choose 'hex' encoding.


Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

____________________________________________


Masahiko,

Developers will still be able to use the long canonical 'hex' UUID format for compatibility. But the short format is not a developer choice, but a convention. We mustn't allow a situation where 25% of systems use base32hex, 25% use Crocksford's Base32, 25% use base36, and 25% even use erroneously sorted base64. That's a very real nightmare. You, too, have every reason not to want to increase the number of built-in functions in PostgreSQL.

But here is a solution that I hope will satisfy everyone:

encode('019535d9-3df7-79fb-b466-​fa907fa17f9e', 'uuid_to_base32hex') -> 06AJBM9TUTSVND36VA87V8BVJO
decode('06AJBM9TUTSVND36VA87V8BVJO', 'base32hex_to_uuid') -> 019535d9-3df7-79fb-b466-​fa907fa17f9e

I don't see any real business need for UUID <-> bytea conversions.

Best regards,
Sergey Prokhorenko










pgsql-hackers by date:

Previous
From: Shinya Kato
Date:
Subject: Re: Add mode column to pg_stat_progress_vacuum
Next
From: Shlok Kyal
Date:
Subject: Re: issue with synchronized_standby_slots