Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
Date
Msg-id CAD21AoCzEDdwpyPwA0d-QmCRe5rMz3m160SJgxMwKke85e8n0w@mail.gmail.com
Whole thread Raw
In response to Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions  (Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au>)
Responses Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
List pgsql-hackers
On Fri, Oct 24, 2025 at 3:42 PM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
>
>
>
> On Friday 24 October 2025 at 09:24:15 pm GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
>
> On Fri, Oct 24, 2025 at 12:17 AM Sergey Prokhorenko
> <sergeyprokhorenko@yahoo.com.au> wrote:
> >
> >
> > Masahiko,
> >
> > Developers will still be able to use the long canonical 'hex' UUID format for compatibility. But the short format
isnot a developer choice, but a convention. We mustn't allow a situation where 25% of systems use base32hex, 25% use
Crocksford'sBase32, 25% use base36, and 25% even use erroneously sorted base64. That's a very real nightmare. You, too,
haveevery reason not to want to increase the number of built-in functions in PostgreSQL. 
> >
> > But here is a solution that I hope will satisfy everyone:
> >
> > encode('019535d9-3df7-79fb-b466-fa907fa17f9e', 'uuid_to_base32hex') -> 06AJBM9TUTSVND36VA87V8BVJO
>
> > Does it mean the first argument is uuid type data and when
> > 'uuid_to_base32hex' is specified as the format the function requires a
> > uuid data at the first argument?
>
> Yes, that's right.
> PostgreSQL will automatically cast the string '019535d9-3df7-79fb-b466-fa907fa17f9e' to the uuid type, since the
formatis correct. 
>
>
> > I could not understand the difference
> > between specifying 'based32hex' and 'uuid_to_base32hex' when encoding
> > UUID data with base32hex encoding.
>
>
> 1. Specifying 'based32hex' in encode function means the first parameter is of bytea type as usual.
>
>
> But specifying 'uuid_to_base32hex' means the first parameter is of uuid type.
>
>
> 2. The encode function does not yet support format based32hex. Therefore, it is not known whether padding =====
shouldbe used. 
> But padding ===== is not used when specifying 'uuid_to_base32hex'.
>
>
> > decode('06AJBM9TUTSVND36VA87V8BVJO', 'base32hex_to_uuid') -> 019535d9-3df7-79fb-b466-fa907fa17f9e
>
> > Suppose that the decode() takes text data at the first argument and
> > returns UUID data, the function signature would be decode(text, text)
> > -> uuid. But we cannot create two functions that have the same name
> > and the same argument types.
>
> Yes, you're right. This is a problem that can't be solved without composite return values. We clearly took the wrong
approachby coupling UUID conversion with encode/decode functions, which only apply to bytea. UUID and bytea are
fundamentallydifferent data types. Meanwhile, PostgreSQL has over 30 other type conversion functions that deal with
otherdata types. For example, array_to_string, string_to_array, jsonb_to_record, to_char, to_timestamp, and to_hex. In
thissituation, the best solution would be to revert to the original uuid_to_base32hex() and base32hex_to_uuid()
functionsrather than deal with type incompatibility issues. 
>

I think that type conversions and data encodings serve different
purposes. Type conversions express semantic transformations between
data types (e.g., text -> timestamp, jsonb -> record), while encodings
are simply representations of binary data as text. For the latter,
PostgreSQL already provides a well-defined abstraction through
encode()/decode(). Mixing encoding logic with type-specific
conversions would blur that boundary.

Also, if we start adding dedicated functions for each supported
encoding (uuid_to_base32hex, uuid_to_hex etc.), the number of
functions could easily multiply. That’s exactly what encode() and
decode() were designed to avoid.

While I agree that base32hex should be the recommended, I'm really not
sure it's a good design that PostgreSQL core should enforce it as the
only built-in method. It seems better to me to provide flexible
primitives, encode()/decode() plus UUID <-> bytea casts, and document
base32hex as the canonical convention (if necessary). Or providing
'uuid_encode(uuid, format text) -> text' and 'uuid_decode(text, format
text) -> uuid' might make sense too, but I'm not sure. I'd like to
hear opinions from other hackers too.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Sergey Prokhorenko
Date:
Subject: Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
Next
From: Sami Imseih
Date:
Subject: Re: Bug in pg_stat_statements