Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions - Mailing list pgsql-hackers

From Chao Li
Subject Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
Date
Msg-id 1F13AF7E-B90A-40FB-B47C-DC38FBB08353@gmail.com
Whole thread Raw
In response to Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
List pgsql-hackers

> On Mar 24, 2026, at 09:17, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Mar 20, 2026 at 7:24 AM Aleksander Alekseev
> <aleksander@tigerdata.com> wrote:
>>
>> Hi,
>>
>>>> Also, a small nitpick is that we can use uint32 instead of uint64 for
>>>> 'bits_buffer'. I've attached the updated patch as well as the
>>>> difference from the previous version.
>>>
>>> Then I suggest using uint32 for the bits_buffer variable in
>>> base32hex_encode() too. Also we should use 1U instead of 1ULL with
>>> uint32.
>>
>> CI is not happy with the new test:
>>
>> ```
>> SELECT decode('あ', 'base32hex'); -- error
>> -ERROR:  invalid symbol "あ" found while decoding base32hex sequence
>> +ERROR:  invalid symbol "ã" found while decoding base32hex sequence
>> ```
>>
>> Although it passes locally. My best guess is that something is off
>> with the database encoding on CI and that we shouldn't use this test.
>> We have a similar test which uses ASCII symbols only.
>
> Good catch. Yes, we should not use this test depending on the database
> encoding and it seems we can omit this test in the first place.
>
> The patch looks basically good to me. I've made some changes to the
> regression test part as I want to have round-trip tests. I've merged
> the tests checking the sortability to the existing tests and added
> round-trip tests. With this change, we can test round-trip tests and
> sortability tests with random UUID value in every test run while
> minimizing the test time. Feedback is very welcome.
>
>
> Regards,
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com
> <v11-0001-Add-base32hex-support-to-encode-and-decode-funct.patch>

V11 overall looks good to me. A couple of comments wrt the tests.

1
```
+        else
+            ereport(ERROR,
+                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("invalid symbol \"%.*s\" found while decoding base32hex sequence",
+                            pg_mblen_range(s - 1, srcend), s - 1)));
```

I noticed the use of pg_mblen_range(). Since base32hex itself is single-ASCII only, my first thought was that we might
notneed a multibyte-aware function here. But then I realized that a multibyte string could still be passed to decode(),
andpg_mblen_range() helps report the invalid multibyte character correctly in the error message. 

I tested it like this:
```
evantest=# SELECT decode('你哈', 'base32hex');
ERROR:  invalid symbol "你" found while decoding base32hex sequence
```

The error message looks good to me. So maybe it would be worth adding a test case for decode() with multibyte
characters.

2
```
+       accepts both padded and unpadded input. Decoding is case-insensitive and ignores
+       whitespace characters.
```

The doc says whitespace characters are silently ignored, so I tested this:
```
evantest=# SELECT decode('a a==', 'base32hex');
 decode
--------
 \x52
(1 row)

evantest=# SELECT decode(' a a==', 'base32hex');
 decode
--------
 \x52
(1 row)

evantest=# SELECT decode(' a a== ', 'base32hex');
 decode
--------
 \x52
(1 row)
```

It looks like leading, trailing, and embedded whitespace are all ignored. But I don’t see a test case covering this
behavior,so maybe it would be good to add one. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







pgsql-hackers by date:

Previous
From: Sami Imseih
Date:
Subject: Re: Proposal to allow setting cursor options on Portals
Next
From: Mahendra Singh Thalor
Date:
Subject: Re: [BUG] CRASH: ECPGprepared_statement() and ECPGdeallocate_all() when connection is NULL