Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions - Mailing list pgsql-hackers
| From | Chao Li |
|---|---|
| Subject | Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions |
| Date | |
| Msg-id | 1F13AF7E-B90A-40FB-B47C-DC38FBB08353@gmail.com Whole thread Raw |
| In response to | Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions (Masahiko Sawada <sawada.mshk@gmail.com>) |
| Responses |
Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
|
| List | pgsql-hackers |
> On Mar 24, 2026, at 09:17, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Mar 20, 2026 at 7:24 AM Aleksander Alekseev
> <aleksander@tigerdata.com> wrote:
>>
>> Hi,
>>
>>>> Also, a small nitpick is that we can use uint32 instead of uint64 for
>>>> 'bits_buffer'. I've attached the updated patch as well as the
>>>> difference from the previous version.
>>>
>>> Then I suggest using uint32 for the bits_buffer variable in
>>> base32hex_encode() too. Also we should use 1U instead of 1ULL with
>>> uint32.
>>
>> CI is not happy with the new test:
>>
>> ```
>> SELECT decode('あ', 'base32hex'); -- error
>> -ERROR: invalid symbol "あ" found while decoding base32hex sequence
>> +ERROR: invalid symbol "ã" found while decoding base32hex sequence
>> ```
>>
>> Although it passes locally. My best guess is that something is off
>> with the database encoding on CI and that we shouldn't use this test.
>> We have a similar test which uses ASCII symbols only.
>
> Good catch. Yes, we should not use this test depending on the database
> encoding and it seems we can omit this test in the first place.
>
> The patch looks basically good to me. I've made some changes to the
> regression test part as I want to have round-trip tests. I've merged
> the tests checking the sortability to the existing tests and added
> round-trip tests. With this change, we can test round-trip tests and
> sortability tests with random UUID value in every test run while
> minimizing the test time. Feedback is very welcome.
>
>
> Regards,
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com
> <v11-0001-Add-base32hex-support-to-encode-and-decode-funct.patch>
V11 overall looks good to me. A couple of comments wrt the tests.
1
```
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid symbol \"%.*s\" found while decoding base32hex sequence",
+ pg_mblen_range(s - 1, srcend), s - 1)));
```
I noticed the use of pg_mblen_range(). Since base32hex itself is single-ASCII only, my first thought was that we might
notneed a multibyte-aware function here. But then I realized that a multibyte string could still be passed to decode(),
andpg_mblen_range() helps report the invalid multibyte character correctly in the error message.
I tested it like this:
```
evantest=# SELECT decode('你哈', 'base32hex');
ERROR: invalid symbol "你" found while decoding base32hex sequence
```
The error message looks good to me. So maybe it would be worth adding a test case for decode() with multibyte
characters.
2
```
+ accepts both padded and unpadded input. Decoding is case-insensitive and ignores
+ whitespace characters.
```
The doc says whitespace characters are silently ignored, so I tested this:
```
evantest=# SELECT decode('a a==', 'base32hex');
decode
--------
\x52
(1 row)
evantest=# SELECT decode(' a a==', 'base32hex');
decode
--------
\x52
(1 row)
evantest=# SELECT decode(' a a== ', 'base32hex');
decode
--------
\x52
(1 row)
```
It looks like leading, trailing, and embedded whitespace are all ignored. But I don’t see a test case covering this
behavior,so maybe it would be good to add one.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
pgsql-hackers by date: