Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
Date
Msg-id CAD21AoAwX1D6baSGuQXm0mzPXPWB07kgaoaaahjNHHenbdY24A@mail.gmail.com
Whole thread Raw
In response to Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
List pgsql-hackers
On Wed, Mar 25, 2026 at 5:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Tomas Vondra <tomas@vondra.me> writes:
> > On 3/26/26 00:40, Tom Lane wrote:
> >> I believe what's happening there is that in cs_CZ locale,
> >> "V" doesn't follow simple ASCII sort ordering.
>
> > With cs_CZ all letters sort *before* numbers, while in en_US it's the
> > other way around. V is not special in any way.
>
> Ah, sorry, I should have researched a bit instead of relying on
> fading memory.  The quirk I was thinking of is that in cs_CZ,
> "ch" sorts after "h":
>
> u8=# select 'h' < 'ch'::text collate "en_US";
>  ?column?
> ----------
>  f
> (1 row)
>
> u8=# select 'h' < 'ch'::text collate "cs_CZ";
>  ?column?
> ----------
>  t
> (1 row)
>
> Regular hex encoding isn't bitten by that because it doesn't
> use 'h' in the text form ... but this base32hex thingie does.
>
> However, your point is also correct:
>
> u8=# select '0' < 'C'::text ;
>  ?column?
> ----------
>  t
> (1 row)
>
> u8=# select '0' < 'C'::text collate "cs_CZ";
>  ?column?
> ----------
>  f
> (1 row)
>
> and that breaks "text ordering matches numeric ordering"
> for both traditional hex and base32hex.  So maybe this
> is not as big a deal as I first thought.  We need a fix
> for the new test though.  Probably adding COLLATE "C"
> would be enough.

Thank you for the report and the analysis.

I've reproduced the issue with "cs_CZ" collation and adding COLLATE
"C" to the query resolves it. It seems also a good idea to add a note
in the documentation too as users might face the same issue. For
example,

To maintain the lexicographical sort order of the encoded data, ensure
that the text is sorted using the C collation (e.g., using COLLATE
"C"). Natural language collations may sort characters differently and
break the ordering.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Roberto Mello
Date:
Subject: Re: pg_publication_tables: return NULL attnames when no column list is specified
Next
From: Lukas Fittl
Date:
Subject: Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?