Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
Date
Msg-id CAD21AoC=92zAEmaGt+3U1jSD77jgdQT_HXUwPPtZEze2sJ=vWA@mail.gmail.com
Whole thread
In response to Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
List pgsql-hackers
On Wed, Mar 25, 2026 at 6:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Mar 25, 2026 at 5:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Tomas Vondra <tomas@vondra.me> writes:
> > > On 3/26/26 00:40, Tom Lane wrote:
> > >> I believe what's happening there is that in cs_CZ locale,
> > >> "V" doesn't follow simple ASCII sort ordering.
> >
> > > With cs_CZ all letters sort *before* numbers, while in en_US it's the
> > > other way around. V is not special in any way.
> >
> > Ah, sorry, I should have researched a bit instead of relying on
> > fading memory.  The quirk I was thinking of is that in cs_CZ,
> > "ch" sorts after "h":
> >
> > u8=# select 'h' < 'ch'::text collate "en_US";
> >  ?column?
> > ----------
> >  f
> > (1 row)
> >
> > u8=# select 'h' < 'ch'::text collate "cs_CZ";
> >  ?column?
> > ----------
> >  t
> > (1 row)
> >
> > Regular hex encoding isn't bitten by that because it doesn't
> > use 'h' in the text form ... but this base32hex thingie does.
> >
> > However, your point is also correct:
> >
> > u8=# select '0' < 'C'::text ;
> >  ?column?
> > ----------
> >  t
> > (1 row)
> >
> > u8=# select '0' < 'C'::text collate "cs_CZ";
> >  ?column?
> > ----------
> >  f
> > (1 row)
> >
> > and that breaks "text ordering matches numeric ordering"
> > for both traditional hex and base32hex.  So maybe this
> > is not as big a deal as I first thought.  We need a fix
> > for the new test though.  Probably adding COLLATE "C"
> > would be enough.
>
> Thank you for the report and the analysis.
>
> I've reproduced the issue with "cs_CZ" collation and adding COLLATE
> "C" to the query resolves it. It seems also a good idea to add a note
> in the documentation too as users might face the same issue. For
> example,
>
> To maintain the lexicographical sort order of the encoded data, ensure
> that the text is sorted using the C collation (e.g., using COLLATE
> "C"). Natural language collations may sort characters differently and
> break the ordering.
>

Attached the patch doing the above idea.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Yugo Nagata
Date:
Subject: Re: Track skipped tables during autovacuum and autoanalyze
Next
From: Michael Paquier
Date:
Subject: Re: Track skipped tables during autovacuum and autoanalyze