Re: Losing my latin on Ordering... - Mailing list pgsql-general

From Laurenz Albe
Subject Re: Losing my latin on Ordering...
Date
Msg-id 2d3c66e7c075ae9efe691eaa3b1040c6ce393ed7.camel@cybertec.at
Whole thread Raw
In response to Re: Losing my latin on Ordering...  (Dominique Devienne <ddevienne@gmail.com>)
List pgsql-general
On Tue, 2023-02-14 at 12:17 +0100, Dominique Devienne wrote:
> On Tue, Feb 14, 2023 at 11:23 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > On Tue, 2023-02-14 at 10:31 +0100, Dominique Devienne wrote:
> > > Surely sorting should be "constant left-to-right", no? What are we missing?
> >
> > No, it isn't.  That's not how natural language collations work.
>
> Honestly, who expects the same prefix to sort differently based on what comes
> after, in left-to-right languages?
> How does one even find out what the (capricious?) rules for sorting in a given
> collation are?

Look at the documentation / implementation.

As far as ICU is concerned, here: https://unicode.org/reports/tr10/

> > > I'm already surprised (star) comes before (space), when the latter "comes
> > > before" the former in both ASCII and UTF-8, but that the two "Foo*" and "Foo "
> > > prefixed pairs are not clustered after sorting is just mistifying to me. So how come?
> >
> > Because they compare identical on the first three levels.  Any difference in
> > letters, accents or case weighs stronger, even if it occurs to the right
> > of these substrings.
>
> That's completely unintuitive...

Well, you can complain to GNU and the Unicode consortium, but that's pretty
much the way it is.

> > Yes, it soulds like the "C" collation may be best for you.  That is, if you don't
> > mind that "Z" < "a".
>
> I would mind if I asked for case-insensitive comparisons.
>
> So the "C" collation is fine with general UTF-8 encoding?
> I.e. it will be codepoint ordered OK?

Yes, exactly.

Yours,
Laurenz Albe



pgsql-general by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Losing my latin on Ordering...
Next
From: Dominique Devienne
Date:
Subject: Re: Losing my latin on Ordering...