Losing my latin on Ordering... - Mailing list pgsql-general

From Dominique Devienne
Subject Losing my latin on Ordering...
Date
Msg-id CAFCRh--xt-J8awOavhB216kom6TQnaP35TTVEQQS5bHH7gMemQ@mail.gmail.com
Whole thread Raw
Responses Re: Losing my latin on Ordering...  (Laurenz Albe <laurenz.albe@cybertec.at>)
List pgsql-general
Hi. Porting a unit test to PostgreSQL, we got a failure related to ordering.

We've distilled it to the below. The DB is en_US.UTF-8, and the sorting we get
does not make sense to me. The same prefix can be sorted differently based on the suffix apprently, which doesn't make any sense to me.

Surely sorting should be "constant left-to-right", no? What are we missing?

I'm already surprised (star) comes before (space), when the latter "comes before" the former in both ASCII and UTF-8, but that the two "Foo*" and "Foo " prefixed pairs are not clustered after sorting is just mistifying to me. So how come?

For now we can work-around this by explicitly adding the `collate "C"` on the queries underlying that particular test, but that would be wrong in the general case of international strings to sort, so I'd really like to understand what's going on.

Thanks, --DD

PS: if I try "en_US.UTF-8" or "en_US"."UTF-8" for the collate, it fails.
  How come what pg_database.datcollate displays is not a valid value for collate?

PPS: We tried on v12 and v13 I believe. Somehow my v14.2 on Windows doesn't have en_US as a collation...

ddevienne=> with t(v) as (values ('Foo All'), ('Foo*'), ('Foo Brief'), ('Foo*All')) select '<'||v||'>', v::bytea from t order by v collate "en_US";
  ?column?   |          v
-------------+----------------------
 <Foo*>      | \x466f6f2a
 <Foo All>   | \x466f6f20416c6c
 <Foo*All>   | \x466f6f2a416c6c
 <Foo Brief> | \x466f6f204272696566
(4 rows)


ddevienne=> with t(v) as (values ('Foo All'), ('Foo*'), ('Foo Brief'), ('Foo*All')) select '<'||v||'>', v::bytea from t order by v collate "C";
  ?column?   |          v
-------------+----------------------
 <Foo All>   | \x466f6f20416c6c
 <Foo Brief> | \x466f6f204272696566
 <Foo*>      | \x466f6f2a
 <Foo*All>   | \x466f6f2a416c6c
(4 rows)

pgsql-general by date:

Previous
From: Rob Sargent
Date:
Subject: Re: HOWTO? Permissions for user to access a single db
Next
From: Laurenz Albe
Date:
Subject: Re: Losing my latin on Ordering...