Re: sort order for UTF-8 char column with Japanese UTF-8 - Mailing list pgsql-general

From Thomas Munro
Subject Re: sort order for UTF-8 char column with Japanese UTF-8
Date
Msg-id CA+hUKGLR86ZK8dq0onE4ExMvtVU9w41ZpUsBjVxoddWzO1b0NA@mail.gmail.com
Whole thread Raw
In response to Re: sort order for UTF-8 char column with Japanese UTF-8  (Matthias Apitz <guru@unixarea.de>)
List pgsql-general
On Fri, Feb 4, 2022 at 8:11 AM Matthias Apitz <guru@unixarea.de> wrote:
> On my FreeBSD laptop the same file sorts as
>
> guru@c720-r368166:~ $ LANG=de_DE.UTF-8 sort swd
> A
> ゲアハルト・A・リッター
> ゲルハルト・A・リッター
> チャールズ・A・ビアード
> A010STRUKTUR
> A010STRUKTUR
> A010STRUKTUR
> A0150SUPRALEITER

Wow, so it's one thing to have a different default "script order" than
glibc and ICU (which is something you can customise IIRC), but isn't
something broken here if the Japanese text comes between "A" and
"A0..."??  Hmm, it's almost as if it completely ignored the Japanese
text.  From my FreeBSD box:

tmunro=> select * from t order by x collate "de_DE.UTF-8";
            x
--------------------------
 ゲアハルト
 A
 ゲアハルト・A・リッター
 A0
 A010STRUKTUR
 AA
 ゲアハルト・AA・リッター
 ゲアハルト・B・リッター
(8 rows)

tmunro=> select * from t order by x collate "ja_JP.UTF-8";
            x
--------------------------
 A
 A0
 A010STRUKTUR
 AA
 ゲアハルト
 ゲアハルト・AA・リッター
 ゲアハルト・A・リッター
 ゲアハルト・B・リッター
(8 rows)

Seems like something to investigate in FreeBSD land.

pgsql-general by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: pg_cron for vacuum - dynamic table set
Next
From: Michael Lewis
Date:
Subject: Re: pg_cron for vacuum - dynamic table set