Re: GB18030-2022 Support in PostgreSQL - Mailing list pgsql-hackers

From Chao Li
Subject Re: GB18030-2022 Support in PostgreSQL
Date
Msg-id CAEoWx2=BWDFXpB9OhfoKJGsU-Lk+7oQ8SW7a5GyoufLiFTWO8g@mail.gmail.com
Whole thread Raw
In response to Re: GB18030-2022 Support in PostgreSQL  (John Naylor <johncnaylorls@gmail.com>)
List pgsql-hackers

On Mon, Sep 29, 2025 at 12:03 PM John Naylor <johncnaylorls@gmail.com> wrote:
On Wed, Sep 24, 2025 at 4:18 PM Chao Li <li.evan.chao@gmail.com> wrote:
> I am not sure if you should also upgrade the UCM file to 2022 version, but if we need, let’s do it with a separate commit.

If they can all use the same file, we should just do that for the sake
of simplicity, in which case a separate commit is just extra noise.


In v3, I have updated EUC_CN to use gb18030-2022.ucm. Fortunately, the map files are unchanged, so we don't have to do much testing for EUC_CN.

For UHC, in the icu master branch https://github.com/unicode-org/icu/tree/main/icu4c/source/data/mappings, there is still windows-949-2000.ucm, thus only download URL is changed, file content is unchanged.

```
% make utf8_to_uhc.map utf8_to_euc_cn.map
wget -O windows-949-2000.ucm --no-use-server-timestamps https://raw.githubusercontent.com/unicode-org/icu/refs/heads/main/icu4c/source/data/mappings/windows-949-2000.ucm
--2025-09-29 16:00:40--  https://raw.githubusercontent.com/unicode-org/icu/refs/heads/main/icu4c/source/data/mappings/windows-949-2000.ucm
HTTP request sent, awaiting response... 200 OK
Length: 356253 (348K) [text/plain]
Saving to: ‘windows-949-2000.ucm’

windows-949-2000.ucm                             100%[=========================================================================================================>] 347.90K   222KB/s    in 1.6s

2025-09-29 16:00:43 (222 KB/s) - ‘windows-949-2000.ucm’ saved [356253/356253]

'/usr/bin/perl' -I . UCS_to_UHC.pl
- Writing UTF8=>UHC conversion table: utf8_to_uhc.map
- Writing UHC=>UTF8 conversion table: uhc_to_utf8.map
wget -O gb18030-2022.ucm --no-use-server-timestamps https://raw.githubusercontent.com/unicode-org/icu/refs/heads/main/icu4c/source/data/mappings/gb18030-2022.ucm
--2025-09-29 16:00:43--  https://raw.githubusercontent.com/unicode-org/icu/refs/heads/main/icu4c/source/data/mappings/gb18030-2022.ucm
HTTP request sent, awaiting response... 200 OK
Length: 675312 (659K) [text/plain]
Saving to: ‘gb18030-2022.ucm’

gb18030-2022.ucm                                 100%[=========================================================================================================>] 659.48K  1.33MB/s    in 0.5s

2025-09-29 16:00:44 (1.33 MB/s) - ‘gb18030-2022.ucm’ saved [675312/675312]

'/usr/bin/perl' -I . UCS_to_EUC_CN.pl
- Writing UTF8=>EUC_CN conversion table: utf8_to_euc_cn.map
- Writing EUC_CN=>UTF8 conversion table: euc_cn_to_utf8.map
% git diff
%
```

Please note, I didn't include the deletion of gb-18030-2000.xml in v3, because that will cause the patch file to be too big, thus requiring an approval process for the email to land in the Mail Archive. Please delete the xml file when you push the commit.

Best regards,
Chao Li (Evan)
---------------------
HighGo Software Co., Ltd.
Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Fix locking issue with fixed-size stats template in injection_points
Next
From: John Naylor
Date:
Subject: Re: [PATCH] Hex-coding optimizations using SVE on ARM.