Re: Update Unicode data to Unicode 16.0.0 - Mailing list pgsql-hackers
From | Joe Conway |
---|---|
Subject | Re: Update Unicode data to Unicode 16.0.0 |
Date | |
Msg-id | f0bd0304-97b8-4a55-bf16-d1a7feb948e3@joeconway.com Whole thread Raw |
Responses |
Re: Update Unicode data to Unicode 16.0.0
|
List | pgsql-hackers |
On 11/11/24 01:27, Peter Eisentraut wrote: > Here is the patch to update the Unicode data to version 16.0.0. > > Normally, this would have been routine, but a few months ago there was > some debate about how this should be handled. [0] AFAICT, the consensus > was to go ahead with it, but I just wanted to notify it here to be clear. > > [0]: > https://www.postgresql.org/message-id/flat/d75d2d0d1d2bd45b2c332c47e3e0a67f0640b49c.camel%40j-davis.com I ran a check and found that this patch causes changes in upper casing of some characters. Repro: setup 8<------------- wget https://joeconway.com/presentations/formated-unicode.txt initdb psql CREATE DATABASE builtincoll LOCALE_PROVIDER builtin BUILTIN_LOCALE 'C.UTF-8' TEMPLATE template0; \c builtincoll CREATE TABLE unsorted_table(strings text); \copy unsorted_table from formated-unicode.txt (format csv) VACUUM FREEZE ANALYZE unsorted_table; 8<------------- 8<------------- -- on master builtincoll=# WITH t AS (SELECT lower(strings) AS s FROM unsorted_table ORDER BY 1) SELECT md5(string_agg(t.s,NULL)) FROM t; md5 ---------------------------------- 7ec7f5c2d8729ec960942942bb82aedd (1 row) builtincoll=# WITH t AS (SELECT upper(strings) AS s FROM unsorted_table ORDER BY 1) SELECT md5(string_agg(t.s,NULL)) FROM t; md5 ---------------------------------- 97f83a4d1937aa65bcf8be134bf7b0c4 (1 row) builtincoll=# WITH t AS (SELECT initcap(strings) AS s FROM unsorted_table ORDER BY 1) SELECT md5(string_agg(t.s,NULL)) FROM t; md5 ---------------------------------- 8cf65a43affc221f3a20645ef402085e (1 row) 8<------------- 8<------------- -- master+patch builtincoll=# WITH t AS (SELECT lower(strings) AS s FROM unsorted_table ORDER BY 1) SELECT md5(string_agg(t.s,NULL)) FROM t; md5 ---------------------------------- 7ec7f5c2d8729ec960942942bb82aedd (1 row) Time: 19858.981 ms (00:19.859) builtincoll=# WITH t AS (SELECT upper(strings) AS s FROM unsorted_table ORDER BY 1)SELECT md5(string_agg(t.s,NULL)) FROM t; md5 ---------------------------------- 3055b3d5dff76c8c1250ef500c6ec13f (1 row) Time: 19774.467 ms (00:19.774) builtincoll=# WITH t AS (SELECT initcap(strings) AS s FROM unsorted_table ORDER BY 1) SELECT md5(string_agg(t.s,NULL)) FROM t; md5 ---------------------------------- 9985acddf7902ea603897cdaccd02114 (1 row) 8<------------- So both UPPER and INITCAP produce different results unless I am missing something. -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: