Home > mailing lists

Re: Built-in CTYPE provider - Mailing list pgsql-hackers

From	Daniel Verite
Subject	Re: Built-in CTYPE provider
Date	December 13, 2023 15:34:15
Msg-id	d26df384-2fa7-4f50-b703-b0b6706dbeff@manitou-mail.org Whole thread Raw
In response to	Built-in CTYPE provider (Jeff Davis <pgsql@j-davis.com>)
Responses	Re: Built-in CTYPE provider Re: Built-in CTYPE provider
List	pgsql-hackers

Tree view

    Jeff Davis wrote:

> While "full" case mapping sounds more complex, there are actually
> very few cases to consider and they are covered in another (small)
> data file. That data file covers ~100 code points that convert to
> multiple code points when the case changes (e.g. "ß" -> "SS"), 7
> code points that have context-sensitive mappings, and three locales
> which have special conversions ("lt", "tr", and "az") for a few code
> points.

But there are CLDR mappings on top of that.

According to the Unicode FAQ

   https://unicode.org/faq/casemap_charprop.html#5

   Q: Does the default case mapping work for every language? What
   about the default case folding?

   [...]

   To make case mapping language sensitive, the Unicode Standard
   specificially allows implementations to tailor the mappings for
   each language, but does not provide the necessary data. The file
   SpecialCasing.txt is included in the Standard as a guide to a few
   of the more important individual character mappings needed for
   specific languages, notably the Greek script and the Turkic
   languages. However, for most language-specific mappings and
   tailoring, users should refer to CLDR and other resources.

In particular "el" (modern greek) has case mapping rules that
ICU seems to implement, but "el" is missing from the list
("lt", "tr", and "az") you identified.

The CLDR case mappings seem to be found in
https://github.com/unicode-org/cldr/tree/main/common/transforms
in *-Lower.xml and *-Upper.xml

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

pgsql-hackers by date:

From: Sacha Hottinger
Date: 13 December 2023, 15:18:02
Subject: AW: Building PosgresSQL with LLVM fails on Solaris 11.4

From: Emre Hasegeli
Date: 13 December 2023, 15:54:33
Subject: "pgoutput" options missing on documentation

Re: Built-in CTYPE provider - Mailing list pgsql-hackers

Previous

Next