Home > mailing lists

Multi-byte character case-folding - Mailing list pgsql-hackers

From	Thom Brown
Subject	Multi-byte character case-folding
Date	July 6, 2020 17:35:10
Msg-id	CAA-aLv5nFfHd72H97u=OnGEsXVn3s-JV-jzMr-HeUePQgX4cEA@mail.gmail.com Whole thread
Responses	Re: Multi-byte character case-folding
List	pgsql-hackers

Tree view

Hi,

At the moment, only single-byte characters in identifiers are
case-folded, and multi-byte characters are not.

For example, abĉDĚF is case-folded to "abĉdĚf".  This can be referred
to as "abĉdĚf" or "ABĉDĚF", but not "abĉděf" or "ABĈDĚF".

downcase_identifier() has the following comment:

        /*
         * SQL99 specifies Unicode-aware case normalization, which we don't yet
         * have the infrastructure for.  Instead we use tolower() to provide a
         * locale-aware translation.  However, there are some locales where this
         * is not right either (eg, Turkish may do strange things with 'i' and
         * 'I').  Our current compromise is to use tolower() for characters with
         * the high bit set, as long as they aren't part of a multi-byte
         * character, and use an ASCII-only downcasing for 7-bit characters.
         */

So my question is, do we yet have the infrastructure to make
case-folding consistent across all character widths?

Thanks

Thom

pgsql-hackers by date:

From: Tom Lane
Date: 06 July 2020, 16:10:37
Subject: Re: Proposal: Automatic partition creation

From: Mark Dilger
Date: 06 July 2020, 18:06:16
Subject: Re: new heapcheck contrib module

Multi-byte character case-folding - Mailing list pgsql-hackers

Previous

Next