Home > mailing lists

Re: encoding affects ICU regex character classification - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: encoding affects ICU regex character classification
Date	November 29, 2023 23:56:04
Msg-id	360857.1701302164@sss.pgh.pa.us Whole thread Raw
In response to	encoding affects ICU regex character classification (Jeff Davis <pgsql@j-davis.com>)
Responses	Re: encoding affects ICU regex character classification
List	pgsql-hackers

Tree view

Jeff Davis <pgsql@j-davis.com> writes:
> The problem seems to be confusion between pg_wchar and a unicode code
> point in pg_wc_isalpha() and related functions.

Yeah, that's an ancient sore spot: we don't really know what the
representation of wchar is.  We assume it's Unicode code points
for UTF8 locales, but libc isn't required to do that AFAIK.  See
comment block starting about line 20 in regc_pg_locale.c.

I doubt that ICU has much to do with this directly.

We'd have to find an alternate source of knowledge to replace the
<wctype.h> functions if we wanted to fix it fully ... can ICU do that?

            regards, tom lane

pgsql-hackers by date:

From: Jeff Davis
Date: 29 November 2023, 23:46:26
Subject: encoding affects ICU regex character classification

From: Tomas Vondra
Date: 29 November 2023, 23:58:45
Subject: Re: logical decoding and replication of sequences, take 2

Re: encoding affects ICU regex character classification - Mailing list pgsql-hackers

Previous

Next