Home > mailing lists

Re: C11: should we use char32_t for unicode code points? - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: C11: should we use char32_t for unicode code points?
Date	October 29 18:12:01
Msg-id	024a6d53f246c87ee2796563f930a66fde1c0c0d.camel@j-davis.com Whole thread Raw
In response to	Re: C11: should we use char32_t for unicode code points? (Thomas Munro <thomas.munro@gmail.com>)
List	pgsql-hackers

Tree view

On Wed, 2025-10-29 at 14:00 +1300, Thomas Munro wrote:
> I wonder if the logic to select the member/semantics could be turned
> into an enum in the encoding table, to make it even clearer, and then
> that could be used as an index into a table of ctype methods obejcts
> in _libc.c.

As long as we're able to isolate that logic in the libc provider,
that's reasonable. The other providers don't need that complexity, they
just need to decode straight to UTF-32.

> You showed char16_t for Windows, but we don't ever get char16_t out
> of
> wchar.c, it's always char32_t for UTF-8 input.  It's just that
> _libc.c
> truncates to UTF-16 or short-circuits to avoid overflow on that
> platform (and in the past AIX 32-bit and maybe more), so it wouldn't
> belong in a hypothetical union or enum.

Oh, I see.

> >
> Perhaps we could at least put the conversion in a new encoding table
> function pointer "pg_wchar_custom_to_wchar_t", so we could reserve a
> place to put that sort of optimisation in

That sounds like a good step forward. And maybe one to convert to UTF-
32 for ICU, also?

> If we do develop this idea though, one issue to contemplate is that
> EUC code points might generate more than one wchar_t, looking at
> EUC_JIS_2004[1].

Wow, that's unfortunate.


Regards,
    Jeff Davis

pgsql-hackers by date:

From: Ashutosh Bapat
Date: 29 October, 17:55:16
Subject: Re: Report bytes and transactions actually sent downtream

From: Sami Imseih
Date: 29 October, 18:24:17
Subject: Re: another autovacuum scheduling thread

Re: C11: should we use char32_t for unicode code points? - Mailing list pgsql-hackers

Previous

Next