Re: C11: should we use char32_t for unicode code points? - Mailing list pgsql-hackers
| From | Thomas Munro |
|---|---|
| Subject | Re: C11: should we use char32_t for unicode code points? |
| Date | |
| Msg-id | CA+hUKGLWggvAW+ZK=P1ZoUBgS8EhodpA7ipeGuq2-3HePjjXDw@mail.gmail.com Whole thread Raw |
| In response to | Re: C11: should we use char32_t for unicode code points? (Peter Eisentraut <peter@eisentraut.org>) |
| Responses |
Re: C11: should we use char32_t for unicode code points?
|
| List | pgsql-hackers |
On Wed, Oct 29, 2025 at 7:45 AM Peter Eisentraut <peter@eisentraut.org> wrote:
> On 26.10.25 20:43, Jeff Davis wrote:
> > +/*
> > + * char16_t and char32_t
> > + * Unicode code points.
> > + */
> > +#ifndef __cplusplus
> > +#ifdef HAVE_UCHAR_H
> > +#include <uchar.h>
> > +#ifndef __STDC_UTF_16__
> > +#error "char16_t must use UTF-16 encoding"
> > +#endif
> > +#ifndef __STDC_UTF_32__
> > +#error "char32_t must use UTF-32 encoding"
> > +#endif
> > +#else
> > +typedef uint16_t char16_t;
> > +typedef uint32_t char32_t;
> > +#endif
> > +#endif
>
> This could be improved a bit. The reason for some of these conditionals
> is not clear. Like, what does __cplusplus have to do with this? I
> think it would be more correct to write a configure/meson check for the
> actual types rather than depend indirectly on a header check.
I suggested testing __cplusplus because I predicted that that typedef
would fail on a C++ compiler (since C++11), where char32_t is a
language keyword identifying a distinct type requiring no #include.
This is an Apple-only problem, without which we could just include
<uchar.h> unconditionally, and presumably will eventually when Apple
supplies this non-optional-per-C11 header. On a Mac, #include
<uchar.h> fails for C (there is no $SDK/usr/include/uchar.h) but works
for C++ (it finds $SDK/usr/include/c++/v1/uchar.h), and since we'd
probe for HAVE_UCHAR_H with the C compiler, we'd not find it and thus
also need to exclude __cplusplus at compile time. Otherwise, let's
see what the error looks like...
test.cpp:2:22: error: cannot combine with previous 'int' declaration specifier
2 | typedef unsigned int char32_t;
| ^
test.cpp:2:1: warning: typedef requires a name [-Wmissing-declarations]
2 | typedef unsigned int char32_t;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning and 1 error generated.
GCC has a clearer message:
test.cpp:2:22: error: redeclaration of C++ built-in type 'char32_t'
[-fpermissive]
2 | typedef unsigned int char32_t;
| ^~~~~~~~
If you try to test for the existence of the type rather than the
header in meson/configure, won't you still have the configure-with-C
compile-with-C++ problem, with no way to resolve it except by keeping
the test for __cplusplus that you're trying to get rid of? So what do
you gain other than more lines of configure stuff?
Out of curiosity, even with -std=C++03 (old C++ standard that might
not work for PostgreSQL for other reasons, but I wanted to see what
would happen with a standard before char32_t became a fundamental
language type) I was surprised to see that the standard library
supplied char32_t. It incorrectly(?) imports a typename from the
future standards using an internal type, so our typedef still fails,
just with a different Clang error:
test.cpp:2:22: error: typedef redefinition with different types
('unsigned int' vs 'char32_t')
2 | typedef unsigned int char32_t;
| ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__config:320:20:
note: previous definition is here
320 | typedef __char32_t char32_t;
| ^
> The checks for __STDC_UTF_16__ and __STDC_UTF_32__ can be removed, as
> was discussed elsewhere, since we don't use any standard library
> functions that make use of these facts, and the need goes away with C23
> anyway.
+1
pgsql-hackers by date: