Re: C11: should we use char32_t for unicode code points? - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: C11: should we use char32_t for unicode code points?
Date
Msg-id CA+hUKGLWggvAW+ZK=P1ZoUBgS8EhodpA7ipeGuq2-3HePjjXDw@mail.gmail.com
Whole thread Raw
In response to Re: C11: should we use char32_t for unicode code points?  (Peter Eisentraut <peter@eisentraut.org>)
Responses Re: C11: should we use char32_t for unicode code points?
List pgsql-hackers
On Wed, Oct 29, 2025 at 7:45 AM Peter Eisentraut <peter@eisentraut.org> wrote:
> On 26.10.25 20:43, Jeff Davis wrote:
> > +/*
> > + * char16_t and char32_t
> > + *      Unicode code points.
> > + */
> > +#ifndef __cplusplus
> > +#ifdef HAVE_UCHAR_H
> > +#include <uchar.h>
> > +#ifndef __STDC_UTF_16__
> > +#error "char16_t must use UTF-16 encoding"
> > +#endif
> > +#ifndef __STDC_UTF_32__
> > +#error "char32_t must use UTF-32 encoding"
> > +#endif
> > +#else
> > +typedef uint16_t char16_t;
> > +typedef uint32_t char32_t;
> > +#endif
> > +#endif
>
> This could be improved a bit. The reason for some of these conditionals
> is not clear.  Like, what does __cplusplus have to do with this?  I
> think it would be more correct to write a configure/meson check for the
> actual types rather than depend indirectly on a header check.

I suggested testing __cplusplus because I predicted that that typedef
would fail on a C++ compiler (since C++11), where char32_t is a
language keyword identifying a distinct type requiring no #include.
This is an Apple-only problem, without which we could just include
<uchar.h> unconditionally, and presumably will eventually when Apple
supplies this non-optional-per-C11 header.  On a Mac, #include
<uchar.h> fails for C (there is no $SDK/usr/include/uchar.h) but works
for C++ (it finds $SDK/usr/include/c++/v1/uchar.h), and since we'd
probe for HAVE_UCHAR_H with the C compiler, we'd not find it and thus
also need to exclude __cplusplus at compile time.  Otherwise, let's
see what the error looks like...

test.cpp:2:22: error: cannot combine with previous 'int' declaration specifier
    2 | typedef unsigned int char32_t;
      |                      ^
test.cpp:2:1: warning: typedef requires a name [-Wmissing-declarations]
    2 | typedef unsigned int char32_t;
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning and 1 error generated.

GCC has a clearer message:

test.cpp:2:22: error: redeclaration of C++ built-in type 'char32_t'
[-fpermissive]
    2 | typedef unsigned int char32_t;
      |                      ^~~~~~~~

If you try to test for the existence of the type rather than the
header in meson/configure, won't you still have the configure-with-C
compile-with-C++ problem, with no way to resolve it except by keeping
the test for __cplusplus that you're trying to get rid of?  So what do
you gain other than more lines of configure stuff?

Out of curiosity, even with -std=C++03 (old C++ standard that might
not work for PostgreSQL for other reasons, but I wanted to see what
would happen with a standard before char32_t became a fundamental
language type) I was surprised to see that the standard library
supplied char32_t.  It incorrectly(?) imports a typename from the
future standards using an internal type, so our typedef still fails,
just with a different Clang error:

test.cpp:2:22: error: typedef redefinition with different types
('unsigned int' vs 'char32_t')
    2 | typedef unsigned int char32_t;
      |                      ^

/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__config:320:20:
note: previous definition is here
  320 | typedef __char32_t char32_t;
      |                    ^

> The checks for __STDC_UTF_16__ and __STDC_UTF_32__ can be removed, as
> was discussed elsewhere, since we don't use any standard library
> functions that make use of these facts, and the need goes away with C23
> anyway.

+1



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: C11: should we use char32_t for unicode code points?
Next
From: Robert Haas
Date:
Subject: Re: apply_scanjoin_target_to_paths and partitionwise join