Home > mailing lists

Re: Pre-proposal: unicode normalized text - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: Pre-proposal: unicode normalized text
Date	October 7, 2023 01:18:01
Msg-id	96c0173c5156d365e132ec29e4873237be565743.camel@j-davis.com Whole thread Raw
In response to	Re: Pre-proposal: unicode normalized text (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Pre-proposal: unicode normalized text
List	pgsql-hackers

Tree view

On Wed, 2023-10-04 at 13:16 -0400, Robert Haas wrote:
> > At minimum I think we need to have some internal functions to check
> > for
> > unassigned code points. That belongs in core, because we generate
> > the
> > unicode tables from a specific version.
>
> That's a good idea.

Patch attached.

I added a new perl script to parse UnicodeData.txt and generate a
lookup table (of ranges, which can be binary-searched).

The C entry point does the same thing as u_charType(), and I also
matched the enum numeric values for convenience. I didn't use
u_charType() because I don't think this kind of unicode functionality
should depend on ICU, and I think it should match other Postgres
Unicode functionality.

Strictly speaking, I only needed to know whether it's unassigned or
not, not the general category. But it seemed easy enough to return the
general category, and it will be easier to create other potentially-
useful functions on top of this.

The tests do require ICU though, because I compare with the results of
u_charType().

Regards,
    Jeff Davis

Attachment

v1-0001-Internal-functions-for-determining-Unicode-genera.patch

pgsql-hackers by date:

From: Amit Kapila
Date: 07 October 2023, 00:19:26
Subject: Re: typo in couple of places

From: Vik Fearing
Date: 07 October 2023, 01:35:06
Subject: Re: Add support for AT LOCAL

Re: Pre-proposal: unicode normalized text - Mailing list pgsql-hackers

Attachment

Previous

Next