Re: Pre-proposal: unicode normalized text - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Pre-proposal: unicode normalized text
Date
Msg-id 96c0173c5156d365e132ec29e4873237be565743.camel@j-davis.com
Whole thread Raw
In response to Re: Pre-proposal: unicode normalized text  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Pre-proposal: unicode normalized text
List pgsql-hackers
On Wed, 2023-10-04 at 13:16 -0400, Robert Haas wrote:
> > At minimum I think we need to have some internal functions to check
> > for
> > unassigned code points. That belongs in core, because we generate
> > the
> > unicode tables from a specific version.
>
> That's a good idea.

Patch attached.

I added a new perl script to parse UnicodeData.txt and generate a
lookup table (of ranges, which can be binary-searched).

The C entry point does the same thing as u_charType(), and I also
matched the enum numeric values for convenience. I didn't use
u_charType() because I don't think this kind of unicode functionality
should depend on ICU, and I think it should match other Postgres
Unicode functionality.

Strictly speaking, I only needed to know whether it's unassigned or
not, not the general category. But it seemed easy enough to return the
general category, and it will be easier to create other potentially-
useful functions on top of this.

The tests do require ICU though, because I compare with the results of
u_charType().

Regards,
    Jeff Davis


Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: typo in couple of places
Next
From: Vik Fearing
Date:
Subject: Re: Add support for AT LOCAL