Re: Optimization for lower(), upper(), casefold() functions. - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Optimization for lower(), upper(), casefold() functions.
Date
Msg-id 4f3772355e038acd3cfe0be43ae2c1aacae1794d.camel@j-davis.com
Whole thread Raw
In response to Re: Optimization for lower(), upper(), casefold() functions.  (Alexander Borisov <lex.borisov@gmail.com>)
Responses Re: Optimization for lower(), upper(), casefold() functions.
List pgsql-hackers
On Sun, 2025-03-02 at 23:33 +0300, Alexander Borisov wrote:
> Did you have a time for review this?
>
> I'd like to continue improving Unicode in Postgres, as I previously
> wrote, next in my plans are Normalization forms, and more.
> But now I am blocked by this patch.

Hi,

I have refactored unicode_case.c a bit (v3j-0001) and rebased your v3
work on top of that (v3j-0002).

The refactoring is so that the optimizations do not need to modify
convert_case, which is already complex and I'd like to avoid adding
more to that function. Instead, I created a casemap() function, which
maps a single chracter, and convert_case() calls that.

I didn't test the refactoring for performance, but it looks as
optimizable as what was there before.

A couple questions:

* Is there a reason the fast-path for codepoints < 0x80 is in
unicode_case.c rather than unicode_case_func.h?

* Is there a reason you defined case_index() as static rather than
static inline?

* Is there a reason to have a new file unicode_case_func.h rather than
just add it to unicode_case_table.h?

I'm looking at a few more details, but this is a low-risk change
because there are exhaustive tests, so I intend to commit something
like this soon.

Regards,
    Jeff Davis


Attachment

pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: AIO v2.5
Next
From: Amit Kapila
Date:
Subject: Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.