Re: Collation & ctype method table, and extension hooks - Mailing list pgsql-hackers

From Andreas Karlsson
Subject Re: Collation & ctype method table, and extension hooks
Date
Msg-id 247f18f3-09e5-480d-9222-9d0d9684995f@proxel.se
Whole thread Raw
In response to Re: Collation & ctype method table, and extension hooks  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Collation & ctype method table, and extension hooks
List pgsql-hackers
On 10/26/24 12:42 AM, Jeff Davis wrote:
> On Thu, 2024-10-24 at 10:05 +0200, Andreas Karlsson wrote:
>> Why is there no pg_locale_builtin.c?
> 
> Just that it would be a fairly small file, but I'm fine with doing
> that.

I think adding such a small file would make life easier for people new 
to the collation part of the code base. It would be a nice symmetry 
between collation providers and where code for them can be found.

>> Why are casemap and ctype_methods not the same struct? They seem very
>> closely related.
> 
> The code impact was in fairly different places, so it seemed like a
> nice way to break it out. I could combine them, but it would be a
> fairly large patch.

For me combining them would make the intention of the code easier to 
understand since aren't the casemap functions just a set of "ctype_methods"?

>> This commit makes me tempted to handle the ctype_is_c logic for
>> character classes also in callbacks and remove the if in functions
>> like
>> pg_wc_ispunct(). But this si something that would need to be
>> benchmarked.
> 
> That's a good idea. The reason collate_is_c is important is because
> there are quite a few caller-specific optimizations, but that doesn't
> seem to be true of ctype_is_c.

Yeah, that was my though too but I have not confirmed it.

>> I wonder if the bitmask idea isn't terrible for the branch predictor
>> and
>> that me may want one function per character class, but this is yet
>> again
>> something we need to benchmark.
> 
> Agreed -- a lot of work has gone into optimizing the regex code, and we
> don't want a perf regression there. But I'm also not sure exactly which
> kinds of tests I should be running for that.

I think we should at least try to find the worst case to see how big the 
performance hit for that is. And then after that try to figure out a 
more typical case benchmark.

>> = v6-0011-Introduce-hooks-for-creating-custom-pg_locale_t.patch
>>
>> Looks good but seems like a quite painful API to use.
> 
> How is it painful and can we make it better?

The painful part was mostly just a reference to that without a catalog 
table where new providers can be added we would need to add collations 
for our new custom provider on some already existing provider and then 
do for example some pattern matching on the name of the new collation. 
Really ugly but works.

I am thinking of implementing ICU4x as an external extension to try out 
the hook, but for the in-core contrib module we likely want to use 
something which does not require an external dependency. Or what do you 
think?

Andreas




pgsql-hackers by date:

Previous
From: 송영욱
Date:
Subject: Inquiry on Future Plans for Enhancements to PostgreSQL MVCC Model and Vacuum Process
Next
From: Bruce Momjian
Date:
Subject: Re: PG does not support one function of its extension pg_hint_plan