On 10/26/24 12:42 AM, Jeff Davis wrote:
> On Thu, 2024-10-24 at 10:05 +0200, Andreas Karlsson wrote:
>> Why is there no pg_locale_builtin.c?
>
> Just that it would be a fairly small file, but I'm fine with doing
> that.
I think adding such a small file would make life easier for people new
to the collation part of the code base. It would be a nice symmetry
between collation providers and where code for them can be found.
>> Why are casemap and ctype_methods not the same struct? They seem very
>> closely related.
>
> The code impact was in fairly different places, so it seemed like a
> nice way to break it out. I could combine them, but it would be a
> fairly large patch.
For me combining them would make the intention of the code easier to
understand since aren't the casemap functions just a set of "ctype_methods"?
>> This commit makes me tempted to handle the ctype_is_c logic for
>> character classes also in callbacks and remove the if in functions
>> like
>> pg_wc_ispunct(). But this si something that would need to be
>> benchmarked.
>
> That's a good idea. The reason collate_is_c is important is because
> there are quite a few caller-specific optimizations, but that doesn't
> seem to be true of ctype_is_c.
Yeah, that was my though too but I have not confirmed it.
>> I wonder if the bitmask idea isn't terrible for the branch predictor
>> and
>> that me may want one function per character class, but this is yet
>> again
>> something we need to benchmark.
>
> Agreed -- a lot of work has gone into optimizing the regex code, and we
> don't want a perf regression there. But I'm also not sure exactly which
> kinds of tests I should be running for that.
I think we should at least try to find the worst case to see how big the
performance hit for that is. And then after that try to figure out a
more typical case benchmark.
>> = v6-0011-Introduce-hooks-for-creating-custom-pg_locale_t.patch
>>
>> Looks good but seems like a quite painful API to use.
>
> How is it painful and can we make it better?
The painful part was mostly just a reference to that without a catalog
table where new providers can be added we would need to add collations
for our new custom provider on some already existing provider and then
do for example some pattern matching on the name of the new collation.
Really ugly but works.
I am thinking of implementing ICU4x as an external extension to try out
the hook, but for the in-core contrib module we likely want to use
something which does not require an external dependency. Or what do you
think?
Andreas