On Tue, 2022-11-01 at 13:36 +0100, Peter Eisentraut wrote:
> But I think putting the Windows UTF-8 code (win32_utf8_wcscoll())
> from
> varstr_cmp() into pg_strcoll() might create future complications.
> Normally, it would be up to varstr_sortsupport() to set up a
> Windows/UTF-8 specific comparator function, but it says there it's
> not
> implemented. But someone who wanted to implement that would have to
> lift it back out of pg_strcoll, or rearrange the code in some other
> way.
Is there a reason that it wouldn't work to just use
varlenafastcmp_locale() after my patch? The comment says:
/*
* There is a further exception on Windows. When the database
* encoding is UTF-8 and we are not using the C collation, complex
* hacks are required...
But I think the complex hacks are just the transformation into UTF 16
and calling of wcscoll(). If that's done from within pg_strcoll()
(after my patch), then I think it just works, right?
I can't easily test on windows, so perhaps I'm missing something. Does
the buildfarm have enough coverage here? Is it reasonable to try a
commit that removes the:
#ifdef WIN32
if (GetDatabaseEncoding() == PG_UTF8 &&
!(locale && locale->provider == COLLPROVIDER_ICU))
return;
#endif
along with my patch and see what I get out of the buildfarm?
> Perhaps you have some follow-up work
> planned, in which case it might be better to consider this in more
> context.
I was also considering an optimization to use stack allocation for
short strings when doing the non-UTF8 icu comparison. I am seeing some
benefit there, but it seems to depend on the size of the stack buffer.
That suggests that maybe TEXTBUFSIZE is too large (1024) and perhaps we
should drop it down, but I need to investigate more.
In general, I'm trying to slowly get the locale stuff more isolated.
Regards,
Jeff Davis