Re: Refactor to introduce pg_strcoll(). - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Refactor to introduce pg_strcoll().
Date
Msg-id 4124e778e2ebc93d8e64b836b0037f5426aa08d6.camel@j-davis.com
Whole thread Raw
In response to Re: Refactor to introduce pg_strcoll().  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
Responses Re: Refactor to introduce pg_strcoll().
List pgsql-hackers
On Tue, 2022-11-01 at 13:36 +0100, Peter Eisentraut wrote:
> But I think putting the Windows UTF-8 code (win32_utf8_wcscoll())
> from
> varstr_cmp() into pg_strcoll() might create future complications.
> Normally, it would be up to varstr_sortsupport() to set up a
> Windows/UTF-8 specific comparator function, but it says there it's
> not
> implemented.  But someone who wanted to implement that would have to
> lift it back out of pg_strcoll, or rearrange the code in some other
> way.

Is there a reason that it wouldn't work to just use
varlenafastcmp_locale() after my patch? The comment says:

  /*
   * There is a further exception on Windows.  When the database
   * encoding is UTF-8 and we are not using the C collation, complex
   * hacks are required...

But I think the complex hacks are just the transformation into UTF 16
and calling of wcscoll(). If that's done from within pg_strcoll()
(after my patch), then I think it just works, right?

I can't easily test on windows, so perhaps I'm missing something. Does
the buildfarm have enough coverage here? Is it reasonable to try a
commit that removes the:

  #ifdef WIN32
    if (GetDatabaseEncoding() == PG_UTF8 &&
        !(locale && locale->provider == COLLPROVIDER_ICU))
        return;
  #endif

along with my patch and see what I get out of the buildfarm?

> Perhaps you have some follow-up work
> planned, in which case it might be better to consider this in more
> context. 

I was also considering an optimization to use stack allocation for
short strings when doing the non-UTF8 icu comparison. I am seeing some
benefit there, but it seems to depend on the size of the stack buffer.
That suggests that maybe TEXTBUFSIZE is too large (1024) and perhaps we
should drop it down, but I need to investigate more.

In general, I'm trying to slowly get the locale stuff more isolated.

Regards,
    Jeff Davis



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: remap the .text segment into huge pages at run time
Next
From: Juan José Santamaría Flecha
Date:
Subject: Re: WIN32 pg_import_system_collations