Home > mailing lists

Re: Refactor to introduce pg_strcoll(). - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: Refactor to introduce pg_strcoll().
Date	November 4, 2022 22:06:41
Msg-id	4124e778e2ebc93d8e64b836b0037f5426aa08d6.camel@j-davis.com Whole thread Raw
In response to	Re: Refactor to introduce pg_strcoll(). (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
Responses	Re: Refactor to introduce pg_strcoll().
List	pgsql-hackers

Tree view

On Tue, 2022-11-01 at 13:36 +0100, Peter Eisentraut wrote:
> But I think putting the Windows UTF-8 code (win32_utf8_wcscoll())
> from
> varstr_cmp() into pg_strcoll() might create future complications.
> Normally, it would be up to varstr_sortsupport() to set up a
> Windows/UTF-8 specific comparator function, but it says there it's
> not
> implemented.  But someone who wanted to implement that would have to
> lift it back out of pg_strcoll, or rearrange the code in some other
> way.

Is there a reason that it wouldn't work to just use
varlenafastcmp_locale() after my patch? The comment says:

  /*
   * There is a further exception on Windows.  When the database
   * encoding is UTF-8 and we are not using the C collation, complex
   * hacks are required...

But I think the complex hacks are just the transformation into UTF 16
and calling of wcscoll(). If that's done from within pg_strcoll()
(after my patch), then I think it just works, right?

I can't easily test on windows, so perhaps I'm missing something. Does
the buildfarm have enough coverage here? Is it reasonable to try a
commit that removes the:

  #ifdef WIN32
    if (GetDatabaseEncoding() == PG_UTF8 &&
        !(locale && locale->provider == COLLPROVIDER_ICU))
        return;
  #endif

along with my patch and see what I get out of the buildfarm?

> Perhaps you have some follow-up work
> planned, in which case it might be better to consider this in more
> context. 

I was also considering an optimization to use stack allocation for
short strings when doing the non-UTF8 icu comparison. I am seeing some
benefit there, but it seems to depend on the size of the stack buffer.
That suggests that maybe TEXTBUFSIZE is too large (1024) and perhaps we
should drop it down, but I need to investigate more.

In general, I'm trying to slowly get the locale stuff more isolated.

Regards,
    Jeff Davis

pgsql-hackers by date:

From: Andres Freund
Date: 04 November 2022, 21:21:26
Subject: Re: remap the .text segment into huge pages at run time

From: Juan José Santamaría Flecha
Date: 04 November 2022, 22:08:24
Subject: Re: WIN32 pg_import_system_collations

Re: Refactor to introduce pg_strcoll(). - Mailing list pgsql-hackers

Previous

Next