Thread: [PROPOSAL] Skip test citext_utf8 on Windows

[PROPOSAL] Skip test citext_utf8 on Windows

From

Oleg Tselebrovskiy

Date:

11 March 2024, 08:21:11

Greetings, everyone!

While running "installchecks" on databases with UTF-8 encoding the test
citext_utf8 fails because of Turkish dotted I like this:

  SELECT 'i'::citext = 'İ'::citext AS t;
   t
  ---
- t
+ f
  (1 row)

I tried to replicate the test's results by hand and with any collation
that I tried (including --locale="Turkish") this test failed

Also an interesing result of my tesing. If you initialize you DB
with -E utf-8 --locale="Turkish" and then run select LOWER('İ');
the output will be this:
  lower
-------
  İ
(1 row)

Which I find strange since lower() uses collation that was passed
(default in this case but still)

My PostgreSQL version is this:
postgres=# select version();
                                version
----------------------------------------------------------------------
  PostgreSQL 17devel on x86_64-windows, compiled by gcc-13.1.0, 64-bit

The proposed patch for skipping test is attached

Oleg Tselebrovskiy, Postgres Pro

Attachment

skip_citext_utf8.patch

Re: [PROPOSAL] Skip test citext_utf8 on Windows

From

Michael Paquier

Date:

11 March 2024, 23:24:49

On Mon, Mar 11, 2024 at 03:21:11PM +0700, Oleg Tselebrovskiy wrote:
> The proposed patch for skipping test is attached

Your attached patch seems to be in binary format.
--
Michael

Attachment

signature.asc

Re: [PROPOSAL] Skip test citext_utf8 on Windows

From

Andrew Dunstan

Date:

12 March 2024, 01:55:53

On 2024-03-11 Mo 04:21, Oleg Tselebrovskiy wrote:
> Greetings, everyone!
>
> While running "installchecks" on databases with UTF-8 encoding the test
> citext_utf8 fails because of Turkish dotted I like this:
>
>  SELECT 'i'::citext = 'İ'::citext AS t;
>   t
>  ---
> - t
> + f
>  (1 row)
>
> I tried to replicate the test's results by hand and with any collation
> that I tried (including --locale="Turkish") this test failed
>
> Also an interesing result of my tesing. If you initialize you DB
> with -E utf-8 --locale="Turkish" and then run select LOWER('İ');
> the output will be this:
>  lower
> -------
>  İ
> (1 row)
>
> Which I find strange since lower() uses collation that was passed
> (default in this case but still)



Wouldn't we be better off finding a Windows fix for this, instead of 
sweeping it under the rug?


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: [PROPOSAL] Skip test citext_utf8 on Windows

From

Thomas Munro

Date:

12 March 2024, 02:50:20

On Tue, Mar 12, 2024 at 2:56 PM Andrew Dunstan <andrew@dunslane.net> wrote:
> On 2024-03-11 Mo 04:21, Oleg Tselebrovskiy wrote:
> > Greetings, everyone!
> >
> > While running "installchecks" on databases with UTF-8 encoding the test
> > citext_utf8 fails because of Turkish dotted I like this:
> >
> >  SELECT 'i'::citext = 'İ'::citext AS t;
> >   t
> >  ---
> > - t
> > + f
> >  (1 row)
> >
> > I tried to replicate the test's results by hand and with any collation
> > that I tried (including --locale="Turkish") this test failed
> >
> > Also an interesing result of my tesing. If you initialize you DB
> > with -E utf-8 --locale="Turkish" and then run select LOWER('İ');
> > the output will be this:
> >  lower
> > -------
> >  İ
> > (1 row)
> >
> > Which I find strange since lower() uses collation that was passed
> > (default in this case but still)
>
> Wouldn't we be better off finding a Windows fix for this, instead of
> sweeping it under the rug?

Given the sorry state of our Windows locale support, I've started
wondering about deleting it and telling users to adopt our nascent
built-in support or ICU[1].

This other thread [2] says the sorting is intransitive so I don't
think it really meets our needs anyway.

[1]
https://www.postgresql.org/message-id/flat/CA%2BhUKGJhV__g_TJ0jVqPbnTuqT%2B%2BM6KFv2wj%2B9AV-cABNCXN6Q%40mail.gmail.com#bc35c0b88962ff8c24c27aecc1bca72e
[2] https://www.postgresql.org/message-id/flat/1407a2c0-062b-4e4c-b728-438fdff5cb07%40manitou-mail.org

Re: [PROPOSAL] Skip test citext_utf8 on Windows

From

Oleg Tselebrovskiy

Date:

12 March 2024, 04:45:03

Michael Paquier писал(а) 2024-03-12 06:24:
> On Mon, Mar 11, 2024 at 03:21:11PM +0700, Oleg Tselebrovskiy wrote:
>> The proposed patch for skipping test is attached
> 
> Your attached patch seems to be in binary format.
> --
> Michael
Right, I had it saved in not-UTF-8 encoding. Kind of ironic

Here's a fixed version

Attachment

v2_skip_citext_utf8.patch

Re: [PROPOSAL] Skip test citext_utf8 on Windows

From

Andrew Dunstan

Date:

13 March 2024, 00:26:38

On 2024-03-11 Mo 22:50, Thomas Munro wrote:
> On Tue, Mar 12, 2024 at 2:56 PM Andrew Dunstan <andrew@dunslane.net> wrote:
>> On 2024-03-11 Mo 04:21, Oleg Tselebrovskiy wrote:
>>> Greetings, everyone!
>>>
>>> While running "installchecks" on databases with UTF-8 encoding the test
>>> citext_utf8 fails because of Turkish dotted I like this:
>>>
>>>   SELECT 'i'::citext = 'İ'::citext AS t;
>>>    t
>>>   ---
>>> - t
>>> + f
>>>   (1 row)
>>>
>>> I tried to replicate the test's results by hand and with any collation
>>> that I tried (including --locale="Turkish") this test failed
>>>
>>> Also an interesing result of my tesing. If you initialize you DB
>>> with -E utf-8 --locale="Turkish" and then run select LOWER('İ');
>>> the output will be this:
>>>   lower
>>> -------
>>>   İ
>>> (1 row)
>>>
>>> Which I find strange since lower() uses collation that was passed
>>> (default in this case but still)
>> Wouldn't we be better off finding a Windows fix for this, instead of
>> sweeping it under the rug?
> Given the sorry state of our Windows locale support, I've started
> wondering about deleting it and telling users to adopt our nascent
> built-in support or ICU[1].
>
> This other thread [2] says the sorting is intransitive so I don't
> think it really meets our needs anyway.
>
> [1]
https://www.postgresql.org/message-id/flat/CA%2BhUKGJhV__g_TJ0jVqPbnTuqT%2B%2BM6KFv2wj%2B9AV-cABNCXN6Q%40mail.gmail.com#bc35c0b88962ff8c24c27aecc1bca72e
> [2] https://www.postgresql.org/message-id/flat/1407a2c0-062b-4e4c-b728-438fdff5cb07%40manitou-mail.org


Makes more sense than just hacking the tests to avoid running them on 
Windows. (I also didn't much like doing it by parsing the version 
string, although I know there's at least one precedent for doing that.)


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com