Thread: [PROPOSAL] Skip test citext_utf8 on Windows
Greetings, everyone! While running "installchecks" on databases with UTF-8 encoding the test citext_utf8 fails because of Turkish dotted I like this: SELECT 'i'::citext = 'İ'::citext AS t; t --- - t + f (1 row) I tried to replicate the test's results by hand and with any collation that I tried (including --locale="Turkish") this test failed Also an interesing result of my tesing. If you initialize you DB with -E utf-8 --locale="Turkish" and then run select LOWER('İ'); the output will be this: lower ------- İ (1 row) Which I find strange since lower() uses collation that was passed (default in this case but still) My PostgreSQL version is this: postgres=# select version(); version ---------------------------------------------------------------------- PostgreSQL 17devel on x86_64-windows, compiled by gcc-13.1.0, 64-bit The proposed patch for skipping test is attached Oleg Tselebrovskiy, Postgres Pro
Attachment
On Mon, Mar 11, 2024 at 03:21:11PM +0700, Oleg Tselebrovskiy wrote: > The proposed patch for skipping test is attached Your attached patch seems to be in binary format. -- Michael
Attachment
On 2024-03-11 Mo 04:21, Oleg Tselebrovskiy wrote: > Greetings, everyone! > > While running "installchecks" on databases with UTF-8 encoding the test > citext_utf8 fails because of Turkish dotted I like this: > > SELECT 'i'::citext = 'İ'::citext AS t; > t > --- > - t > + f > (1 row) > > I tried to replicate the test's results by hand and with any collation > that I tried (including --locale="Turkish") this test failed > > Also an interesing result of my tesing. If you initialize you DB > with -E utf-8 --locale="Turkish" and then run select LOWER('İ'); > the output will be this: > lower > ------- > İ > (1 row) > > Which I find strange since lower() uses collation that was passed > (default in this case but still) Wouldn't we be better off finding a Windows fix for this, instead of sweeping it under the rug? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Tue, Mar 12, 2024 at 2:56 PM Andrew Dunstan <andrew@dunslane.net> wrote: > On 2024-03-11 Mo 04:21, Oleg Tselebrovskiy wrote: > > Greetings, everyone! > > > > While running "installchecks" on databases with UTF-8 encoding the test > > citext_utf8 fails because of Turkish dotted I like this: > > > > SELECT 'i'::citext = 'İ'::citext AS t; > > t > > --- > > - t > > + f > > (1 row) > > > > I tried to replicate the test's results by hand and with any collation > > that I tried (including --locale="Turkish") this test failed > > > > Also an interesing result of my tesing. If you initialize you DB > > with -E utf-8 --locale="Turkish" and then run select LOWER('İ'); > > the output will be this: > > lower > > ------- > > İ > > (1 row) > > > > Which I find strange since lower() uses collation that was passed > > (default in this case but still) > > Wouldn't we be better off finding a Windows fix for this, instead of > sweeping it under the rug? Given the sorry state of our Windows locale support, I've started wondering about deleting it and telling users to adopt our nascent built-in support or ICU[1]. This other thread [2] says the sorting is intransitive so I don't think it really meets our needs anyway. [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGJhV__g_TJ0jVqPbnTuqT%2B%2BM6KFv2wj%2B9AV-cABNCXN6Q%40mail.gmail.com#bc35c0b88962ff8c24c27aecc1bca72e [2] https://www.postgresql.org/message-id/flat/1407a2c0-062b-4e4c-b728-438fdff5cb07%40manitou-mail.org
Michael Paquier писал(а) 2024-03-12 06:24: > On Mon, Mar 11, 2024 at 03:21:11PM +0700, Oleg Tselebrovskiy wrote: >> The proposed patch for skipping test is attached > > Your attached patch seems to be in binary format. > -- > Michael Right, I had it saved in not-UTF-8 encoding. Kind of ironic Here's a fixed version
Attachment
On 2024-03-11 Mo 22:50, Thomas Munro wrote: > On Tue, Mar 12, 2024 at 2:56 PM Andrew Dunstan <andrew@dunslane.net> wrote: >> On 2024-03-11 Mo 04:21, Oleg Tselebrovskiy wrote: >>> Greetings, everyone! >>> >>> While running "installchecks" on databases with UTF-8 encoding the test >>> citext_utf8 fails because of Turkish dotted I like this: >>> >>> SELECT 'i'::citext = 'İ'::citext AS t; >>> t >>> --- >>> - t >>> + f >>> (1 row) >>> >>> I tried to replicate the test's results by hand and with any collation >>> that I tried (including --locale="Turkish") this test failed >>> >>> Also an interesing result of my tesing. If you initialize you DB >>> with -E utf-8 --locale="Turkish" and then run select LOWER('İ'); >>> the output will be this: >>> lower >>> ------- >>> İ >>> (1 row) >>> >>> Which I find strange since lower() uses collation that was passed >>> (default in this case but still) >> Wouldn't we be better off finding a Windows fix for this, instead of >> sweeping it under the rug? > Given the sorry state of our Windows locale support, I've started > wondering about deleting it and telling users to adopt our nascent > built-in support or ICU[1]. > > This other thread [2] says the sorting is intransitive so I don't > think it really meets our needs anyway. > > [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGJhV__g_TJ0jVqPbnTuqT%2B%2BM6KFv2wj%2B9AV-cABNCXN6Q%40mail.gmail.com#bc35c0b88962ff8c24c27aecc1bca72e > [2] https://www.postgresql.org/message-id/flat/1407a2c0-062b-4e4c-b728-438fdff5cb07%40manitou-mail.org Makes more sense than just hacking the tests to avoid running them on Windows. (I also didn't much like doing it by parsing the version string, although I know there's at least one precedent for doing that.) cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com