Thread: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Marc-Olaf Jaschke

Date:

21 March 2016, 23:48:37

Hi,

PostgreSQL 9.5 ignores rows with the following test case:

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=


\l+
=E2=80=A6
Encoding | Collate | Ctype   =20
UTF8 | de_DE.UTF-8 | de_DE.UTF-8=20
...

create table test (t) as values ('eai'), ('e a=C3=AD');

select * from test where t =3D 'eai';
  t =20
-----
 eai
(1 row)

create index on test(t);

set enable_seqscan =3D false;

select * from test where t =3D 'eai';
 t=20
---
(0 rows)

select t from test where t =3D 'eai' collate "C";
  t =20
-----
 eai
(1 row)

alter table test alter column t type text collate "C";
select * from test where t =3D 'eai';
  t =20
-----
 eai
(1 row)


alter table test alter column t type text collate "de_DE.utf8";
select * from test where t =3D 'eai';
 t=20
---
(0 rows)

set enable_seqscan =3D true;

select * from test where t =3D 'eai';
  t =20
-----
 eai
(1 row)

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=



I was able to reproduce this with

cat /etc/debian_version=20
6.0.1
PostgreSQL 9.5.0 on x86_64-pc-linux-gnu, compiled by gcc-4.4.real =
(Debian 4.4.5-8) 4.4.5, 64-bit
/lib/libc.so.6 > GNU C Library (Debian EGLIBC 2.11.3-3) stable release =
version 2.11.3, by Roland McGrath et al.

CentOS release 6.7 (Final)
PostgreSQL 9.5.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 =
20120313 (Red Hat 4.4.7-16), 64-bit
ldd --version
ldd (GNU libc) 2.12


I was not able to reproduce this with

OSX (10.11.3 (15D21))
PostgreSQL 9.5alpha1 on x86_64-apple-darwin14.3.0, compiled by Apple =
LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn), 64-bit

OSX (10.11.3 (15D21))
PostgreSQL 9.5.1 on x86_64-apple-darwin14.5.0, compiled by Apple LLVM =
version 7.0.0 (clang-700.1.76), 64-bit

Ubuntu 12.04.5 LTS
PostgreSQL 9.3.11 on x86_64-unknown-linux-gnu, compiled by gcc =
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
ldd --version
ldd (Ubuntu EGLIBC 2.15-0ubuntu10.13) 2.15
=09
CentOS release 6.7 (Final)
PostgreSQL 9.4.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) =
4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit
ldd --version
ldd (GNU libc) 2.12

Red Hat Enterprise Linux Server release 7.2 (Maipo)=20
PostgreSQL 9.5.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 =
20150623 (Red Hat 4.8.5-4), 64-bit
ldd --version
ldd (GNU libc) 2.17



Best regards,
Marc-Olaf Jaschke

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Tom Lane

Date:

22 March 2016, 00:03:47

Marc-Olaf Jaschke <marc-olaf.jaschke@s24.com> writes:
> PostgreSQL 9.5 ignores rows with the following test case:

I can reproduce this in 9.5 and HEAD on RHEL6, but 9.4 works as expected.
I presume that that points the finger at the abbreviated-keys work.

BTW, what I'm seeing in 9.5/HEAD is that all three comparison senses fail:

u8=# set enable_seqscan TO 0;
SET
u8=#  select * from test where t < 'eai';
 t
---
(0 rows)

u8=#  select * from test where t = 'eai';
 t
---
(0 rows)

u8=#  select * from test where t > 'eai';
 t
---
(0 rows)

            regards, tom lane

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Robert Haas

Date:

22 March 2016, 00:26:31

On Mon, Mar 21, 2016 at 8:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marc-Olaf Jaschke <marc-olaf.jaschke@s24.com> writes:
>> PostgreSQL 9.5 ignores rows with the following test case:
>
> I can reproduce this in 9.5 and HEAD on RHEL6, but 9.4 works as expected.
> I presume that that points the finger at the abbreviated-keys work.
>
> BTW, what I'm seeing in 9.5/HEAD is that all three comparison senses fail:
>
> u8=# set enable_seqscan TO 0;
> SET
> u8=#  select * from test where t < 'eai';
>  t
> ---
> (0 rows)
>
> u8=#  select * from test where t = 'eai';
>  t
> ---
> (0 rows)
>
> u8=#  select * from test where t > 'eai';
>  t
> ---
> (0 rows)

This could plausibly be a consequence of the abbreviated keys work if
strxfrm() and strcoll() return inconsistent results for those strings
for the same locale (say, one says +1 and the other says -1 given
those inputs).  I don't have a RHEL6 system handy to test whether that
might be the case here.

If that is the case, I'd argue that's a glibc problem, not our
problem.  Of course, we could provide an option to disable abbreviated
keys for the benefit of people who need to work around buggy libc
implementations.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

22 March 2016, 00:27:18

On Mon, Mar 21, 2016 at 5:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> If that is the case, I'd argue that's a glibc problem, not our
> problem.  Of course, we could provide an option to disable abbreviated
> keys for the benefit of people who need to work around buggy libc
> implementations.

Conferred with Robert. This is my first suspicion. More in a little while.

--
Peter Geoghegan

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

22 March 2016, 00:34:23

On Mon, Mar 21, 2016 at 1:40 PM, Marc-Olaf Jaschke
<marc-olaf.jaschke@s24.com> wrote:
> PostgreSQL 9.5 ignores rows with the following test case:

At one point, Robert wrote a small self-contained tool to show OS
strxfrm() blobs:

http://www.postgresql.org/message-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com

It would be great if you showed us the output for your test case
strings, both on an affected and on an unaffected system. As Robert
mentioned, our use of strxfrm() quite reasonably relies on it
producing blobs that compare with strcmp() in a way that gives the
same result as a strcoll() on the original strings, per ISO C90.

Thanks
--
Peter Geoghegan

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

22 March 2016, 00:44:45

On Mon, Mar 21, 2016 at 5:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> If that is the case, I'd argue that's a glibc problem, not our
> problem.  Of course, we could provide an option to disable abbreviated
> keys for the benefit of people who need to work around buggy libc
> implementations.

That would be an easy patch to write. We'd simply have a test within
bttextsortsupport() that had systems that disabled abbreviated keys
for text PG_RETURN_VOID(). Actually, to be more precise we'd put that
next to the Windows code within varstr_sortsupport() (the function is
called btsortsupport_worker in 9.5). It would look at a GUC, I
suppose.

--
Peter Geoghegan

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

22 March 2016, 01:09:57

On Mon, Mar 21, 2016 at 1:40 PM, Marc-Olaf Jaschke
<marc-olaf.jaschke@s24.com> wrote:
> I was able to reproduce this with
>
> cat /etc/debian_version
> 6.0.1
> PostgreSQL 9.5.0 on x86_64-pc-linux-gnu, compiled by gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 64-bit
> /lib/libc.so.6 > GNU C Library (Debian EGLIBC 2.11.3-3) stable release version 2.11.3, by Roland McGrath et al.
>
> CentOS release 6.7 (Final)
> PostgreSQL 9.5.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit
> ldd --version
> ldd (GNU libc) 2.12

I found this fairly recent bug report concerning glibc's strxfrm():

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=803927

(See also https://sourceware.org/bugzilla/show_bug.cgi?id=16009)

I'm not certain that this is the problem, but it's a good theory. Note
that this particular message talks about your exact affected version
of eglibc (eglibc-2.11.3):

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=803927#27

Even if it isn't this exact issue, I have a really hard time imagining
that this is not a bug in the relevant Glibc versions. Abbreviated
keys are fundamentally a fairly simple idea, and it's hard to think of
any other possible explanation.

We'll know more when we use those strxfrm() blobs, from the tool I linked to.

--
Peter Geoghegan

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

22 March 2016, 01:15:50

On Mon, Mar 21, 2016 at 5:44 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Mon, Mar 21, 2016 at 5:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> If that is the case, I'd argue that's a glibc problem, not our
>> problem.  Of course, we could provide an option to disable abbreviated
>> keys for the benefit of people who need to work around buggy libc
>> implementations.
>
> That would be an easy patch to write. We'd simply have a test within
> bttextsortsupport() that had systems that disabled abbreviated keys
> for text PG_RETURN_VOID(). Actually, to be more precise we'd put that
> next to the Windows code within varstr_sortsupport() (the function is
> called btsortsupport_worker in 9.5). It would look at a GUC, I
> suppose.

Actually, I suppose it isn't quite that simple, because abbreviated
keys did not introduce the use of strxfrm() by Postgres. That happened
much sooner. I guess we'd have to think about convert_string_datum(),
too.

Maybe we can write a test-case that lets check_strxfrm_bug() detect
this issue, which would be ideal. But, again, I need to see what's
going on with strxfrm() on affected systems before I can do anything.
Don't have one of my own close at hand.

--
Peter Geoghegan

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Tom Lane

Date:

22 March 2016, 02:54:27

Peter Geoghegan <pg@heroku.com> writes:
> On Mon, Mar 21, 2016 at 5:44 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> On Mon, Mar 21, 2016 at 5:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> If that is the case, I'd argue that's a glibc problem, not our
>>>> problem.  Of course, we could provide an option to disable abbreviated
>>>> keys for the benefit of people who need to work around buggy libc
>>>> implementations.

FWIW, I do not think you can dismiss it as "not our bug" if a large
fraction of existing glibc installations share the issue.  It might
be a glibc bug, but we'll have to find a workaround.

> Maybe we can write a test-case that lets check_strxfrm_bug() detect
> this issue, which would be ideal. But, again, I need to see what's
> going on with strxfrm() on affected systems before I can do anything.

Happy to test if you can provide a test case.

            regards, tom lane

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

22 March 2016, 04:04:23

On Mon, Mar 21, 2016 at 7:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> FWIW, I do not think you can dismiss it as "not our bug" if a large
> fraction of existing glibc installations share the issue.  It might
> be a glibc bug, but we'll have to find a workaround.

I didn't say that. I strongly agree.

>> Maybe we can write a test-case that lets check_strxfrm_bug() detect
>> this issue, which would be ideal. But, again, I need to see what's
>> going on with strxfrm() on affected systems before I can do anything.
>
> Happy to test if you can provide a test case.

Can you look at generating a textual representation of the strxfrm()
blobs in question, using Robert's tool?:

http://www.postgresql.org/message-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com

That would give me some basis for writing a test.

--
Peter Geoghegan

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Tom Lane

Date:

22 March 2016, 04:11:04

Peter Geoghegan <pg@heroku.com> writes:
> At one point, Robert wrote a small self-contained tool to show OS
> strxfrm() blobs:
> http://www.postgresql.org/message-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com

> It would be great if you showed us the output for your test case
> strings, both on an affected and on an unaffected system.

On RHEL6, I get

./strxfrm-binary de_DE.UTF-8 'eai' 'e aÃ'
"eai" -> 100c140108080801020202 (11 bytes)
"e aÃ" -> 100c140108080901020202010235 (14 bytes)

This seems a bit problematic, because these string sort in the other
order ("e aÃ" before "eai") according to sort(1) as well as Postgres
sorting code.

It's possible I've copied-and-pasted these multibyte characters wrong.
But if I haven't, this says that the strxfrm-based optimization is
unusably broken on a very large fraction of reasonably-modern
installations.  Quite aside from casting aspersions on the glibc guys,
how did we fail to notice this in our own testing?

            regards, tom lane

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

22 March 2016, 05:16:39

On Mon, Mar 21, 2016 at 9:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> On RHEL6, I get
>
> ./strxfrm-binary de_DE.UTF-8 'eai' 'e a=C3=AD'
> "eai" -> 100c140108080801020202 (11 bytes)
> "e a=C3=AD" -> 100c140108080901020202010235 (14 bytes)

As expect, ISTM that the "primary weights" here are the same.

Aligned comparison of this with correct en_US.UTF-8 blobs from my system:

Buggy version (Tom's de_DE.UTF-8 testcase):

"eai" ->  100c14 01 090909 01 090909 (11 bytes)
"e a=C3=AD" -> 100c14 01 0b0909 01 090909010235 (14 bytes)

Correct version (though uses different locale):

"eai" ->  100c14 01 080808 01 020202 (11 bytes)
"e a=C3=AD" -> 100c14 01 080809 01 020202010235 (14 bytes)

The low bytes, 0x01, separate the weight levels,. I think that this
always happens with glibc. The space character is only represented at
the last level, which is why strcoll() typically weighs spaces as very
unimportant (you'll recall that we here complaints about this from
time to time).

My guess is that the 0x0b byte in Tom's buggy de_DE.UTF-8 testcase is
the problem. Not sure why.

I guess I'll look around here for further ideas tomorrow:
http://unicode.org/reports/tr10/#Well_Formedness_Examples

> This seems a bit problematic, because these string sort in the other
> order ("e a=C3=AD" before "eai") according to sort(1) as well as Postgres
> sorting code.
>
> It's possible I've copied-and-pasted these multibyte characters wrong.
> But if I haven't, this says that the strxfrm-based optimization is
> unusably broken on a very large fraction of reasonably-modern
> installations.  Quite aside from casting aspersions on the glibc guys,
> how did we fail to notice this in our own testing?

Because we don't test every possible libc installations. And even if
we did, why should we be able to usefully nail down something that's
fundamentally not under our control? (I don't want to assume that that
bug is at fault, but it seems like a reasonable speculation,
especially based on your "strxfrm-binary" result.)

Let's not relitigate the debate about Postgres controlling its own
collations right now, though.

I think that amcheck will be able to provide reasonable smoke-testing
for these kinds of issues once it gets some buildfarm cycles. I intend
to write plenty of tests for external sorting to go with amcheck, too;
that code currently has no tests whatsoever. amcheck provides a nice
way of testing if strxfrm() agrees with strcoll(), without having to
"expect" any particular total ordering for a collatable type, which is
what a simple pg_regress approach would require. Portable testing of
strcoll() + strxfrm() will improve matters.

--=20
Peter Geoghegan

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

22 March 2016, 07:53:37

On Mon, Mar 21, 2016 at 10:16 PM, Peter Geoghegan <pg@heroku.com> wrote:
> "eai" ->  100c14 01 090909 01 090909 (11 bytes)
> "e a=C3=AD" -> 100c14 01 0b0909 01 090909010235 (14 bytes)

> "eai" ->  100c14 01 080808 01 020202 (11 bytes)
> "e a=C3=AD" -> 100c14 01 080809 01 020202010235 (14 bytes)

Sorry, I have that backwards. The latter output is Tom's de_DE.UTF-8
testcase, showing broken glibc behavior.

--=20
Peter Geoghegan

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

22 March 2016, 21:09:58

On Mon, Mar 21, 2016 at 9:04 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Can you look at generating a textual representation of the strxfrm()
> blobs in question, using Robert's tool?:
>
> http://www.postgresql.org/message-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com

I played with this tool myself, on an affected CentOS 6.7 VM:

[vagrant@localhost ~]$ ldd --version
ldd (GNU libc) 2.12
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

I now think that we have this backwards: This isn't a bug in glibc's
strxfrm(); it's a bug in glibc's strcoll(). Minimal testcase with
modified tool, simplified to use ascii-safe strings:

[vagrant@localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'x xx'
"xxx" -> 2323230108080801020202 (11 bytes)
"x xx" -> 2323230108080801020202010235 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: 6

If we assume for the sake of argument that this is a strxfrm() bug and
strcoll() is a reliable source of truth, then I find it very curious
that Germany's Austrian neighbors differ on this point about how text
should be collated:

[vagrant@localhost ~]$ ./a.out de_AT.UTF-8 'xxx' 'x xx'
"xxx" -> 2323230108080801020202 (11 bytes)
"x xx" -> 2323230108080801020202010235 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: -1

This surely adds doubt to the idea that strxfrm() in particular is broken.

I find something else inconsistent with the strxfrm() theory: even the
de_DE collation gives strxfrm()/strcoll() self-consistent answers when
we move the rhs argument's space to the far side of its center 'x'
char:

[vagrant@localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'xx x'
"xxx" -> 2323230108080801020202 (11 bytes)
"xx x" -> 2323230108080801020202010335 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: -1

It seems very unlikely that this is because of a legitimate
consideration that strcoll() makes about how German should be collated
(one that strxfrm() fails to make, say).

This is probably a worse situation for affected Postgres systems,
though, because now they have no scope to turn the faulty part of the
system off. I have a hard time believing that it's a good idea to
trust strcoll() to be wrong in a consistent way that has collatable
type opclasses at least follow "Notes to Operator Class Implementors".
I'd like to hear more opinions on that, though, because it's a tricky
thing to reason about.

--
Peter Geoghegan

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Robert Haas

Date:

22 March 2016, 22:06:43

On Tue, Mar 22, 2016 at 5:09 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Mon, Mar 21, 2016 at 9:04 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> Can you look at generating a textual representation of the strxfrm()
>> blobs in question, using Robert's tool?:
>>
>> http://www.postgresql.org/message-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com
>
> I played with this tool myself, on an affected CentOS 6.7 VM:
>
> [vagrant@localhost ~]$ ldd --version
> ldd (GNU libc) 2.12
> Copyright (C) 2010 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> Written by Roland McGrath and Ulrich Drepper.
>
> I now think that we have this backwards: This isn't a bug in glibc's
> strxfrm(); it's a bug in glibc's strcoll(). Minimal testcase with
> modified tool, simplified to use ascii-safe strings:
>
> [vagrant@localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'x xx'
> "xxx" -> 2323230108080801020202 (11 bytes)
> "x xx" -> 2323230108080801020202010235 (14 bytes)
> strcmp(arg1, arg2) result: -1
> strcoll(arg1, arg2) result: 6
>
> If we assume for the sake of argument that this is a strxfrm() bug and
> strcoll() is a reliable source of truth, then I find it very curious
> that Germany's Austrian neighbors differ on this point about how text
> should be collated:
>
> [vagrant@localhost ~]$ ./a.out de_AT.UTF-8 'xxx' 'x xx'
> "xxx" -> 2323230108080801020202 (11 bytes)
> "x xx" -> 2323230108080801020202010235 (14 bytes)
> strcmp(arg1, arg2) result: -1
> strcoll(arg1, arg2) result: -1
>
> This surely adds doubt to the idea that strxfrm() in particular is broken.
>
> I find something else inconsistent with the strxfrm() theory: even the
> de_DE collation gives strxfrm()/strcoll() self-consistent answers when
> we move the rhs argument's space to the far side of its center 'x'
> char:
>
> [vagrant@localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'xx x'
> "xxx" -> 2323230108080801020202 (11 bytes)
> "xx x" -> 2323230108080801020202010335 (14 bytes)
> strcmp(arg1, arg2) result: -1
> strcoll(arg1, arg2) result: -1
>
> It seems very unlikely that this is because of a legitimate
> consideration that strcoll() makes about how German should be collated
> (one that strxfrm() fails to make, say).
>
> This is probably a worse situation for affected Postgres systems,
> though, because now they have no scope to turn the faulty part of the
> system off. I have a hard time believing that it's a good idea to
> trust strcoll() to be wrong in a consistent way that has collatable
> type opclasses at least follow "Notes to Operator Class Implementors".
> I'd like to hear more opinions on that, though, because it's a tricky
> thing to reason about.

Well, if we implement a compatibility GUC that shuts off our
dependency on strxfrm(), people can go back to having 9.5 be no more
broken than 9.4 was.  I vote we do that and go home.
Behavior-changing GUCs suck, but it seems clear that Tom is not going
to sit still for any solution that involves blaming the glibc vendor
no matter how well-justified that approach might be; and I don't have
a better idea.  I was a little worried that it was too much to hope
for that all libc vendors on earth would ship a strxfrm()
implementation that was actually consistent with strcoll(), and here
we are.  It's a good thing that operating systems manage to make
read() and getpid() several orders of magnitude more reliable than
strxfrm() and strcoll(), or we'd probably all be running Windows or
VMS or something now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Tom Lane

Date:

22 March 2016, 23:19:57

Robert Haas <robertmhaas@gmail.com> writes:
> I was a little worried that it was too much to hope for that all libc
> vendors on earth would ship a strxfrm() implementation that was actually
> consistent with strcoll(), and here we are.

Indeed.  To try to put some scope on the problem, I made an idiot little
program that just generates some random UTF8 strings and sees whether
strcoll and strxfrm sort them alike.  Attached are that program, a even
more idiot little shell script that runs it over all available UTF8
locales, and the results on my RHEL6 box.  While de_DE seems to be the
worst-broken locale, it's far from the only one.

Please try this on as many platforms as you can get hold of ...

            regards, tom lane

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <locale.h>
#include <time.h>

/*
 * Test: generate 1000 random UTF8 strings, sort them by strcoll, sanity-
 * check the sort result, sort them by strxfrm, sanity-check that result,
 * and compare the two sort orders.
 */
#define NSTRINGS 1000
#define MAXSTRLEN 20
#define MAXXFRMLEN (MAXSTRLEN * 5)

typedef struct
{
    char        strval[MAXSTRLEN];
    char        xfrmval[MAXXFRMLEN];
    int            strsortpos;
    int            xfrmsortpos;
} OneString;

/* qsort comparators */

static int
strcoll_compare(const void *pa, const void *pb)
{
    const OneString *a = (const OneString *) pa;
    const OneString *b = (const OneString *) pb;

    return strcoll(a->strval, b->strval);
}

static int
strxfrm_compare(const void *pa, const void *pb)
{
    const OneString *a = (const OneString *) pa;
    const OneString *b = (const OneString *) pb;

    return strcmp(a->xfrmval, b->xfrmval);
}


/* returns 1 if OK, 0 if inconsistency detected */
static int
run_test_case(void)
{
    int            ok = 1;
    OneString    data[NSTRINGS];
    int            i,
                j;

    /* Generate random UTF8 strings of length less than MAXSTRLEN bytes */
    for (i = 0; i < NSTRINGS; i++)
    {
        char       *p = data[i].strval;
        int            len;

        len = 1 + (random() % (MAXSTRLEN - 1));
        while (len > 0)
        {
            int            c;

            /* Generate random printable char in ISO8859-1 range */
            /* Bias towards producing a lot of spaces */
            if ((random() % 16) < 3)
                c = ' ';
            else
            {
                do
                {
                    c = random() & 0xFF;
                } while (!((c >= ' ' && c <= 127) || (c >= 0xA0 && c <= 0xFF)));
            }

            if (c <= 127)
            {
                *p++ = c;
                len--;
            }
            else
            {
                if (len < 2)
                    break;
                /* Poor man's utf8-ification */
                *p++ = 0xC0 + (c >> 6);
                len--;
                *p++ = 0x80 + (c & 0x3F);
                len--;
            }
        }
        *p = '\0';

        /* strxfrm each string as we produce it */
        if (strxfrm(data[i].xfrmval, data[i].strval, MAXXFRMLEN) >= MAXXFRMLEN)
        {
            fprintf(stderr, "strxfrm() result for %d-length string exceeded %d bytes\n",
                    (int) strlen(data[i].strval), MAXXFRMLEN);
            exit(1);
        }

#if 0
        printf("%d %s\n", i, data[i].strval);
#endif
    }

    /* Sort per strcoll(), and label, being careful in case some are equal */
    qsort(data, NSTRINGS, sizeof(OneString), strcoll_compare);
    j = 0;
    for (i = 0; i < NSTRINGS; i++)
    {
        if (i > 0 && strcoll(data[i].strval, data[i-1].strval) != 0)
            j++;
        data[i].strsortpos = j;
    }

    /* Sanity-check: is each string <= those after it? */
    for (i = 0; i < NSTRINGS; i++)
    {
        for (j = i + 1; j < NSTRINGS; j++)
        {
            if (strcoll(data[i].strval, data[j].strval) > 0)
            {
                fprintf(stdout, "strcoll sort inconsistency between positions %d and %d\n",
                        i, j);
                ok = 0;
            }
        }
    }

    /* Sort per strxfrm(), and label, being careful in case some are equal */
    qsort(data, NSTRINGS, sizeof(OneString), strxfrm_compare);
    j = 0;
    for (i = 0; i < NSTRINGS; i++)
    {
        if (i > 0 && strcmp(data[i].xfrmval, data[i-1].xfrmval) != 0)
            j++;
        data[i].xfrmsortpos = j;
    }

    /* Sanity-check: is each string <= those after it? */
    for (i = 0; i < NSTRINGS; i++)
    {
        for (j = i + 1; j < NSTRINGS; j++)
        {
            if (strcmp(data[i].xfrmval, data[j].xfrmval) > 0)
            {
                fprintf(stdout, "strxfrm sort inconsistency between positions %d and %d\n",
                        i, j);
                ok = 0;
            }
        }
    }

    /* Compare */
    for (i = 0; i < NSTRINGS; i++)
    {
        if (data[i].strsortpos != data[i].xfrmsortpos)
        {
            fprintf(stdout, "inconsistency between strcoll (%d) and strxfrm (%d) orders\n",
                    data[i].strsortpos, data[i].xfrmsortpos);
            ok = 0;
        }
    }

    return ok;
}

int
main(int argc, char **argv)
{
    const char *lc;
    int            ntries;

    /* Absorb locale from environment, and report what we're using */
    if (setlocale(LC_ALL, "") == NULL)
    {
        perror("setlocale(LC_ALL) failed");
        exit(1);
    }
    lc = setlocale(LC_COLLATE, NULL);
    if (lc)
    {
        printf("Using LC_COLLATE = \"%s\"\n", lc);
    }
    else
    {
        perror("setlocale(LC_COLLATE) failed");
        exit(1);
    }
    lc = setlocale(LC_CTYPE, NULL);
    if (lc)
    {
        printf("Using LC_CTYPE = \"%s\"\n", lc);
    }
    else
    {
        perror("setlocale(LC_CTYPE) failed");
        exit(1);
    }

    /* Ensure new random() values on every run */
    srandom((unsigned int) time(NULL));

    /* argv[1] can be the max number of tries to run */
    if (argc > 1)
        ntries = atoi(argv[1]);
    else
        ntries = 1;

    /* Run one test instance per loop */
    while (ntries-- > 0)
    {
        if (!run_test_case())
            exit(1);
    }

    return 0;
}
#! /bin/sh

for LANG in `locale -a | grep -i 'utf.*8'`
do
    export LANG
    if ./strcolltest 10
    then
        echo $LANG good
    else
        echo $LANG BAD
    fi
done
Using LC_COLLATE = "aa_DJ.utf8"
Using LC_CTYPE = "aa_DJ.utf8"
aa_DJ.utf8 good
Using LC_COLLATE = "aa_ER.utf8"
Using LC_CTYPE = "aa_ER.utf8"
aa_ER.utf8 good
Using LC_COLLATE = "aa_ER.utf8@saaho"
Using LC_CTYPE = "aa_ER.utf8@saaho"
aa_ER.utf8@saaho good
Using LC_COLLATE = "aa_ET.utf8"
Using LC_CTYPE = "aa_ET.utf8"
aa_ET.utf8 good
Using LC_COLLATE = "af_ZA.utf8"
Using LC_CTYPE = "af_ZA.utf8"
af_ZA.utf8 good
Using LC_COLLATE = "am_ET.utf8"
Using LC_CTYPE = "am_ET.utf8"
am_ET.utf8 good
Using LC_COLLATE = "an_ES.utf8"
Using LC_CTYPE = "an_ES.utf8"
an_ES.utf8 good
Using LC_COLLATE = "ar_AE.utf8"
Using LC_CTYPE = "ar_AE.utf8"
ar_AE.utf8 good
Using LC_COLLATE = "ar_BH.utf8"
Using LC_CTYPE = "ar_BH.utf8"
ar_BH.utf8 good
Using LC_COLLATE = "ar_DZ.utf8"
Using LC_CTYPE = "ar_DZ.utf8"
ar_DZ.utf8 good
Using LC_COLLATE = "ar_EG.utf8"
Using LC_CTYPE = "ar_EG.utf8"
ar_EG.utf8 good
Using LC_COLLATE = "ar_IN.utf8"
Using LC_CTYPE = "ar_IN.utf8"
ar_IN.utf8 good
Using LC_COLLATE = "ar_IQ.utf8"
Using LC_CTYPE = "ar_IQ.utf8"
ar_IQ.utf8 good
Using LC_COLLATE = "ar_JO.utf8"
Using LC_CTYPE = "ar_JO.utf8"
ar_JO.utf8 good
Using LC_COLLATE = "ar_KW.utf8"
Using LC_CTYPE = "ar_KW.utf8"
ar_KW.utf8 good
Using LC_COLLATE = "ar_LB.utf8"
Using LC_CTYPE = "ar_LB.utf8"
ar_LB.utf8 good
Using LC_COLLATE = "ar_LY.utf8"
Using LC_CTYPE = "ar_LY.utf8"
ar_LY.utf8 good
Using LC_COLLATE = "ar_MA.utf8"
Using LC_CTYPE = "ar_MA.utf8"
ar_MA.utf8 good
Using LC_COLLATE = "ar_OM.utf8"
Using LC_CTYPE = "ar_OM.utf8"
ar_OM.utf8 good
Using LC_COLLATE = "ar_QA.utf8"
Using LC_CTYPE = "ar_QA.utf8"
ar_QA.utf8 good
Using LC_COLLATE = "ar_SA.utf8"
Using LC_CTYPE = "ar_SA.utf8"
ar_SA.utf8 good
Using LC_COLLATE = "ar_SD.utf8"
Using LC_CTYPE = "ar_SD.utf8"
ar_SD.utf8 good
Using LC_COLLATE = "ar_SY.utf8"
Using LC_CTYPE = "ar_SY.utf8"
ar_SY.utf8 good
Using LC_COLLATE = "ar_TN.utf8"
Using LC_CTYPE = "ar_TN.utf8"
ar_TN.utf8 good
Using LC_COLLATE = "ar_YE.utf8"
Using LC_CTYPE = "ar_YE.utf8"
ar_YE.utf8 good
Using LC_COLLATE = "as_IN.utf8"
Using LC_CTYPE = "as_IN.utf8"
as_IN.utf8 good
Using LC_COLLATE = "ast_ES.utf8"
Using LC_CTYPE = "ast_ES.utf8"
ast_ES.utf8 good
Using LC_COLLATE = "az_AZ.utf8"
Using LC_CTYPE = "az_AZ.utf8"
inconsistency between strcoll (718) and strxfrm (717) orders
inconsistency between strcoll (717) and strxfrm (718) orders
az_AZ.utf8 BAD
Using LC_COLLATE = "be_BY.utf8"
Using LC_CTYPE = "be_BY.utf8"
be_BY.utf8 good
Using LC_COLLATE = "be_BY.utf8@latin"
Using LC_CTYPE = "be_BY.utf8@latin"
be_BY.utf8@latin good
Using LC_COLLATE = "ber_DZ.utf8"
Using LC_CTYPE = "ber_DZ.utf8"
ber_DZ.utf8 good
Using LC_COLLATE = "ber_MA.utf8"
Using LC_CTYPE = "ber_MA.utf8"
ber_MA.utf8 good
Using LC_COLLATE = "bg_BG.utf8"
Using LC_CTYPE = "bg_BG.utf8"
bg_BG.utf8 good
Using LC_COLLATE = "bn_BD.utf8"
Using LC_CTYPE = "bn_BD.utf8"
bn_BD.utf8 good
Using LC_COLLATE = "bn_IN.utf8"
Using LC_CTYPE = "bn_IN.utf8"
bn_IN.utf8 good
Using LC_COLLATE = "bo_CN.utf8"
Using LC_CTYPE = "bo_CN.utf8"
bo_CN.utf8 good
Using LC_COLLATE = "bo_IN.utf8"
Using LC_CTYPE = "bo_IN.utf8"
bo_IN.utf8 good
Using LC_COLLATE = "br_FR.utf8"
Using LC_CTYPE = "br_FR.utf8"
br_FR.utf8 good
Using LC_COLLATE = "bs_BA.utf8"
Using LC_CTYPE = "bs_BA.utf8"
bs_BA.utf8 good
Using LC_COLLATE = "byn_ER.utf8"
Using LC_CTYPE = "byn_ER.utf8"
byn_ER.utf8 good
Using LC_COLLATE = "ca_AD.utf8"
Using LC_CTYPE = "ca_AD.utf8"
ca_AD.utf8 good
Using LC_COLLATE = "ca_ES.utf8"
Using LC_CTYPE = "ca_ES.utf8"
ca_ES.utf8 good
Using LC_COLLATE = "ca_FR.utf8"
Using LC_CTYPE = "ca_FR.utf8"
ca_FR.utf8 good
Using LC_COLLATE = "ca_IT.utf8"
Using LC_CTYPE = "ca_IT.utf8"
ca_IT.utf8 good
Using LC_COLLATE = "crh_UA.utf8"
Using LC_CTYPE = "crh_UA.utf8"
inconsistency between strcoll (264) and strxfrm (263) orders
inconsistency between strcoll (265) and strxfrm (264) orders
inconsistency between strcoll (263) and strxfrm (265) orders
inconsistency between strcoll (427) and strxfrm (426) orders
inconsistency between strcoll (426) and strxfrm (427) orders
crh_UA.utf8 BAD
Using LC_COLLATE = "cs_CZ.utf8"
Using LC_CTYPE = "cs_CZ.utf8"
cs_CZ.utf8 good
Using LC_COLLATE = "csb_PL.utf8"
Using LC_CTYPE = "csb_PL.utf8"
csb_PL.utf8 good
Using LC_COLLATE = "cv_RU.utf8"
Using LC_CTYPE = "cv_RU.utf8"
cv_RU.utf8 good
Using LC_COLLATE = "cy_GB.utf8"
Using LC_CTYPE = "cy_GB.utf8"
cy_GB.utf8 good
Using LC_COLLATE = "da_DK.utf8"
Using LC_CTYPE = "da_DK.utf8"
inconsistency between strcoll (876) and strxfrm (875) orders
inconsistency between strcoll (877) and strxfrm (876) orders
inconsistency between strcoll (875) and strxfrm (877) orders
inconsistency between strcoll (902) and strxfrm (901) orders
inconsistency between strcoll (901) and strxfrm (902) orders
da_DK.utf8 BAD
Using LC_COLLATE = "de_AT.utf8"
Using LC_CTYPE = "de_AT.utf8"
de_AT.utf8 good
Using LC_COLLATE = "de_BE.utf8"
Using LC_CTYPE = "de_BE.utf8"
de_BE.utf8 good
Using LC_COLLATE = "de_CH.utf8"
Using LC_CTYPE = "de_CH.utf8"
de_CH.utf8 good
Using LC_COLLATE = "de_DE.utf8"
Using LC_CTYPE = "de_DE.utf8"
inconsistency between strcoll (69) and strxfrm (68) orders
inconsistency between strcoll (68) and strxfrm (69) orders
inconsistency between strcoll (129) and strxfrm (127) orders
inconsistency between strcoll (127) and strxfrm (128) orders
inconsistency between strcoll (128) and strxfrm (129) orders
inconsistency between strcoll (188) and strxfrm (187) orders
inconsistency between strcoll (187) and strxfrm (188) orders
inconsistency between strcoll (258) and strxfrm (257) orders
inconsistency between strcoll (257) and strxfrm (258) orders
inconsistency between strcoll (260) and strxfrm (259) orders
inconsistency between strcoll (261) and strxfrm (260) orders
inconsistency between strcoll (259) and strxfrm (261) orders
inconsistency between strcoll (284) and strxfrm (283) orders
inconsistency between strcoll (283) and strxfrm (284) orders
inconsistency between strcoll (312) and strxfrm (311) orders
inconsistency between strcoll (311) and strxfrm (312) orders
inconsistency between strcoll (316) and strxfrm (315) orders
inconsistency between strcoll (315) and strxfrm (316) orders
inconsistency between strcoll (361) and strxfrm (360) orders
inconsistency between strcoll (360) and strxfrm (361) orders
inconsistency between strcoll (385) and strxfrm (383) orders
inconsistency between strcoll (383) and strxfrm (384) orders
inconsistency between strcoll (384) and strxfrm (385) orders
inconsistency between strcoll (410) and strxfrm (408) orders
inconsistency between strcoll (408) and strxfrm (409) orders
inconsistency between strcoll (409) and strxfrm (410) orders
inconsistency between strcoll (428) and strxfrm (426) orders
inconsistency between strcoll (426) and strxfrm (427) orders
inconsistency between strcoll (429) and strxfrm (428) orders
inconsistency between strcoll (427) and strxfrm (429) orders
inconsistency between strcoll (431) and strxfrm (430) orders
inconsistency between strcoll (430) and strxfrm (431) orders
inconsistency between strcoll (528) and strxfrm (527) orders
inconsistency between strcoll (529) and strxfrm (528) orders
inconsistency between strcoll (527) and strxfrm (529) orders
inconsistency between strcoll (542) and strxfrm (541) orders
inconsistency between strcoll (541) and strxfrm (542) orders
inconsistency between strcoll (552) and strxfrm (551) orders
inconsistency between strcoll (551) and strxfrm (552) orders
inconsistency between strcoll (586) and strxfrm (583) orders
inconsistency between strcoll (587) and strxfrm (584) orders
inconsistency between strcoll (583) and strxfrm (585) orders
inconsistency between strcoll (584) and strxfrm (586) orders
inconsistency between strcoll (585) and strxfrm (587) orders
inconsistency between strcoll (596) and strxfrm (595) orders
inconsistency between strcoll (595) and strxfrm (596) orders
inconsistency between strcoll (921) and strxfrm (920) orders
inconsistency between strcoll (920) and strxfrm (921) orders
de_DE.utf8 BAD
Using LC_COLLATE = "de_LU.utf8"
Using LC_CTYPE = "de_LU.utf8"
de_LU.utf8 good
Using LC_COLLATE = "dv_MV.utf8"
Using LC_CTYPE = "dv_MV.utf8"
dv_MV.utf8 good
Using LC_COLLATE = "dz_BT.utf8"
Using LC_CTYPE = "dz_BT.utf8"
dz_BT.utf8 good
Using LC_COLLATE = "el_CY.utf8"
Using LC_CTYPE = "el_CY.utf8"
el_CY.utf8 good
Using LC_COLLATE = "el_GR.utf8"
Using LC_CTYPE = "el_GR.utf8"
el_GR.utf8 good
Using LC_COLLATE = "en_AG.utf8"
Using LC_CTYPE = "en_AG.utf8"
en_AG.utf8 good
Using LC_COLLATE = "en_AU.utf8"
Using LC_CTYPE = "en_AU.utf8"
en_AU.utf8 good
Using LC_COLLATE = "en_BW.utf8"
Using LC_CTYPE = "en_BW.utf8"
en_BW.utf8 good
Using LC_COLLATE = "en_CA.utf8"
Using LC_CTYPE = "en_CA.utf8"
en_CA.utf8 good
Using LC_COLLATE = "en_DK.utf8"
Using LC_CTYPE = "en_DK.utf8"
en_DK.utf8 good
Using LC_COLLATE = "en_GB.utf8"
Using LC_CTYPE = "en_GB.utf8"
en_GB.utf8 good
Using LC_COLLATE = "en_HK.utf8"
Using LC_CTYPE = "en_HK.utf8"
en_HK.utf8 good
Using LC_COLLATE = "en_IE.utf8"
Using LC_CTYPE = "en_IE.utf8"
en_IE.utf8 good
Using LC_COLLATE = "en_IN.utf8"
Using LC_CTYPE = "en_IN.utf8"
en_IN.utf8 good
Using LC_COLLATE = "en_NG.utf8"
Using LC_CTYPE = "en_NG.utf8"
en_NG.utf8 good
Using LC_COLLATE = "en_NZ.utf8"
Using LC_CTYPE = "en_NZ.utf8"
en_NZ.utf8 good
Using LC_COLLATE = "en_PH.utf8"
Using LC_CTYPE = "en_PH.utf8"
en_PH.utf8 good
Using LC_COLLATE = "en_SG.utf8"
Using LC_CTYPE = "en_SG.utf8"
en_SG.utf8 good
Using LC_COLLATE = "en_US.utf8"
Using LC_CTYPE = "en_US.utf8"
en_US.utf8 good
Using LC_COLLATE = "en_ZA.utf8"
Using LC_CTYPE = "en_ZA.utf8"
en_ZA.utf8 good
Using LC_COLLATE = "en_ZW.utf8"
Using LC_CTYPE = "en_ZW.utf8"
en_ZW.utf8 good
Using LC_COLLATE = "es_AR.utf8"
Using LC_CTYPE = "es_AR.utf8"
es_AR.utf8 good
Using LC_COLLATE = "es_BO.utf8"
Using LC_CTYPE = "es_BO.utf8"
es_BO.utf8 good
Using LC_COLLATE = "es_CL.utf8"
Using LC_CTYPE = "es_CL.utf8"
es_CL.utf8 good
Using LC_COLLATE = "es_CO.utf8"
Using LC_CTYPE = "es_CO.utf8"
es_CO.utf8 good
Using LC_COLLATE = "es_CR.utf8"
Using LC_CTYPE = "es_CR.utf8"
es_CR.utf8 good
Using LC_COLLATE = "es_DO.utf8"
Using LC_CTYPE = "es_DO.utf8"
es_DO.utf8 good
Using LC_COLLATE = "es_EC.utf8"
Using LC_CTYPE = "es_EC.utf8"
es_EC.utf8 good
Using LC_COLLATE = "es_ES.utf8"
Using LC_CTYPE = "es_ES.utf8"
es_ES.utf8 good
Using LC_COLLATE = "es_GT.utf8"
Using LC_CTYPE = "es_GT.utf8"
es_GT.utf8 good
Using LC_COLLATE = "es_HN.utf8"
Using LC_CTYPE = "es_HN.utf8"
es_HN.utf8 good
Using LC_COLLATE = "es_MX.utf8"
Using LC_CTYPE = "es_MX.utf8"
es_MX.utf8 good
Using LC_COLLATE = "es_NI.utf8"
Using LC_CTYPE = "es_NI.utf8"
es_NI.utf8 good
Using LC_COLLATE = "es_PA.utf8"
Using LC_CTYPE = "es_PA.utf8"
es_PA.utf8 good
Using LC_COLLATE = "es_PE.utf8"
Using LC_CTYPE = "es_PE.utf8"
es_PE.utf8 good
Using LC_COLLATE = "es_PR.utf8"
Using LC_CTYPE = "es_PR.utf8"
es_PR.utf8 good
Using LC_COLLATE = "es_PY.utf8"
Using LC_CTYPE = "es_PY.utf8"
es_PY.utf8 good
Using LC_COLLATE = "es_SV.utf8"
Using LC_CTYPE = "es_SV.utf8"
es_SV.utf8 good
Using LC_COLLATE = "es_US.utf8"
Using LC_CTYPE = "es_US.utf8"
inconsistency between strcoll (605) and strxfrm (603) orders
inconsistency between strcoll (603) and strxfrm (604) orders
inconsistency between strcoll (604) and strxfrm (605) orders
es_US.utf8 BAD
Using LC_COLLATE = "es_UY.utf8"
Using LC_CTYPE = "es_UY.utf8"
es_UY.utf8 good
Using LC_COLLATE = "es_VE.utf8"
Using LC_CTYPE = "es_VE.utf8"
es_VE.utf8 good
Using LC_COLLATE = "et_EE.utf8"
Using LC_CTYPE = "et_EE.utf8"
et_EE.utf8 good
Using LC_COLLATE = "eu_ES.utf8"
Using LC_CTYPE = "eu_ES.utf8"
eu_ES.utf8 good
Using LC_COLLATE = "fa_IR.utf8"
Using LC_CTYPE = "fa_IR.utf8"
fa_IR.utf8 good
Using LC_COLLATE = "fi_FI.utf8"
Using LC_CTYPE = "fi_FI.utf8"
inconsistency between strcoll (699) and strxfrm (697) orders
inconsistency between strcoll (697) and strxfrm (698) orders
inconsistency between strcoll (698) and strxfrm (699) orders
inconsistency between strcoll (883) and strxfrm (881) orders
inconsistency between strcoll (881) and strxfrm (882) orders
inconsistency between strcoll (882) and strxfrm (883) orders
fi_FI.utf8 BAD
Using LC_COLLATE = "fil_PH.utf8"
Using LC_CTYPE = "fil_PH.utf8"
inconsistency between strcoll (605) and strxfrm (603) orders
inconsistency between strcoll (603) and strxfrm (604) orders
inconsistency between strcoll (604) and strxfrm (605) orders
fil_PH.utf8 BAD
Using LC_COLLATE = "fo_FO.utf8"
Using LC_CTYPE = "fo_FO.utf8"
inconsistency between strcoll (892) and strxfrm (891) orders
inconsistency between strcoll (891) and strxfrm (892) orders
inconsistency between strcoll (945) and strxfrm (944) orders
inconsistency between strcoll (944) and strxfrm (945) orders
fo_FO.utf8 BAD
Using LC_COLLATE = "fr_BE.utf8"
Using LC_CTYPE = "fr_BE.utf8"
fr_BE.utf8 good
Using LC_COLLATE = "fr_CA.utf8"
Using LC_CTYPE = "fr_CA.utf8"
inconsistency between strcoll (220) and strxfrm (219) orders
inconsistency between strcoll (219) and strxfrm (220) orders
fr_CA.utf8 BAD
Using LC_COLLATE = "fr_CH.utf8"
Using LC_CTYPE = "fr_CH.utf8"
fr_CH.utf8 good
Using LC_COLLATE = "fr_FR.utf8"
Using LC_CTYPE = "fr_FR.utf8"
fr_FR.utf8 good
Using LC_COLLATE = "fr_LU.utf8"
Using LC_CTYPE = "fr_LU.utf8"
fr_LU.utf8 good
Using LC_COLLATE = "fur_IT.utf8"
Using LC_CTYPE = "fur_IT.utf8"
fur_IT.utf8 good
Using LC_COLLATE = "fy_DE.utf8"
Using LC_CTYPE = "fy_DE.utf8"
fy_DE.utf8 good
Using LC_COLLATE = "fy_NL.utf8"
Using LC_CTYPE = "fy_NL.utf8"
fy_NL.utf8 good
Using LC_COLLATE = "ga_IE.utf8"
Using LC_CTYPE = "ga_IE.utf8"
ga_IE.utf8 good
Using LC_COLLATE = "gd_GB.utf8"
Using LC_CTYPE = "gd_GB.utf8"
gd_GB.utf8 good
Using LC_COLLATE = "gez_ER.utf8"
Using LC_CTYPE = "gez_ER.utf8"
gez_ER.utf8 good
Using LC_COLLATE = "gez_ER.utf8@abegede"
Using LC_CTYPE = "gez_ER.utf8@abegede"
gez_ER.utf8@abegede good
Using LC_COLLATE = "gez_ET.utf8"
Using LC_CTYPE = "gez_ET.utf8"
gez_ET.utf8 good
Using LC_COLLATE = "gez_ET.utf8@abegede"
Using LC_CTYPE = "gez_ET.utf8@abegede"
gez_ET.utf8@abegede good
Using LC_COLLATE = "gl_ES.utf8"
Using LC_CTYPE = "gl_ES.utf8"
gl_ES.utf8 good
Using LC_COLLATE = "gu_IN.utf8"
Using LC_CTYPE = "gu_IN.utf8"
gu_IN.utf8 good
Using LC_COLLATE = "gv_GB.utf8"
Using LC_CTYPE = "gv_GB.utf8"
gv_GB.utf8 good
Using LC_COLLATE = "ha_NG.utf8"
Using LC_CTYPE = "ha_NG.utf8"
ha_NG.utf8 good
Using LC_COLLATE = "he_IL.utf8"
Using LC_CTYPE = "he_IL.utf8"
he_IL.utf8 good
Using LC_COLLATE = "hi_IN.utf8"
Using LC_CTYPE = "hi_IN.utf8"
hi_IN.utf8 good
Using LC_COLLATE = "hne_IN.utf8"
Using LC_CTYPE = "hne_IN.utf8"
hne_IN.utf8 good
Using LC_COLLATE = "hr_HR.utf8"
Using LC_CTYPE = "hr_HR.utf8"
hr_HR.utf8 good
Using LC_COLLATE = "hsb_DE.utf8"
Using LC_CTYPE = "hsb_DE.utf8"
hsb_DE.utf8 good
Using LC_COLLATE = "ht_HT.utf8"
Using LC_CTYPE = "ht_HT.utf8"
ht_HT.utf8 good
Using LC_COLLATE = "hu_HU.utf8"
Using LC_CTYPE = "hu_HU.utf8"
hu_HU.utf8 good
Using LC_COLLATE = "hy_AM.utf8"
Using LC_CTYPE = "hy_AM.utf8"
hy_AM.utf8 good
Using LC_COLLATE = "id_ID.utf8"
Using LC_CTYPE = "id_ID.utf8"
id_ID.utf8 good
Using LC_COLLATE = "ig_NG.utf8"
Using LC_CTYPE = "ig_NG.utf8"
inconsistency between strcoll (165) and strxfrm (164) orders
inconsistency between strcoll (164) and strxfrm (165) orders
inconsistency between strcoll (453) and strxfrm (452) orders
inconsistency between strcoll (452) and strxfrm (453) orders
inconsistency between strcoll (786) and strxfrm (785) orders
inconsistency between strcoll (785) and strxfrm (786) orders
ig_NG.utf8 BAD
Using LC_COLLATE = "ik_CA.utf8"
Using LC_CTYPE = "ik_CA.utf8"
ik_CA.utf8 good
Using LC_COLLATE = "is_IS.utf8"
Using LC_CTYPE = "is_IS.utf8"
is_IS.utf8 good
Using LC_COLLATE = "it_CH.utf8"
Using LC_CTYPE = "it_CH.utf8"
it_CH.utf8 good
Using LC_COLLATE = "it_IT.utf8"
Using LC_CTYPE = "it_IT.utf8"
it_IT.utf8 good
Using LC_COLLATE = "iu_CA.utf8"
Using LC_CTYPE = "iu_CA.utf8"
iu_CA.utf8 good
Using LC_COLLATE = "iw_IL.utf8"
Using LC_CTYPE = "iw_IL.utf8"
iw_IL.utf8 good
Using LC_COLLATE = "ja_JP.utf8"
Using LC_CTYPE = "ja_JP.utf8"
ja_JP.utf8 good
Using LC_COLLATE = "ka_GE.utf8"
Using LC_CTYPE = "ka_GE.utf8"
ka_GE.utf8 good
Using LC_COLLATE = "kk_KZ.utf8"
Using LC_CTYPE = "kk_KZ.utf8"
kk_KZ.utf8 good
Using LC_COLLATE = "kl_GL.utf8"
Using LC_CTYPE = "kl_GL.utf8"
inconsistency between strcoll (704) and strxfrm (703) orders
inconsistency between strcoll (703) and strxfrm (704) orders
inconsistency between strcoll (871) and strxfrm (870) orders
inconsistency between strcoll (870) and strxfrm (871) orders
inconsistency between strcoll (870) and strxfrm (871) orders
inconsistency between strcoll (885) and strxfrm (884) orders
inconsistency between strcoll (884) and strxfrm (885) orders
inconsistency between strcoll (927) and strxfrm (926) orders
inconsistency between strcoll (928) and strxfrm (927) orders
inconsistency between strcoll (926) and strxfrm (928) orders
kl_GL.utf8 BAD
Using LC_COLLATE = "km_KH.utf8"
Using LC_CTYPE = "km_KH.utf8"
km_KH.utf8 good
Using LC_COLLATE = "kn_IN.utf8"
Using LC_CTYPE = "kn_IN.utf8"
kn_IN.utf8 good
Using LC_COLLATE = "ko_KR.utf8"
Using LC_CTYPE = "ko_KR.utf8"
ko_KR.utf8 good
Using LC_COLLATE = "kok_IN.utf8"
Using LC_CTYPE = "kok_IN.utf8"
kok_IN.utf8 good
Using LC_COLLATE = "ks_IN.utf8"
Using LC_CTYPE = "ks_IN.utf8"
ks_IN.utf8 good
Using LC_COLLATE = "ks_IN.utf8@devanagari"
Using LC_CTYPE = "ks_IN.utf8@devanagari"
ks_IN.utf8@devanagari good
Using LC_COLLATE = "ku_TR.utf8"
Using LC_CTYPE = "ku_TR.utf8"
inconsistency between strcoll (505) and strxfrm (504) orders
inconsistency between strcoll (506) and strxfrm (505) orders
inconsistency between strcoll (504) and strxfrm (506) orders
ku_TR.utf8 BAD
Using LC_COLLATE = "kw_GB.utf8"
Using LC_CTYPE = "kw_GB.utf8"
kw_GB.utf8 good
Using LC_COLLATE = "ky_KG.utf8"
Using LC_CTYPE = "ky_KG.utf8"
ky_KG.utf8 good
Using LC_COLLATE = "lg_UG.utf8"
Using LC_CTYPE = "lg_UG.utf8"
lg_UG.utf8 good
Using LC_COLLATE = "li_BE.utf8"
Using LC_CTYPE = "li_BE.utf8"
li_BE.utf8 good
Using LC_COLLATE = "li_NL.utf8"
Using LC_CTYPE = "li_NL.utf8"
li_NL.utf8 good
Using LC_COLLATE = "lo_LA.utf8"
Using LC_CTYPE = "lo_LA.utf8"
lo_LA.utf8 good
Using LC_COLLATE = "lt_LT.utf8"
Using LC_CTYPE = "lt_LT.utf8"
lt_LT.utf8 good
Using LC_COLLATE = "lv_LV.utf8"
Using LC_CTYPE = "lv_LV.utf8"
lv_LV.utf8 good
Using LC_COLLATE = "mai_IN.utf8"
Using LC_CTYPE = "mai_IN.utf8"
mai_IN.utf8 good
Using LC_COLLATE = "mg_MG.utf8"
Using LC_CTYPE = "mg_MG.utf8"
mg_MG.utf8 good
Using LC_COLLATE = "mi_NZ.utf8"
Using LC_CTYPE = "mi_NZ.utf8"
mi_NZ.utf8 good
Using LC_COLLATE = "mk_MK.utf8"
Using LC_CTYPE = "mk_MK.utf8"
mk_MK.utf8 good
Using LC_COLLATE = "ml_IN.utf8"
Using LC_CTYPE = "ml_IN.utf8"
ml_IN.utf8 good
Using LC_COLLATE = "mn_MN.utf8"
Using LC_CTYPE = "mn_MN.utf8"
mn_MN.utf8 good
Using LC_COLLATE = "mr_IN.utf8"
Using LC_CTYPE = "mr_IN.utf8"
mr_IN.utf8 good
Using LC_COLLATE = "ms_MY.utf8"
Using LC_CTYPE = "ms_MY.utf8"
ms_MY.utf8 good
Using LC_COLLATE = "mt_MT.utf8"
Using LC_CTYPE = "mt_MT.utf8"
mt_MT.utf8 good
Using LC_COLLATE = "my_MM.utf8"
Using LC_CTYPE = "my_MM.utf8"
my_MM.utf8 good
Using LC_COLLATE = "nan_TW.utf8@latin"
Using LC_CTYPE = "nan_TW.utf8@latin"
nan_TW.utf8@latin good
Using LC_COLLATE = "nb_NO.utf8"
Using LC_CTYPE = "nb_NO.utf8"
inconsistency between strcoll (295) and strxfrm (294) orders
inconsistency between strcoll (294) and strxfrm (295) orders
nb_NO.utf8 BAD
Using LC_COLLATE = "nds_DE.utf8"
Using LC_CTYPE = "nds_DE.utf8"
nds_DE.utf8 good
Using LC_COLLATE = "nds_NL.utf8"
Using LC_CTYPE = "nds_NL.utf8"
nds_NL.utf8 good
Using LC_COLLATE = "ne_NP.utf8"
Using LC_CTYPE = "ne_NP.utf8"
ne_NP.utf8 good
Using LC_COLLATE = "nl_AW.utf8"
Using LC_CTYPE = "nl_AW.utf8"
nl_AW.utf8 good
Using LC_COLLATE = "nl_BE.utf8"
Using LC_CTYPE = "nl_BE.utf8"
nl_BE.utf8 good
Using LC_COLLATE = "nl_NL.utf8"
Using LC_CTYPE = "nl_NL.utf8"
nl_NL.utf8 good
Using LC_COLLATE = "nn_NO.utf8"
Using LC_CTYPE = "nn_NO.utf8"
inconsistency between strcoll (295) and strxfrm (294) orders
inconsistency between strcoll (294) and strxfrm (295) orders
nn_NO.utf8 BAD
Using LC_COLLATE = "no_NO.utf8"
Using LC_CTYPE = "no_NO.utf8"
inconsistency between strcoll (295) and strxfrm (294) orders
inconsistency between strcoll (294) and strxfrm (295) orders
no_NO.utf8 BAD
Using LC_COLLATE = "nr_ZA.utf8"
Using LC_CTYPE = "nr_ZA.utf8"
nr_ZA.utf8 good
Using LC_COLLATE = "nso_ZA.utf8"
Using LC_CTYPE = "nso_ZA.utf8"
nso_ZA.utf8 good
Using LC_COLLATE = "oc_FR.utf8"
Using LC_CTYPE = "oc_FR.utf8"
oc_FR.utf8 good
Using LC_COLLATE = "om_ET.utf8"
Using LC_CTYPE = "om_ET.utf8"
om_ET.utf8 good
Using LC_COLLATE = "om_KE.utf8"
Using LC_CTYPE = "om_KE.utf8"
om_KE.utf8 good
Using LC_COLLATE = "or_IN.utf8"
Using LC_CTYPE = "or_IN.utf8"
or_IN.utf8 good
Using LC_COLLATE = "pa_IN.utf8"
Using LC_CTYPE = "pa_IN.utf8"
pa_IN.utf8 good
Using LC_COLLATE = "pa_PK.utf8"
Using LC_CTYPE = "pa_PK.utf8"
pa_PK.utf8 good
Using LC_COLLATE = "pap_AN.utf8"
Using LC_CTYPE = "pap_AN.utf8"
pap_AN.utf8 good
Using LC_COLLATE = "pl_PL.utf8"
Using LC_CTYPE = "pl_PL.utf8"
pl_PL.utf8 good
Using LC_COLLATE = "ps_AF.utf8"
Using LC_CTYPE = "ps_AF.utf8"
ps_AF.utf8 good
Using LC_COLLATE = "pt_BR.utf8"
Using LC_CTYPE = "pt_BR.utf8"
pt_BR.utf8 good
Using LC_COLLATE = "pt_PT.utf8"
Using LC_CTYPE = "pt_PT.utf8"
pt_PT.utf8 good
Using LC_COLLATE = "ro_RO.utf8"
Using LC_CTYPE = "ro_RO.utf8"
inconsistency between strcoll (502) and strxfrm (501) orders
inconsistency between strcoll (503) and strxfrm (502) orders
inconsistency between strcoll (501) and strxfrm (503) orders
ro_RO.utf8 BAD
Using LC_COLLATE = "ru_RU.utf8"
Using LC_CTYPE = "ru_RU.utf8"
ru_RU.utf8 good
Using LC_COLLATE = "ru_UA.utf8"
Using LC_CTYPE = "ru_UA.utf8"
ru_UA.utf8 good
Using LC_COLLATE = "rw_RW.utf8"
Using LC_CTYPE = "rw_RW.utf8"
rw_RW.utf8 good
Using LC_COLLATE = "sa_IN.utf8"
Using LC_CTYPE = "sa_IN.utf8"
sa_IN.utf8 good
Using LC_COLLATE = "sc_IT.utf8"
Using LC_CTYPE = "sc_IT.utf8"
sc_IT.utf8 good
Using LC_COLLATE = "sd_IN.utf8"
Using LC_CTYPE = "sd_IN.utf8"
sd_IN.utf8 good
Using LC_COLLATE = "sd_IN.utf8@devanagari"
Using LC_CTYPE = "sd_IN.utf8@devanagari"
sd_IN.utf8@devanagari good
Using LC_COLLATE = "se_NO.utf8"
Using LC_CTYPE = "se_NO.utf8"
inconsistency between strcoll (196) and strxfrm (194) orders
inconsistency between strcoll (197) and strxfrm (195) orders
inconsistency between strcoll (194) and strxfrm (196) orders
inconsistency between strcoll (195) and strxfrm (197) orders
inconsistency between strcoll (894) and strxfrm (892) orders
inconsistency between strcoll (892) and strxfrm (893) orders
inconsistency between strcoll (892) and strxfrm (893) orders
inconsistency between strcoll (893) and strxfrm (894) orders
inconsistency between strcoll (911) and strxfrm (909) orders
inconsistency between strcoll (909) and strxfrm (910) orders
inconsistency between strcoll (910) and strxfrm (911) orders
inconsistency between strcoll (934) and strxfrm (933) orders
inconsistency between strcoll (933) and strxfrm (934) orders
se_NO.utf8 BAD
Using LC_COLLATE = "shs_CA.utf8"
Using LC_CTYPE = "shs_CA.utf8"
inconsistency between strcoll (944) and strxfrm (942) orders
inconsistency between strcoll (942) and strxfrm (943) orders
inconsistency between strcoll (942) and strxfrm (943) orders
inconsistency between strcoll (943) and strxfrm (944) orders
shs_CA.utf8 BAD
Using LC_COLLATE = "si_LK.utf8"
Using LC_CTYPE = "si_LK.utf8"
si_LK.utf8 good
Using LC_COLLATE = "sid_ET.utf8"
Using LC_CTYPE = "sid_ET.utf8"
sid_ET.utf8 good
Using LC_COLLATE = "sk_SK.utf8"
Using LC_CTYPE = "sk_SK.utf8"
sk_SK.utf8 good
Using LC_COLLATE = "sl_SI.utf8"
Using LC_CTYPE = "sl_SI.utf8"
sl_SI.utf8 good
Using LC_COLLATE = "so_DJ.utf8"
Using LC_CTYPE = "so_DJ.utf8"
so_DJ.utf8 good
Using LC_COLLATE = "so_ET.utf8"
Using LC_CTYPE = "so_ET.utf8"
so_ET.utf8 good
Using LC_COLLATE = "so_KE.utf8"
Using LC_CTYPE = "so_KE.utf8"
so_KE.utf8 good
Using LC_COLLATE = "so_SO.utf8"
Using LC_CTYPE = "so_SO.utf8"
so_SO.utf8 good
Using LC_COLLATE = "sq_AL.utf8"
Using LC_CTYPE = "sq_AL.utf8"
inconsistency between strcoll (286) and strxfrm (285) orders
inconsistency between strcoll (285) and strxfrm (286) orders
sq_AL.utf8 BAD
Using LC_COLLATE = "sq_MK.utf8"
Using LC_CTYPE = "sq_MK.utf8"
inconsistency between strcoll (286) and strxfrm (285) orders
inconsistency between strcoll (285) and strxfrm (286) orders
sq_MK.utf8 BAD
Using LC_COLLATE = "sr_ME.utf8"
Using LC_CTYPE = "sr_ME.utf8"
sr_ME.utf8 good
Using LC_COLLATE = "sr_RS.utf8"
Using LC_CTYPE = "sr_RS.utf8"
sr_RS.utf8 good
Using LC_COLLATE = "sr_RS.utf8@latin"
Using LC_CTYPE = "sr_RS.utf8@latin"
sr_RS.utf8@latin good
Using LC_COLLATE = "ss_ZA.utf8"
Using LC_CTYPE = "ss_ZA.utf8"
ss_ZA.utf8 good
Using LC_COLLATE = "st_ZA.utf8"
Using LC_CTYPE = "st_ZA.utf8"
st_ZA.utf8 good
Using LC_COLLATE = "sv_FI.utf8"
Using LC_CTYPE = "sv_FI.utf8"
inconsistency between strcoll (898) and strxfrm (897) orders
inconsistency between strcoll (897) and strxfrm (898) orders
sv_FI.utf8 BAD
Using LC_COLLATE = "sv_SE.utf8"
Using LC_CTYPE = "sv_SE.utf8"
inconsistency between strcoll (788) and strxfrm (785) orders
inconsistency between strcoll (785) and strxfrm (786) orders
inconsistency between strcoll (786) and strxfrm (787) orders
inconsistency between strcoll (787) and strxfrm (788) orders
inconsistency between strcoll (837) and strxfrm (836) orders
inconsistency between strcoll (836) and strxfrm (837) orders
inconsistency between strcoll (903) and strxfrm (902) orders
inconsistency between strcoll (902) and strxfrm (903) orders
sv_SE.utf8 BAD
Using LC_COLLATE = "ta_IN.utf8"
Using LC_CTYPE = "ta_IN.utf8"
ta_IN.utf8 good
Using LC_COLLATE = "te_IN.utf8"
Using LC_CTYPE = "te_IN.utf8"
te_IN.utf8 good
Using LC_COLLATE = "tg_TJ.utf8"
Using LC_CTYPE = "tg_TJ.utf8"
tg_TJ.utf8 good
Using LC_COLLATE = "th_TH.utf8"
Using LC_CTYPE = "th_TH.utf8"
th_TH.utf8 good
Using LC_COLLATE = "ti_ER.utf8"
Using LC_CTYPE = "ti_ER.utf8"
ti_ER.utf8 good
Using LC_COLLATE = "ti_ET.utf8"
Using LC_CTYPE = "ti_ET.utf8"
ti_ET.utf8 good
Using LC_COLLATE = "tig_ER.utf8"
Using LC_CTYPE = "tig_ER.utf8"
tig_ER.utf8 good
Using LC_COLLATE = "tk_TM.utf8"
Using LC_CTYPE = "tk_TM.utf8"
inconsistency between strcoll (383) and strxfrm (382) orders
inconsistency between strcoll (384) and strxfrm (383) orders
inconsistency between strcoll (382) and strxfrm (384) orders
inconsistency between strcoll (700) and strxfrm (699) orders
inconsistency between strcoll (699) and strxfrm (700) orders
inconsistency between strcoll (858) and strxfrm (857) orders
inconsistency between strcoll (857) and strxfrm (858) orders
tk_TM.utf8 BAD
Using LC_COLLATE = "tl_PH.utf8"
Using LC_CTYPE = "tl_PH.utf8"
tl_PH.utf8 good
Using LC_COLLATE = "tn_ZA.utf8"
Using LC_CTYPE = "tn_ZA.utf8"
tn_ZA.utf8 good
Using LC_COLLATE = "tr_CY.utf8"
Using LC_CTYPE = "tr_CY.utf8"
tr_CY.utf8 good
Using LC_COLLATE = "tr_TR.utf8"
Using LC_CTYPE = "tr_TR.utf8"
tr_TR.utf8 good
Using LC_COLLATE = "ts_ZA.utf8"
Using LC_CTYPE = "ts_ZA.utf8"
ts_ZA.utf8 good
Using LC_COLLATE = "tt_RU.utf8"
Using LC_CTYPE = "tt_RU.utf8"
inconsistency between strcoll (248) and strxfrm (247) orders
inconsistency between strcoll (249) and strxfrm (248) orders
inconsistency between strcoll (247) and strxfrm (249) orders
inconsistency between strcoll (431) and strxfrm (430) orders
inconsistency between strcoll (432) and strxfrm (431) orders
inconsistency between strcoll (430) and strxfrm (432) orders
inconsistency between strcoll (714) and strxfrm (713) orders
inconsistency between strcoll (713) and strxfrm (714) orders
tt_RU.utf8 BAD
Using LC_COLLATE = "tt_RU.utf8@iqtelif"
Using LC_CTYPE = "tt_RU.utf8@iqtelif"
inconsistency between strcoll (431) and strxfrm (430) orders
inconsistency between strcoll (432) and strxfrm (431) orders
inconsistency between strcoll (430) and strxfrm (432) orders
inconsistency between strcoll (700) and strxfrm (699) orders
inconsistency between strcoll (699) and strxfrm (700) orders
tt_RU.utf8@iqtelif BAD
Using LC_COLLATE = "ug_CN.utf8"
Using LC_CTYPE = "ug_CN.utf8"
inconsistency between strcoll (248) and strxfrm (247) orders
inconsistency between strcoll (249) and strxfrm (248) orders
inconsistency between strcoll (247) and strxfrm (249) orders
inconsistency between strcoll (700) and strxfrm (699) orders
inconsistency between strcoll (699) and strxfrm (700) orders
ug_CN.utf8 BAD
Using LC_COLLATE = "uk_UA.utf8"
Using LC_CTYPE = "uk_UA.utf8"
uk_UA.utf8 good
Using LC_COLLATE = "ur_PK.utf8"
Using LC_CTYPE = "ur_PK.utf8"
ur_PK.utf8 good
Using LC_COLLATE = "uz_UZ.utf8@cyrillic"
Using LC_CTYPE = "uz_UZ.utf8@cyrillic"
uz_UZ.utf8@cyrillic good
Using LC_COLLATE = "ve_ZA.utf8"
Using LC_CTYPE = "ve_ZA.utf8"
ve_ZA.utf8 good
Using LC_COLLATE = "vi_VN.utf8"
Using LC_CTYPE = "vi_VN.utf8"
inconsistency between strcoll (379) and strxfrm (378) orders
inconsistency between strcoll (380) and strxfrm (379) orders
inconsistency between strcoll (378) and strxfrm (380) orders
vi_VN.utf8 BAD
Using LC_COLLATE = "wa_BE.utf8"
Using LC_CTYPE = "wa_BE.utf8"
wa_BE.utf8 good
Using LC_COLLATE = "wo_SN.utf8"
Using LC_CTYPE = "wo_SN.utf8"
wo_SN.utf8 good
Using LC_COLLATE = "xh_ZA.utf8"
Using LC_CTYPE = "xh_ZA.utf8"
xh_ZA.utf8 good
Using LC_COLLATE = "yi_US.utf8"
Using LC_CTYPE = "yi_US.utf8"
yi_US.utf8 good
Using LC_COLLATE = "yo_NG.utf8"
Using LC_CTYPE = "yo_NG.utf8"
inconsistency between strcoll (347) and strxfrm (346) orders
inconsistency between strcoll (348) and strxfrm (347) orders
inconsistency between strcoll (346) and strxfrm (348) orders
inconsistency between strcoll (793) and strxfrm (791) orders
inconsistency between strcoll (791) and strxfrm (792) orders
inconsistency between strcoll (792) and strxfrm (793) orders
inconsistency between strcoll (795) and strxfrm (794) orders
inconsistency between strcoll (794) and strxfrm (795) orders
yo_NG.utf8 BAD
Using LC_COLLATE = "zh_CN.utf8"
Using LC_CTYPE = "zh_CN.utf8"
zh_CN.utf8 good
Using LC_COLLATE = "zh_HK.utf8"
Using LC_CTYPE = "zh_HK.utf8"
zh_HK.utf8 good
Using LC_COLLATE = "zh_SG.utf8"
Using LC_CTYPE = "zh_SG.utf8"
zh_SG.utf8 good
Using LC_COLLATE = "zh_TW.utf8"
Using LC_CTYPE = "zh_TW.utf8"
zh_TW.utf8 good
Using LC_COLLATE = "zu_ZA.utf8"
Using LC_CTYPE = "zu_ZA.utf8"
zu_ZA.utf8 good

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Tom Lane

Date:

22 March 2016, 23:26:26

Peter Geoghegan <pg@heroku.com> writes:
> I now think that we have this backwards: This isn't a bug in glibc's
> strxfrm(); it's a bug in glibc's strcoll().

FWIW, the test program I just posted includes checks to see if the two
cases produce self-consistent sort orders.  So far I've seen no evidence
that they don't; that is, strcoll() produces a consistent sort order,
and strxfrm() produces a consistent sort order, but not the same one.
That being the case, arguing about which one is wrong seems a bit
academic, not to mention well above my pay grade so far as the theoretical
behavior of locale-specific sort ordering is concerned.

            regards, tom lane

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

22 March 2016, 23:27:13

On Tue, Mar 22, 2016 at 4:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <pg@heroku.com> writes:
>> I now think that we have this backwards: This isn't a bug in glibc's
>> strxfrm(); it's a bug in glibc's strcoll().
>
> FWIW, the test program I just posted includes checks to see if the two
> cases produce self-consistent sort orders.  So far I've seen no evidence
> that they don't; that is, strcoll() produces a consistent sort order,
> and strxfrm() produces a consistent sort order, but not the same one.
> That being the case, arguing about which one is wrong seems a bit
> academic, not to mention well above my pay grade so far as the theoretical
> behavior of locale-specific sort ordering is concerned.

I hope you're right about it being academic.

--
Peter Geoghegan

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Robert Haas

Date:

22 March 2016, 23:48:37

On Tue, Mar 22, 2016 at 7:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Please try this on as many platforms as you can get hold of ...

On MacOS X 10.10.5, this fails because the strxfrm() blobs are far
longer than the maximum you defined (about 8n+8 bytes, IIRC).  I fixed
that and ran this; all locales tested good.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Tom Lane

Date:

22 March 2016, 23:51:22

Robert Haas <robertmhaas@gmail.com> writes:
>> I was a little worried that it was too much to hope for that all libc
>> vendors on earth would ship a strxfrm() implementation that was actually
>> consistent with strcoll(), and here we are.

BTW, the glibc discussion starting here:
https://sourceware.org/ml/libc-alpha/2015-09/msg00196.html
should put substantial fear in us about the advisability of putting strxfrm
results on-disk, as I understand we're now doing in btrees.

I was led to that while looking to see if there were any already-filed
glibc bug reports concerning this issue.  AFAICS there are not, which
is odd if the bug is gone in more recent releases ...

            regards, tom lane

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Robert Haas

Date:

22 March 2016, 23:52:51

On Tue, Mar 22, 2016 at 7:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>>> I was a little worried that it was too much to hope for that all libc
>>> vendors on earth would ship a strxfrm() implementation that was actually
>>> consistent with strcoll(), and here we are.
>
> BTW, the glibc discussion starting here:
> https://sourceware.org/ml/libc-alpha/2015-09/msg00196.html
> should put substantial fear in us about the advisability of putting strxfrm
> results on-disk, as I understand we're now doing in btrees.

No.  Peter proposed that, but it hasn't actually been done.  This
certainly makes that sound inadvisable, though.

We are, however, putting indexes on disk whose ordering was determined
partly by the result of strxfrm() comparisons.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

22 March 2016, 23:58:34

On Tue, Mar 22, 2016 at 4:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> BTW, the glibc discussion starting here:
> https://sourceware.org/ml/libc-alpha/2015-09/msg00196.html
> should put substantial fear in us about the advisability of putting strxfrm
> results on-disk, as I understand we're now doing in btrees.
>
> I was led to that while looking to see if there were any already-filed
> glibc bug reports concerning this issue.  AFAICS there are not, which
> is odd if the bug is gone in more recent releases ...

I always knew it wouldn't fly to store strxfrm on disk, and we don't
do that. I actually quoted a paper saying just that at one point. I
specifically acknowledged that that was clearly a non-starter a couple
of times.

B-Trees are built based on strxfrm() comparisons at a point in time.
strxfrm() should be able to produce the same results as strcoll().
That is what it's documented to do, in C90. glibc has license to
change the strxfrm() representation while still producing answers
consistent with previous answers. Just not during an ongoing sort,
obviously.

It's not 100% clear that we have a contract with glibc to never change
collation rules, even for strcoll(), but our current use of strxfrm()
should not have made that any worse. Problems only cropped up because
of bugs in glibc.

--
Peter Geoghegan

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Tom Lane

Date:

22 March 2016, 23:59:09

Robert Haas <robertmhaas@gmail.com> writes:
> We are, however, putting indexes on disk whose ordering was determined
> partly by the result of strxfrm() comparisons.

Yeah.  It appears to me that the originally-submitted test case creates
an index whose entries are ordered correctly according to strxfrm(),
but not so much according to strcoll().

            regards, tom lane

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Robert Haas

Date:

23 March 2016, 00:05:32

On Tue, Mar 22, 2016 at 7:48 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Mar 22, 2016 at 7:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Please try this on as many platforms as you can get hold of ...
>
> On MacOS X 10.10.5, this fails because the strxfrm() blobs are far
> longer than the maximum you defined (about 8n+8 bytes, IIRC).  I fixed
> that and ran this; all locales tested good.

Here are the results on Fedora 16 and RHEL 7.1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Tom Lane

Date:

23 March 2016, 00:51:44

Robert Haas <robertmhaas@gmail.com> writes:
> Here are the results on Fedora 16 and RHEL 7.1.

So much for the theory that it's fixed in RHEL7.  I now think that the
glibc folk actually do not know about this, and have accordingly filed
https://bugzilla.redhat.com/show_bug.cgi?id=1320356

            regards, tom lane

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Robert Haas

Date:

23 March 2016, 00:57:11

On Tue, Mar 22, 2016 at 8:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Here are the results on Fedora 16 and RHEL 7.1.
>
> So much for the theory that it's fixed in RHEL7.  I now think that the
> glibc folk actually do not know about this, and have accordingly filed
> https://bugzilla.redhat.com/show_bug.cgi?id=1320356

Good plan, but what do we do between now and when they fix it?  This
seems quite bad.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Tom Lane

Date:

23 March 2016, 01:02:23

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Mar 22, 2016 at 8:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So much for the theory that it's fixed in RHEL7.  I now think that the
>> glibc folk actually do not know about this, and have accordingly filed
>> https://bugzilla.redhat.com/show_bug.cgi?id=1320356

> Good plan, but what do we do between now and when they fix it?  This
> seems quite bad.

At the moment I think we're still in information-gathering mode.
The upstream reaction to this will be valuable data.  In the meantime,
I'd still like to find out which other platforms have similar issues.
I really kinda doubt the upthread report that Ubuntu doesn't have a
comparable problem, for instance, given the lack of any evidence that
this is a known/fixed issue in glibc.

            regards, tom lane

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Stephen Frost

Date:

23 March 2016, 01:15:13

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I was a little worried that it was too much to hope for that all libc
> > vendors on earth would ship a strxfrm() implementation that was actually
> > consistent with strcoll(), and here we are.
>=20
> Indeed.  To try to put some scope on the problem, I made an idiot little
> program that just generates some random UTF8 strings and sees whether
> strcoll and strxfrm sort them alike.  Attached are that program, a even
> more idiot little shell script that runs it over all available UTF8
> locales, and the results on my RHEL6 box.  While de_DE seems to be the
> worst-broken locale, it's far from the only one.
>=20
> Please try this on as many platforms as you can get hold of ...

Results for Ubuntu 15.10:

Using LC_COLLATE =3D "C.UTF-8"
Using LC_CTYPE =3D "en_US.UTF-8"
C.UTF-8 good
Using LC_COLLATE =3D "de_DE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
de_DE.utf8 good
Using LC_COLLATE =3D "en_AG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_AG.utf8 good
Using LC_COLLATE =3D "en_AU.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_AU.utf8 good
Using LC_COLLATE =3D "en_BW.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_BW.utf8 good
Using LC_COLLATE =3D "en_CA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_CA.utf8 good
Using LC_COLLATE =3D "en_DK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_DK.utf8 good
Using LC_COLLATE =3D "en_GB.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_GB.utf8 good
Using LC_COLLATE =3D "en_HK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_HK.utf8 good
Using LC_COLLATE =3D "en_IE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_IE.utf8 good
Using LC_COLLATE =3D "en_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_IN.utf8 good
Using LC_COLLATE =3D "en_NG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_NG.utf8 good
Using LC_COLLATE =3D "en_NZ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_NZ.utf8 good
Using LC_COLLATE =3D "en_PH.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_PH.utf8 good
Using LC_COLLATE =3D "en_SG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_SG.utf8 good
Using LC_COLLATE =3D "en_US.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_US.utf8 good
Using LC_COLLATE =3D "en_ZA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_ZA.utf8 good
Using LC_COLLATE =3D "en_ZM.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_ZM.utf8 good
Using LC_COLLATE =3D "en_ZW.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_ZW.utf8 good

Will try on others.

Thanks!

Stephen

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Thomas Munro

Date:

23 March 2016, 01:18:45

On Wed, Mar 23, 2016 at 12:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I was a little worried that it was too much to hope for that all libc
>> vendors on earth would ship a strxfrm() implementation that was actually
>> consistent with strcoll(), and here we are.
>
> Indeed.  To try to put some scope on the problem, I made an idiot little
> program that just generates some random UTF8 strings and sees whether
> strcoll and strxfrm sort them alike.  Attached are that program, a even
> more idiot little shell script that runs it over all available UTF8
> locales, and the results on my RHEL6 box.  While de_DE seems to be the
> worst-broken locale, it's far from the only one.
>
> Please try this on as many platforms as you can get hold of ...

Failed on Debian 8.2, but only for de_DE.utf8.  libc 2.19-18+deb8u1.  Attached.

--
Thomas Munro
http://www.enterprisedb.com

Attachment

debian-8.2-results.txt

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Stephen Frost

Date:

23 March 2016, 01:20:04

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I was a little worried that it was too much to hope for that all libc
> > vendors on earth would ship a strxfrm() implementation that was actually
> > consistent with strcoll(), and here we are.
>=20
> Indeed.  To try to put some scope on the problem, I made an idiot little
> program that just generates some random UTF8 strings and sees whether
> strcoll and strxfrm sort them alike.  Attached are that program, a even
> more idiot little shell script that runs it over all available UTF8
> locales, and the results on my RHEL6 box.  While de_DE seems to be the
> worst-broken locale, it's far from the only one.
>=20
> Please try this on as many platforms as you can get hold of ...

Results for Ubuntu 14.04:

sfrost@dwemer:/home/sfrost> sh tryalllocales.sh            =20
Using LC_COLLATE =3D "C.UTF-8"
Using LC_CTYPE =3D "en_US.UTF-8"
C.UTF-8 good
Using LC_COLLATE =3D "de_DE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
inconsistency between strcoll (36) and strxfrm (35) orders
inconsistency between strcoll (35) and strxfrm (36) orders
inconsistency between strcoll (160) and strxfrm (159) orders
inconsistency between strcoll (159) and strxfrm (160) orders
inconsistency between strcoll (347) and strxfrm (346) orders
inconsistency between strcoll (348) and strxfrm (347) orders
inconsistency between strcoll (346) and strxfrm (348) orders
inconsistency between strcoll (355) and strxfrm (353) orders
inconsistency between strcoll (353) and strxfrm (354) orders
inconsistency between strcoll (354) and strxfrm (355) orders
inconsistency between strcoll (440) and strxfrm (439) orders
inconsistency between strcoll (441) and strxfrm (440) orders
inconsistency between strcoll (439) and strxfrm (441) orders
inconsistency between strcoll (450) and strxfrm (449) orders
inconsistency between strcoll (449) and strxfrm (450) orders
inconsistency between strcoll (454) and strxfrm (452) orders
inconsistency between strcoll (455) and strxfrm (453) orders
inconsistency between strcoll (452) and strxfrm (454) orders
inconsistency between strcoll (453) and strxfrm (455) orders
inconsistency between strcoll (521) and strxfrm (520) orders
inconsistency between strcoll (520) and strxfrm (521) orders
inconsistency between strcoll (529) and strxfrm (528) orders
inconsistency between strcoll (528) and strxfrm (529) orders
inconsistency between strcoll (682) and strxfrm (681) orders
inconsistency between strcoll (681) and strxfrm (682) orders
inconsistency between strcoll (743) and strxfrm (742) orders
inconsistency between strcoll (742) and strxfrm (743) orders
inconsistency between strcoll (830) and strxfrm (829) orders
inconsistency between strcoll (829) and strxfrm (830) orders
inconsistency between strcoll (870) and strxfrm (869) orders
inconsistency between strcoll (869) and strxfrm (870) orders
inconsistency between strcoll (933) and strxfrm (931) orders
inconsistency between strcoll (931) and strxfrm (932) orders
inconsistency between strcoll (932) and strxfrm (933) orders
de_DE.utf8 BAD
Using LC_COLLATE =3D "en_US.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_US.utf8 good

Thanks!

Stephen

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Stephen Frost

Date:

23 March 2016, 01:30:18

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I was a little worried that it was too much to hope for that all libc
> > vendors on earth would ship a strxfrm() implementation that was actually
> > consistent with strcoll(), and here we are.
>=20
> Indeed.  To try to put some scope on the problem, I made an idiot little
> program that just generates some random UTF8 strings and sees whether
> strcoll and strxfrm sort them alike.  Attached are that program, a even
> more idiot little shell script that runs it over all available UTF8
> locales, and the results on my RHEL6 box.  While de_DE seems to be the
> worst-broken locale, it's far from the only one.
>=20
> Please try this on as many platforms as you can get hold of ...

I found the 'all' button on Debian 8.3:

sfrost@mahout:~$ sh tryalllocales.sh=20
Using LC_COLLATE =3D "aa_DJ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
aa_DJ.utf8 good
Using LC_COLLATE =3D "aa_ER.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
aa_ER.utf8 good
Using LC_COLLATE =3D "aa_ER.utf8@saaho"
Using LC_CTYPE =3D "en_US.UTF-8"
aa_ER.utf8@saaho good
Using LC_COLLATE =3D "aa_ET.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
aa_ET.utf8 good
Using LC_COLLATE =3D "af_ZA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
af_ZA.utf8 good
Using LC_COLLATE =3D "ak_GH.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ak_GH.utf8 good
Using LC_COLLATE =3D "am_ET.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
am_ET.utf8 good
Using LC_COLLATE =3D "an_ES.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
an_ES.utf8 good
Using LC_COLLATE =3D "anp_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
anp_IN.utf8 good
Using LC_COLLATE =3D "ar_AE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_AE.utf8 good
Using LC_COLLATE =3D "ar_BH.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_BH.utf8 good
Using LC_COLLATE =3D "ar_DZ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_DZ.utf8 good
Using LC_COLLATE =3D "ar_EG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_EG.utf8 good
Using LC_COLLATE =3D "ar_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_IN.utf8 good
Using LC_COLLATE =3D "ar_IQ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_IQ.utf8 good
Using LC_COLLATE =3D "ar_JO.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_JO.utf8 good
Using LC_COLLATE =3D "ar_KW.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_KW.utf8 good
Using LC_COLLATE =3D "ar_LB.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_LB.utf8 good
Using LC_COLLATE =3D "ar_LY.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_LY.utf8 good
Using LC_COLLATE =3D "ar_MA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_MA.utf8 good
Using LC_COLLATE =3D "ar_OM.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_OM.utf8 good
Using LC_COLLATE =3D "ar_QA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_QA.utf8 good
Using LC_COLLATE =3D "ar_SA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_SA.utf8 good
Using LC_COLLATE =3D "ar_SD.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_SD.utf8 good
Using LC_COLLATE =3D "ar_SS.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_SS.utf8 good
Using LC_COLLATE =3D "ar_SY.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_SY.utf8 good
Using LC_COLLATE =3D "ar_TN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_TN.utf8 good
Using LC_COLLATE =3D "ar_YE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_YE.utf8 good
Using LC_COLLATE =3D "as_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
as_IN.utf8 good
Using LC_COLLATE =3D "ast_ES.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ast_ES.utf8 good
Using LC_COLLATE =3D "ayc_PE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ayc_PE.utf8 good
Using LC_COLLATE =3D "az_AZ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
az_AZ.utf8 good
Using LC_COLLATE =3D "be_BY.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
be_BY.utf8 good
Using LC_COLLATE =3D "be_BY.utf8@latin"
Using LC_CTYPE =3D "en_US.UTF-8"
be_BY.utf8@latin good
Using LC_COLLATE =3D "bem_ZM.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
bem_ZM.utf8 good
Using LC_COLLATE =3D "ber_DZ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ber_DZ.utf8 good
Using LC_COLLATE =3D "ber_MA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ber_MA.utf8 good
Using LC_COLLATE =3D "bg_BG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
bg_BG.utf8 good
Using LC_COLLATE =3D "bho_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
bho_IN.utf8 good
Using LC_COLLATE =3D "bn_BD.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
bn_BD.utf8 good
Using LC_COLLATE =3D "bn_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
bn_IN.utf8 good
Using LC_COLLATE =3D "bo_CN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
bo_CN.utf8 good
Using LC_COLLATE =3D "bo_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
bo_IN.utf8 good
Using LC_COLLATE =3D "br_FR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
br_FR.utf8 good
Using LC_COLLATE =3D "brx_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
brx_IN.utf8 good
Using LC_COLLATE =3D "bs_BA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
bs_BA.utf8 good
Using LC_COLLATE =3D "byn_ER.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
byn_ER.utf8 good
Using LC_COLLATE =3D "ca_AD.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ca_AD.utf8 good
Using LC_COLLATE =3D "ca_ES.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ca_ES.utf8 good
Using LC_COLLATE =3D "ca_ES.utf8@valencia"
Using LC_CTYPE =3D "en_US.UTF-8"
ca_ES.utf8@valencia good
Using LC_COLLATE =3D "ca_FR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ca_FR.utf8 good
Using LC_COLLATE =3D "ca_IT.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ca_IT.utf8 good
Using LC_COLLATE =3D "cmn_TW.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
cmn_TW.utf8 good
Using LC_COLLATE =3D "crh_UA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
crh_UA.utf8 good
Using LC_COLLATE =3D "csb_PL.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
csb_PL.utf8 good
Using LC_COLLATE =3D "cs_CZ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
cs_CZ.utf8 good
Using LC_COLLATE =3D "C.UTF-8"
Using LC_CTYPE =3D "en_US.UTF-8"
C.UTF-8 good
Using LC_COLLATE =3D "cv_RU.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
cv_RU.utf8 good
Using LC_COLLATE =3D "cy_GB.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
cy_GB.utf8 good
Using LC_COLLATE =3D "da_DK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
da_DK.utf8 good
Using LC_COLLATE =3D "de_AT.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
de_AT.utf8 good
Using LC_COLLATE =3D "de_BE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
de_BE.utf8 good
Using LC_COLLATE =3D "de_CH.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
de_CH.utf8 good
Using LC_COLLATE =3D "de_DE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
inconsistency between strcoll (72) and strxfrm (71) orders
inconsistency between strcoll (71) and strxfrm (72) orders
inconsistency between strcoll (136) and strxfrm (135) orders
inconsistency between strcoll (135) and strxfrm (136) orders
inconsistency between strcoll (135) and strxfrm (136) orders
inconsistency between strcoll (139) and strxfrm (137) orders
inconsistency between strcoll (140) and strxfrm (138) orders
inconsistency between strcoll (137) and strxfrm (139) orders
inconsistency between strcoll (138) and strxfrm (140) orders
inconsistency between strcoll (149) and strxfrm (148) orders
inconsistency between strcoll (148) and strxfrm (149) orders
inconsistency between strcoll (254) and strxfrm (252) orders
inconsistency between strcoll (252) and strxfrm (253) orders
inconsistency between strcoll (253) and strxfrm (254) orders
inconsistency between strcoll (274) and strxfrm (273) orders
inconsistency between strcoll (275) and strxfrm (274) orders
inconsistency between strcoll (273) and strxfrm (275) orders
inconsistency between strcoll (339) and strxfrm (338) orders
inconsistency between strcoll (338) and strxfrm (339) orders
inconsistency between strcoll (338) and strxfrm (339) orders
inconsistency between strcoll (390) and strxfrm (388) orders
inconsistency between strcoll (388) and strxfrm (389) orders
inconsistency between strcoll (389) and strxfrm (390) orders
inconsistency between strcoll (411) and strxfrm (410) orders
inconsistency between strcoll (410) and strxfrm (411) orders
inconsistency between strcoll (449) and strxfrm (448) orders
inconsistency between strcoll (448) and strxfrm (449) orders
inconsistency between strcoll (454) and strxfrm (453) orders
inconsistency between strcoll (453) and strxfrm (454) orders
inconsistency between strcoll (529) and strxfrm (528) orders
inconsistency between strcoll (528) and strxfrm (529) orders
inconsistency between strcoll (543) and strxfrm (542) orders
inconsistency between strcoll (544) and strxfrm (543) orders
inconsistency between strcoll (542) and strxfrm (544) orders
inconsistency between strcoll (542) and strxfrm (544) orders
inconsistency between strcoll (567) and strxfrm (566) orders
inconsistency between strcoll (566) and strxfrm (567) orders
inconsistency between strcoll (589) and strxfrm (588) orders
inconsistency between strcoll (588) and strxfrm (589) orders
inconsistency between strcoll (592) and strxfrm (591) orders
inconsistency between strcoll (591) and strxfrm (592) orders
inconsistency between strcoll (594) and strxfrm (593) orders
inconsistency between strcoll (593) and strxfrm (594) orders
inconsistency between strcoll (597) and strxfrm (595) orders
inconsistency between strcoll (595) and strxfrm (596) orders
inconsistency between strcoll (596) and strxfrm (597) orders
inconsistency between strcoll (601) and strxfrm (600) orders
inconsistency between strcoll (600) and strxfrm (601) orders
inconsistency between strcoll (726) and strxfrm (724) orders
inconsistency between strcoll (724) and strxfrm (725) orders
inconsistency between strcoll (725) and strxfrm (726) orders
inconsistency between strcoll (743) and strxfrm (741) orders
inconsistency between strcoll (741) and strxfrm (742) orders
inconsistency between strcoll (741) and strxfrm (742) orders
inconsistency between strcoll (744) and strxfrm (743) orders
inconsistency between strcoll (742) and strxfrm (744) orders
inconsistency between strcoll (765) and strxfrm (764) orders
inconsistency between strcoll (764) and strxfrm (765) orders
inconsistency between strcoll (786) and strxfrm (784) orders
inconsistency between strcoll (784) and strxfrm (786) orders
inconsistency between strcoll (896) and strxfrm (895) orders
inconsistency between strcoll (895) and strxfrm (896) orders
inconsistency between strcoll (941) and strxfrm (939) orders
inconsistency between strcoll (942) and strxfrm (940) orders
inconsistency between strcoll (943) and strxfrm (941) orders
inconsistency between strcoll (939) and strxfrm (942) orders
inconsistency between strcoll (940) and strxfrm (943) orders
de_DE.utf8 BAD
Using LC_COLLATE =3D "de_LI.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
de_LI.utf8 good
Using LC_COLLATE =3D "de_LU.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
de_LU.utf8 good
Using LC_COLLATE =3D "doi_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
doi_IN.utf8 good
Using LC_COLLATE =3D "dv_MV.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
dv_MV.utf8 good
Using LC_COLLATE =3D "dz_BT.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
dz_BT.utf8 good
Using LC_COLLATE =3D "el_CY.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
el_CY.utf8 good
Using LC_COLLATE =3D "el_GR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
el_GR.utf8 good
Using LC_COLLATE =3D "en_AG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_AG.utf8 good
Using LC_COLLATE =3D "en_AU.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_AU.utf8 good
Using LC_COLLATE =3D "en_BW.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_BW.utf8 good
Using LC_COLLATE =3D "en_CA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_CA.utf8 good
Using LC_COLLATE =3D "en_DK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_DK.utf8 good
Using LC_COLLATE =3D "en_GB.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_GB.utf8 good
Using LC_COLLATE =3D "en_HK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_HK.utf8 good
Using LC_COLLATE =3D "en_IE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_IE.utf8 good
Using LC_COLLATE =3D "en_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_IN.utf8 good
Using LC_COLLATE =3D "en_NG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_NG.utf8 good
Using LC_COLLATE =3D "en_NZ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_NZ.utf8 good
Using LC_COLLATE =3D "en_PH.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_PH.utf8 good
Using LC_COLLATE =3D "en_SG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_SG.utf8 good
Using LC_COLLATE =3D "en_US.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_US.utf8 good
Using LC_COLLATE =3D "en_ZA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_ZA.utf8 good
Using LC_COLLATE =3D "en_ZM.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_ZM.utf8 good
Using LC_COLLATE =3D "en_ZW.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_ZW.utf8 good
Using LC_COLLATE =3D "eo.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
eo.utf8 good
Using LC_COLLATE =3D "es_AR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_AR.utf8 good
Using LC_COLLATE =3D "es_BO.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_BO.utf8 good
Using LC_COLLATE =3D "es_CL.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_CL.utf8 good
Using LC_COLLATE =3D "es_CO.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_CO.utf8 good
Using LC_COLLATE =3D "es_CR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_CR.utf8 good
Using LC_COLLATE =3D "es_CU.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_CU.utf8 good
Using LC_COLLATE =3D "es_DO.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_DO.utf8 good
Using LC_COLLATE =3D "es_EC.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_EC.utf8 good
Using LC_COLLATE =3D "es_ES.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_ES.utf8 good
Using LC_COLLATE =3D "es_GT.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_GT.utf8 good
Using LC_COLLATE =3D "es_HN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_HN.utf8 good
Using LC_COLLATE =3D "es_MX.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_MX.utf8 good
Using LC_COLLATE =3D "es_NI.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_NI.utf8 good
Using LC_COLLATE =3D "es_PA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_PA.utf8 good
Using LC_COLLATE =3D "es_PE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_PE.utf8 good
Using LC_COLLATE =3D "es_PR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_PR.utf8 good
Using LC_COLLATE =3D "es_PY.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_PY.utf8 good
Using LC_COLLATE =3D "es_SV.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_SV.utf8 good
Using LC_COLLATE =3D "es_US.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_US.utf8 good
Using LC_COLLATE =3D "es_UY.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_UY.utf8 good
Using LC_COLLATE =3D "es_VE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_VE.utf8 good
Using LC_COLLATE =3D "et_EE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
et_EE.utf8 good
Using LC_COLLATE =3D "eu_ES.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
eu_ES.utf8 good
Using LC_COLLATE =3D "eu_FR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
eu_FR.utf8 good
Using LC_COLLATE =3D "fa_IR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fa_IR.utf8 good
Using LC_COLLATE =3D "ff_SN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ff_SN.utf8 good
Using LC_COLLATE =3D "fi_FI.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fi_FI.utf8 good
Using LC_COLLATE =3D "fil_PH.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fil_PH.utf8 good
Using LC_COLLATE =3D "fo_FO.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fo_FO.utf8 good
Using LC_COLLATE =3D "fr_BE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fr_BE.utf8 good
Using LC_COLLATE =3D "fr_CA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fr_CA.utf8 good
Using LC_COLLATE =3D "fr_CH.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fr_CH.utf8 good
Using LC_COLLATE =3D "fr_FR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fr_FR.utf8 good
Using LC_COLLATE =3D "fr_LU.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fr_LU.utf8 good
Using LC_COLLATE =3D "fur_IT.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fur_IT.utf8 good
Using LC_COLLATE =3D "fy_DE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fy_DE.utf8 good
Using LC_COLLATE =3D "fy_NL.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fy_NL.utf8 good
Using LC_COLLATE =3D "ga_IE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ga_IE.utf8 good
Using LC_COLLATE =3D "gd_GB.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
gd_GB.utf8 good
Using LC_COLLATE =3D "gez_ER.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
gez_ER.utf8 good
Using LC_COLLATE =3D "gez_ER.utf8@abegede"
Using LC_CTYPE =3D "en_US.UTF-8"
gez_ER.utf8@abegede good
Using LC_COLLATE =3D "gez_ET.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
gez_ET.utf8 good
Using LC_COLLATE =3D "gez_ET.utf8@abegede"
Using LC_CTYPE =3D "en_US.UTF-8"
gez_ET.utf8@abegede good
Using LC_COLLATE =3D "gl_ES.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
gl_ES.utf8 good
Using LC_COLLATE =3D "gu_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
gu_IN.utf8 good
Using LC_COLLATE =3D "gv_GB.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
gv_GB.utf8 good
Using LC_COLLATE =3D "hak_TW.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
hak_TW.utf8 good
Using LC_COLLATE =3D "ha_NG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ha_NG.utf8 good
Using LC_COLLATE =3D "he_IL.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
he_IL.utf8 good
Using LC_COLLATE =3D "hi_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
hi_IN.utf8 good
Using LC_COLLATE =3D "hne_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
hne_IN.utf8 good
Using LC_COLLATE =3D "hr_HR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
hr_HR.utf8 good
Using LC_COLLATE =3D "hsb_DE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
hsb_DE.utf8 good
Using LC_COLLATE =3D "ht_HT.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ht_HT.utf8 good
Using LC_COLLATE =3D "hu_HU.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
hu_HU.utf8 good
Using LC_COLLATE =3D "hy_AM.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
hy_AM.utf8 good
Using LC_COLLATE =3D "ia_FR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ia_FR.utf8 good
Using LC_COLLATE =3D "id_ID.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
id_ID.utf8 good
Using LC_COLLATE =3D "ig_NG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ig_NG.utf8 good
Using LC_COLLATE =3D "ik_CA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ik_CA.utf8 good
Using LC_COLLATE =3D "is_IS.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
is_IS.utf8 good
Using LC_COLLATE =3D "it_CH.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
it_CH.utf8 good
Using LC_COLLATE =3D "it_IT.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
it_IT.utf8 good
Using LC_COLLATE =3D "iu_CA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
iu_CA.utf8 good
Using LC_COLLATE =3D "iw_IL.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
iw_IL.utf8 good
Using LC_COLLATE =3D "ja_JP.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ja_JP.utf8 good
Using LC_COLLATE =3D "ka_GE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ka_GE.utf8 good
Using LC_COLLATE =3D "kk_KZ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
kk_KZ.utf8 good
Using LC_COLLATE =3D "kl_GL.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
kl_GL.utf8 good
Using LC_COLLATE =3D "km_KH.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
km_KH.utf8 good
Using LC_COLLATE =3D "kn_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
kn_IN.utf8 good
Using LC_COLLATE =3D "kok_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
kok_IN.utf8 good
Using LC_COLLATE =3D "ko_KR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ko_KR.utf8 good
Using LC_COLLATE =3D "ks_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ks_IN.utf8 good
Using LC_COLLATE =3D "ks_IN.utf8@devanagari"
Using LC_CTYPE =3D "en_US.UTF-8"
ks_IN.utf8@devanagari good
Using LC_COLLATE =3D "ku_TR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ku_TR.utf8 good
Using LC_COLLATE =3D "kw_GB.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
kw_GB.utf8 good
Using LC_COLLATE =3D "ky_KG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ky_KG.utf8 good
Using LC_COLLATE =3D "lb_LU.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
inconsistency between strcoll (137) and strxfrm (136) orders
inconsistency between strcoll (136) and strxfrm (137) orders
inconsistency between strcoll (171) and strxfrm (170) orders
inconsistency between strcoll (170) and strxfrm (171) orders
inconsistency between strcoll (351) and strxfrm (350) orders
inconsistency between strcoll (350) and strxfrm (351) orders
inconsistency between strcoll (350) and strxfrm (351) orders
inconsistency between strcoll (356) and strxfrm (353) orders
inconsistency between strcoll (353) and strxfrm (354) orders
inconsistency between strcoll (354) and strxfrm (355) orders
inconsistency between strcoll (355) and strxfrm (356) orders
inconsistency between strcoll (465) and strxfrm (464) orders
inconsistency between strcoll (464) and strxfrm (465) orders
inconsistency between strcoll (467) and strxfrm (466) orders
inconsistency between strcoll (466) and strxfrm (467) orders
inconsistency between strcoll (470) and strxfrm (469) orders
inconsistency between strcoll (469) and strxfrm (470) orders
inconsistency between strcoll (573) and strxfrm (572) orders
inconsistency between strcoll (574) and strxfrm (573) orders
inconsistency between strcoll (572) and strxfrm (574) orders
inconsistency between strcoll (572) and strxfrm (574) orders
inconsistency between strcoll (612) and strxfrm (611) orders
inconsistency between strcoll (611) and strxfrm (612) orders
inconsistency between strcoll (709) and strxfrm (708) orders
inconsistency between strcoll (710) and strxfrm (709) orders
inconsistency between strcoll (708) and strxfrm (710) orders
inconsistency between strcoll (771) and strxfrm (770) orders
inconsistency between strcoll (770) and strxfrm (771) orders
inconsistency between strcoll (789) and strxfrm (787) orders
inconsistency between strcoll (787) and strxfrm (788) orders
inconsistency between strcoll (788) and strxfrm (789) orders
inconsistency between strcoll (948) and strxfrm (947) orders
inconsistency between strcoll (947) and strxfrm (948) orders
lb_LU.utf8 BAD
Using LC_COLLATE =3D "lg_UG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
lg_UG.utf8 good
Using LC_COLLATE =3D "li_BE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
li_BE.utf8 good
Using LC_COLLATE =3D "lij_IT.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
lij_IT.utf8 good
Using LC_COLLATE =3D "li_NL.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
li_NL.utf8 good
Using LC_COLLATE =3D "lo_LA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
lo_LA.utf8 good
Using LC_COLLATE =3D "lt_LT.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
lt_LT.utf8 good
Using LC_COLLATE =3D "lv_LV.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
lv_LV.utf8 good
Using LC_COLLATE =3D "lzh_TW.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
lzh_TW.utf8 good
Using LC_COLLATE =3D "mag_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
mag_IN.utf8 good
Using LC_COLLATE =3D "mai_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
mai_IN.utf8 good
Using LC_COLLATE =3D "mg_MG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
mg_MG.utf8 good
Using LC_COLLATE =3D "mhr_RU.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
mhr_RU.utf8 good
Using LC_COLLATE =3D "mi_NZ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
mi_NZ.utf8 good
Using LC_COLLATE =3D "mk_MK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
mk_MK.utf8 good
Using LC_COLLATE =3D "ml_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ml_IN.utf8 good
Using LC_COLLATE =3D "mni_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
mni_IN.utf8 good
Using LC_COLLATE =3D "mn_MN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
mn_MN.utf8 good
Using LC_COLLATE =3D "mr_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
mr_IN.utf8 good
Using LC_COLLATE =3D "ms_MY.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ms_MY.utf8 good
Using LC_COLLATE =3D "mt_MT.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
mt_MT.utf8 good
Using LC_COLLATE =3D "my_MM.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
my_MM.utf8 good
Using LC_COLLATE =3D "nan_TW.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
nan_TW.utf8 good
Using LC_COLLATE =3D "nan_TW.utf8@latin"
Using LC_CTYPE =3D "en_US.UTF-8"
nan_TW.utf8@latin good
Using LC_COLLATE =3D "nb_NO.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
nb_NO.utf8 good
Using LC_COLLATE =3D "nds_DE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
nds_DE.utf8 good
Using LC_COLLATE =3D "nds_NL.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
nds_NL.utf8 good
Using LC_COLLATE =3D "ne_NP.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ne_NP.utf8 good
Using LC_COLLATE =3D "nhn_MX.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
nhn_MX.utf8 good
Using LC_COLLATE =3D "niu_NU.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
niu_NU.utf8 good
Using LC_COLLATE =3D "niu_NZ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
niu_NZ.utf8 good
Using LC_COLLATE =3D "nl_AW.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
nl_AW.utf8 good
Using LC_COLLATE =3D "nl_BE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
nl_BE.utf8 good
Using LC_COLLATE =3D "nl_NL.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
nl_NL.utf8 good
Using LC_COLLATE =3D "nn_NO.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
nn_NO.utf8 good
Using LC_COLLATE =3D "nr_ZA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
nr_ZA.utf8 good
Using LC_COLLATE =3D "nso_ZA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
nso_ZA.utf8 good
Using LC_COLLATE =3D "oc_FR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
oc_FR.utf8 good
Using LC_COLLATE =3D "om_ET.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
om_ET.utf8 good
Using LC_COLLATE =3D "om_KE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
om_KE.utf8 good
Using LC_COLLATE =3D "or_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
or_IN.utf8 good
Using LC_COLLATE =3D "os_RU.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
os_RU.utf8 good
Using LC_COLLATE =3D "pa_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
pa_IN.utf8 good
Using LC_COLLATE =3D "pap_AN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
pap_AN.utf8 good
Using LC_COLLATE =3D "pap_AW.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
pap_AW.utf8 good
Using LC_COLLATE =3D "pap_CW.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
pap_CW.utf8 good
Using LC_COLLATE =3D "pa_PK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
pa_PK.utf8 good
Using LC_COLLATE =3D "pl_PL.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
pl_PL.utf8 good
Using LC_COLLATE =3D "ps_AF.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ps_AF.utf8 good
Using LC_COLLATE =3D "pt_BR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
pt_BR.utf8 good
Using LC_COLLATE =3D "pt_PT.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
pt_PT.utf8 good
Using LC_COLLATE =3D "quz_PE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
quz_PE.utf8 good
Using LC_COLLATE =3D "ro_RO.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ro_RO.utf8 good
Using LC_COLLATE =3D "ru_RU.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ru_RU.utf8 good
Using LC_COLLATE =3D "ru_UA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ru_UA.utf8 good
Using LC_COLLATE =3D "rw_RW.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
rw_RW.utf8 good
Using LC_COLLATE =3D "sa_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sa_IN.utf8 good
Using LC_COLLATE =3D "sat_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sat_IN.utf8 good
Using LC_COLLATE =3D "sc_IT.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sc_IT.utf8 good
Using LC_COLLATE =3D "sd_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sd_IN.utf8 good
Using LC_COLLATE =3D "sd_IN.utf8@devanagari"
Using LC_CTYPE =3D "en_US.UTF-8"
sd_IN.utf8@devanagari good
Using LC_COLLATE =3D "se_NO.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
se_NO.utf8 good
Using LC_COLLATE =3D "shs_CA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
shs_CA.utf8 good
Using LC_COLLATE =3D "sid_ET.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sid_ET.utf8 good
Using LC_COLLATE =3D "si_LK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
si_LK.utf8 good
Using LC_COLLATE =3D "sk_SK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sk_SK.utf8 good
Using LC_COLLATE =3D "sl_SI.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sl_SI.utf8 good
Using LC_COLLATE =3D "so_DJ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
so_DJ.utf8 good
Using LC_COLLATE =3D "so_ET.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
so_ET.utf8 good
Using LC_COLLATE =3D "so_KE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
so_KE.utf8 good
Using LC_COLLATE =3D "so_SO.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
so_SO.utf8 good
Using LC_COLLATE =3D "sq_AL.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sq_AL.utf8 good
Using LC_COLLATE =3D "sq_MK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sq_MK.utf8 good
Using LC_COLLATE =3D "sr_ME.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sr_ME.utf8 good
Using LC_COLLATE =3D "sr_RS.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sr_RS.utf8 good
Using LC_COLLATE =3D "sr_RS.utf8@latin"
Using LC_CTYPE =3D "en_US.UTF-8"
sr_RS.utf8@latin good
Using LC_COLLATE =3D "ss_ZA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ss_ZA.utf8 good
Using LC_COLLATE =3D "st_ZA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
st_ZA.utf8 good
Using LC_COLLATE =3D "sv_FI.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sv_FI.utf8 good
Using LC_COLLATE =3D "sv_SE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sv_SE.utf8 good
Using LC_COLLATE =3D "sw_KE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sw_KE.utf8 good
Using LC_COLLATE =3D "sw_TZ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sw_TZ.utf8 good
Using LC_COLLATE =3D "szl_PL.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
szl_PL.utf8 good
Using LC_COLLATE =3D "ta_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ta_IN.utf8 good
Using LC_COLLATE =3D "ta_LK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ta_LK.utf8 good
Using LC_COLLATE =3D "te_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
te_IN.utf8 good
Using LC_COLLATE =3D "tg_TJ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
tg_TJ.utf8 good
Using LC_COLLATE =3D "the_NP.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
the_NP.utf8 good
Using LC_COLLATE =3D "th_TH.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
th_TH.utf8 good
Using LC_COLLATE =3D "ti_ER.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ti_ER.utf8 good
Using LC_COLLATE =3D "ti_ET.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ti_ET.utf8 good
Using LC_COLLATE =3D "tig_ER.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
tig_ER.utf8 good
Using LC_COLLATE =3D "tk_TM.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
tk_TM.utf8 good
Using LC_COLLATE =3D "tl_PH.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
tl_PH.utf8 good
Using LC_COLLATE =3D "tn_ZA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
tn_ZA.utf8 good
Using LC_COLLATE =3D "tr_CY.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
tr_CY.utf8 good
Using LC_COLLATE =3D "tr_TR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
tr_TR.utf8 good
Using LC_COLLATE =3D "ts_ZA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ts_ZA.utf8 good
Using LC_COLLATE =3D "tt_RU.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
tt_RU.utf8 good
Using LC_COLLATE =3D "tt_RU.utf8@iqtelif"
Using LC_CTYPE =3D "en_US.UTF-8"
tt_RU.utf8@iqtelif good
Using LC_COLLATE =3D "ug_CN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ug_CN.utf8 good
Using LC_COLLATE =3D "uk_UA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
uk_UA.utf8 good
Using LC_COLLATE =3D "unm_US.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
unm_US.utf8 good
Using LC_COLLATE =3D "ur_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ur_IN.utf8 good
Using LC_COLLATE =3D "ur_PK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ur_PK.utf8 good
Using LC_COLLATE =3D "uz_UZ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
uz_UZ.utf8 good
Using LC_COLLATE =3D "uz_UZ.utf8@cyrillic"
Using LC_CTYPE =3D "en_US.UTF-8"
uz_UZ.utf8@cyrillic good
Using LC_COLLATE =3D "ve_ZA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ve_ZA.utf8 good
Using LC_COLLATE =3D "vi_VN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
vi_VN.utf8 good
Using LC_COLLATE =3D "wa_BE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
wa_BE.utf8 good
Using LC_COLLATE =3D "wae_CH.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
wae_CH.utf8 good
Using LC_COLLATE =3D "wal_ET.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
wal_ET.utf8 good
Using LC_COLLATE =3D "wo_SN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
wo_SN.utf8 good
Using LC_COLLATE =3D "xh_ZA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
xh_ZA.utf8 good
Using LC_COLLATE =3D "yi_US.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
yi_US.utf8 good
Using LC_COLLATE =3D "yo_NG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
yo_NG.utf8 good
Using LC_COLLATE =3D "yue_HK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
yue_HK.utf8 good
Using LC_COLLATE =3D "zh_CN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
zh_CN.utf8 good
Using LC_COLLATE =3D "zh_HK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
zh_HK.utf8 good
Using LC_COLLATE =3D "zh_SG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
zh_SG.utf8 good
Using LC_COLLATE =3D "zh_TW.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
zh_TW.utf8 good
Using LC_COLLATE =3D "zu_ZA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
zu_ZA.utf8 good

Thanks!

Stephen

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Thomas Munro

Date:

23 March 2016, 01:43:02

On Wed, Mar 23, 2016 at 2:18 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, Mar 23, 2016 at 12:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> I was a little worried that it was too much to hope for that all libc
>>> vendors on earth would ship a strxfrm() implementation that was actually
>>> consistent with strcoll(), and here we are.
>>
>> Indeed.  To try to put some scope on the problem, I made an idiot little
>> program that just generates some random UTF8 strings and sees whether
>> strcoll and strxfrm sort them alike.  Attached are that program, a even
>> more idiot little shell script that runs it over all available UTF8
>> locales, and the results on my RHEL6 box.  While de_DE seems to be the
>> worst-broken locale, it's far from the only one.
>>
>> Please try this on as many platforms as you can get hold of ...
>
> Failed on Debian 8.2, but only for de_DE.utf8.  libc 2.19-18+deb8u1.  Attached.

Ran again after apt-get upgrade took me to 8.3 and libc6
2.19-18+deb8u2.  Results similar, de_DE.utf8 has inconsistencies but
nothing else.  So Debian stable is affected.  (Just noticed that
Stephen Frost's output from the same OS reports a broken lb_LU.utf8
too, but after conferring on IRC it seems that may be because I
installed "locales-all" (precompiled) which didn't give me lb_LU.utf8,
and he generated all locales which apparently does.)

--
Thomas Munro
http://www.enterprisedb.com

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Stephen Frost

Date:

23 March 2016, 01:45:48

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I was a little worried that it was too much to hope for that all libc
> > vendors on earth would ship a strxfrm() implementation that was actually
> > consistent with strcoll(), and here we are.
>=20
> Indeed.  To try to put some scope on the problem, I made an idiot little
> program that just generates some random UTF8 strings and sees whether
> strcoll and strxfrm sort them alike.  Attached are that program, a even
> more idiot little shell script that runs it over all available UTF8
> locales, and the results on my RHEL6 box.  While de_DE seems to be the
> worst-broken locale, it's far from the only one.
>=20
> Please try this on as many platforms as you can get hold of ...

Debian 7.9 results with all locales locally generated:

Using LC_COLLATE =3D "aa_DJ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
aa_DJ.utf8 good
Using LC_COLLATE =3D "aa_ER.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
aa_ER.utf8 good
Using LC_COLLATE =3D "aa_ER.utf8@saaho"
Using LC_CTYPE =3D "en_US.UTF-8"
aa_ER.utf8@saaho good
Using LC_COLLATE =3D "aa_ET.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
aa_ET.utf8 good
Using LC_COLLATE =3D "af_ZA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
af_ZA.utf8 good
Using LC_COLLATE =3D "am_ET.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
am_ET.utf8 good
Using LC_COLLATE =3D "an_ES.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
an_ES.utf8 good
Using LC_COLLATE =3D "ar_AE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_AE.utf8 good
Using LC_COLLATE =3D "ar_BH.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_BH.utf8 good
Using LC_COLLATE =3D "ar_DZ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_DZ.utf8 good
Using LC_COLLATE =3D "ar_EG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_EG.utf8 good
Using LC_COLLATE =3D "ar_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_IN.utf8 good
Using LC_COLLATE =3D "ar_IQ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_IQ.utf8 good
Using LC_COLLATE =3D "ar_JO.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_JO.utf8 good
Using LC_COLLATE =3D "ar_KW.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_KW.utf8 good
Using LC_COLLATE =3D "ar_LB.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_LB.utf8 good
Using LC_COLLATE =3D "ar_LY.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_LY.utf8 good
Using LC_COLLATE =3D "ar_MA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_MA.utf8 good
Using LC_COLLATE =3D "ar_OM.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_OM.utf8 good
Using LC_COLLATE =3D "ar_QA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_QA.utf8 good
Using LC_COLLATE =3D "ar_SA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_SA.utf8 good
Using LC_COLLATE =3D "ar_SD.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_SD.utf8 good
Using LC_COLLATE =3D "ar_SY.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_SY.utf8 good
Using LC_COLLATE =3D "ar_TN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_TN.utf8 good
Using LC_COLLATE =3D "ar_YE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ar_YE.utf8 good
Using LC_COLLATE =3D "as_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
as_IN.utf8 good
Using LC_COLLATE =3D "ast_ES.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ast_ES.utf8 good
Using LC_COLLATE =3D "az_AZ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
az_AZ.utf8 good
Using LC_COLLATE =3D "be_BY.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
be_BY.utf8 good
Using LC_COLLATE =3D "be_BY.utf8@latin"
Using LC_CTYPE =3D "en_US.UTF-8"
be_BY.utf8@latin good
Using LC_COLLATE =3D "bem_ZM.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
bem_ZM.utf8 good
Using LC_COLLATE =3D "ber_DZ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ber_DZ.utf8 good
Using LC_COLLATE =3D "ber_MA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ber_MA.utf8 good
Using LC_COLLATE =3D "bg_BG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
bg_BG.utf8 good
Using LC_COLLATE =3D "bn_BD.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
bn_BD.utf8 good
Using LC_COLLATE =3D "bn_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
bn_IN.utf8 good
Using LC_COLLATE =3D "bo_CN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
bo_CN.utf8 good
Using LC_COLLATE =3D "bo_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
bo_IN.utf8 good
Using LC_COLLATE =3D "br_FR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
br_FR.utf8 good
Using LC_COLLATE =3D "bs_BA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
bs_BA.utf8 good
Using LC_COLLATE =3D "byn_ER.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
byn_ER.utf8 good
Using LC_COLLATE =3D "ca_AD.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ca_AD.utf8 good
Using LC_COLLATE =3D "ca_ES.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ca_ES.utf8 good
Using LC_COLLATE =3D "ca_ES.utf8@valencia"
Using LC_CTYPE =3D "en_US.UTF-8"
ca_ES.utf8@valencia good
Using LC_COLLATE =3D "ca_FR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ca_FR.utf8 good
Using LC_COLLATE =3D "ca_IT.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ca_IT.utf8 good
Using LC_COLLATE =3D "crh_UA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
crh_UA.utf8 good
Using LC_COLLATE =3D "csb_PL.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
csb_PL.utf8 good
Using LC_COLLATE =3D "cs_CZ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
cs_CZ.utf8 good
Using LC_COLLATE =3D "C.UTF-8"
Using LC_CTYPE =3D "en_US.UTF-8"
C.UTF-8 good
Using LC_COLLATE =3D "cv_RU.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
cv_RU.utf8 good
Using LC_COLLATE =3D "cy_GB.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
cy_GB.utf8 good
Using LC_COLLATE =3D "da_DK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
da_DK.utf8 good
Using LC_COLLATE =3D "de_AT.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
de_AT.utf8 good
Using LC_COLLATE =3D "de_BE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
de_BE.utf8 good
Using LC_COLLATE =3D "de_CH.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
de_CH.utf8 good
Using LC_COLLATE =3D "de_DE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
inconsistency between strcoll (71) and strxfrm (70) orders
inconsistency between strcoll (70) and strxfrm (71) orders
inconsistency between strcoll (98) and strxfrm (97) orders
inconsistency between strcoll (97) and strxfrm (98) orders
inconsistency between strcoll (130) and strxfrm (128) orders
inconsistency between strcoll (131) and strxfrm (129) orders
inconsistency between strcoll (128) and strxfrm (130) orders
inconsistency between strcoll (129) and strxfrm (131) orders
inconsistency between strcoll (143) and strxfrm (142) orders
inconsistency between strcoll (142) and strxfrm (143) orders
inconsistency between strcoll (147) and strxfrm (146) orders
inconsistency between strcoll (146) and strxfrm (147) orders
inconsistency between strcoll (152) and strxfrm (150) orders
inconsistency between strcoll (150) and strxfrm (151) orders
inconsistency between strcoll (151) and strxfrm (152) orders
inconsistency between strcoll (155) and strxfrm (154) orders
inconsistency between strcoll (154) and strxfrm (155) orders
inconsistency between strcoll (154) and strxfrm (155) orders
inconsistency between strcoll (157) and strxfrm (156) orders
inconsistency between strcoll (156) and strxfrm (157) orders
inconsistency between strcoll (195) and strxfrm (194) orders
inconsistency between strcoll (194) and strxfrm (195) orders
inconsistency between strcoll (314) and strxfrm (313) orders
inconsistency between strcoll (315) and strxfrm (314) orders
inconsistency between strcoll (316) and strxfrm (315) orders
inconsistency between strcoll (313) and strxfrm (316) orders
inconsistency between strcoll (350) and strxfrm (349) orders
inconsistency between strcoll (351) and strxfrm (350) orders
inconsistency between strcoll (352) and strxfrm (351) orders
inconsistency between strcoll (353) and strxfrm (352) orders
inconsistency between strcoll (354) and strxfrm (353) orders
inconsistency between strcoll (349) and strxfrm (354) orders
inconsistency between strcoll (357) and strxfrm (356) orders
inconsistency between strcoll (356) and strxfrm (357) orders
inconsistency between strcoll (360) and strxfrm (359) orders
inconsistency between strcoll (359) and strxfrm (360) orders
inconsistency between strcoll (433) and strxfrm (432) orders
inconsistency between strcoll (432) and strxfrm (433) orders
inconsistency between strcoll (535) and strxfrm (534) orders
inconsistency between strcoll (534) and strxfrm (535) orders
inconsistency between strcoll (634) and strxfrm (632) orders
inconsistency between strcoll (635) and strxfrm (633) orders
inconsistency between strcoll (632) and strxfrm (634) orders
inconsistency between strcoll (633) and strxfrm (635) orders
inconsistency between strcoll (642) and strxfrm (641) orders
inconsistency between strcoll (641) and strxfrm (642) orders
inconsistency between strcoll (760) and strxfrm (758) orders
inconsistency between strcoll (758) and strxfrm (759) orders
inconsistency between strcoll (761) and strxfrm (760) orders
inconsistency between strcoll (759) and strxfrm (761) orders
inconsistency between strcoll (794) and strxfrm (793) orders
inconsistency between strcoll (795) and strxfrm (794) orders
inconsistency between strcoll (796) and strxfrm (795) orders
inconsistency between strcoll (797) and strxfrm (796) orders
inconsistency between strcoll (793) and strxfrm (797) orders
inconsistency between strcoll (799) and strxfrm (798) orders
inconsistency between strcoll (798) and strxfrm (799) orders
inconsistency between strcoll (803) and strxfrm (802) orders
inconsistency between strcoll (802) and strxfrm (803) orders
inconsistency between strcoll (880) and strxfrm (879) orders
inconsistency between strcoll (879) and strxfrm (880) orders
inconsistency between strcoll (879) and strxfrm (880) orders
inconsistency between strcoll (890) and strxfrm (889) orders
inconsistency between strcoll (889) and strxfrm (890) orders
de_DE.utf8 BAD
Using LC_COLLATE =3D "de_LI.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
de_LI.utf8 good
Using LC_COLLATE =3D "de_LU.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
de_LU.utf8 good
Using LC_COLLATE =3D "dv_MV.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
dv_MV.utf8 good
Using LC_COLLATE =3D "dz_BT.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
dz_BT.utf8 good
Using LC_COLLATE =3D "el_CY.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
el_CY.utf8 good
Using LC_COLLATE =3D "el_GR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
el_GR.utf8 good
Using LC_COLLATE =3D "en_AG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_AG.utf8 good
Using LC_COLLATE =3D "en_AU.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_AU.utf8 good
Using LC_COLLATE =3D "en_BW.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_BW.utf8 good
Using LC_COLLATE =3D "en_CA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_CA.utf8 good
Using LC_COLLATE =3D "en_DK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_DK.utf8 good
Using LC_COLLATE =3D "en_GB.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_GB.utf8 good
Using LC_COLLATE =3D "en_HK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_HK.utf8 good
Using LC_COLLATE =3D "en_IE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_IE.utf8 good
Using LC_COLLATE =3D "en_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_IN.utf8 good
Using LC_COLLATE =3D "en_NG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_NG.utf8 good
Using LC_COLLATE =3D "en_NZ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_NZ.utf8 good
Using LC_COLLATE =3D "en_PH.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_PH.utf8 good
Using LC_COLLATE =3D "en_SG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_SG.utf8 good
Using LC_COLLATE =3D "en_US.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_US.utf8 good
Using LC_COLLATE =3D "en_ZA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_ZA.utf8 good
Using LC_COLLATE =3D "en_ZM.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_ZM.utf8 good
Using LC_COLLATE =3D "en_ZW.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
en_ZW.utf8 good
Using LC_COLLATE =3D "eo.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
eo.utf8 good
Using LC_COLLATE =3D "es_AR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_AR.utf8 good
Using LC_COLLATE =3D "es_BO.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_BO.utf8 good
Using LC_COLLATE =3D "es_CL.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_CL.utf8 good
Using LC_COLLATE =3D "es_CO.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_CO.utf8 good
Using LC_COLLATE =3D "es_CR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_CR.utf8 good
Using LC_COLLATE =3D "es_DO.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_DO.utf8 good
Using LC_COLLATE =3D "es_EC.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_EC.utf8 good
Using LC_COLLATE =3D "es_ES.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_ES.utf8 good
Using LC_COLLATE =3D "es_GT.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_GT.utf8 good
Using LC_COLLATE =3D "es_HN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_HN.utf8 good
Using LC_COLLATE =3D "es_MX.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_MX.utf8 good
Using LC_COLLATE =3D "es_NI.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_NI.utf8 good
Using LC_COLLATE =3D "es_PA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_PA.utf8 good
Using LC_COLLATE =3D "es_PE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_PE.utf8 good
Using LC_COLLATE =3D "es_PR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_PR.utf8 good
Using LC_COLLATE =3D "es_PY.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_PY.utf8 good
Using LC_COLLATE =3D "es_SV.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_SV.utf8 good
Using LC_COLLATE =3D "es_US.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_US.utf8 good
Using LC_COLLATE =3D "es_UY.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_UY.utf8 good
Using LC_COLLATE =3D "es_VE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
es_VE.utf8 good
Using LC_COLLATE =3D "et_EE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
et_EE.utf8 good
Using LC_COLLATE =3D "eu_ES.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
eu_ES.utf8 good
Using LC_COLLATE =3D "eu_FR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
eu_FR.utf8 good
Using LC_COLLATE =3D "fa_IR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fa_IR.utf8 good
Using LC_COLLATE =3D "ff_SN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ff_SN.utf8 good
Using LC_COLLATE =3D "fi_FI.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fi_FI.utf8 good
Using LC_COLLATE =3D "fil_PH.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fil_PH.utf8 good
Using LC_COLLATE =3D "fo_FO.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fo_FO.utf8 good
Using LC_COLLATE =3D "fr_BE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fr_BE.utf8 good
Using LC_COLLATE =3D "fr_CA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fr_CA.utf8 good
Using LC_COLLATE =3D "fr_CH.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fr_CH.utf8 good
Using LC_COLLATE =3D "fr_FR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fr_FR.utf8 good
Using LC_COLLATE =3D "fr_LU.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fr_LU.utf8 good
Using LC_COLLATE =3D "fur_IT.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fur_IT.utf8 good
Using LC_COLLATE =3D "fy_DE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fy_DE.utf8 good
Using LC_COLLATE =3D "fy_NL.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
fy_NL.utf8 good
Using LC_COLLATE =3D "ga_IE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ga_IE.utf8 good
Using LC_COLLATE =3D "gd_GB.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
gd_GB.utf8 good
Using LC_COLLATE =3D "gez_ER.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
gez_ER.utf8 good
Using LC_COLLATE =3D "gez_ER.utf8@abegede"
Using LC_CTYPE =3D "en_US.UTF-8"
gez_ER.utf8@abegede good
Using LC_COLLATE =3D "gez_ET.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
gez_ET.utf8 good
Using LC_COLLATE =3D "gez_ET.utf8@abegede"
Using LC_CTYPE =3D "en_US.UTF-8"
gez_ET.utf8@abegede good
Using LC_COLLATE =3D "gl_ES.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
gl_ES.utf8 good
Using LC_COLLATE =3D "gu_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
gu_IN.utf8 good
Using LC_COLLATE =3D "gv_GB.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
gv_GB.utf8 good
Using LC_COLLATE =3D "ha_NG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ha_NG.utf8 good
Using LC_COLLATE =3D "he_IL.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
he_IL.utf8 good
Using LC_COLLATE =3D "hi_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
hi_IN.utf8 good
Using LC_COLLATE =3D "hne_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
hne_IN.utf8 good
Using LC_COLLATE =3D "hr_HR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
hr_HR.utf8 good
Using LC_COLLATE =3D "hsb_DE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
hsb_DE.utf8 good
Using LC_COLLATE =3D "ht_HT.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ht_HT.utf8 good
Using LC_COLLATE =3D "hu_HU.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
hu_HU.utf8 good
Using LC_COLLATE =3D "hy_AM.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
hy_AM.utf8 good
Using LC_COLLATE =3D "ia.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ia.utf8 good
Using LC_COLLATE =3D "id_ID.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
id_ID.utf8 good
Using LC_COLLATE =3D "ig_NG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ig_NG.utf8 good
Using LC_COLLATE =3D "ik_CA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ik_CA.utf8 good
Using LC_COLLATE =3D "is_IS.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
is_IS.utf8 good
Using LC_COLLATE =3D "it_CH.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
it_CH.utf8 good
Using LC_COLLATE =3D "it_IT.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
it_IT.utf8 good
Using LC_COLLATE =3D "iu_CA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
iu_CA.utf8 good
Using LC_COLLATE =3D "iw_IL.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
iw_IL.utf8 good
Using LC_COLLATE =3D "ja_JP.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ja_JP.utf8 good
Using LC_COLLATE =3D "ka_GE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ka_GE.utf8 good
Using LC_COLLATE =3D "kk_KZ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
kk_KZ.utf8 good
Using LC_COLLATE =3D "kl_GL.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
kl_GL.utf8 good
Using LC_COLLATE =3D "km_KH.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
km_KH.utf8 good
Using LC_COLLATE =3D "kn_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
kn_IN.utf8 good
Using LC_COLLATE =3D "kok_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
kok_IN.utf8 good
Using LC_COLLATE =3D "ko_KR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ko_KR.utf8 good
Using LC_COLLATE =3D "ks_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ks_IN.utf8 good
Using LC_COLLATE =3D "ks_IN.utf8@devanagari"
Using LC_CTYPE =3D "en_US.UTF-8"
ks_IN.utf8@devanagari good
Using LC_COLLATE =3D "ku_TR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ku_TR.utf8 good
Using LC_COLLATE =3D "kw_GB.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
kw_GB.utf8 good
Using LC_COLLATE =3D "ky_KG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ky_KG.utf8 good
Using LC_COLLATE =3D "lg_UG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
lg_UG.utf8 good
Using LC_COLLATE =3D "li_BE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
li_BE.utf8 good
Using LC_COLLATE =3D "li_NL.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
li_NL.utf8 good
Using LC_COLLATE =3D "lo_LA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
lo_LA.utf8 good
Using LC_COLLATE =3D "lt_LT.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
lt_LT.utf8 good
Using LC_COLLATE =3D "lv_LV.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
lv_LV.utf8 good
Using LC_COLLATE =3D "mai_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
mai_IN.utf8 good
Using LC_COLLATE =3D "mg_MG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
mg_MG.utf8 good
Using LC_COLLATE =3D "mi_NZ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
mi_NZ.utf8 good
Using LC_COLLATE =3D "mk_MK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
mk_MK.utf8 good
Using LC_COLLATE =3D "ml_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ml_IN.utf8 good
Using LC_COLLATE =3D "mn_MN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
mn_MN.utf8 good
Using LC_COLLATE =3D "mr_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
mr_IN.utf8 good
Using LC_COLLATE =3D "ms_MY.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ms_MY.utf8 good
Using LC_COLLATE =3D "mt_MT.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
mt_MT.utf8 good
Using LC_COLLATE =3D "my_MM.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
my_MM.utf8 good
Using LC_COLLATE =3D "nan_TW.utf8@latin"
Using LC_CTYPE =3D "en_US.UTF-8"
nan_TW.utf8@latin good
Using LC_COLLATE =3D "nb_NO.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
nb_NO.utf8 good
Using LC_COLLATE =3D "nds_DE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
nds_DE.utf8 good
Using LC_COLLATE =3D "nds_NL.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
nds_NL.utf8 good
Using LC_COLLATE =3D "ne_NP.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ne_NP.utf8 good
Using LC_COLLATE =3D "nl_AW.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
nl_AW.utf8 good
Using LC_COLLATE =3D "nl_BE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
nl_BE.utf8 good
Using LC_COLLATE =3D "nl_NL.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
nl_NL.utf8 good
Using LC_COLLATE =3D "nn_NO.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
nn_NO.utf8 good
Using LC_COLLATE =3D "nr_ZA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
nr_ZA.utf8 good
Using LC_COLLATE =3D "nso_ZA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
nso_ZA.utf8 good
Using LC_COLLATE =3D "oc_FR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
oc_FR.utf8 good
Using LC_COLLATE =3D "om_ET.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
om_ET.utf8 good
Using LC_COLLATE =3D "om_KE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
om_KE.utf8 good
Using LC_COLLATE =3D "or_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
or_IN.utf8 good
Using LC_COLLATE =3D "os_RU.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
inconsistency between strcoll (936) and strxfrm (935) orders
inconsistency between strcoll (935) and strxfrm (936) orders
os_RU.utf8 BAD
Using LC_COLLATE =3D "pa_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
pa_IN.utf8 good
Using LC_COLLATE =3D "pap_AN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
pap_AN.utf8 good
Using LC_COLLATE =3D "pa_PK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
pa_PK.utf8 good
Using LC_COLLATE =3D "pl_PL.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
pl_PL.utf8 good
Using LC_COLLATE =3D "ps_AF.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ps_AF.utf8 good
Using LC_COLLATE =3D "pt_BR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
pt_BR.utf8 good
Using LC_COLLATE =3D "pt_PT.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
pt_PT.utf8 good
Using LC_COLLATE =3D "ro_RO.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ro_RO.utf8 good
Using LC_COLLATE =3D "ru_RU.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ru_RU.utf8 good
Using LC_COLLATE =3D "ru_UA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ru_UA.utf8 good
Using LC_COLLATE =3D "rw_RW.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
rw_RW.utf8 good
Using LC_COLLATE =3D "sa_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sa_IN.utf8 good
Using LC_COLLATE =3D "sc_IT.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sc_IT.utf8 good
Using LC_COLLATE =3D "sd_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sd_IN.utf8 good
Using LC_COLLATE =3D "sd_IN.utf8@devanagari"
Using LC_CTYPE =3D "en_US.UTF-8"
sd_IN.utf8@devanagari good
Using LC_COLLATE =3D "se_NO.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
se_NO.utf8 good
Using LC_COLLATE =3D "shs_CA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
strxfrm() result for 18-length string exceeded 100 bytes
shs_CA.utf8 BAD
Using LC_COLLATE =3D "sid_ET.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sid_ET.utf8 good
Using LC_COLLATE =3D "si_LK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
si_LK.utf8 good
Using LC_COLLATE =3D "sk_SK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sk_SK.utf8 good
Using LC_COLLATE =3D "sl_SI.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sl_SI.utf8 good
Using LC_COLLATE =3D "so_DJ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
so_DJ.utf8 good
Using LC_COLLATE =3D "so_ET.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
so_ET.utf8 good
Using LC_COLLATE =3D "so_KE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
so_KE.utf8 good
Using LC_COLLATE =3D "so_SO.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
so_SO.utf8 good
Using LC_COLLATE =3D "sq_AL.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sq_AL.utf8 good
Using LC_COLLATE =3D "sq_MK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sq_MK.utf8 good
Using LC_COLLATE =3D "sr_ME.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sr_ME.utf8 good
Using LC_COLLATE =3D "sr_RS.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sr_RS.utf8 good
Using LC_COLLATE =3D "sr_RS.utf8@latin"
Using LC_CTYPE =3D "en_US.UTF-8"
sr_RS.utf8@latin good
Using LC_COLLATE =3D "ss_ZA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ss_ZA.utf8 good
Using LC_COLLATE =3D "st_ZA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
st_ZA.utf8 good
Using LC_COLLATE =3D "sv_FI.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sv_FI.utf8 good
Using LC_COLLATE =3D "sv_SE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sv_SE.utf8 good
Using LC_COLLATE =3D "sw_KE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sw_KE.utf8 good
Using LC_COLLATE =3D "sw_TZ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
sw_TZ.utf8 good
Using LC_COLLATE =3D "ta_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ta_IN.utf8 good
Using LC_COLLATE =3D "te_IN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
te_IN.utf8 good
Using LC_COLLATE =3D "tg_TJ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
tg_TJ.utf8 good
Using LC_COLLATE =3D "th_TH.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
th_TH.utf8 good
Using LC_COLLATE =3D "ti_ER.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ti_ER.utf8 good
Using LC_COLLATE =3D "ti_ET.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ti_ET.utf8 good
Using LC_COLLATE =3D "tig_ER.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
tig_ER.utf8 good
Using LC_COLLATE =3D "tk_TM.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
tk_TM.utf8 good
Using LC_COLLATE =3D "tl_PH.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
tl_PH.utf8 good
Using LC_COLLATE =3D "tn_ZA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
tn_ZA.utf8 good
Using LC_COLLATE =3D "tr_CY.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
tr_CY.utf8 good
Using LC_COLLATE =3D "tr_TR.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
tr_TR.utf8 good
Using LC_COLLATE =3D "ts_ZA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ts_ZA.utf8 good
Using LC_COLLATE =3D "tt_RU.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
tt_RU.utf8 good
Using LC_COLLATE =3D "tt_RU.utf8@iqtelif"
Using LC_CTYPE =3D "en_US.UTF-8"
tt_RU.utf8@iqtelif good
Using LC_COLLATE =3D "ug_CN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ug_CN.utf8 good
Using LC_COLLATE =3D "uk_UA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
uk_UA.utf8 good
Using LC_COLLATE =3D "ur_PK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ur_PK.utf8 good
Using LC_COLLATE =3D "uz_UZ.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
uz_UZ.utf8 good
Using LC_COLLATE =3D "uz_UZ.utf8@cyrillic"
Using LC_CTYPE =3D "en_US.UTF-8"
uz_UZ.utf8@cyrillic good
Using LC_COLLATE =3D "ve_ZA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
ve_ZA.utf8 good
Using LC_COLLATE =3D "vi_VN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
vi_VN.utf8 good
Using LC_COLLATE =3D "wa_BE.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
wa_BE.utf8 good
Using LC_COLLATE =3D "wo_SN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
wo_SN.utf8 good
Using LC_COLLATE =3D "xh_ZA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
xh_ZA.utf8 good
Using LC_COLLATE =3D "yi_US.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
yi_US.utf8 good
Using LC_COLLATE =3D "yo_NG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
yo_NG.utf8 good
Using LC_COLLATE =3D "zh_CN.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
zh_CN.utf8 good
Using LC_COLLATE =3D "zh_HK.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
zh_HK.utf8 good
Using LC_COLLATE =3D "zh_SG.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
zh_SG.utf8 good
Using LC_COLLATE =3D "zh_TW.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
zh_TW.utf8 good
Using LC_COLLATE =3D "zu_ZA.utf8"
Using LC_CTYPE =3D "en_US.UTF-8"
zu_ZA.utf8 good

Thanks!

Stephen

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Stephen Frost

Date:

23 March 2016, 01:50:01

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I was a little worried that it was too much to hope for that all libc
> > vendors on earth would ship a strxfrm() implementation that was actually
> > consistent with strcoll(), and here we are.
>=20
> Indeed.  To try to put some scope on the problem, I made an idiot little
> program that just generates some random UTF8 strings and sees whether
> strcoll and strxfrm sort them alike.  Attached are that program, a even
> more idiot little shell script that runs it over all available UTF8
> locales, and the results on my RHEL6 box.  While de_DE seems to be the
> worst-broken locale, it's far from the only one.
>=20
> Please try this on as many platforms as you can get hold of ...

=46rom IRC (not mine), "debian testing, glibc 2.22-3":

Using LC_COLLATE =3D "aa_DJ.utf8"
Using LC_CTYPE =3D "aa_DJ.utf8"
aa_DJ.utf8 good
Using LC_COLLATE =3D "aa_ER"
Using LC_CTYPE =3D "aa_ER"
aa_ER good
Using LC_COLLATE =3D "aa_ER@saaho"
Using LC_CTYPE =3D "aa_ER@saaho"
aa_ER@saaho good
Using LC_COLLATE =3D "aa_ET"
Using LC_CTYPE =3D "aa_ET"
aa_ET good
Using LC_COLLATE =3D "af_ZA.utf8"
Using LC_CTYPE =3D "af_ZA.utf8"
af_ZA.utf8 good
Using LC_COLLATE =3D "ak_GH"
Using LC_CTYPE =3D "ak_GH"
ak_GH good
Using LC_COLLATE =3D "am_ET"
Using LC_CTYPE =3D "am_ET"
am_ET good
Using LC_COLLATE =3D "an_ES.utf8"
Using LC_CTYPE =3D "an_ES.utf8"
an_ES.utf8 good
Using LC_COLLATE =3D "anp_IN"
Using LC_CTYPE =3D "anp_IN"
anp_IN good
Using LC_COLLATE =3D "ar_AE.utf8"
Using LC_CTYPE =3D "ar_AE.utf8"
ar_AE.utf8 good
Using LC_COLLATE =3D "ar_BH.utf8"
Using LC_CTYPE =3D "ar_BH.utf8"
ar_BH.utf8 good
Using LC_COLLATE =3D "ar_DZ.utf8"
Using LC_CTYPE =3D "ar_DZ.utf8"
ar_DZ.utf8 good
Using LC_COLLATE =3D "ar_EG.utf8"
Using LC_CTYPE =3D "ar_EG.utf8"
ar_EG.utf8 good
Using LC_COLLATE =3D "ar_IN"
Using LC_CTYPE =3D "ar_IN"
ar_IN good
Using LC_COLLATE =3D "ar_IQ.utf8"
Using LC_CTYPE =3D "ar_IQ.utf8"
ar_IQ.utf8 good
Using LC_COLLATE =3D "ar_JO.utf8"
Using LC_CTYPE =3D "ar_JO.utf8"
ar_JO.utf8 good
Using LC_COLLATE =3D "ar_KW.utf8"
Using LC_CTYPE =3D "ar_KW.utf8"
ar_KW.utf8 good
Using LC_COLLATE =3D "ar_LB.utf8"
Using LC_CTYPE =3D "ar_LB.utf8"
ar_LB.utf8 good
Using LC_COLLATE =3D "ar_LY.utf8"
Using LC_CTYPE =3D "ar_LY.utf8"
ar_LY.utf8 good
Using LC_COLLATE =3D "ar_MA.utf8"
Using LC_CTYPE =3D "ar_MA.utf8"
ar_MA.utf8 good
Using LC_COLLATE =3D "ar_OM.utf8"
Using LC_CTYPE =3D "ar_OM.utf8"
ar_OM.utf8 good
Using LC_COLLATE =3D "ar_QA.utf8"
Using LC_CTYPE =3D "ar_QA.utf8"
ar_QA.utf8 good
Using LC_COLLATE =3D "ar_SA.utf8"
Using LC_CTYPE =3D "ar_SA.utf8"
ar_SA.utf8 good
Using LC_COLLATE =3D "ar_SD.utf8"
Using LC_CTYPE =3D "ar_SD.utf8"
ar_SD.utf8 good
Using LC_COLLATE =3D "ar_SS"
Using LC_CTYPE =3D "ar_SS"
ar_SS good
Using LC_COLLATE =3D "ar_SY.utf8"
Using LC_CTYPE =3D "ar_SY.utf8"
ar_SY.utf8 good
Using LC_COLLATE =3D "ar_TN.utf8"
Using LC_CTYPE =3D "ar_TN.utf8"
ar_TN.utf8 good
Using LC_COLLATE =3D "ar_YE.utf8"
Using LC_CTYPE =3D "ar_YE.utf8"
ar_YE.utf8 good
Using LC_COLLATE =3D "as_IN"
Using LC_CTYPE =3D "as_IN"
as_IN good
Using LC_COLLATE =3D "ast_ES.utf8"
Using LC_CTYPE =3D "ast_ES.utf8"
ast_ES.utf8 good
Using LC_COLLATE =3D "ayc_PE"
Using LC_CTYPE =3D "ayc_PE"
ayc_PE good
Using LC_COLLATE =3D "az_AZ"
Using LC_CTYPE =3D "az_AZ"
az_AZ good
Using LC_COLLATE =3D "be_BY@latin"
Using LC_CTYPE =3D "be_BY@latin"
be_BY@latin good
Using LC_COLLATE =3D "be_BY.utf8"
Using LC_CTYPE =3D "be_BY.utf8"
be_BY.utf8 good
Using LC_COLLATE =3D "bem_ZM"
Using LC_CTYPE =3D "bem_ZM"
bem_ZM good
Using LC_COLLATE =3D "ber_DZ"
Using LC_CTYPE =3D "ber_DZ"
ber_DZ good
Using LC_COLLATE =3D "ber_MA"
Using LC_CTYPE =3D "ber_MA"
ber_MA good
Using LC_COLLATE =3D "bg_BG.utf8"
Using LC_CTYPE =3D "bg_BG.utf8"
bg_BG.utf8 good
Using LC_COLLATE =3D "bhb_IN.utf8"
Using LC_CTYPE =3D "bhb_IN.utf8"
bhb_IN.utf8 good
Using LC_COLLATE =3D "bho_IN"
Using LC_CTYPE =3D "bho_IN"
bho_IN good
Using LC_COLLATE =3D "bn_BD"
Using LC_CTYPE =3D "bn_BD"
bn_BD good
Using LC_COLLATE =3D "bn_IN"
Using LC_CTYPE =3D "bn_IN"
bn_IN good
Using LC_COLLATE =3D "bo_CN"
Using LC_CTYPE =3D "bo_CN"
bo_CN good
Using LC_COLLATE =3D "bo_IN"
Using LC_CTYPE =3D "bo_IN"
bo_IN good
Using LC_COLLATE =3D "br_FR.utf8"
Using LC_CTYPE =3D "br_FR.utf8"
br_FR.utf8 good
Using LC_COLLATE =3D "brx_IN"
Using LC_CTYPE =3D "brx_IN"
brx_IN good
Using LC_COLLATE =3D "bs_BA.utf8"
Using LC_CTYPE =3D "bs_BA.utf8"
bs_BA.utf8 good
Using LC_COLLATE =3D "byn_ER"
Using LC_CTYPE =3D "byn_ER"
byn_ER good
Using LC_COLLATE =3D "ca_AD.utf8"
Using LC_CTYPE =3D "ca_AD.utf8"
ca_AD.utf8 good
Using LC_COLLATE =3D "ca_ES.utf8"
Using LC_CTYPE =3D "ca_ES.utf8"
ca_ES.utf8 good
Using LC_COLLATE =3D "ca_ES.utf8@valencia"
Using LC_CTYPE =3D "ca_ES.utf8@valencia"
ca_ES.utf8@valencia good
Using LC_COLLATE =3D "ca_FR.utf8"
Using LC_CTYPE =3D "ca_FR.utf8"
ca_FR.utf8 good
Using LC_COLLATE =3D "ca_IT.utf8"
Using LC_CTYPE =3D "ca_IT.utf8"
ca_IT.utf8 good
Using LC_COLLATE =3D "ce_RU"
Using LC_CTYPE =3D "ce_RU"
ce_RU good
Using LC_COLLATE =3D "cmn_TW"
Using LC_CTYPE =3D "cmn_TW"
cmn_TW good
Using LC_COLLATE =3D "crh_UA"
Using LC_CTYPE =3D "crh_UA"
crh_UA good
Using LC_COLLATE =3D "csb_PL"
Using LC_CTYPE =3D "csb_PL"
csb_PL good
Using LC_COLLATE =3D "cs_CZ.utf8"
Using LC_CTYPE =3D "cs_CZ.utf8"
cs_CZ.utf8 good
Using LC_COLLATE =3D "C.UTF-8"
Using LC_CTYPE =3D "C.UTF-8"
C.UTF-8 good
Using LC_COLLATE =3D "cv_RU"
Using LC_CTYPE =3D "cv_RU"
cv_RU good
Using LC_COLLATE =3D "cy_GB.utf8"
Using LC_CTYPE =3D "cy_GB.utf8"
cy_GB.utf8 good
Using LC_COLLATE =3D "da_DK.utf8"
Using LC_CTYPE =3D "da_DK.utf8"
da_DK.utf8 good
Using LC_COLLATE =3D "de_AT.utf8"
Using LC_CTYPE =3D "de_AT.utf8"
de_AT.utf8 good
Using LC_COLLATE =3D "de_BE.utf8"
Using LC_CTYPE =3D "de_BE.utf8"
de_BE.utf8 good
Using LC_COLLATE =3D "de_CH.utf8"
Using LC_CTYPE =3D "de_CH.utf8"
de_CH.utf8 good
Using LC_COLLATE =3D "de_DE.utf8"
Using LC_CTYPE =3D "de_DE.utf8"
de_DE.utf8 good
Using LC_COLLATE =3D "de_LI.utf8"
Using LC_CTYPE =3D "de_LI.utf8"
de_LI.utf8 good
Using LC_COLLATE =3D "de_LU.utf8"
Using LC_CTYPE =3D "de_LU.utf8"
de_LU.utf8 good
Using LC_COLLATE =3D "doi_IN"
Using LC_CTYPE =3D "doi_IN"
doi_IN good
Using LC_COLLATE =3D "dv_MV"
Using LC_CTYPE =3D "dv_MV"
dv_MV good
Using LC_COLLATE =3D "dz_BT"
Using LC_CTYPE =3D "dz_BT"
dz_BT good
Using LC_COLLATE =3D "el_CY.utf8"
Using LC_CTYPE =3D "el_CY.utf8"
el_CY.utf8 good
Using LC_COLLATE =3D "el_GR.utf8"
Using LC_CTYPE =3D "el_GR.utf8"
el_GR.utf8 good
Using LC_COLLATE =3D "en_AG"
Using LC_CTYPE =3D "en_AG"
en_AG good
Using LC_COLLATE =3D "en_AU.utf8"
Using LC_CTYPE =3D "en_AU.utf8"
en_AU.utf8 good
Using LC_COLLATE =3D "en_BW.utf8"
Using LC_CTYPE =3D "en_BW.utf8"
en_BW.utf8 good
Using LC_COLLATE =3D "en_CA.utf8"
Using LC_CTYPE =3D "en_CA.utf8"
en_CA.utf8 good
Using LC_COLLATE =3D "en_DK.utf8"
Using LC_CTYPE =3D "en_DK.utf8"
en_DK.utf8 good
Using LC_COLLATE =3D "en_GB.utf8"
Using LC_CTYPE =3D "en_GB.utf8"
en_GB.utf8 good
Using LC_COLLATE =3D "en_HK.utf8"
Using LC_CTYPE =3D "en_HK.utf8"
en_HK.utf8 good
Using LC_COLLATE =3D "en_IE.utf8"
Using LC_CTYPE =3D "en_IE.utf8"
en_IE.utf8 good
Using LC_COLLATE =3D "en_IN"
Using LC_CTYPE =3D "en_IN"
en_IN good
Using LC_COLLATE =3D "en_NG"
Using LC_CTYPE =3D "en_NG"
en_NG good
Using LC_COLLATE =3D "en_NZ.utf8"
Using LC_CTYPE =3D "en_NZ.utf8"
en_NZ.utf8 good
Using LC_COLLATE =3D "en_PH.utf8"
Using LC_CTYPE =3D "en_PH.utf8"
en_PH.utf8 good
Using LC_COLLATE =3D "en_SG.utf8"
Using LC_CTYPE =3D "en_SG.utf8"
en_SG.utf8 good
Using LC_COLLATE =3D "en_US.utf8"
Using LC_CTYPE =3D "en_US.utf8"
en_US.utf8 good
Using LC_COLLATE =3D "en_ZA.utf8"
Using LC_CTYPE =3D "en_ZA.utf8"
en_ZA.utf8 good
Using LC_COLLATE =3D "en_ZM"
Using LC_CTYPE =3D "en_ZM"
en_ZM good
Using LC_COLLATE =3D "en_ZW.utf8"
Using LC_CTYPE =3D "en_ZW.utf8"
en_ZW.utf8 good
Using LC_COLLATE =3D "eo.utf8"
Using LC_CTYPE =3D "eo.utf8"
eo.utf8 good
Using LC_COLLATE =3D "es_AR.utf8"
Using LC_CTYPE =3D "es_AR.utf8"
es_AR.utf8 good
Using LC_COLLATE =3D "es_BO.utf8"
Using LC_CTYPE =3D "es_BO.utf8"
es_BO.utf8 good
Using LC_COLLATE =3D "es_CL.utf8"
Using LC_CTYPE =3D "es_CL.utf8"
es_CL.utf8 good
Using LC_COLLATE =3D "es_CO.utf8"
Using LC_CTYPE =3D "es_CO.utf8"
es_CO.utf8 good
Using LC_COLLATE =3D "es_CR.utf8"
Using LC_CTYPE =3D "es_CR.utf8"
es_CR.utf8 good
Using LC_COLLATE =3D "es_CU"
Using LC_CTYPE =3D "es_CU"
es_CU good
Using LC_COLLATE =3D "es_DO.utf8"
Using LC_CTYPE =3D "es_DO.utf8"
es_DO.utf8 good
Using LC_COLLATE =3D "es_EC.utf8"
Using LC_CTYPE =3D "es_EC.utf8"
es_EC.utf8 good
Using LC_COLLATE =3D "es_ES.utf8"
Using LC_CTYPE =3D "es_ES.utf8"
es_ES.utf8 good
Using LC_COLLATE =3D "es_GT.utf8"
Using LC_CTYPE =3D "es_GT.utf8"
es_GT.utf8 good
Using LC_COLLATE =3D "es_HN.utf8"
Using LC_CTYPE =3D "es_HN.utf8"
es_HN.utf8 good
Using LC_COLLATE =3D "es_MX.utf8"
Using LC_CTYPE =3D "es_MX.utf8"
es_MX.utf8 good
Using LC_COLLATE =3D "es_NI.utf8"
Using LC_CTYPE =3D "es_NI.utf8"
es_NI.utf8 good
Using LC_COLLATE =3D "es_PA.utf8"
Using LC_CTYPE =3D "es_PA.utf8"
es_PA.utf8 good
Using LC_COLLATE =3D "es_PE.utf8"
Using LC_CTYPE =3D "es_PE.utf8"
es_PE.utf8 good
Using LC_COLLATE =3D "es_PR.utf8"
Using LC_CTYPE =3D "es_PR.utf8"
es_PR.utf8 good
Using LC_COLLATE =3D "es_PY.utf8"
Using LC_CTYPE =3D "es_PY.utf8"
es_PY.utf8 good
Using LC_COLLATE =3D "es_SV.utf8"
Using LC_CTYPE =3D "es_SV.utf8"
es_SV.utf8 good
Using LC_COLLATE =3D "es_US.utf8"
Using LC_CTYPE =3D "es_US.utf8"
es_US.utf8 good
Using LC_COLLATE =3D "es_UY.utf8"
Using LC_CTYPE =3D "es_UY.utf8"
es_UY.utf8 good
Using LC_COLLATE =3D "es_VE.utf8"
Using LC_CTYPE =3D "es_VE.utf8"
es_VE.utf8 good
Using LC_COLLATE =3D "et_EE.utf8"
Using LC_CTYPE =3D "et_EE.utf8"
et_EE.utf8 good
Using LC_COLLATE =3D "eu_ES.utf8"
Using LC_CTYPE =3D "eu_ES.utf8"
eu_ES.utf8 good
Using LC_COLLATE =3D "eu_FR.utf8"
Using LC_CTYPE =3D "eu_FR.utf8"
eu_FR.utf8 good
Using LC_COLLATE =3D "fa_IR"
Using LC_CTYPE =3D "fa_IR"
fa_IR good
Using LC_COLLATE =3D "ff_SN"
Using LC_CTYPE =3D "ff_SN"
ff_SN good
Using LC_COLLATE =3D "fi_FI.utf8"
Using LC_CTYPE =3D "fi_FI.utf8"
fi_FI.utf8 good
Using LC_COLLATE =3D "fil_PH"
Using LC_CTYPE =3D "fil_PH"
fil_PH good
Using LC_COLLATE =3D "fo_FO.utf8"
Using LC_CTYPE =3D "fo_FO.utf8"
fo_FO.utf8 good
Using LC_COLLATE =3D "fr_BE.utf8"
Using LC_CTYPE =3D "fr_BE.utf8"
fr_BE.utf8 good
Using LC_COLLATE =3D "fr_CA.utf8"
Using LC_CTYPE =3D "fr_CA.utf8"
fr_CA.utf8 good
Using LC_COLLATE =3D "fr_CH.utf8"
Using LC_CTYPE =3D "fr_CH.utf8"
fr_CH.utf8 good
Using LC_COLLATE =3D "fr_FR.utf8"
Using LC_CTYPE =3D "fr_FR.utf8"
fr_FR.utf8 good
Using LC_COLLATE =3D "fr_LU.utf8"
Using LC_CTYPE =3D "fr_LU.utf8"
fr_LU.utf8 good
Using LC_COLLATE =3D "fur_IT"
Using LC_CTYPE =3D "fur_IT"
fur_IT good
Using LC_COLLATE =3D "fy_DE"
Using LC_CTYPE =3D "fy_DE"
fy_DE good
Using LC_COLLATE =3D "fy_NL"
Using LC_CTYPE =3D "fy_NL"
fy_NL good
Using LC_COLLATE =3D "ga_IE.utf8"
Using LC_CTYPE =3D "ga_IE.utf8"
ga_IE.utf8 good
Using LC_COLLATE =3D "gd_GB.utf8"
Using LC_CTYPE =3D "gd_GB.utf8"
gd_GB.utf8 good
Using LC_COLLATE =3D "gez_ER"
Using LC_CTYPE =3D "gez_ER"
gez_ER good
Using LC_COLLATE =3D "gez_ER@abegede"
Using LC_CTYPE =3D "gez_ER@abegede"
gez_ER@abegede good
Using LC_COLLATE =3D "gez_ET"
Using LC_CTYPE =3D "gez_ET"
gez_ET good
Using LC_COLLATE =3D "gez_ET@abegede"
Using LC_CTYPE =3D "gez_ET@abegede"
gez_ET@abegede good
Using LC_COLLATE =3D "gl_ES.utf8"
Using LC_CTYPE =3D "gl_ES.utf8"
gl_ES.utf8 good
Using LC_COLLATE =3D "gu_IN"
Using LC_CTYPE =3D "gu_IN"
gu_IN good
Using LC_COLLATE =3D "gv_GB.utf8"
Using LC_CTYPE =3D "gv_GB.utf8"
gv_GB.utf8 good
Using LC_COLLATE =3D "hak_TW"
Using LC_CTYPE =3D "hak_TW"
hak_TW good
Using LC_COLLATE =3D "ha_NG"
Using LC_CTYPE =3D "ha_NG"
ha_NG good
Using LC_COLLATE =3D "he_IL.utf8"
Using LC_CTYPE =3D "he_IL.utf8"
he_IL.utf8 good
Using LC_COLLATE =3D "hi_IN"
Using LC_CTYPE =3D "hi_IN"
hi_IN good
Using LC_COLLATE =3D "hne_IN"
Using LC_CTYPE =3D "hne_IN"
hne_IN good
Using LC_COLLATE =3D "hr_HR.utf8"
Using LC_CTYPE =3D "hr_HR.utf8"
hr_HR.utf8 good
Using LC_COLLATE =3D "hsb_DE.utf8"
Using LC_CTYPE =3D "hsb_DE.utf8"
hsb_DE.utf8 good
Using LC_COLLATE =3D "ht_HT"
Using LC_CTYPE =3D "ht_HT"
ht_HT good
Using LC_COLLATE =3D "hu_HU.utf8"
Using LC_CTYPE =3D "hu_HU.utf8"
hu_HU.utf8 good
Using LC_COLLATE =3D "hy_AM"
Using LC_CTYPE =3D "hy_AM"
hy_AM good
Using LC_COLLATE =3D "ia_FR"
Using LC_CTYPE =3D "ia_FR"
ia_FR good
Using LC_COLLATE =3D "id_ID.utf8"
Using LC_CTYPE =3D "id_ID.utf8"
id_ID.utf8 good
Using LC_COLLATE =3D "ig_NG"
Using LC_CTYPE =3D "ig_NG"
ig_NG good
Using LC_COLLATE =3D "ik_CA"
Using LC_CTYPE =3D "ik_CA"
ik_CA good
Using LC_COLLATE =3D "is_IS.utf8"
Using LC_CTYPE =3D "is_IS.utf8"
is_IS.utf8 good
Using LC_COLLATE =3D "it_CH.utf8"
Using LC_CTYPE =3D "it_CH.utf8"
it_CH.utf8 good
Using LC_COLLATE =3D "it_IT.utf8"
Using LC_CTYPE =3D "it_IT.utf8"
it_IT.utf8 good
Using LC_COLLATE =3D "iu_CA"
Using LC_CTYPE =3D "iu_CA"
iu_CA good
Using LC_COLLATE =3D "iw_IL.utf8"
Using LC_CTYPE =3D "iw_IL.utf8"
iw_IL.utf8 good
Using LC_COLLATE =3D "ja_JP.utf8"
Using LC_CTYPE =3D "ja_JP.utf8"
ja_JP.utf8 good
Using LC_COLLATE =3D "ka_GE.utf8"
Using LC_CTYPE =3D "ka_GE.utf8"
ka_GE.utf8 good
Using LC_COLLATE =3D "kk_KZ.utf8"
Using LC_CTYPE =3D "kk_KZ.utf8"
kk_KZ.utf8 good
Using LC_COLLATE =3D "kl_GL.utf8"
Using LC_CTYPE =3D "kl_GL.utf8"
kl_GL.utf8 good
Using LC_COLLATE =3D "km_KH"
Using LC_CTYPE =3D "km_KH"
km_KH good
Using LC_COLLATE =3D "kn_IN"
Using LC_CTYPE =3D "kn_IN"
kn_IN good
Using LC_COLLATE =3D "kok_IN"
Using LC_CTYPE =3D "kok_IN"
kok_IN good
Using LC_COLLATE =3D "ko_KR.utf8"
Using LC_CTYPE =3D "ko_KR.utf8"
ko_KR.utf8 good
Using LC_COLLATE =3D "ks_IN"
Using LC_CTYPE =3D "ks_IN"
ks_IN good
Using LC_COLLATE =3D "ks_IN@devanagari"
Using LC_CTYPE =3D "ks_IN@devanagari"
ks_IN@devanagari good
Using LC_COLLATE =3D "ku_TR.utf8"
Using LC_CTYPE =3D "ku_TR.utf8"
ku_TR.utf8 good
Using LC_COLLATE =3D "kw_GB.utf8"
Using LC_CTYPE =3D "kw_GB.utf8"
kw_GB.utf8 good
Using LC_COLLATE =3D "ky_KG"
Using LC_CTYPE =3D "ky_KG"
ky_KG good
Using LC_COLLATE =3D "lb_LU"
Using LC_CTYPE =3D "lb_LU"
lb_LU good
Using LC_COLLATE =3D "lg_UG.utf8"
Using LC_CTYPE =3D "lg_UG.utf8"
lg_UG.utf8 good
Using LC_COLLATE =3D "li_BE"
Using LC_CTYPE =3D "li_BE"
li_BE good
Using LC_COLLATE =3D "lij_IT"
Using LC_CTYPE =3D "lij_IT"
lij_IT good
Using LC_COLLATE =3D "li_NL"
Using LC_CTYPE =3D "li_NL"
li_NL good
Using LC_COLLATE =3D "lo_LA"
Using LC_CTYPE =3D "lo_LA"
lo_LA good
Using LC_COLLATE =3D "lt_LT.utf8"
Using LC_CTYPE =3D "lt_LT.utf8"
lt_LT.utf8 good
Using LC_COLLATE =3D "lv_LV.utf8"
Using LC_CTYPE =3D "lv_LV.utf8"
lv_LV.utf8 good
Using LC_COLLATE =3D "lzh_TW"
Using LC_CTYPE =3D "lzh_TW"
lzh_TW good
Using LC_COLLATE =3D "mag_IN"
Using LC_CTYPE =3D "mag_IN"
mag_IN good
Using LC_COLLATE =3D "mai_IN"
Using LC_CTYPE =3D "mai_IN"
mai_IN good
Using LC_COLLATE =3D "mg_MG.utf8"
Using LC_CTYPE =3D "mg_MG.utf8"
mg_MG.utf8 good
Using LC_COLLATE =3D "mhr_RU"
Using LC_CTYPE =3D "mhr_RU"
mhr_RU good
Using LC_COLLATE =3D "mi_NZ.utf8"
Using LC_CTYPE =3D "mi_NZ.utf8"
mi_NZ.utf8 good
Using LC_COLLATE =3D "mk_MK.utf8"
Using LC_CTYPE =3D "mk_MK.utf8"
mk_MK.utf8 good
Using LC_COLLATE =3D "ml_IN"
Using LC_CTYPE =3D "ml_IN"
ml_IN good
Using LC_COLLATE =3D "mni_IN"
Using LC_CTYPE =3D "mni_IN"
mni_IN good
Using LC_COLLATE =3D "mn_MN"
Using LC_CTYPE =3D "mn_MN"
mn_MN good
Using LC_COLLATE =3D "mr_IN"
Using LC_CTYPE =3D "mr_IN"
mr_IN good
Using LC_COLLATE =3D "ms_MY.utf8"
Using LC_CTYPE =3D "ms_MY.utf8"
ms_MY.utf8 good
Using LC_COLLATE =3D "mt_MT.utf8"
Using LC_CTYPE =3D "mt_MT.utf8"
mt_MT.utf8 good
Using LC_COLLATE =3D "my_MM"
Using LC_CTYPE =3D "my_MM"
my_MM good
Using LC_COLLATE =3D "nan_TW"
Using LC_CTYPE =3D "nan_TW"
nan_TW good
Using LC_COLLATE =3D "nan_TW@latin"
Using LC_CTYPE =3D "nan_TW@latin"
nan_TW@latin good
Using LC_COLLATE =3D "nb_NO.utf8"
Using LC_CTYPE =3D "nb_NO.utf8"
nb_NO.utf8 good
Using LC_COLLATE =3D "nds_DE"
Using LC_CTYPE =3D "nds_DE"
nds_DE good
Using LC_COLLATE =3D "nds_NL"
Using LC_CTYPE =3D "nds_NL"
nds_NL good
Using LC_COLLATE =3D "ne_NP"
Using LC_CTYPE =3D "ne_NP"
ne_NP good
Using LC_COLLATE =3D "nhn_MX"
Using LC_CTYPE =3D "nhn_MX"
nhn_MX good
Using LC_COLLATE =3D "niu_NU"
Using LC_CTYPE =3D "niu_NU"
niu_NU good
Using LC_COLLATE =3D "niu_NZ"
Using LC_CTYPE =3D "niu_NZ"
niu_NZ good
Using LC_COLLATE =3D "nl_AW"
Using LC_CTYPE =3D "nl_AW"
nl_AW good
Using LC_COLLATE =3D "nl_BE.utf8"
Using LC_CTYPE =3D "nl_BE.utf8"
nl_BE.utf8 good
Using LC_COLLATE =3D "nl_NL.utf8"
Using LC_CTYPE =3D "nl_NL.utf8"
nl_NL.utf8 good
Using LC_COLLATE =3D "nn_NO.utf8"
Using LC_CTYPE =3D "nn_NO.utf8"
nn_NO.utf8 good
Using LC_COLLATE =3D "nr_ZA"
Using LC_CTYPE =3D "nr_ZA"
nr_ZA good
Using LC_COLLATE =3D "nso_ZA"
Using LC_CTYPE =3D "nso_ZA"
nso_ZA good
Using LC_COLLATE =3D "oc_FR.utf8"
Using LC_CTYPE =3D "oc_FR.utf8"
oc_FR.utf8 good
Using LC_COLLATE =3D "om_ET"
Using LC_CTYPE =3D "om_ET"
om_ET good
Using LC_COLLATE =3D "om_KE.utf8"
Using LC_CTYPE =3D "om_KE.utf8"
om_KE.utf8 good
Using LC_COLLATE =3D "or_IN"
Using LC_CTYPE =3D "or_IN"
or_IN good
Using LC_COLLATE =3D "os_RU"
Using LC_CTYPE =3D "os_RU"
os_RU good
Using LC_COLLATE =3D "pa_IN"
Using LC_CTYPE =3D "pa_IN"
pa_IN good
Using LC_COLLATE =3D "pap_AN"
Using LC_CTYPE =3D "pap_AN"
pap_AN good
Using LC_COLLATE =3D "pap_AW"
Using LC_CTYPE =3D "pap_AW"
pap_AW good
Using LC_COLLATE =3D "pap_CW"
Using LC_CTYPE =3D "pap_CW"
pap_CW good
Using LC_COLLATE =3D "pa_PK"
Using LC_CTYPE =3D "pa_PK"
pa_PK good
Using LC_COLLATE =3D "pl_PL.utf8"
Using LC_CTYPE =3D "pl_PL.utf8"
pl_PL.utf8 good
Using LC_COLLATE =3D "ps_AF"
Using LC_CTYPE =3D "ps_AF"
ps_AF good
Using LC_COLLATE =3D "pt_BR.utf8"
Using LC_CTYPE =3D "pt_BR.utf8"
pt_BR.utf8 good
Using LC_COLLATE =3D "pt_PT.utf8"
Using LC_CTYPE =3D "pt_PT.utf8"
pt_PT.utf8 good
Using LC_COLLATE =3D "quz_PE"
Using LC_CTYPE =3D "quz_PE"
quz_PE good
Using LC_COLLATE =3D "raj_IN"
Using LC_CTYPE =3D "raj_IN"
raj_IN good
Using LC_COLLATE =3D "ro_RO.utf8"
Using LC_CTYPE =3D "ro_RO.utf8"
ro_RO.utf8 good
Using LC_COLLATE =3D "ru_RU.utf8"
Using LC_CTYPE =3D "ru_RU.utf8"
ru_RU.utf8 good
Using LC_COLLATE =3D "ru_UA.utf8"
Using LC_CTYPE =3D "ru_UA.utf8"
ru_UA.utf8 good
Using LC_COLLATE =3D "rw_RW"
Using LC_CTYPE =3D "rw_RW"
rw_RW good
Using LC_COLLATE =3D "sa_IN"
Using LC_CTYPE =3D "sa_IN"
sa_IN good
Using LC_COLLATE =3D "sat_IN"
Using LC_CTYPE =3D "sat_IN"
sat_IN good
Using LC_COLLATE =3D "sc_IT"
Using LC_CTYPE =3D "sc_IT"
sc_IT good
Using LC_COLLATE =3D "sd_IN"
Using LC_CTYPE =3D "sd_IN"
sd_IN good
Using LC_COLLATE =3D "sd_IN@devanagari"
Using LC_CTYPE =3D "sd_IN@devanagari"
sd_IN@devanagari good
Using LC_COLLATE =3D "se_NO"
Using LC_CTYPE =3D "se_NO"
se_NO good
Using LC_COLLATE =3D "shs_CA"
Using LC_CTYPE =3D "shs_CA"
shs_CA good
Using LC_COLLATE =3D "sid_ET"
Using LC_CTYPE =3D "sid_ET"
sid_ET good
Using LC_COLLATE =3D "si_LK"
Using LC_CTYPE =3D "si_LK"
si_LK good
Using LC_COLLATE =3D "sk_SK.utf8"
Using LC_CTYPE =3D "sk_SK.utf8"
sk_SK.utf8 good
Using LC_COLLATE =3D "sl_SI.utf8"
Using LC_CTYPE =3D "sl_SI.utf8"
sl_SI.utf8 good
Using LC_COLLATE =3D "so_DJ.utf8"
Using LC_CTYPE =3D "so_DJ.utf8"
so_DJ.utf8 good
Using LC_COLLATE =3D "so_ET"
Using LC_CTYPE =3D "so_ET"
so_ET good
Using LC_COLLATE =3D "so_KE.utf8"
Using LC_CTYPE =3D "so_KE.utf8"
so_KE.utf8 good
Using LC_COLLATE =3D "so_SO.utf8"
Using LC_CTYPE =3D "so_SO.utf8"
so_SO.utf8 good
Using LC_COLLATE =3D "sq_AL.utf8"
Using LC_CTYPE =3D "sq_AL.utf8"
sq_AL.utf8 good
Using LC_COLLATE =3D "sq_MK"
Using LC_CTYPE =3D "sq_MK"
sq_MK good
Using LC_COLLATE =3D "sr_ME"
Using LC_CTYPE =3D "sr_ME"
sr_ME good
Using LC_COLLATE =3D "sr_RS"
Using LC_CTYPE =3D "sr_RS"
sr_RS good
Using LC_COLLATE =3D "sr_RS@latin"
Using LC_CTYPE =3D "sr_RS@latin"
sr_RS@latin good
Using LC_COLLATE =3D "ss_ZA"
Using LC_CTYPE =3D "ss_ZA"
ss_ZA good
Using LC_COLLATE =3D "st_ZA.utf8"
Using LC_CTYPE =3D "st_ZA.utf8"
st_ZA.utf8 good
Using LC_COLLATE =3D "sv_FI.utf8"
Using LC_CTYPE =3D "sv_FI.utf8"
sv_FI.utf8 good
Using LC_COLLATE =3D "sv_SE.utf8"
Using LC_CTYPE =3D "sv_SE.utf8"
sv_SE.utf8 good
Using LC_COLLATE =3D "sw_KE"
Using LC_CTYPE =3D "sw_KE"
sw_KE good
Using LC_COLLATE =3D "sw_TZ"
Using LC_CTYPE =3D "sw_TZ"
sw_TZ good
Using LC_COLLATE =3D "szl_PL"
Using LC_CTYPE =3D "szl_PL"
szl_PL good
Using LC_COLLATE =3D "ta_IN"
Using LC_CTYPE =3D "ta_IN"
ta_IN good
Using LC_COLLATE =3D "ta_LK"
Using LC_CTYPE =3D "ta_LK"
ta_LK good
Using LC_COLLATE =3D "tcy_IN.utf8"
Using LC_CTYPE =3D "tcy_IN.utf8"
tcy_IN.utf8 good
Using LC_COLLATE =3D "te_IN"
Using LC_CTYPE =3D "te_IN"
te_IN good
Using LC_COLLATE =3D "tg_TJ.utf8"
Using LC_CTYPE =3D "tg_TJ.utf8"
tg_TJ.utf8 good
Using LC_COLLATE =3D "the_NP"
Using LC_CTYPE =3D "the_NP"
the_NP good
Using LC_COLLATE =3D "th_TH.utf8"
Using LC_CTYPE =3D "th_TH.utf8"
th_TH.utf8 good
Using LC_COLLATE =3D "ti_ER"
Using LC_CTYPE =3D "ti_ER"
ti_ER good
Using LC_COLLATE =3D "ti_ET"
Using LC_CTYPE =3D "ti_ET"
ti_ET good
Using LC_COLLATE =3D "tig_ER"
Using LC_CTYPE =3D "tig_ER"
tig_ER good
Using LC_COLLATE =3D "tk_TM"
Using LC_CTYPE =3D "tk_TM"
tk_TM good
Using LC_COLLATE =3D "tl_PH.utf8"
Using LC_CTYPE =3D "tl_PH.utf8"
tl_PH.utf8 good
Using LC_COLLATE =3D "tn_ZA"
Using LC_CTYPE =3D "tn_ZA"
tn_ZA good
Using LC_COLLATE =3D "tr_CY.utf8"
Using LC_CTYPE =3D "tr_CY.utf8"
tr_CY.utf8 good
Using LC_COLLATE =3D "tr_TR.utf8"
Using LC_CTYPE =3D "tr_TR.utf8"
tr_TR.utf8 good
Using LC_COLLATE =3D "ts_ZA"
Using LC_CTYPE =3D "ts_ZA"
ts_ZA good
Using LC_COLLATE =3D "tt_RU"
Using LC_CTYPE =3D "tt_RU"
tt_RU good
Using LC_COLLATE =3D "tt_RU@iqtelif"
Using LC_CTYPE =3D "tt_RU@iqtelif"
tt_RU@iqtelif good
Using LC_COLLATE =3D "ug_CN"
Using LC_CTYPE =3D "ug_CN"
ug_CN good
Using LC_COLLATE =3D "uk_UA.utf8"
Using LC_CTYPE =3D "uk_UA.utf8"
uk_UA.utf8 good
Using LC_COLLATE =3D "unm_US"
Using LC_CTYPE =3D "unm_US"
unm_US good
Using LC_COLLATE =3D "ur_IN"
Using LC_CTYPE =3D "ur_IN"
ur_IN good
Using LC_COLLATE =3D "ur_PK"
Using LC_CTYPE =3D "ur_PK"
ur_PK good
Using LC_COLLATE =3D "uz_UZ@cyrillic"
Using LC_CTYPE =3D "uz_UZ@cyrillic"
uz_UZ@cyrillic good
Using LC_COLLATE =3D "uz_UZ.utf8"
Using LC_CTYPE =3D "uz_UZ.utf8"
uz_UZ.utf8 good
Using LC_COLLATE =3D "ve_ZA"
Using LC_CTYPE =3D "ve_ZA"
ve_ZA good
Using LC_COLLATE =3D "vi_VN"
Using LC_CTYPE =3D "vi_VN"
vi_VN good
Using LC_COLLATE =3D "wa_BE.utf8"
Using LC_CTYPE =3D "wa_BE.utf8"
wa_BE.utf8 good
Using LC_COLLATE =3D "wae_CH"
Using LC_CTYPE =3D "wae_CH"
wae_CH good
Using LC_COLLATE =3D "wal_ET"
Using LC_CTYPE =3D "wal_ET"
wal_ET good
Using LC_COLLATE =3D "wo_SN"
Using LC_CTYPE =3D "wo_SN"
wo_SN good
Using LC_COLLATE =3D "xh_ZA.utf8"
Using LC_CTYPE =3D "xh_ZA.utf8"
xh_ZA.utf8 good
Using LC_COLLATE =3D "yi_US.utf8"
Using LC_CTYPE =3D "yi_US.utf8"
yi_US.utf8 good
Using LC_COLLATE =3D "yo_NG"
Using LC_CTYPE =3D "yo_NG"
yo_NG good
Using LC_COLLATE =3D "yue_HK"
Using LC_CTYPE =3D "yue_HK"
yue_HK good
Using LC_COLLATE =3D "zh_CN.utf8"
Using LC_CTYPE =3D "zh_CN.utf8"
zh_CN.utf8 good
Using LC_COLLATE =3D "zh_HK.utf8"
Using LC_CTYPE =3D "zh_HK.utf8"
zh_HK.utf8 good
Using LC_COLLATE =3D "zh_SG.utf8"
Using LC_CTYPE =3D "zh_SG.utf8"
zh_SG.utf8 good
Using LC_COLLATE =3D "zh_TW.utf8"
Using LC_CTYPE =3D "zh_TW.utf8"
zh_TW.utf8 good
Using LC_COLLATE =3D "zu_ZA.utf8"
Using LC_CTYPE =3D "zu_ZA.utf8"
zu_ZA.utf8 good

Thanks!

Stephen

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

23 March 2016, 02:33:52

On Tue, Mar 22, 2016 at 3:06 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Well, if we implement a compatibility GUC that shuts off our
> dependency on strxfrm(), people can go back to having 9.5 be no more
> broken than 9.4 was.  I vote we do that and go home.

I don't have a problem with that idea, but I fear "no more broken than
9.4 was" might be a very low bar for certain systems and collations.
Abbreviated key may have simply unmasked the problem in some cases.

Consider:

[vagrant@localhost ~]$ LC_COLLATE=en_us sort strings.txt <-- correct
x xx
x xx"
xxx
xxx"
[vagrant@localhost ~]$ LC_COLLATE=de_DE sort strings.txt <-- wrong
xxx
xxx"
x xx
x xx"
[vagrant@localhost ~]$ ./strxfrm-binary de_DE.UTF-8 'xxx' 'x xx'
"xxx" -> 2323230108080801020202 (11 bytes)
"x xx" -> 2323230108080801020202010235 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: 6

My concern was not merely "academic" (i.e. it was not limited in scope
to things that don't make B-Tree indexes corrupt). Pretty sure that we
need to start thinking of this as a problem with strcoll() that
strxfrm() does not have for more fundamental reasons, because
strcoll() says that the first string in the de_DE sorted list is
*greater* than the third string. That's wrong, and not just because
strxfrm() gives an intuitively correct answer -- it's wrong
specifically because the transitive law has been broken.

--
Peter Geoghegan

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Tom Lane

Date:

23 March 2016, 02:41:50

Peter Geoghegan <pg@heroku.com> writes:
> My concern was not merely "academic" (i.e. it was not limited in scope
> to things that don't make B-Tree indexes corrupt). Pretty sure that we
> need to start thinking of this as a problem with strcoll() that
> strxfrm() does not have for more fundamental reasons, because
> strcoll() says that the first string in the de_DE sorted list is
> *greater* than the third string.

[ squint... ]  I was looking specifically for that sort of misbehavior
in my test program, and I haven't seen it.

            regards, tom lane

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Noah Misch

Date:

23 March 2016, 02:45:07

On Tue, Mar 22, 2016 at 07:19:44PM -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I was a little worried that it was too much to hope for that all libc
> > vendors on earth would ship a strxfrm() implementation that was actually
> > consistent with strcoll(), and here we are.
>
> Indeed.  To try to put some scope on the problem, I made an idiot little
> program that just generates some random UTF8 strings and sees whether
> strcoll and strxfrm sort them alike.  Attached are that program, a even
> more idiot little shell script that runs it over all available UTF8
> locales, and the results on my RHEL6 box.  While de_DE seems to be the
> worst-broken locale, it's far from the only one.
>
> Please try this on as many platforms as you can get hold of ...

I, too, found MAXXFRMLEN insufficient; I raised it fourfold.  Cygwin
2.2.1(0.289/5/3) caught fire; 10% of locales passed.  (varstr_sortsupport()
already blacklists the UTF8/native Windows case.)  The test passed on Solaris
10, Solaris 11, HP-UX B.11.31, OpenBSD 5.0, NetBSD 5.1.2, and FreeBSD 9.0.
See attached tryalllocales.sh outputs.  I did not test AIX, because the AIX
machines I use have no UTF8 locales installed.

Attachment

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Bernd Helmle

Date:

23 March 2016, 10:32:17


--On 22. März 2016 19:19:44 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Please try this on as many platforms as you can get hold of ...

Since i have to work on SuSE/SLES platforms atm some results from them
(openLeap/SLES12 are identical, but that isn't a surprise since SLES12 is
based on openLeap42.1):

SLES12:

grep BAD results_sles12.txt
ca_AD.utf8 BAD
ca_ES.utf8 BAD
ca_FR.utf8 BAD
ca_IT.utf8 BAD
da_DK.utf8 BAD
de_DE.utf8 BAD
en_BE.utf8 BAD
en_CA.utf8 BAD
es_EC.utf8 BAD
es_US.utf8 BAD
fi_FI.utf8 BAD
fo_FO.utf8 BAD
fr_CA.utf8 BAD
hu_HU.utf8 BAD
kl_GL.utf8 BAD
ku_TR.utf8 BAD
nb_NO.utf8 BAD
nn_NO.utf8 BAD
no_NO.utf8 BAD
ro_RO.utf8 BAD
sh_YU.utf8 BAD
sq_AL.utf8 BAD
sv_FI.utf8 BAD
sv_SE.utf8 BAD

SLES11 SP4:

grep BAD results_sles11sp4.txt
az_AZ.utf8 BAD
ca_AD.utf8 BAD
ca_ES.utf8 BAD
ca_FR.utf8 BAD
ca_IT.utf8 BAD
da_DK.utf8 BAD
de_DE.utf8 BAD
en_BE.utf8 BAD
en_CA.utf8 BAD
es_EC.utf8 BAD
es_US.utf8 BAD
fi_FI.utf8 BAD
fo_FO.utf8 BAD
fr_CA.utf8 BAD
hu_HU.utf8 BAD
kl_GL.utf8 BAD
ku_TR.utf8 BAD
nb_NO.utf8 BAD
nn_NO.utf8 BAD
no_NO.utf8 BAD
ro_RO.utf8 BAD
se_NO.utf8 BAD
sh_YU.utf8 BAD
sq_AL.utf8 BAD
sv_FI.utf8 BAD
sv_SE.utf8 BAD
tt_RU.utf8 BAD
tt_RU@iqtelif.UTF-8 BAD

openSuSE/openLeap 42.1:

grep BAD results_openleap421.txt
ca_AD.utf8 BAD
ca_ES.utf8 BAD
ca_FR.utf8 BAD
ca_IT.utf8 BAD
da_DK.utf8 BAD
de_DE.utf8 BAD
en_BE.utf8 BAD
en_CA.utf8 BAD
es_EC.utf8 BAD
es_US.utf8 BAD
fi_FI.utf8 BAD
fo_FO.utf8 BAD
fr_CA.utf8 BAD
hu_HU.utf8 BAD
kl_GL.utf8 BAD
ku_TR.utf8 BAD
nb_NO.utf8 BAD
nn_NO.utf8 BAD
no_NO.utf8 BAD
ro_RO.utf8 BAD
sh_YU.utf8 BAD
sq_AL.utf8 BAD
sv_FI.utf8 BAD
sv_SE.utf8 BAD

--
Thanks

    Bernd

Attachment

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Robert Haas

Date:

23 March 2016, 14:47:13

On Tue, Mar 22, 2016 at 10:44 PM, Noah Misch <noah@leadboat.com> wrote:
> On Tue, Mar 22, 2016 at 07:19:44PM -0400, Tom Lane wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>> > I was a little worried that it was too much to hope for that all libc
>> > vendors on earth would ship a strxfrm() implementation that was actually
>> > consistent with strcoll(), and here we are.
>>
>> Indeed.  To try to put some scope on the problem, I made an idiot little
>> program that just generates some random UTF8 strings and sees whether
>> strcoll and strxfrm sort them alike.  Attached are that program, a even
>> more idiot little shell script that runs it over all available UTF8
>> locales, and the results on my RHEL6 box.  While de_DE seems to be the
>> worst-broken locale, it's far from the only one.
>>
>> Please try this on as many platforms as you can get hold of ...
>
> I, too, found MAXXFRMLEN insufficient; I raised it fourfold.  Cygwin
> 2.2.1(0.289/5/3) caught fire; 10% of locales passed.  (varstr_sortsupport()
> already blacklists the UTF8/native Windows case.)  The test passed on Solaris
> 10, Solaris 11, HP-UX B.11.31, OpenBSD 5.0, NetBSD 5.1.2, and FreeBSD 9.0.
> See attached tryalllocales.sh outputs.  I did not test AIX, because the AIX
> machines I use have no UTF8 locales installed.

Wow, thanks for the extensive testing.  This suggests that, apart from
Cygwin which apparently doesn't matter right now, the only thing that
is busted is glibc.  I believe we have yet to see a single locale that
fails anywhere else (apart from Cygwin).  Good thing so few of our
users run glibc!

Ha ha, little joke there.

So, options:

1. We could make it the user's problem to figure out whether they've
got a buggy glibc and add a GUC to shut this off, as previously
suggested.

2. We could add a blacklist (either hardcoded or a GUC) shutting this
off for locales known to be buggy anywhere.

3. We could write some test code that runs at startup time which
reliably detects all of the broken locales we've so far uncovered and
disables this if so.

4. We could shut this off for all Linux users in all locales and tell
everybody to REINDEX.  That would be pretty sad, though.

Thoughts?  Other ideas?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Tom Lane

Date:

23 March 2016, 15:44:08

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Mar 22, 2016 at 10:44 PM, Noah Misch <noah@leadboat.com> wrote:
>> I, too, found MAXXFRMLEN insufficient; I raised it fourfold.  Cygwin
>> 2.2.1(0.289/5/3) caught fire; 10% of locales passed.  (varstr_sortsupport()
>> already blacklists the UTF8/native Windows case.)  The test passed on Solaris
>> 10, Solaris 11, HP-UX B.11.31, OpenBSD 5.0, NetBSD 5.1.2, and FreeBSD 9.0.
>> See attached tryalllocales.sh outputs.  I did not test AIX, because the AIX
>> machines I use have no UTF8 locales installed.

> Wow, thanks for the extensive testing.  This suggests that, apart from
> Cygwin which apparently doesn't matter right now, the only thing that
> is busted is glibc.  I believe we have yet to see a single locale that
> fails anywhere else (apart from Cygwin).  Good thing so few of our
> users run glibc!

I extended my test program to be able to check locales using ISO-8859-x
encodings.  RHEL6 shows me failures in a set of locales that is remarkably
unlike the set it fails on for UTF8 (though good ol de_DE manages to fail
in both encodings, as do a few others).  I'm not sure what that implies
for the underlying bug(s).

> So, options:

> 1. We could make it the user's problem to figure out whether they've
> got a buggy glibc and add a GUC to shut this off, as previously
> suggested.

> 2. We could add a blacklist (either hardcoded or a GUC) shutting this
> off for locales known to be buggy anywhere.

> 3. We could write some test code that runs at startup time which
> reliably detects all of the broken locales we've so far uncovered and
> disables this if so.

> 4. We could shut this off for all Linux users in all locales and tell
> everybody to REINDEX.  That would be pretty sad, though.

TBH, I think #1 is right out, unless maybe the GUC defaults to off.
We aren't that cavalier with data consistency in other departments.

#2 and #3 presume a level of knowledge of the bug details that we
have not got, and probably can't get by Monday.

As far as #4 goes, we're going to have to tell people to REINDEX
no matter what the other aspects of the fix look like.  On-disk
indexes are broken right now, if you're using one of the affected
locales.

            regards, tom lane

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <locale.h>
#include <langinfo.h>
#include <time.h>

/*
 * Test: generate 1000 random UTF8 strings, sort them by strcoll, sanity-
 * check the sort result, sort them by strxfrm, sanity-check that result,
 * and compare the two sort orders.
 */
#define NSTRINGS 1000
#define MAXSTRLEN 20
#define MAXXFRMLEN (MAXSTRLEN * 10)

typedef struct
{
    char        strval[MAXSTRLEN];
    char        xfrmval[MAXXFRMLEN];
    int            strsortpos;
    int            xfrmsortpos;
} OneString;

/* qsort comparators */

static int
strcoll_compare(const void *pa, const void *pb)
{
    const OneString *a = (const OneString *) pa;
    const OneString *b = (const OneString *) pb;

    return strcoll(a->strval, b->strval);
}

static int
strxfrm_compare(const void *pa, const void *pb)
{
    const OneString *a = (const OneString *) pa;
    const OneString *b = (const OneString *) pb;

    return strcmp(a->xfrmval, b->xfrmval);
}


/* returns 1 if OK, 0 if inconsistency detected */
static int
run_test_case(int is_utf8)
{
    int            ok = 1;
    OneString    data[NSTRINGS];
    int            i,
                j;

    /* Generate random strings of length less than MAXSTRLEN bytes */
    for (i = 0; i < NSTRINGS; i++)
    {
        char       *p = data[i].strval;
        int            len;

        len = 1 + (random() % (MAXSTRLEN - 1));
        while (len > 0)
        {
            int            c;

            /* Generate random printable char in ISO8859-1 range */
            /* Bias towards producing a lot of spaces */
            if ((random() % 16) < 3)
                c = ' ';
            else
            {
                do
                {
                    c = random() & 0xFF;
                } while (!((c >= ' ' && c <= 127) || (c >= 0xA0 && c <= 0xFF)));
            }

            if (c <= 127 || !is_utf8)
            {
                *p++ = c;
                len--;
            }
            else
            {
                if (len < 2)
                    break;
                /* Poor man's utf8-ification */
                *p++ = 0xC0 + (c >> 6);
                len--;
                *p++ = 0x80 + (c & 0x3F);
                len--;
            }
        }
        *p = '\0';

        /* strxfrm each string as we produce it */
        if (strxfrm(data[i].xfrmval, data[i].strval, MAXXFRMLEN) >= MAXXFRMLEN)
        {
            fprintf(stderr, "strxfrm() result for %d-length string exceeded %d bytes\n",
                    (int) strlen(data[i].strval), MAXXFRMLEN);
            exit(1);
        }

#if 0
        printf("%d %s\n", i, data[i].strval);
#endif
    }

    /* Sort per strcoll(), and label, being careful in case some are equal */
    qsort(data, NSTRINGS, sizeof(OneString), strcoll_compare);
    j = 0;
    for (i = 0; i < NSTRINGS; i++)
    {
        if (i > 0 && strcoll(data[i].strval, data[i-1].strval) != 0)
            j++;
        data[i].strsortpos = j;
    }

    /* Sanity-check: is each string <= those after it? */
    for (i = 0; i < NSTRINGS; i++)
    {
        for (j = i + 1; j < NSTRINGS; j++)
        {
            if (strcoll(data[i].strval, data[j].strval) > 0)
            {
                fprintf(stdout, "strcoll sort inconsistency between positions %d and %d\n",
                        i, j);
                ok = 0;
            }
        }
    }

    /* Sort per strxfrm(), and label, being careful in case some are equal */
    qsort(data, NSTRINGS, sizeof(OneString), strxfrm_compare);
    j = 0;
    for (i = 0; i < NSTRINGS; i++)
    {
        if (i > 0 && strcmp(data[i].xfrmval, data[i-1].xfrmval) != 0)
            j++;
        data[i].xfrmsortpos = j;
    }

    /* Sanity-check: is each string <= those after it? */
    for (i = 0; i < NSTRINGS; i++)
    {
        for (j = i + 1; j < NSTRINGS; j++)
        {
            if (strcmp(data[i].xfrmval, data[j].xfrmval) > 0)
            {
                fprintf(stdout, "strxfrm sort inconsistency between positions %d and %d\n",
                        i, j);
                ok = 0;
            }
        }
    }

    /* Compare */
    for (i = 0; i < NSTRINGS; i++)
    {
        if (data[i].strsortpos != data[i].xfrmsortpos)
        {
            fprintf(stdout, "inconsistency between strcoll (%d) and strxfrm (%d) orders\n",
                    data[i].strsortpos, data[i].xfrmsortpos);
            ok = 0;
        }
    }

    return ok;
}

int
main(int argc, char **argv)
{
    const char *lc;
    const char *cset;
    int            is_utf8;
    int            ntries;
    int result = 0;

    /* Absorb locale from environment, and report what we're using */
    if (setlocale(LC_ALL, "") == NULL)
    {
        perror("setlocale(LC_ALL) failed");
        exit(1);
    }
    lc = setlocale(LC_COLLATE, NULL);
    if (lc)
    {
        printf("Using LC_COLLATE = \"%s\"\n", lc);
    }
    else
    {
        perror("setlocale(LC_COLLATE) failed");
        exit(1);
    }
    lc = setlocale(LC_CTYPE, NULL);
    if (lc)
    {
        printf("Using LC_CTYPE = \"%s\"\n", lc);
    }
    else
    {
        perror("setlocale(LC_CTYPE) failed");
        exit(1);
    }

    /* Identify encoding */
    cset = nl_langinfo(CODESET);
    if (!cset)
    {
        perror("nl_langinfo(CODESET) failed");
        exit(1);
    }
    if (strstr(cset, "utf") || strstr(cset, "UTF"))
        is_utf8 = 1;
    else if (strstr(cset, "iso-8859") || strstr(cset, "ISO-8859") ||
         strstr(cset, "iso8859") || strstr(cset, "ISO8859"))
        is_utf8 = 0;
    else
    {
        fprintf(stderr, "unrecognized codeset name \"%s\"\n", cset);
        exit(1);
    }

    /* Ensure new random() values on every run */
    srandom((unsigned int) time(NULL));

    /* argv[1] can be the max number of tries to run */
    if (argc > 1)
        ntries = atoi(argv[1]);
    else
        ntries = 1;

    /* Run one test instance per loop */
    while (ntries-- > 0)
    {
        if (!run_test_case(is_utf8))
            result = 1;
    }

    return result;
}
az_AZ.utf8 BAD
ca_AD.utf8 BAD
ca_ES.utf8 BAD
ca_FR.utf8 BAD
ca_IT.utf8 BAD
crh_UA.utf8 BAD
csb_PL.utf8 BAD
cv_RU.utf8 BAD
da_DK.utf8 BAD
de_DE.utf8 BAD
en_CA.utf8 BAD
es_EC.utf8 BAD
es_US.utf8 BAD
fi_FI.utf8 BAD
fil_PH.utf8 BAD
fo_FO.utf8 BAD
fr_CA.utf8 BAD
fur_IT.utf8 BAD
hu_HU.utf8 BAD
ig_NG.utf8 BAD
ik_CA.utf8 BAD
iu_CA.utf8 BAD
kl_GL.utf8 BAD
ku_TR.utf8 BAD
nb_NO.utf8 BAD
nn_NO.utf8 BAD
no_NO.utf8 BAD
ro_RO.utf8 BAD
sc_IT.utf8 BAD
se_NO.utf8 BAD
shs_CA.utf8 BAD
sq_AL.utf8 BAD
sq_MK.utf8 BAD
sv_FI.utf8 BAD
sv_SE.utf8 BAD
tk_TM.utf8 BAD
tt_RU.utf8 BAD
tt_RU.utf8@iqtelif BAD
ug_CN.utf8 BAD
vi_VN.utf8 BAD
yo_NG.utf8 BAD
ar_AE.iso88596 BAD
ar_BH.iso88596 BAD
ar_DZ.iso88596 BAD
ar_EG.iso88596 BAD
ar_IQ.iso88596 BAD
ar_JO.iso88596 BAD
ar_KW.iso88596 BAD
ar_LB.iso88596 BAD
ar_LY.iso88596 BAD
ar_MA.iso88596 BAD
ar_OM.iso88596 BAD
ar_QA.iso88596 BAD
ar_SD.iso88596 BAD
ar_SY.iso88596 BAD
ar_TN.iso88596 BAD
ar_YE.iso88596 BAD
bs_BA.iso88592 BAD
ca_AD.iso885915 BAD
ca_ES.iso88591 BAD
ca_ES.iso885915@euro BAD
ca_FR.iso885915 BAD
ca_IT.iso885915 BAD
da_DK.iso88591 BAD
da_DK.iso885915 BAD
de_DE.iso88591 BAD
es_EC.iso88591 BAD
es_US.iso88591 BAD
fi_FI.iso88591 BAD
fi_FI.iso885915@euro BAD
fo_FO.iso88591 BAD
fr_CA.iso88591 BAD
he_IL.iso88598 BAD
hu_HU.iso88592 BAD
iw_IL.iso88598 BAD
kl_GL.iso88591 BAD
ku_TR.iso88599 BAD
mk_MK.iso88595 BAD
mt_MT.iso88593 BAD
nb_NO.iso88591 BAD
nn_NO.iso88591 BAD
no_NO.iso88591 BAD
ro_RO.iso88592 BAD
ru_RU.iso88595 BAD
sq_AL.iso88591 BAD
sv_FI.iso88591 BAD
sv_FI.iso885915@euro BAD
sv_SE.iso88591 BAD
sv_SE.iso885915 BAD

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Tom Lane

Date:

23 March 2016, 16:13:52

I wrote:
> I extended my test program to be able to check locales using ISO-8859-x
> encodings.  RHEL6 shows me failures in a set of locales that is remarkably
> unlike the set it fails on for UTF8 (though good ol de_DE manages to fail
> in both encodings, as do a few others).  I'm not sure what that implies
> for the underlying bug(s).

Closer analysis says that all of the cases where only utf8 is reported to
fail are in fact because there is no iso8859 equivalent locale on my
machine.  Many of the cases where only iso8859 is reported to fail are
just chance passes due to not having randomly generated a failure case;
you can reduce the odds of that by passing strcolltest a repeat count
larger than 1.  There remain, however, a few locales in which it seems
that indeed iso8859 is broken and utf8 is not; ru_RU being the most
prominent example.

In short, the problem is actually worse in non-UTF8 locales.

            regards, tom lane

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

"David G. Johnston"

Date:

23 March 2016, 16:19:07

On Wed, Mar 23, 2016 at 9:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I wrote:
> > I extended my test program to be able to check locales using ISO-8859-x
> > encodings.  RHEL6 shows me failures in a set of locales that is
> remarkably
> > unlike the set it fails on for UTF8 (though good ol de_DE manages to fail
> > in both encodings, as do a few others).  I'm not sure what that implies
> > for the underlying bug(s).
>
> Closer analysis says that all of the cases where only utf8 is reported to
> fail are in fact because there is no iso8859 equivalent locale on my
> machine.  Many of the cases where only iso8859 is reported to fail are
> just chance passes due to not having randomly generated a failure case;
> you can reduce the odds of that by passing strcolltest a repeat count
> larger than 1.  There remain, however, a few locales in which it seems
> that indeed iso8859 is broken and utf8 is not; ru_RU being the most
> prominent example.
>
> In short, the problem is actually worse in non-UTF8 locales.
>

Is the POSIX/C (non)-locale affected?

David J.

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Robert Haas

Date:

23 March 2016, 17:22:18

On Wed, Mar 23, 2016 at 12:19 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> Is the POSIX/C (non)-locale affected?

We don't use strxfrm() or strcoll() in that case, so I sure hope not.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Robert Haas

Date:

23 March 2016, 17:23:50

On Wed, Mar 23, 2016 at 12:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> I extended my test program to be able to check locales using ISO-8859-x
>> encodings.  RHEL6 shows me failures in a set of locales that is remarkably
>> unlike the set it fails on for UTF8 (though good ol de_DE manages to fail
>> in both encodings, as do a few others).  I'm not sure what that implies
>> for the underlying bug(s).
>
> Closer analysis says that all of the cases where only utf8 is reported to
> fail are in fact because there is no iso8859 equivalent locale on my
> machine.  Many of the cases where only iso8859 is reported to fail are
> just chance passes due to not having randomly generated a failure case;
> you can reduce the odds of that by passing strcolltest a repeat count
> larger than 1.  There remain, however, a few locales in which it seems
> that indeed iso8859 is broken and utf8 is not; ru_RU being the most
> prominent example.
>
> In short, the problem is actually worse in non-UTF8 locales.

I guess that's not terribly surprising.  If the glibc maintainers
haven't managed to get this right for UTF-8 locales, I can't imagine
why they would have been more careful for non-UTF-8 locales that - I
would guess - get less use.

Are you still in information-gathering more, or are you going to issue
a recommendation on how we should proceed here, or what?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Tom Lane

Date:

23 March 2016, 17:46:59

Robert Haas <robertmhaas@gmail.com> writes:
> Are you still in information-gathering more, or are you going to issue
> a recommendation on how we should proceed here, or what?

If I had to make a recommendation right now, I would go for your
option #4, ie shut 'em all down Scotty.  We do not know the full extent
of the problem but it looks pretty bad, and I think our first priority
has to be to guarantee data integrity.  I do not have a lot of faith in
the proposition that glibc's is the only buggy implementation, either.

            regards, tom lane

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

23 March 2016, 17:52:24

On Wed, Mar 23, 2016 at 10:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> If I had to make a recommendation right now, I would go for your
> option #4, ie shut 'em all down Scotty.  We do not know the full extent
> of the problem but it looks pretty bad, and I think our first priority
> has to be to guarantee data integrity.

+1, but only for glibc, and configurable. The glibc default might
later be revisited in the stable 9.5 branch.

--
Peter Geoghegan

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Magnus Hagander

Date:

23 March 2016, 17:56:37

On Mar 23, 2016 18:53, "Peter Geoghegan" <pg@heroku.com> wrote:
>
> On Wed, Mar 23, 2016 at 10:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > If I had to make a recommendation right now, I would go for your
> > option #4, ie shut 'em all down Scotty.  We do not know the full extent
> > of the problem but it looks pretty bad, and I think our first priority
> > has to be to guarantee data integrity.
>
> +1, but only for glibc, and configurable. The glibc default might
> later be revisited in the stable 9.5 branch.
>

Are you talking about  configurable at./configure time, or guc?

Making it a compile time option makes sense I think. But turning it into a
guc will expose users to a lot of failure scenarios if they *change* the
value, and that seems risky.

Putting it in autoconf and default to off in the upcoming minor seems like
a good idea. Then once we have more information, we can consider if we want
to turn it back on in backbranches our just in 9.6 (when/if properly
fixed).

/Magnus

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

23 March 2016, 17:58:46

On Wed, Mar 23, 2016 at 10:56 AM, Magnus Hagander <magnus@hagander.net> wrote:
> Are you talking about  configurable at./configure time, or guc?

I meant a GUC. I think a ./configure option is overkill.

What about the existing caller of strxfrm(), convert_string_datum()?

--
Peter Geoghegan

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

23 March 2016, 18:00:12

On Wed, Mar 23, 2016 at 10:58 AM, Peter Geoghegan <pg@heroku.com> wrote:
> What about the existing caller of strxfrm(), convert_string_datum()?

I mean, the caller exists in all back-branches, not just 9.5.


--
Peter Geoghegan

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

23 March 2016, 18:02:36

On Wed, Mar 23, 2016 at 10:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I guess that's not terribly surprising.  If the glibc maintainers
> haven't managed to get this right for UTF-8 locales, I can't imagine
> why they would have been more careful for non-UTF-8 locales that - I
> would guess - get less use.

We don't want to suggest that locales are broken as such. My inability
to reproduce the original complaint on alternative German locales
(e.g. Austrian) suggest to me that it just "accidentally fails to
fail" for whatever reason (maybe they fail in other ways). I should
say "accidentally fails to not fail", because this is a failure of
strxfrm() to be bug-compatible with strcoll(), which I think needs to
not be forgotten.

--
Peter Geoghegan

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Magnus Hagander

Date:

23 March 2016, 18:04:49

On Wed, Mar 23, 2016 at 6:58 PM, Peter Geoghegan <pg@heroku.com> wrote:

> On Wed, Mar 23, 2016 at 10:56 AM, Magnus Hagander <magnus@hagander.net>
> wrote:
> > Are you talking about  configurable at./configure time, or guc?
>
> I meant a GUC. I think a ./configure option is overkill.
>

We clearly have different views of the amount of kill effort required for
the different options :) I would've said that a ./configure option is the
easier way, and that doing a GUC is the one that's an overkill (being
significantly more effort).

That said, my main point is that I do not think the knob is something that
should be tuned by the average end user. For most people, that should be
left to the packagers for the platform, who can make an informed choice
about if it's safe to turn it on.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

23 March 2016, 18:06:51

On Wed, Mar 23, 2016 at 11:04 AM, Magnus Hagander <magnus@hagander.net> wrote:
> That said, my main point is that I do not think the knob is something that
> should be tuned by the average end user. For most people, that should be
> left to the packagers for the platform, who can make an informed choice
> about if it's safe to turn it on.

I could get behind that if we really make an effort to help them make
an informed choice. The abbreviated keys optimization is highly
valuable, and I put a lot of work into it, as did Robert.

--
Peter Geoghegan

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Magnus Hagander

Date:

23 March 2016, 18:09:57

On Wed, Mar 23, 2016 at 7:06 PM, Peter Geoghegan <pg@heroku.com> wrote:

> On Wed, Mar 23, 2016 at 11:04 AM, Magnus Hagander <magnus@hagander.net>
> wrote:
> > That said, my main point is that I do not think the knob is something
> that
> > should be tuned by the average end user. For most people, that should be
> > left to the packagers for the platform, who can make an informed choice
> > about if it's safe to turn it on.
>
> I could get behind that if we really make an effort to help them make
> an informed choice. The abbreviated keys optimization is highly
> valuable, and I put a lot of work into it, as did Robert.
>

Oh, I totally appreciate that. It's one of the great improvements in 9.5,
and one of the best things is that it's an "automatic improvement" that
doesn't require the users to change their applications to benefit from it.
But it's also currently badly broken on some of our most common platforms.

We want to get it back to working. But short-term, it's more important to
limit the scope of the brokenness, since this is a version that people are
putting in production. Once we have enough info to safely say we've put a
workaround in place, we turn it back on.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Tom Lane

Date:

23 March 2016, 18:14:04

Peter Geoghegan <pg@heroku.com> writes:
> What about the existing caller of strxfrm(), convert_string_datum()?

convert_string_datum is, and always has been, used only for planner
estimation purposes.  We do not care if it sometimes gets inaccurate
answers.  Even if it's as wrong as it can possibly be, that will only
affect planner estimates to the extent of wrongly interpolating between
the endpoints of a histogram bin, so that the effects are no worse than
about 1/statistics_target.  And there are bigger limitations on the
accuracy of those estimates anyway, notably that we use the same stats
regardless of the collation that applies to a particular WHERE-clause
operator.

            regards, tom lane

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

23 March 2016, 18:14:41

On Wed, Mar 23, 2016 at 11:09 AM, Magnus Hagander <magnus@hagander.net> wrote:
> We want to get it back to working. But short-term, it's more important to
> limit the scope of the brokenness, since this is a version that people are
> putting in production. Once we have enough info to safely say we've put a
> workaround in place, we turn it back on.

Do you think it's possible that my amcheck tool might have a role to
play here? I wrote it for exactly this kind of scenario. If we could
get it reviewed, then a pre-release version compatible with 9.5 could
be made available. I'd be willing to work on that side of things if
core are receptive. Early prototypes of the tool were used to detect
collation incompatibility issues in production.

--
Peter Geoghegan

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Tom Lane

Date:

23 March 2016, 18:32:33

Peter Geoghegan <pg@heroku.com> writes:
> On Wed, Mar 23, 2016 at 11:04 AM, Magnus Hagander <magnus@hagander.net> wrote:
>> That said, my main point is that I do not think the knob is something that
>> should be tuned by the average end user. For most people, that should be
>> left to the packagers for the platform, who can make an informed choice
>> about if it's safe to turn it on.

> I could get behind that if we really make an effort to help them make
> an informed choice. The abbreviated keys optimization is highly
> valuable, and I put a lot of work into it, as did Robert.

I realize that, and I'm sympathetic, but I'm afraid it also means that
your judgment in this matter is rather biased.

I do not think that end users can be expected to know whether this is safe
to turn on, and TBH I do not think that most packagers will either.  My
opinion is that our only guaranteed-safe option is to turn it off, period,
no exceptions for platforms that we've not yet found a failure case for.
We can consider turning it back on later, once we've done vastly more
study and testing than has evidently been done to date.  One thing I'm
going to want to know is what was the root cause of glibc's bug, and what
is the reason to think that other implementations are going to be any more
reliable.  At this point I'm disinclined to trust any implementation that
can't point to a structural reason (e.g., sharing code) to believe that
strcoll and strxfrm must yield equivalent answers.

(In other words, I want an #ifdef NOT_USED, which is even less effort
than either a GUC or a configure option ;-(.  As well as being something
that we won't need to document and support indefinitely.)

            regards, tom lane

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

23 March 2016, 18:40:27

On Wed, Mar 23, 2016 at 11:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I do not think that end users can be expected to know whether this is safe
> to turn on, and TBH I do not think that most packagers will either.  My
> opinion is that our only guaranteed-safe option is to turn it off, period,
> no exceptions for platforms that we've not yet found a failure case for.
> We can consider turning it back on later, once we've done vastly more
> study and testing than has evidently been done to date.  One thing I'm
> going to want to know is what was the root cause of glibc's bug, and what
> is the reason to think that other implementations are going to be any more
> reliable.  At this point I'm disinclined to trust any implementation that
> can't point to a structural reason (e.g., sharing code) to believe that
> strcoll and strxfrm must yield equivalent answers.

The more I think about it, the more I agree that not trusting
strxfrm() across the board is the right move short-term. So, I'm not
going to be upset, provided we do actually follow through later with
an effort to turn it back on in 9.5 as as when it is known to be
reliable. All I'm asking for is that we actively work towards making
it safe, which evidently requires leg-work, that I can only do part
of. (For example, I'm not on the -packagers list, so cannot really
coordinate with packagers).

I think that that's a reasonable thing for me to expect.

--
Peter Geoghegan

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Robert Haas

Date:

23 March 2016, 18:57:00

On Wed, Mar 23, 2016 at 2:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <pg@heroku.com> writes:
>> On Wed, Mar 23, 2016 at 11:04 AM, Magnus Hagander <magnus@hagander.net> wrote:
>>> That said, my main point is that I do not think the knob is something that
>>> should be tuned by the average end user. For most people, that should be
>>> left to the packagers for the platform, who can make an informed choice
>>> about if it's safe to turn it on.
>
>> I could get behind that if we really make an effort to help them make
>> an informed choice. The abbreviated keys optimization is highly
>> valuable, and I put a lot of work into it, as did Robert.
>
> I realize that, and I'm sympathetic, but I'm afraid it also means that
> your judgment in this matter is rather biased.
>
> I do not think that end users can be expected to know whether this is safe
> to turn on, and TBH I do not think that most packagers will either.  My
> opinion is that our only guaranteed-safe option is to turn it off, period,
> no exceptions for platforms that we've not yet found a failure case for.
> We can consider turning it back on later, once we've done vastly more
> study and testing than has evidently been done to date.  One thing I'm
> going to want to know is what was the root cause of glibc's bug, and what
> is the reason to think that other implementations are going to be any more
> reliable.  At this point I'm disinclined to trust any implementation that
> can't point to a structural reason (e.g., sharing code) to believe that
> strcoll and strxfrm must yield equivalent answers.
>
> (In other words, I want an #ifdef NOT_USED, which is even less effort
> than either a GUC or a configure option ;-(.  As well as being something
> that we won't need to document and support indefinitely.)

I think that something like the attached would be a reasonable
approach to the problem.  If we later decide this is altogether
hopeless, we can do a more thorough job removing the code that can be
reached when collate_c && abbreviate, but let's not do that right now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

dont-trust-strxfrm.patch

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

23 March 2016, 19:01:41

On Wed, Mar 23, 2016 at 11:56 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think that something like the attached would be a reasonable
> approach to the problem.  If we later decide this is altogether
> hopeless, we can do a more thorough job removing the code that can be
> reached when collate_c && abbreviate, but let's not do that right now.

This patch looks good to me.

I think that disabling abbreviation when the C collation is in makes
no sense, though. This has nothing to do with abbreviation as such,
and everything to do with glibc.

--
Peter Geoghegan

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Robert Haas

Date:

23 March 2016, 19:04:41

On Wed, Mar 23, 2016 at 3:01 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Wed, Mar 23, 2016 at 11:56 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I think that something like the attached would be a reasonable
>> approach to the problem.  If we later decide this is altogether
>> hopeless, we can do a more thorough job removing the code that can be
>> reached when collate_c && abbreviate, but let's not do that right now.
>
> This patch looks good to me.
>
> I think that disabling abbreviation when the C collation is in makes
> no sense, though.

But the patch doesn't do that, right?

> This has nothing to do with abbreviation as such,
> and everything to do with glibc.

Yes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

23 March 2016, 19:07:32

On Wed, Mar 23, 2016 at 12:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I think that disabling abbreviation when the C collation is in makes
>> no sense, though.
>
> But the patch doesn't do that, right?

Right, it doesn't. But I was surprised that you even mentioned it as a
possibility. That's all.

--
Peter Geoghegan

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Tom Lane

Date:

23 March 2016, 19:20:23

Robert Haas <robertmhaas@gmail.com> writes:
> +#ifndef TRUST_STRXFRM
> +    if (!collate_c)
> +        abbreviate = false;
> +#endif

Ah, I did not realize that abbreviation would be of any value in C locale.
If it is, then +1 for something like the above.

            regards, tom lane

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Robert Haas

Date:

23 March 2016, 20:07:20

On Wed, Mar 23, 2016 at 3:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> +#ifndef TRUST_STRXFRM
>> +     if (!collate_c)
>> +             abbreviate = false;
>> +#endif
>
> Ah, I did not realize that abbreviation would be of any value in C locale.
> If it is, then +1 for something like the above.

It's actually more likely to help for a C locale than for a non-C locale.

I have committed this and back-patched it to 9.5.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

23 March 2016, 21:34:41

On Tue, Mar 22, 2016 at 7:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <pg@heroku.com> writes:
>> My concern was not merely "academic" (i.e. it was not limited in scope
>> to things that don't make B-Tree indexes corrupt). Pretty sure that we
>> need to start thinking of this as a problem with strcoll() that
>> strxfrm() does not have for more fundamental reasons, because
>> strcoll() says that the first string in the de_DE sorted list is
>> *greater* than the third string.
>
> [ squint... ]  I was looking specifically for that sort of misbehavior
> in my test program, and I haven't seen it.

Sorry, I was in too much of a hurry to get to the bottom of this with
that example. I failed to notice that LC_COLLATE for sort was "de_DE",
not "de_DE.UTF-8". For my simple case it would not have mattered if
"de_DE" was specified instead of "de_DE.UTF-8" on a non-broken system.
But, this was a broken system.

Anyway, what prompted the misguided example was this:

[vagrant@localhost ~]$ ./strxfrm-binary de_DE.UTF-8 'x xx"' 'xxx"'
"x xx"" -> 2323230108080801020202010235034b (16 bytes)
"xxx"" -> 232323010808080102020201044b (14 bytes)
strcmp(arg1, arg2) result: -2
strcoll(arg1, arg2) result: -6
[vagrant@localhost ~]$ ./strxfrm-binary de_DE.UTF-8 'x xxf' 'xxxf'
"x xxf" -> 2323231101080808080102020202010235 (17 bytes)
"xxxf" -> 2323231101080808080102020202 (14 bytes)
strcmp(arg1, arg2) result: 1
strcoll(arg1, arg2) result: -6

Notice that case where a double-quote is used makes strxfrm() and
strcoll() agree. Whereas if that character is a character from the
Latin Alphabet instead, they disagree.

My intuition is that this is significant from the point of view of
fixing the glibc strcoll() bug. It feels like there is an incorrectly
applied optimization here, that occurs for strcoll() but not the
separate transformation process that strxfrm() does.

There seems to be at least a few instances of over-optimizing
strcoll() in the past few years. For example:
https://github.com/bminor/glibc/commit/87701a58e291bd7ac3b407d10a829dac52c9c16e

This bug looks like a possible candidate, given that complaints were
about de_DE:

https://github.com/bminor/glibc/commit/33a667def79c42e0befed1a4070798c58488170f

Is this bug of the right vintage? Seems like it might be a bit too
early for RHEL 6 to be affected, but I'm no expert.

--
Peter Geoghegan

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Tom Lane

Date:

23 March 2016, 21:51:36

Peter Geoghegan <pg@heroku.com> writes:
> There seems to be at least a few instances of over-optimizing
> strcoll() in the past few years. For example:
> https://github.com/bminor/glibc/commit/87701a58e291bd7ac3b407d10a829dac52c9c16e

> This bug looks like a possible candidate, given that complaints were
> about de_DE:
> https://github.com/bminor/glibc/commit/33a667def79c42e0befed1a4070798c58488170f
> Is this bug of the right vintage? Seems like it might be a bit too
> early for RHEL 6 to be affected, but I'm no expert.

It is too early.  RHEL6 seems to be based off glibc 2.12, released 2010.
(By the same token, it's not got the other bug you mention ;-))

            regards, tom lane

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

23 March 2016, 22:08:11

On Wed, Mar 23, 2016 at 2:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It is too early.  RHEL6 seems to be based off glibc 2.12, released 2010.
> (By the same token, it's not got the other bug you mention ;-))

Well, it looked like everything was fine for "debian testing, glibc
2.22-3", including de_DE.UTF-8. In theory, it's only a matter of using
git-bisect to find what the fix was. That's just leg-work. I will find
time for it after the ongoing CF.

Mercifully, the situation with Glibc 2.22 suggests that the Glibc
people *aren't* fixing the strcoll() bugs in stable branches. But that
also means that it will take a long time to make non-C collation text
sorting use abbreviation on most systems, which is certainly
disappointing.

--
Peter Geoghegan

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Magnus Hagander

Date:

24 March 2016, 13:04:31

On Wed, Mar 23, 2016 at 7:14 PM, Peter Geoghegan <pg@heroku.com> wrote:

> On Wed, Mar 23, 2016 at 11:09 AM, Magnus Hagander <magnus@hagander.net>
> wrote:
> > We want to get it back to working. But short-term, it's more important to
> > limit the scope of the brokenness, since this is a version that people
> are
> > putting in production. Once we have enough info to safely say we've put a
> > workaround in place, we turn it back on.
>
> Do you think it's possible that my amcheck tool might have a role to
> play here? I wrote it for exactly this kind of scenario. If we could
> get it reviewed, then a pre-release version compatible with 9.5 could
> be made available. I'd be willing to work on that side of things if
> core are receptive. Early prototypes of the tool were used to detect
> collation incompatibility issues in production.
>

That's a good question? Would it catch corruption like this? I haven't
actually tested it :) My understanding is that the thing that can happen is
that while we don't actually store incorrect values in the indexes, we can
end up with index pointers in the wrong order in the indexes with this bug?
That does sound like one of those things that the amcheck tool is designed
to find?

And if not that one, can we find some other way for people to find out if
they need to REINDEX after the upgrade? It would be very nice not to have
to tell everybody to reindex everything, but to actually detect the cases
where it's needed. Or at least provide a supported way to do that, for
those where a cluster-wide reindex is really expensive.

Even if we can't sneak amcheck into 9.5, if we can show that it detects the
problem, then just being able to direct people to "get amcheck from 9.6 if
you want to check if the reindex is necessary" would still be a strong
improvement over nothing.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Robert Haas

Date:

24 March 2016, 14:14:17

On Thu, Mar 24, 2016 at 9:04 AM, Magnus Hagander <magnus@hagander.net> wrote:
> Even if we can't sneak amcheck into 9.5, if we can show that it detects the
> problem, then just being able to direct people to "get amcheck from 9.6 if
> you want to check if the reindex is necessary" would still be a strong
> improvement over nothing.

I agree that back-patching amcheck into 9.5 would be unprecedented,
but it wouldn't be crazy: shipping an extra contrib module with no
additional dependencies shouldn't break anything for existing users.

However, the fact that the patch is not "Ready for Committer" at this
point means that it is not going to be available in time for next
week's maintenance releases, or very possibly, for 9.6.  Time grows
very short.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Bernd Helmle

Date:

24 March 2016, 14:17:35

--On 24. M=C3=A4rz 2016 14:04:22 +0100 Magnus Hagander =
<magnus@hagander.net>
wrote:

> That's a good question? Would it catch corruption like this? I haven't
> actually tested it :) My understanding is that the thing that can happen
> is that while we don't actually store incorrect values in the indexes, we
> can end up with index pointers in the wrong order in the indexes with
> this bug? That does sound like one of those things that the amcheck tool
> is designed to find?

This is exactly where the prototype btreecheck helped a lot. The last time
i used it to track down problems we got=20

> WARNING:  page order invariant violated for index

which nailed down collation problems on that specific machine and to
identify indexes, where we got the problem.

For example, if you take the bug report from Marc-Olaf and check the
affected table/index with the current amcheck patch, you get:

bernd@localhost:test #=3D SELECT bt_index_check('foo_val_idx');
ERROR:  XX002: page order invariant violated for index "foo_val_idx"
DETAIL:  Lower index tid=3D(1,1) (points to heap tid=3D(0,1)) higher index
tid=3D(1,2) (points to heap tid=3D(0,2)) page lsn=3D0/0.
LOCATION:  bt_target_page_check, amcheck.c:687
STATEMENT:  SELECT bt_index_check('foo_val_idx');

So if you ask me, this absolutely is a "must-have".

--=20
Thanks

    Bernd

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

24 March 2016, 19:10:46

On Thu, Mar 24, 2016 at 7:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> However, the fact that the patch is not "Ready for Committer" at this
> point means that it is not going to be available in time for next
> week's maintenance releases, or very possibly, for 9.6.  Time grows
> very short.

The only people that are likely comfortable giving final sign-off on
it that are active this CF are Tom and Kevin. That is an awkward
situation.

I could produce a 9.5 variant that had even more limited scope than
what's in the CF. That would be strictly limited to checking page
order, and the high key invariant. It wouldn't check relationships
spanning multiple pages, either on the same level, or though
parent/child relationships. Then, I think significantly less expertise
is required for review, because locking protocols and so on don't
enter into it.

I think that the risk of getting something wrong with amcheck as
things stand is acceptable for 9.6, and maybe even 9.5. About the
worst case scenario is a false positive report of corruption. But with
the tool scoped at only looking at really obvious invariants at the
level of a single page, which is what I'd propose for 9.5, it seems
like the risk of bugs would be very well managed. That would still
catch issues caused by this glibc bug very reliable.

Keep in mind that in general, amcheck does nothing special with buffer
locks + pins -- it just acquires a pin +shared buffer lock on one
buffer/page at a time, copies it into local memory, and releases and
drops the pin. So, all processing by amcheck happens outside any
critical path.

I could work hard to get that stripped down amcheck into 9.5. I'm
already behind on my CF reviews, and time is short, so it would be
good if we moved quickly on this, either way....

--
Peter Geoghegan

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Tom Lane

Date:

24 March 2016, 19:20:27

Peter Geoghegan <pg@heroku.com> writes:
> On Thu, Mar 24, 2016 at 7:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> However, the fact that the patch is not "Ready for Committer" at this
>> point means that it is not going to be available in time for next
>> week's maintenance releases, or very possibly, for 9.6.  Time grows
>> very short.

> The only people that are likely comfortable giving final sign-off on
> it that are active this CF are Tom and Kevin. That is an awkward
> situation.

I would not be comfortable with reviewing an entire module with the
intention of shipping it in a stable branch on Monday, even if I had
nothing else to do between now and then.  I think the only sane way
to get this into 9.5.2 would be to slip the release date, and that
seems rather counterproductive.  We need to get this fix into the
hands of users ASAP.

I fear our only realistic course of action is to publish release
notes along the lines of "if you use any of list-of-affected-locales,
you should REINDEX btree indexes on text/varchar/bpchar columns".

            regards, tom lane

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

24 March 2016, 19:29:46

On Thu, Mar 24, 2016 at 12:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The only people that are likely comfortable giving final sign-off on
>> it that are active this CF are Tom and Kevin. That is an awkward
>> situation.
>
> I would not be comfortable with reviewing an entire module with the
> intention of shipping it in a stable branch on Monday, even if I had
> nothing else to do between now and then.  I think the only sane way
> to get this into 9.5.2 would be to slip the release date, and that
> seems rather counterproductive.  We need to get this fix into the
> hands of users ASAP.

That's fair. I didn't really imagine that we'd want to put the tool
into 9.5 myself. Still, I think that amcheck could have some role to
play in managing the problem. Even the near-term availability of
amcheck for 9.5 as a satellite project would count. That could happen
without blocking the point release. I just don't want to go over
anyone's head with that.

"REINDEX everything" isn't a realistic plan for a lot of users.

--
Peter Geoghegan

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Tom Lane

Date:

24 March 2016, 19:47:15

Peter Geoghegan <pg@heroku.com> writes:
> That's fair. I didn't really imagine that we'd want to put the tool
> into 9.5 myself. Still, I think that amcheck could have some role to
> play in managing the problem. Even the near-term availability of
> amcheck for 9.5 as a satellite project would count. That could happen
> without blocking the point release. I just don't want to go over
> anyone's head with that.

I have no objection to something like that happening.

            regards, tom lane

Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Peter Geoghegan

Date:

24 March 2016, 20:28:13

On Thu, Mar 24, 2016 at 6:04 AM, Magnus Hagander <magnus@hagander.net> wrote:
> And if not that one, can we find some other way for people to find out if
> they need to REINDEX after the upgrade? It would be very nice not to have to
> tell everybody to reindex everything, but to actually detect the cases where
> it's needed. Or at least provide a supported way to do that, for those where
> a cluster-wide reindex is really expensive.

If amcheck was made to only verify pages in isolation, then it have a
very strong chance of finding any issues, but not an iron-clad
guarantee -- it might be that the ordering was wrong across pages
(although that seems like a very small space for problems to hide).
Because we know that there is a sane total ordering for both strcoll()
and strxfrm() cases on affected systems, I'm pretty sure that the
version of amcheck in the ongoing CF (that checks child/parent, as
well as sibling relationships) would actually catch any problems of
that kind *reliably*. In other words, it would be okay that it didn't
check every item against every other item, because per Tom's analysis
the transitive law is not broken in either case, even if strcoll() is
buggy.

> Even if we can't sneak amcheck into 9.5, if we can show that it detects the
> problem, then just being able to direct people to "get amcheck from 9.6 if
> you want to check if the reindex is necessary" would still be a strong
> improvement over nothing.

Agreed.

--
Peter Geoghegan

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From

Marc-Olaf Jaschke

Date:

30 March 2016, 15:46:30

Thanks for the quick bug fix!

I've seen that a wiki page on the subject has been created. Maybe it is =
useful to explicitly mention, that 9.5.1 performance can be partly =
maintained, by changing the collation of text columns to "C", when there =
is no need for special collation handling.

Best regards,
Marc-Olaf Jaschke


> Am 23.03.2016 um 21:07 schrieb Robert Haas <robertmhaas@gmail.com>:
>=20
> On Wed, Mar 23, 2016 at 3:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> +#ifndef TRUST_STRXFRM
>>> +     if (!collate_c)
>>> +             abbreviate =3D false;
>>> +#endif
>>=20
>> Ah, I did not realize that abbreviation would be of any value in C =
locale.
>> If it is, then +1 for something like the above.
>=20
> It's actually more likely to help for a C locale than for a non-C =
locale.
>=20
> I have committed this and back-patched it to 9.5.
>=20
> --=20
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company