Thread: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Marc-Olaf Jaschke
Date:
Hi, PostgreSQL 9.5 ignores rows with the following test case: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= \l+ =E2=80=A6 Encoding | Collate | Ctype =20 UTF8 | de_DE.UTF-8 | de_DE.UTF-8=20 ... create table test (t) as values ('eai'), ('e a=C3=AD'); select * from test where t =3D 'eai'; t =20 ----- eai (1 row) create index on test(t); set enable_seqscan =3D false; select * from test where t =3D 'eai'; t=20 --- (0 rows) select t from test where t =3D 'eai' collate "C"; t =20 ----- eai (1 row) alter table test alter column t type text collate "C"; select * from test where t =3D 'eai'; t =20 ----- eai (1 row) alter table test alter column t type text collate "de_DE.utf8"; select * from test where t =3D 'eai'; t=20 --- (0 rows) set enable_seqscan =3D true; select * from test where t =3D 'eai'; t =20 ----- eai (1 row) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= I was able to reproduce this with cat /etc/debian_version=20 6.0.1 PostgreSQL 9.5.0 on x86_64-pc-linux-gnu, compiled by gcc-4.4.real = (Debian 4.4.5-8) 4.4.5, 64-bit /lib/libc.so.6 > GNU C Library (Debian EGLIBC 2.11.3-3) stable release = version 2.11.3, by Roland McGrath et al. CentOS release 6.7 (Final) PostgreSQL 9.5.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 = 20120313 (Red Hat 4.4.7-16), 64-bit ldd --version ldd (GNU libc) 2.12 I was not able to reproduce this with OSX (10.11.3 (15D21)) PostgreSQL 9.5alpha1 on x86_64-apple-darwin14.3.0, compiled by Apple = LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn), 64-bit OSX (10.11.3 (15D21)) PostgreSQL 9.5.1 on x86_64-apple-darwin14.5.0, compiled by Apple LLVM = version 7.0.0 (clang-700.1.76), 64-bit Ubuntu 12.04.5 LTS PostgreSQL 9.3.11 on x86_64-unknown-linux-gnu, compiled by gcc = (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit ldd --version ldd (Ubuntu EGLIBC 2.15-0ubuntu10.13) 2.15 =09 CentOS release 6.7 (Final) PostgreSQL 9.4.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) = 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit ldd --version ldd (GNU libc) 2.12 Red Hat Enterprise Linux Server release 7.2 (Maipo)=20 PostgreSQL 9.5.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 = 20150623 (Red Hat 4.8.5-4), 64-bit ldd --version ldd (GNU libc) 2.17 Best regards, Marc-Olaf Jaschke
Marc-Olaf Jaschke <marc-olaf.jaschke@s24.com> writes: > PostgreSQL 9.5 ignores rows with the following test case: I can reproduce this in 9.5 and HEAD on RHEL6, but 9.4 works as expected. I presume that that points the finger at the abbreviated-keys work. BTW, what I'm seeing in 9.5/HEAD is that all three comparison senses fail: u8=# set enable_seqscan TO 0; SET u8=# select * from test where t < 'eai'; t --- (0 rows) u8=# select * from test where t = 'eai'; t --- (0 rows) u8=# select * from test where t > 'eai'; t --- (0 rows) regards, tom lane
On Mon, Mar 21, 2016 at 8:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Marc-Olaf Jaschke <marc-olaf.jaschke@s24.com> writes: >> PostgreSQL 9.5 ignores rows with the following test case: > > I can reproduce this in 9.5 and HEAD on RHEL6, but 9.4 works as expected. > I presume that that points the finger at the abbreviated-keys work. > > BTW, what I'm seeing in 9.5/HEAD is that all three comparison senses fail: > > u8=# set enable_seqscan TO 0; > SET > u8=# select * from test where t < 'eai'; > t > --- > (0 rows) > > u8=# select * from test where t = 'eai'; > t > --- > (0 rows) > > u8=# select * from test where t > 'eai'; > t > --- > (0 rows) This could plausibly be a consequence of the abbreviated keys work if strxfrm() and strcoll() return inconsistent results for those strings for the same locale (say, one says +1 and the other says -1 given those inputs). I don't have a RHEL6 system handy to test whether that might be the case here. If that is the case, I'd argue that's a glibc problem, not our problem. Of course, we could provide an option to disable abbreviated keys for the benefit of people who need to work around buggy libc implementations. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Mon, Mar 21, 2016 at 5:26 PM, Robert Haas <robertmhaas@gmail.com> wrote: > If that is the case, I'd argue that's a glibc problem, not our > problem. Of course, we could provide an option to disable abbreviated > keys for the benefit of people who need to work around buggy libc > implementations. Conferred with Robert. This is my first suspicion. More in a little while. -- Peter Geoghegan
Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Mon, Mar 21, 2016 at 1:40 PM, Marc-Olaf Jaschke <marc-olaf.jaschke@s24.com> wrote: > PostgreSQL 9.5 ignores rows with the following test case: At one point, Robert wrote a small self-contained tool to show OS strxfrm() blobs: http://www.postgresql.org/message-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com It would be great if you showed us the output for your test case strings, both on an affected and on an unaffected system. As Robert mentioned, our use of strxfrm() quite reasonably relies on it producing blobs that compare with strcmp() in a way that gives the same result as a strcoll() on the original strings, per ISO C90. Thanks -- Peter Geoghegan
Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Mon, Mar 21, 2016 at 5:26 PM, Robert Haas <robertmhaas@gmail.com> wrote: > If that is the case, I'd argue that's a glibc problem, not our > problem. Of course, we could provide an option to disable abbreviated > keys for the benefit of people who need to work around buggy libc > implementations. That would be an easy patch to write. We'd simply have a test within bttextsortsupport() that had systems that disabled abbreviated keys for text PG_RETURN_VOID(). Actually, to be more precise we'd put that next to the Windows code within varstr_sortsupport() (the function is called btsortsupport_worker in 9.5). It would look at a GUC, I suppose. -- Peter Geoghegan
Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Mon, Mar 21, 2016 at 1:40 PM, Marc-Olaf Jaschke <marc-olaf.jaschke@s24.com> wrote: > I was able to reproduce this with > > cat /etc/debian_version > 6.0.1 > PostgreSQL 9.5.0 on x86_64-pc-linux-gnu, compiled by gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 64-bit > /lib/libc.so.6 > GNU C Library (Debian EGLIBC 2.11.3-3) stable release version 2.11.3, by Roland McGrath et al. > > CentOS release 6.7 (Final) > PostgreSQL 9.5.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit > ldd --version > ldd (GNU libc) 2.12 I found this fairly recent bug report concerning glibc's strxfrm(): https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=803927 (See also https://sourceware.org/bugzilla/show_bug.cgi?id=16009) I'm not certain that this is the problem, but it's a good theory. Note that this particular message talks about your exact affected version of eglibc (eglibc-2.11.3): https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=803927#27 Even if it isn't this exact issue, I have a really hard time imagining that this is not a bug in the relevant Glibc versions. Abbreviated keys are fundamentally a fairly simple idea, and it's hard to think of any other possible explanation. We'll know more when we use those strxfrm() blobs, from the tool I linked to. -- Peter Geoghegan
Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Mon, Mar 21, 2016 at 5:44 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Mon, Mar 21, 2016 at 5:26 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> If that is the case, I'd argue that's a glibc problem, not our >> problem. Of course, we could provide an option to disable abbreviated >> keys for the benefit of people who need to work around buggy libc >> implementations. > > That would be an easy patch to write. We'd simply have a test within > bttextsortsupport() that had systems that disabled abbreviated keys > for text PG_RETURN_VOID(). Actually, to be more precise we'd put that > next to the Windows code within varstr_sortsupport() (the function is > called btsortsupport_worker in 9.5). It would look at a GUC, I > suppose. Actually, I suppose it isn't quite that simple, because abbreviated keys did not introduce the use of strxfrm() by Postgres. That happened much sooner. I guess we'd have to think about convert_string_datum(), too. Maybe we can write a test-case that lets check_strxfrm_bug() detect this issue, which would be ideal. But, again, I need to see what's going on with strxfrm() on affected systems before I can do anything. Don't have one of my own close at hand. -- Peter Geoghegan
Peter Geoghegan <pg@heroku.com> writes: > On Mon, Mar 21, 2016 at 5:44 PM, Peter Geoghegan <pg@heroku.com> wrote: >> On Mon, Mar 21, 2016 at 5:26 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>>> If that is the case, I'd argue that's a glibc problem, not our >>>> problem. Of course, we could provide an option to disable abbreviated >>>> keys for the benefit of people who need to work around buggy libc >>>> implementations. FWIW, I do not think you can dismiss it as "not our bug" if a large fraction of existing glibc installations share the issue. It might be a glibc bug, but we'll have to find a workaround. > Maybe we can write a test-case that lets check_strxfrm_bug() detect > this issue, which would be ideal. But, again, I need to see what's > going on with strxfrm() on affected systems before I can do anything. Happy to test if you can provide a test case. regards, tom lane
Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Mon, Mar 21, 2016 at 7:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > FWIW, I do not think you can dismiss it as "not our bug" if a large > fraction of existing glibc installations share the issue. It might > be a glibc bug, but we'll have to find a workaround. I didn't say that. I strongly agree. >> Maybe we can write a test-case that lets check_strxfrm_bug() detect >> this issue, which would be ideal. But, again, I need to see what's >> going on with strxfrm() on affected systems before I can do anything. > > Happy to test if you can provide a test case. Can you look at generating a textual representation of the strxfrm() blobs in question, using Robert's tool?: http://www.postgresql.org/message-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com That would give me some basis for writing a test. -- Peter Geoghegan
Peter Geoghegan <pg@heroku.com> writes: > At one point, Robert wrote a small self-contained tool to show OS > strxfrm() blobs: > http://www.postgresql.org/message-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com > It would be great if you showed us the output for your test case > strings, both on an affected and on an unaffected system. On RHEL6, I get ./strxfrm-binary de_DE.UTF-8 'eai' 'e aÃ' "eai" -> 100c140108080801020202 (11 bytes) "e aÃ" -> 100c140108080901020202010235 (14 bytes) This seems a bit problematic, because these string sort in the other order ("e aÃ" before "eai") according to sort(1) as well as Postgres sorting code. It's possible I've copied-and-pasted these multibyte characters wrong. But if I haven't, this says that the strxfrm-based optimization is unusably broken on a very large fraction of reasonably-modern installations. Quite aside from casting aspersions on the glibc guys, how did we fail to notice this in our own testing? regards, tom lane
Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Mon, Mar 21, 2016 at 9:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > On RHEL6, I get > > ./strxfrm-binary de_DE.UTF-8 'eai' 'e a=C3=AD' > "eai" -> 100c140108080801020202 (11 bytes) > "e a=C3=AD" -> 100c140108080901020202010235 (14 bytes) As expect, ISTM that the "primary weights" here are the same. Aligned comparison of this with correct en_US.UTF-8 blobs from my system: Buggy version (Tom's de_DE.UTF-8 testcase): "eai" -> 100c14 01 090909 01 090909 (11 bytes) "e a=C3=AD" -> 100c14 01 0b0909 01 090909010235 (14 bytes) Correct version (though uses different locale): "eai" -> 100c14 01 080808 01 020202 (11 bytes) "e a=C3=AD" -> 100c14 01 080809 01 020202010235 (14 bytes) The low bytes, 0x01, separate the weight levels,. I think that this always happens with glibc. The space character is only represented at the last level, which is why strcoll() typically weighs spaces as very unimportant (you'll recall that we here complaints about this from time to time). My guess is that the 0x0b byte in Tom's buggy de_DE.UTF-8 testcase is the problem. Not sure why. I guess I'll look around here for further ideas tomorrow: http://unicode.org/reports/tr10/#Well_Formedness_Examples > This seems a bit problematic, because these string sort in the other > order ("e a=C3=AD" before "eai") according to sort(1) as well as Postgres > sorting code. > > It's possible I've copied-and-pasted these multibyte characters wrong. > But if I haven't, this says that the strxfrm-based optimization is > unusably broken on a very large fraction of reasonably-modern > installations. Quite aside from casting aspersions on the glibc guys, > how did we fail to notice this in our own testing? Because we don't test every possible libc installations. And even if we did, why should we be able to usefully nail down something that's fundamentally not under our control? (I don't want to assume that that bug is at fault, but it seems like a reasonable speculation, especially based on your "strxfrm-binary" result.) Let's not relitigate the debate about Postgres controlling its own collations right now, though. I think that amcheck will be able to provide reasonable smoke-testing for these kinds of issues once it gets some buildfarm cycles. I intend to write plenty of tests for external sorting to go with amcheck, too; that code currently has no tests whatsoever. amcheck provides a nice way of testing if strxfrm() agrees with strcoll(), without having to "expect" any particular total ordering for a collatable type, which is what a simple pg_regress approach would require. Portable testing of strcoll() + strxfrm() will improve matters. --=20 Peter Geoghegan
Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Mon, Mar 21, 2016 at 10:16 PM, Peter Geoghegan <pg@heroku.com> wrote: > "eai" -> 100c14 01 090909 01 090909 (11 bytes) > "e a=C3=AD" -> 100c14 01 0b0909 01 090909010235 (14 bytes) > "eai" -> 100c14 01 080808 01 020202 (11 bytes) > "e a=C3=AD" -> 100c14 01 080809 01 020202010235 (14 bytes) Sorry, I have that backwards. The latter output is Tom's de_DE.UTF-8 testcase, showing broken glibc behavior. --=20 Peter Geoghegan
Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Mon, Mar 21, 2016 at 9:04 PM, Peter Geoghegan <pg@heroku.com> wrote: > Can you look at generating a textual representation of the strxfrm() > blobs in question, using Robert's tool?: > > http://www.postgresql.org/message-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com I played with this tool myself, on an affected CentOS 6.7 VM: [vagrant@localhost ~]$ ldd --version ldd (GNU libc) 2.12 Copyright (C) 2010 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Written by Roland McGrath and Ulrich Drepper. I now think that we have this backwards: This isn't a bug in glibc's strxfrm(); it's a bug in glibc's strcoll(). Minimal testcase with modified tool, simplified to use ascii-safe strings: [vagrant@localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'x xx' "xxx" -> 2323230108080801020202 (11 bytes) "x xx" -> 2323230108080801020202010235 (14 bytes) strcmp(arg1, arg2) result: -1 strcoll(arg1, arg2) result: 6 If we assume for the sake of argument that this is a strxfrm() bug and strcoll() is a reliable source of truth, then I find it very curious that Germany's Austrian neighbors differ on this point about how text should be collated: [vagrant@localhost ~]$ ./a.out de_AT.UTF-8 'xxx' 'x xx' "xxx" -> 2323230108080801020202 (11 bytes) "x xx" -> 2323230108080801020202010235 (14 bytes) strcmp(arg1, arg2) result: -1 strcoll(arg1, arg2) result: -1 This surely adds doubt to the idea that strxfrm() in particular is broken. I find something else inconsistent with the strxfrm() theory: even the de_DE collation gives strxfrm()/strcoll() self-consistent answers when we move the rhs argument's space to the far side of its center 'x' char: [vagrant@localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'xx x' "xxx" -> 2323230108080801020202 (11 bytes) "xx x" -> 2323230108080801020202010335 (14 bytes) strcmp(arg1, arg2) result: -1 strcoll(arg1, arg2) result: -1 It seems very unlikely that this is because of a legitimate consideration that strcoll() makes about how German should be collated (one that strxfrm() fails to make, say). This is probably a worse situation for affected Postgres systems, though, because now they have no scope to turn the faulty part of the system off. I have a hard time believing that it's a good idea to trust strcoll() to be wrong in a consistent way that has collatable type opclasses at least follow "Notes to Operator Class Implementors". I'd like to hear more opinions on that, though, because it's a tricky thing to reason about. -- Peter Geoghegan
On Tue, Mar 22, 2016 at 5:09 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Mon, Mar 21, 2016 at 9:04 PM, Peter Geoghegan <pg@heroku.com> wrote: >> Can you look at generating a textual representation of the strxfrm() >> blobs in question, using Robert's tool?: >> >> http://www.postgresql.org/message-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com > > I played with this tool myself, on an affected CentOS 6.7 VM: > > [vagrant@localhost ~]$ ldd --version > ldd (GNU libc) 2.12 > Copyright (C) 2010 Free Software Foundation, Inc. > This is free software; see the source for copying conditions. There is NO > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > Written by Roland McGrath and Ulrich Drepper. > > I now think that we have this backwards: This isn't a bug in glibc's > strxfrm(); it's a bug in glibc's strcoll(). Minimal testcase with > modified tool, simplified to use ascii-safe strings: > > [vagrant@localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'x xx' > "xxx" -> 2323230108080801020202 (11 bytes) > "x xx" -> 2323230108080801020202010235 (14 bytes) > strcmp(arg1, arg2) result: -1 > strcoll(arg1, arg2) result: 6 > > If we assume for the sake of argument that this is a strxfrm() bug and > strcoll() is a reliable source of truth, then I find it very curious > that Germany's Austrian neighbors differ on this point about how text > should be collated: > > [vagrant@localhost ~]$ ./a.out de_AT.UTF-8 'xxx' 'x xx' > "xxx" -> 2323230108080801020202 (11 bytes) > "x xx" -> 2323230108080801020202010235 (14 bytes) > strcmp(arg1, arg2) result: -1 > strcoll(arg1, arg2) result: -1 > > This surely adds doubt to the idea that strxfrm() in particular is broken. > > I find something else inconsistent with the strxfrm() theory: even the > de_DE collation gives strxfrm()/strcoll() self-consistent answers when > we move the rhs argument's space to the far side of its center 'x' > char: > > [vagrant@localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'xx x' > "xxx" -> 2323230108080801020202 (11 bytes) > "xx x" -> 2323230108080801020202010335 (14 bytes) > strcmp(arg1, arg2) result: -1 > strcoll(arg1, arg2) result: -1 > > It seems very unlikely that this is because of a legitimate > consideration that strcoll() makes about how German should be collated > (one that strxfrm() fails to make, say). > > This is probably a worse situation for affected Postgres systems, > though, because now they have no scope to turn the faulty part of the > system off. I have a hard time believing that it's a good idea to > trust strcoll() to be wrong in a consistent way that has collatable > type opclasses at least follow "Notes to Operator Class Implementors". > I'd like to hear more opinions on that, though, because it's a tricky > thing to reason about. Well, if we implement a compatibility GUC that shuts off our dependency on strxfrm(), people can go back to having 9.5 be no more broken than 9.4 was. I vote we do that and go home. Behavior-changing GUCs suck, but it seems clear that Tom is not going to sit still for any solution that involves blaming the glibc vendor no matter how well-justified that approach might be; and I don't have a better idea. I was a little worried that it was too much to hope for that all libc vendors on earth would ship a strxfrm() implementation that was actually consistent with strcoll(), and here we are. It's a good thing that operating systems manage to make read() and getpid() several orders of magnitude more reliable than strxfrm() and strcoll(), or we'd probably all be running Windows or VMS or something now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I was a little worried that it was too much to hope for that all libc > vendors on earth would ship a strxfrm() implementation that was actually > consistent with strcoll(), and here we are. Indeed. To try to put some scope on the problem, I made an idiot little program that just generates some random UTF8 strings and sees whether strcoll and strxfrm sort them alike. Attached are that program, a even more idiot little shell script that runs it over all available UTF8 locales, and the results on my RHEL6 box. While de_DE seems to be the worst-broken locale, it's far from the only one. Please try this on as many platforms as you can get hold of ... regards, tom lane #include <stdio.h> #include <stdlib.h> #include <string.h> #include <locale.h> #include <time.h> /* * Test: generate 1000 random UTF8 strings, sort them by strcoll, sanity- * check the sort result, sort them by strxfrm, sanity-check that result, * and compare the two sort orders. */ #define NSTRINGS 1000 #define MAXSTRLEN 20 #define MAXXFRMLEN (MAXSTRLEN * 5) typedef struct { char strval[MAXSTRLEN]; char xfrmval[MAXXFRMLEN]; int strsortpos; int xfrmsortpos; } OneString; /* qsort comparators */ static int strcoll_compare(const void *pa, const void *pb) { const OneString *a = (const OneString *) pa; const OneString *b = (const OneString *) pb; return strcoll(a->strval, b->strval); } static int strxfrm_compare(const void *pa, const void *pb) { const OneString *a = (const OneString *) pa; const OneString *b = (const OneString *) pb; return strcmp(a->xfrmval, b->xfrmval); } /* returns 1 if OK, 0 if inconsistency detected */ static int run_test_case(void) { int ok = 1; OneString data[NSTRINGS]; int i, j; /* Generate random UTF8 strings of length less than MAXSTRLEN bytes */ for (i = 0; i < NSTRINGS; i++) { char *p = data[i].strval; int len; len = 1 + (random() % (MAXSTRLEN - 1)); while (len > 0) { int c; /* Generate random printable char in ISO8859-1 range */ /* Bias towards producing a lot of spaces */ if ((random() % 16) < 3) c = ' '; else { do { c = random() & 0xFF; } while (!((c >= ' ' && c <= 127) || (c >= 0xA0 && c <= 0xFF))); } if (c <= 127) { *p++ = c; len--; } else { if (len < 2) break; /* Poor man's utf8-ification */ *p++ = 0xC0 + (c >> 6); len--; *p++ = 0x80 + (c & 0x3F); len--; } } *p = '\0'; /* strxfrm each string as we produce it */ if (strxfrm(data[i].xfrmval, data[i].strval, MAXXFRMLEN) >= MAXXFRMLEN) { fprintf(stderr, "strxfrm() result for %d-length string exceeded %d bytes\n", (int) strlen(data[i].strval), MAXXFRMLEN); exit(1); } #if 0 printf("%d %s\n", i, data[i].strval); #endif } /* Sort per strcoll(), and label, being careful in case some are equal */ qsort(data, NSTRINGS, sizeof(OneString), strcoll_compare); j = 0; for (i = 0; i < NSTRINGS; i++) { if (i > 0 && strcoll(data[i].strval, data[i-1].strval) != 0) j++; data[i].strsortpos = j; } /* Sanity-check: is each string <= those after it? */ for (i = 0; i < NSTRINGS; i++) { for (j = i + 1; j < NSTRINGS; j++) { if (strcoll(data[i].strval, data[j].strval) > 0) { fprintf(stdout, "strcoll sort inconsistency between positions %d and %d\n", i, j); ok = 0; } } } /* Sort per strxfrm(), and label, being careful in case some are equal */ qsort(data, NSTRINGS, sizeof(OneString), strxfrm_compare); j = 0; for (i = 0; i < NSTRINGS; i++) { if (i > 0 && strcmp(data[i].xfrmval, data[i-1].xfrmval) != 0) j++; data[i].xfrmsortpos = j; } /* Sanity-check: is each string <= those after it? */ for (i = 0; i < NSTRINGS; i++) { for (j = i + 1; j < NSTRINGS; j++) { if (strcmp(data[i].xfrmval, data[j].xfrmval) > 0) { fprintf(stdout, "strxfrm sort inconsistency between positions %d and %d\n", i, j); ok = 0; } } } /* Compare */ for (i = 0; i < NSTRINGS; i++) { if (data[i].strsortpos != data[i].xfrmsortpos) { fprintf(stdout, "inconsistency between strcoll (%d) and strxfrm (%d) orders\n", data[i].strsortpos, data[i].xfrmsortpos); ok = 0; } } return ok; } int main(int argc, char **argv) { const char *lc; int ntries; /* Absorb locale from environment, and report what we're using */ if (setlocale(LC_ALL, "") == NULL) { perror("setlocale(LC_ALL) failed"); exit(1); } lc = setlocale(LC_COLLATE, NULL); if (lc) { printf("Using LC_COLLATE = \"%s\"\n", lc); } else { perror("setlocale(LC_COLLATE) failed"); exit(1); } lc = setlocale(LC_CTYPE, NULL); if (lc) { printf("Using LC_CTYPE = \"%s\"\n", lc); } else { perror("setlocale(LC_CTYPE) failed"); exit(1); } /* Ensure new random() values on every run */ srandom((unsigned int) time(NULL)); /* argv[1] can be the max number of tries to run */ if (argc > 1) ntries = atoi(argv[1]); else ntries = 1; /* Run one test instance per loop */ while (ntries-- > 0) { if (!run_test_case()) exit(1); } return 0; } #! /bin/sh for LANG in `locale -a | grep -i 'utf.*8'` do export LANG if ./strcolltest 10 then echo $LANG good else echo $LANG BAD fi done Using LC_COLLATE = "aa_DJ.utf8" Using LC_CTYPE = "aa_DJ.utf8" aa_DJ.utf8 good Using LC_COLLATE = "aa_ER.utf8" Using LC_CTYPE = "aa_ER.utf8" aa_ER.utf8 good Using LC_COLLATE = "aa_ER.utf8@saaho" Using LC_CTYPE = "aa_ER.utf8@saaho" aa_ER.utf8@saaho good Using LC_COLLATE = "aa_ET.utf8" Using LC_CTYPE = "aa_ET.utf8" aa_ET.utf8 good Using LC_COLLATE = "af_ZA.utf8" Using LC_CTYPE = "af_ZA.utf8" af_ZA.utf8 good Using LC_COLLATE = "am_ET.utf8" Using LC_CTYPE = "am_ET.utf8" am_ET.utf8 good Using LC_COLLATE = "an_ES.utf8" Using LC_CTYPE = "an_ES.utf8" an_ES.utf8 good Using LC_COLLATE = "ar_AE.utf8" Using LC_CTYPE = "ar_AE.utf8" ar_AE.utf8 good Using LC_COLLATE = "ar_BH.utf8" Using LC_CTYPE = "ar_BH.utf8" ar_BH.utf8 good Using LC_COLLATE = "ar_DZ.utf8" Using LC_CTYPE = "ar_DZ.utf8" ar_DZ.utf8 good Using LC_COLLATE = "ar_EG.utf8" Using LC_CTYPE = "ar_EG.utf8" ar_EG.utf8 good Using LC_COLLATE = "ar_IN.utf8" Using LC_CTYPE = "ar_IN.utf8" ar_IN.utf8 good Using LC_COLLATE = "ar_IQ.utf8" Using LC_CTYPE = "ar_IQ.utf8" ar_IQ.utf8 good Using LC_COLLATE = "ar_JO.utf8" Using LC_CTYPE = "ar_JO.utf8" ar_JO.utf8 good Using LC_COLLATE = "ar_KW.utf8" Using LC_CTYPE = "ar_KW.utf8" ar_KW.utf8 good Using LC_COLLATE = "ar_LB.utf8" Using LC_CTYPE = "ar_LB.utf8" ar_LB.utf8 good Using LC_COLLATE = "ar_LY.utf8" Using LC_CTYPE = "ar_LY.utf8" ar_LY.utf8 good Using LC_COLLATE = "ar_MA.utf8" Using LC_CTYPE = "ar_MA.utf8" ar_MA.utf8 good Using LC_COLLATE = "ar_OM.utf8" Using LC_CTYPE = "ar_OM.utf8" ar_OM.utf8 good Using LC_COLLATE = "ar_QA.utf8" Using LC_CTYPE = "ar_QA.utf8" ar_QA.utf8 good Using LC_COLLATE = "ar_SA.utf8" Using LC_CTYPE = "ar_SA.utf8" ar_SA.utf8 good Using LC_COLLATE = "ar_SD.utf8" Using LC_CTYPE = "ar_SD.utf8" ar_SD.utf8 good Using LC_COLLATE = "ar_SY.utf8" Using LC_CTYPE = "ar_SY.utf8" ar_SY.utf8 good Using LC_COLLATE = "ar_TN.utf8" Using LC_CTYPE = "ar_TN.utf8" ar_TN.utf8 good Using LC_COLLATE = "ar_YE.utf8" Using LC_CTYPE = "ar_YE.utf8" ar_YE.utf8 good Using LC_COLLATE = "as_IN.utf8" Using LC_CTYPE = "as_IN.utf8" as_IN.utf8 good Using LC_COLLATE = "ast_ES.utf8" Using LC_CTYPE = "ast_ES.utf8" ast_ES.utf8 good Using LC_COLLATE = "az_AZ.utf8" Using LC_CTYPE = "az_AZ.utf8" inconsistency between strcoll (718) and strxfrm (717) orders inconsistency between strcoll (717) and strxfrm (718) orders az_AZ.utf8 BAD Using LC_COLLATE = "be_BY.utf8" Using LC_CTYPE = "be_BY.utf8" be_BY.utf8 good Using LC_COLLATE = "be_BY.utf8@latin" Using LC_CTYPE = "be_BY.utf8@latin" be_BY.utf8@latin good Using LC_COLLATE = "ber_DZ.utf8" Using LC_CTYPE = "ber_DZ.utf8" ber_DZ.utf8 good Using LC_COLLATE = "ber_MA.utf8" Using LC_CTYPE = "ber_MA.utf8" ber_MA.utf8 good Using LC_COLLATE = "bg_BG.utf8" Using LC_CTYPE = "bg_BG.utf8" bg_BG.utf8 good Using LC_COLLATE = "bn_BD.utf8" Using LC_CTYPE = "bn_BD.utf8" bn_BD.utf8 good Using LC_COLLATE = "bn_IN.utf8" Using LC_CTYPE = "bn_IN.utf8" bn_IN.utf8 good Using LC_COLLATE = "bo_CN.utf8" Using LC_CTYPE = "bo_CN.utf8" bo_CN.utf8 good Using LC_COLLATE = "bo_IN.utf8" Using LC_CTYPE = "bo_IN.utf8" bo_IN.utf8 good Using LC_COLLATE = "br_FR.utf8" Using LC_CTYPE = "br_FR.utf8" br_FR.utf8 good Using LC_COLLATE = "bs_BA.utf8" Using LC_CTYPE = "bs_BA.utf8" bs_BA.utf8 good Using LC_COLLATE = "byn_ER.utf8" Using LC_CTYPE = "byn_ER.utf8" byn_ER.utf8 good Using LC_COLLATE = "ca_AD.utf8" Using LC_CTYPE = "ca_AD.utf8" ca_AD.utf8 good Using LC_COLLATE = "ca_ES.utf8" Using LC_CTYPE = "ca_ES.utf8" ca_ES.utf8 good Using LC_COLLATE = "ca_FR.utf8" Using LC_CTYPE = "ca_FR.utf8" ca_FR.utf8 good Using LC_COLLATE = "ca_IT.utf8" Using LC_CTYPE = "ca_IT.utf8" ca_IT.utf8 good Using LC_COLLATE = "crh_UA.utf8" Using LC_CTYPE = "crh_UA.utf8" inconsistency between strcoll (264) and strxfrm (263) orders inconsistency between strcoll (265) and strxfrm (264) orders inconsistency between strcoll (263) and strxfrm (265) orders inconsistency between strcoll (427) and strxfrm (426) orders inconsistency between strcoll (426) and strxfrm (427) orders crh_UA.utf8 BAD Using LC_COLLATE = "cs_CZ.utf8" Using LC_CTYPE = "cs_CZ.utf8" cs_CZ.utf8 good Using LC_COLLATE = "csb_PL.utf8" Using LC_CTYPE = "csb_PL.utf8" csb_PL.utf8 good Using LC_COLLATE = "cv_RU.utf8" Using LC_CTYPE = "cv_RU.utf8" cv_RU.utf8 good Using LC_COLLATE = "cy_GB.utf8" Using LC_CTYPE = "cy_GB.utf8" cy_GB.utf8 good Using LC_COLLATE = "da_DK.utf8" Using LC_CTYPE = "da_DK.utf8" inconsistency between strcoll (876) and strxfrm (875) orders inconsistency between strcoll (877) and strxfrm (876) orders inconsistency between strcoll (875) and strxfrm (877) orders inconsistency between strcoll (902) and strxfrm (901) orders inconsistency between strcoll (901) and strxfrm (902) orders da_DK.utf8 BAD Using LC_COLLATE = "de_AT.utf8" Using LC_CTYPE = "de_AT.utf8" de_AT.utf8 good Using LC_COLLATE = "de_BE.utf8" Using LC_CTYPE = "de_BE.utf8" de_BE.utf8 good Using LC_COLLATE = "de_CH.utf8" Using LC_CTYPE = "de_CH.utf8" de_CH.utf8 good Using LC_COLLATE = "de_DE.utf8" Using LC_CTYPE = "de_DE.utf8" inconsistency between strcoll (69) and strxfrm (68) orders inconsistency between strcoll (68) and strxfrm (69) orders inconsistency between strcoll (129) and strxfrm (127) orders inconsistency between strcoll (127) and strxfrm (128) orders inconsistency between strcoll (128) and strxfrm (129) orders inconsistency between strcoll (188) and strxfrm (187) orders inconsistency between strcoll (187) and strxfrm (188) orders inconsistency between strcoll (258) and strxfrm (257) orders inconsistency between strcoll (257) and strxfrm (258) orders inconsistency between strcoll (260) and strxfrm (259) orders inconsistency between strcoll (261) and strxfrm (260) orders inconsistency between strcoll (259) and strxfrm (261) orders inconsistency between strcoll (284) and strxfrm (283) orders inconsistency between strcoll (283) and strxfrm (284) orders inconsistency between strcoll (312) and strxfrm (311) orders inconsistency between strcoll (311) and strxfrm (312) orders inconsistency between strcoll (316) and strxfrm (315) orders inconsistency between strcoll (315) and strxfrm (316) orders inconsistency between strcoll (361) and strxfrm (360) orders inconsistency between strcoll (360) and strxfrm (361) orders inconsistency between strcoll (385) and strxfrm (383) orders inconsistency between strcoll (383) and strxfrm (384) orders inconsistency between strcoll (384) and strxfrm (385) orders inconsistency between strcoll (410) and strxfrm (408) orders inconsistency between strcoll (408) and strxfrm (409) orders inconsistency between strcoll (409) and strxfrm (410) orders inconsistency between strcoll (428) and strxfrm (426) orders inconsistency between strcoll (426) and strxfrm (427) orders inconsistency between strcoll (429) and strxfrm (428) orders inconsistency between strcoll (427) and strxfrm (429) orders inconsistency between strcoll (431) and strxfrm (430) orders inconsistency between strcoll (430) and strxfrm (431) orders inconsistency between strcoll (528) and strxfrm (527) orders inconsistency between strcoll (529) and strxfrm (528) orders inconsistency between strcoll (527) and strxfrm (529) orders inconsistency between strcoll (542) and strxfrm (541) orders inconsistency between strcoll (541) and strxfrm (542) orders inconsistency between strcoll (552) and strxfrm (551) orders inconsistency between strcoll (551) and strxfrm (552) orders inconsistency between strcoll (586) and strxfrm (583) orders inconsistency between strcoll (587) and strxfrm (584) orders inconsistency between strcoll (583) and strxfrm (585) orders inconsistency between strcoll (584) and strxfrm (586) orders inconsistency between strcoll (585) and strxfrm (587) orders inconsistency between strcoll (596) and strxfrm (595) orders inconsistency between strcoll (595) and strxfrm (596) orders inconsistency between strcoll (921) and strxfrm (920) orders inconsistency between strcoll (920) and strxfrm (921) orders de_DE.utf8 BAD Using LC_COLLATE = "de_LU.utf8" Using LC_CTYPE = "de_LU.utf8" de_LU.utf8 good Using LC_COLLATE = "dv_MV.utf8" Using LC_CTYPE = "dv_MV.utf8" dv_MV.utf8 good Using LC_COLLATE = "dz_BT.utf8" Using LC_CTYPE = "dz_BT.utf8" dz_BT.utf8 good Using LC_COLLATE = "el_CY.utf8" Using LC_CTYPE = "el_CY.utf8" el_CY.utf8 good Using LC_COLLATE = "el_GR.utf8" Using LC_CTYPE = "el_GR.utf8" el_GR.utf8 good Using LC_COLLATE = "en_AG.utf8" Using LC_CTYPE = "en_AG.utf8" en_AG.utf8 good Using LC_COLLATE = "en_AU.utf8" Using LC_CTYPE = "en_AU.utf8" en_AU.utf8 good Using LC_COLLATE = "en_BW.utf8" Using LC_CTYPE = "en_BW.utf8" en_BW.utf8 good Using LC_COLLATE = "en_CA.utf8" Using LC_CTYPE = "en_CA.utf8" en_CA.utf8 good Using LC_COLLATE = "en_DK.utf8" Using LC_CTYPE = "en_DK.utf8" en_DK.utf8 good Using LC_COLLATE = "en_GB.utf8" Using LC_CTYPE = "en_GB.utf8" en_GB.utf8 good Using LC_COLLATE = "en_HK.utf8" Using LC_CTYPE = "en_HK.utf8" en_HK.utf8 good Using LC_COLLATE = "en_IE.utf8" Using LC_CTYPE = "en_IE.utf8" en_IE.utf8 good Using LC_COLLATE = "en_IN.utf8" Using LC_CTYPE = "en_IN.utf8" en_IN.utf8 good Using LC_COLLATE = "en_NG.utf8" Using LC_CTYPE = "en_NG.utf8" en_NG.utf8 good Using LC_COLLATE = "en_NZ.utf8" Using LC_CTYPE = "en_NZ.utf8" en_NZ.utf8 good Using LC_COLLATE = "en_PH.utf8" Using LC_CTYPE = "en_PH.utf8" en_PH.utf8 good Using LC_COLLATE = "en_SG.utf8" Using LC_CTYPE = "en_SG.utf8" en_SG.utf8 good Using LC_COLLATE = "en_US.utf8" Using LC_CTYPE = "en_US.utf8" en_US.utf8 good Using LC_COLLATE = "en_ZA.utf8" Using LC_CTYPE = "en_ZA.utf8" en_ZA.utf8 good Using LC_COLLATE = "en_ZW.utf8" Using LC_CTYPE = "en_ZW.utf8" en_ZW.utf8 good Using LC_COLLATE = "es_AR.utf8" Using LC_CTYPE = "es_AR.utf8" es_AR.utf8 good Using LC_COLLATE = "es_BO.utf8" Using LC_CTYPE = "es_BO.utf8" es_BO.utf8 good Using LC_COLLATE = "es_CL.utf8" Using LC_CTYPE = "es_CL.utf8" es_CL.utf8 good Using LC_COLLATE = "es_CO.utf8" Using LC_CTYPE = "es_CO.utf8" es_CO.utf8 good Using LC_COLLATE = "es_CR.utf8" Using LC_CTYPE = "es_CR.utf8" es_CR.utf8 good Using LC_COLLATE = "es_DO.utf8" Using LC_CTYPE = "es_DO.utf8" es_DO.utf8 good Using LC_COLLATE = "es_EC.utf8" Using LC_CTYPE = "es_EC.utf8" es_EC.utf8 good Using LC_COLLATE = "es_ES.utf8" Using LC_CTYPE = "es_ES.utf8" es_ES.utf8 good Using LC_COLLATE = "es_GT.utf8" Using LC_CTYPE = "es_GT.utf8" es_GT.utf8 good Using LC_COLLATE = "es_HN.utf8" Using LC_CTYPE = "es_HN.utf8" es_HN.utf8 good Using LC_COLLATE = "es_MX.utf8" Using LC_CTYPE = "es_MX.utf8" es_MX.utf8 good Using LC_COLLATE = "es_NI.utf8" Using LC_CTYPE = "es_NI.utf8" es_NI.utf8 good Using LC_COLLATE = "es_PA.utf8" Using LC_CTYPE = "es_PA.utf8" es_PA.utf8 good Using LC_COLLATE = "es_PE.utf8" Using LC_CTYPE = "es_PE.utf8" es_PE.utf8 good Using LC_COLLATE = "es_PR.utf8" Using LC_CTYPE = "es_PR.utf8" es_PR.utf8 good Using LC_COLLATE = "es_PY.utf8" Using LC_CTYPE = "es_PY.utf8" es_PY.utf8 good Using LC_COLLATE = "es_SV.utf8" Using LC_CTYPE = "es_SV.utf8" es_SV.utf8 good Using LC_COLLATE = "es_US.utf8" Using LC_CTYPE = "es_US.utf8" inconsistency between strcoll (605) and strxfrm (603) orders inconsistency between strcoll (603) and strxfrm (604) orders inconsistency between strcoll (604) and strxfrm (605) orders es_US.utf8 BAD Using LC_COLLATE = "es_UY.utf8" Using LC_CTYPE = "es_UY.utf8" es_UY.utf8 good Using LC_COLLATE = "es_VE.utf8" Using LC_CTYPE = "es_VE.utf8" es_VE.utf8 good Using LC_COLLATE = "et_EE.utf8" Using LC_CTYPE = "et_EE.utf8" et_EE.utf8 good Using LC_COLLATE = "eu_ES.utf8" Using LC_CTYPE = "eu_ES.utf8" eu_ES.utf8 good Using LC_COLLATE = "fa_IR.utf8" Using LC_CTYPE = "fa_IR.utf8" fa_IR.utf8 good Using LC_COLLATE = "fi_FI.utf8" Using LC_CTYPE = "fi_FI.utf8" inconsistency between strcoll (699) and strxfrm (697) orders inconsistency between strcoll (697) and strxfrm (698) orders inconsistency between strcoll (698) and strxfrm (699) orders inconsistency between strcoll (883) and strxfrm (881) orders inconsistency between strcoll (881) and strxfrm (882) orders inconsistency between strcoll (882) and strxfrm (883) orders fi_FI.utf8 BAD Using LC_COLLATE = "fil_PH.utf8" Using LC_CTYPE = "fil_PH.utf8" inconsistency between strcoll (605) and strxfrm (603) orders inconsistency between strcoll (603) and strxfrm (604) orders inconsistency between strcoll (604) and strxfrm (605) orders fil_PH.utf8 BAD Using LC_COLLATE = "fo_FO.utf8" Using LC_CTYPE = "fo_FO.utf8" inconsistency between strcoll (892) and strxfrm (891) orders inconsistency between strcoll (891) and strxfrm (892) orders inconsistency between strcoll (945) and strxfrm (944) orders inconsistency between strcoll (944) and strxfrm (945) orders fo_FO.utf8 BAD Using LC_COLLATE = "fr_BE.utf8" Using LC_CTYPE = "fr_BE.utf8" fr_BE.utf8 good Using LC_COLLATE = "fr_CA.utf8" Using LC_CTYPE = "fr_CA.utf8" inconsistency between strcoll (220) and strxfrm (219) orders inconsistency between strcoll (219) and strxfrm (220) orders fr_CA.utf8 BAD Using LC_COLLATE = "fr_CH.utf8" Using LC_CTYPE = "fr_CH.utf8" fr_CH.utf8 good Using LC_COLLATE = "fr_FR.utf8" Using LC_CTYPE = "fr_FR.utf8" fr_FR.utf8 good Using LC_COLLATE = "fr_LU.utf8" Using LC_CTYPE = "fr_LU.utf8" fr_LU.utf8 good Using LC_COLLATE = "fur_IT.utf8" Using LC_CTYPE = "fur_IT.utf8" fur_IT.utf8 good Using LC_COLLATE = "fy_DE.utf8" Using LC_CTYPE = "fy_DE.utf8" fy_DE.utf8 good Using LC_COLLATE = "fy_NL.utf8" Using LC_CTYPE = "fy_NL.utf8" fy_NL.utf8 good Using LC_COLLATE = "ga_IE.utf8" Using LC_CTYPE = "ga_IE.utf8" ga_IE.utf8 good Using LC_COLLATE = "gd_GB.utf8" Using LC_CTYPE = "gd_GB.utf8" gd_GB.utf8 good Using LC_COLLATE = "gez_ER.utf8" Using LC_CTYPE = "gez_ER.utf8" gez_ER.utf8 good Using LC_COLLATE = "gez_ER.utf8@abegede" Using LC_CTYPE = "gez_ER.utf8@abegede" gez_ER.utf8@abegede good Using LC_COLLATE = "gez_ET.utf8" Using LC_CTYPE = "gez_ET.utf8" gez_ET.utf8 good Using LC_COLLATE = "gez_ET.utf8@abegede" Using LC_CTYPE = "gez_ET.utf8@abegede" gez_ET.utf8@abegede good Using LC_COLLATE = "gl_ES.utf8" Using LC_CTYPE = "gl_ES.utf8" gl_ES.utf8 good Using LC_COLLATE = "gu_IN.utf8" Using LC_CTYPE = "gu_IN.utf8" gu_IN.utf8 good Using LC_COLLATE = "gv_GB.utf8" Using LC_CTYPE = "gv_GB.utf8" gv_GB.utf8 good Using LC_COLLATE = "ha_NG.utf8" Using LC_CTYPE = "ha_NG.utf8" ha_NG.utf8 good Using LC_COLLATE = "he_IL.utf8" Using LC_CTYPE = "he_IL.utf8" he_IL.utf8 good Using LC_COLLATE = "hi_IN.utf8" Using LC_CTYPE = "hi_IN.utf8" hi_IN.utf8 good Using LC_COLLATE = "hne_IN.utf8" Using LC_CTYPE = "hne_IN.utf8" hne_IN.utf8 good Using LC_COLLATE = "hr_HR.utf8" Using LC_CTYPE = "hr_HR.utf8" hr_HR.utf8 good Using LC_COLLATE = "hsb_DE.utf8" Using LC_CTYPE = "hsb_DE.utf8" hsb_DE.utf8 good Using LC_COLLATE = "ht_HT.utf8" Using LC_CTYPE = "ht_HT.utf8" ht_HT.utf8 good Using LC_COLLATE = "hu_HU.utf8" Using LC_CTYPE = "hu_HU.utf8" hu_HU.utf8 good Using LC_COLLATE = "hy_AM.utf8" Using LC_CTYPE = "hy_AM.utf8" hy_AM.utf8 good Using LC_COLLATE = "id_ID.utf8" Using LC_CTYPE = "id_ID.utf8" id_ID.utf8 good Using LC_COLLATE = "ig_NG.utf8" Using LC_CTYPE = "ig_NG.utf8" inconsistency between strcoll (165) and strxfrm (164) orders inconsistency between strcoll (164) and strxfrm (165) orders inconsistency between strcoll (453) and strxfrm (452) orders inconsistency between strcoll (452) and strxfrm (453) orders inconsistency between strcoll (786) and strxfrm (785) orders inconsistency between strcoll (785) and strxfrm (786) orders ig_NG.utf8 BAD Using LC_COLLATE = "ik_CA.utf8" Using LC_CTYPE = "ik_CA.utf8" ik_CA.utf8 good Using LC_COLLATE = "is_IS.utf8" Using LC_CTYPE = "is_IS.utf8" is_IS.utf8 good Using LC_COLLATE = "it_CH.utf8" Using LC_CTYPE = "it_CH.utf8" it_CH.utf8 good Using LC_COLLATE = "it_IT.utf8" Using LC_CTYPE = "it_IT.utf8" it_IT.utf8 good Using LC_COLLATE = "iu_CA.utf8" Using LC_CTYPE = "iu_CA.utf8" iu_CA.utf8 good Using LC_COLLATE = "iw_IL.utf8" Using LC_CTYPE = "iw_IL.utf8" iw_IL.utf8 good Using LC_COLLATE = "ja_JP.utf8" Using LC_CTYPE = "ja_JP.utf8" ja_JP.utf8 good Using LC_COLLATE = "ka_GE.utf8" Using LC_CTYPE = "ka_GE.utf8" ka_GE.utf8 good Using LC_COLLATE = "kk_KZ.utf8" Using LC_CTYPE = "kk_KZ.utf8" kk_KZ.utf8 good Using LC_COLLATE = "kl_GL.utf8" Using LC_CTYPE = "kl_GL.utf8" inconsistency between strcoll (704) and strxfrm (703) orders inconsistency between strcoll (703) and strxfrm (704) orders inconsistency between strcoll (871) and strxfrm (870) orders inconsistency between strcoll (870) and strxfrm (871) orders inconsistency between strcoll (870) and strxfrm (871) orders inconsistency between strcoll (885) and strxfrm (884) orders inconsistency between strcoll (884) and strxfrm (885) orders inconsistency between strcoll (927) and strxfrm (926) orders inconsistency between strcoll (928) and strxfrm (927) orders inconsistency between strcoll (926) and strxfrm (928) orders kl_GL.utf8 BAD Using LC_COLLATE = "km_KH.utf8" Using LC_CTYPE = "km_KH.utf8" km_KH.utf8 good Using LC_COLLATE = "kn_IN.utf8" Using LC_CTYPE = "kn_IN.utf8" kn_IN.utf8 good Using LC_COLLATE = "ko_KR.utf8" Using LC_CTYPE = "ko_KR.utf8" ko_KR.utf8 good Using LC_COLLATE = "kok_IN.utf8" Using LC_CTYPE = "kok_IN.utf8" kok_IN.utf8 good Using LC_COLLATE = "ks_IN.utf8" Using LC_CTYPE = "ks_IN.utf8" ks_IN.utf8 good Using LC_COLLATE = "ks_IN.utf8@devanagari" Using LC_CTYPE = "ks_IN.utf8@devanagari" ks_IN.utf8@devanagari good Using LC_COLLATE = "ku_TR.utf8" Using LC_CTYPE = "ku_TR.utf8" inconsistency between strcoll (505) and strxfrm (504) orders inconsistency between strcoll (506) and strxfrm (505) orders inconsistency between strcoll (504) and strxfrm (506) orders ku_TR.utf8 BAD Using LC_COLLATE = "kw_GB.utf8" Using LC_CTYPE = "kw_GB.utf8" kw_GB.utf8 good Using LC_COLLATE = "ky_KG.utf8" Using LC_CTYPE = "ky_KG.utf8" ky_KG.utf8 good Using LC_COLLATE = "lg_UG.utf8" Using LC_CTYPE = "lg_UG.utf8" lg_UG.utf8 good Using LC_COLLATE = "li_BE.utf8" Using LC_CTYPE = "li_BE.utf8" li_BE.utf8 good Using LC_COLLATE = "li_NL.utf8" Using LC_CTYPE = "li_NL.utf8" li_NL.utf8 good Using LC_COLLATE = "lo_LA.utf8" Using LC_CTYPE = "lo_LA.utf8" lo_LA.utf8 good Using LC_COLLATE = "lt_LT.utf8" Using LC_CTYPE = "lt_LT.utf8" lt_LT.utf8 good Using LC_COLLATE = "lv_LV.utf8" Using LC_CTYPE = "lv_LV.utf8" lv_LV.utf8 good Using LC_COLLATE = "mai_IN.utf8" Using LC_CTYPE = "mai_IN.utf8" mai_IN.utf8 good Using LC_COLLATE = "mg_MG.utf8" Using LC_CTYPE = "mg_MG.utf8" mg_MG.utf8 good Using LC_COLLATE = "mi_NZ.utf8" Using LC_CTYPE = "mi_NZ.utf8" mi_NZ.utf8 good Using LC_COLLATE = "mk_MK.utf8" Using LC_CTYPE = "mk_MK.utf8" mk_MK.utf8 good Using LC_COLLATE = "ml_IN.utf8" Using LC_CTYPE = "ml_IN.utf8" ml_IN.utf8 good Using LC_COLLATE = "mn_MN.utf8" Using LC_CTYPE = "mn_MN.utf8" mn_MN.utf8 good Using LC_COLLATE = "mr_IN.utf8" Using LC_CTYPE = "mr_IN.utf8" mr_IN.utf8 good Using LC_COLLATE = "ms_MY.utf8" Using LC_CTYPE = "ms_MY.utf8" ms_MY.utf8 good Using LC_COLLATE = "mt_MT.utf8" Using LC_CTYPE = "mt_MT.utf8" mt_MT.utf8 good Using LC_COLLATE = "my_MM.utf8" Using LC_CTYPE = "my_MM.utf8" my_MM.utf8 good Using LC_COLLATE = "nan_TW.utf8@latin" Using LC_CTYPE = "nan_TW.utf8@latin" nan_TW.utf8@latin good Using LC_COLLATE = "nb_NO.utf8" Using LC_CTYPE = "nb_NO.utf8" inconsistency between strcoll (295) and strxfrm (294) orders inconsistency between strcoll (294) and strxfrm (295) orders nb_NO.utf8 BAD Using LC_COLLATE = "nds_DE.utf8" Using LC_CTYPE = "nds_DE.utf8" nds_DE.utf8 good Using LC_COLLATE = "nds_NL.utf8" Using LC_CTYPE = "nds_NL.utf8" nds_NL.utf8 good Using LC_COLLATE = "ne_NP.utf8" Using LC_CTYPE = "ne_NP.utf8" ne_NP.utf8 good Using LC_COLLATE = "nl_AW.utf8" Using LC_CTYPE = "nl_AW.utf8" nl_AW.utf8 good Using LC_COLLATE = "nl_BE.utf8" Using LC_CTYPE = "nl_BE.utf8" nl_BE.utf8 good Using LC_COLLATE = "nl_NL.utf8" Using LC_CTYPE = "nl_NL.utf8" nl_NL.utf8 good Using LC_COLLATE = "nn_NO.utf8" Using LC_CTYPE = "nn_NO.utf8" inconsistency between strcoll (295) and strxfrm (294) orders inconsistency between strcoll (294) and strxfrm (295) orders nn_NO.utf8 BAD Using LC_COLLATE = "no_NO.utf8" Using LC_CTYPE = "no_NO.utf8" inconsistency between strcoll (295) and strxfrm (294) orders inconsistency between strcoll (294) and strxfrm (295) orders no_NO.utf8 BAD Using LC_COLLATE = "nr_ZA.utf8" Using LC_CTYPE = "nr_ZA.utf8" nr_ZA.utf8 good Using LC_COLLATE = "nso_ZA.utf8" Using LC_CTYPE = "nso_ZA.utf8" nso_ZA.utf8 good Using LC_COLLATE = "oc_FR.utf8" Using LC_CTYPE = "oc_FR.utf8" oc_FR.utf8 good Using LC_COLLATE = "om_ET.utf8" Using LC_CTYPE = "om_ET.utf8" om_ET.utf8 good Using LC_COLLATE = "om_KE.utf8" Using LC_CTYPE = "om_KE.utf8" om_KE.utf8 good Using LC_COLLATE = "or_IN.utf8" Using LC_CTYPE = "or_IN.utf8" or_IN.utf8 good Using LC_COLLATE = "pa_IN.utf8" Using LC_CTYPE = "pa_IN.utf8" pa_IN.utf8 good Using LC_COLLATE = "pa_PK.utf8" Using LC_CTYPE = "pa_PK.utf8" pa_PK.utf8 good Using LC_COLLATE = "pap_AN.utf8" Using LC_CTYPE = "pap_AN.utf8" pap_AN.utf8 good Using LC_COLLATE = "pl_PL.utf8" Using LC_CTYPE = "pl_PL.utf8" pl_PL.utf8 good Using LC_COLLATE = "ps_AF.utf8" Using LC_CTYPE = "ps_AF.utf8" ps_AF.utf8 good Using LC_COLLATE = "pt_BR.utf8" Using LC_CTYPE = "pt_BR.utf8" pt_BR.utf8 good Using LC_COLLATE = "pt_PT.utf8" Using LC_CTYPE = "pt_PT.utf8" pt_PT.utf8 good Using LC_COLLATE = "ro_RO.utf8" Using LC_CTYPE = "ro_RO.utf8" inconsistency between strcoll (502) and strxfrm (501) orders inconsistency between strcoll (503) and strxfrm (502) orders inconsistency between strcoll (501) and strxfrm (503) orders ro_RO.utf8 BAD Using LC_COLLATE = "ru_RU.utf8" Using LC_CTYPE = "ru_RU.utf8" ru_RU.utf8 good Using LC_COLLATE = "ru_UA.utf8" Using LC_CTYPE = "ru_UA.utf8" ru_UA.utf8 good Using LC_COLLATE = "rw_RW.utf8" Using LC_CTYPE = "rw_RW.utf8" rw_RW.utf8 good Using LC_COLLATE = "sa_IN.utf8" Using LC_CTYPE = "sa_IN.utf8" sa_IN.utf8 good Using LC_COLLATE = "sc_IT.utf8" Using LC_CTYPE = "sc_IT.utf8" sc_IT.utf8 good Using LC_COLLATE = "sd_IN.utf8" Using LC_CTYPE = "sd_IN.utf8" sd_IN.utf8 good Using LC_COLLATE = "sd_IN.utf8@devanagari" Using LC_CTYPE = "sd_IN.utf8@devanagari" sd_IN.utf8@devanagari good Using LC_COLLATE = "se_NO.utf8" Using LC_CTYPE = "se_NO.utf8" inconsistency between strcoll (196) and strxfrm (194) orders inconsistency between strcoll (197) and strxfrm (195) orders inconsistency between strcoll (194) and strxfrm (196) orders inconsistency between strcoll (195) and strxfrm (197) orders inconsistency between strcoll (894) and strxfrm (892) orders inconsistency between strcoll (892) and strxfrm (893) orders inconsistency between strcoll (892) and strxfrm (893) orders inconsistency between strcoll (893) and strxfrm (894) orders inconsistency between strcoll (911) and strxfrm (909) orders inconsistency between strcoll (909) and strxfrm (910) orders inconsistency between strcoll (910) and strxfrm (911) orders inconsistency between strcoll (934) and strxfrm (933) orders inconsistency between strcoll (933) and strxfrm (934) orders se_NO.utf8 BAD Using LC_COLLATE = "shs_CA.utf8" Using LC_CTYPE = "shs_CA.utf8" inconsistency between strcoll (944) and strxfrm (942) orders inconsistency between strcoll (942) and strxfrm (943) orders inconsistency between strcoll (942) and strxfrm (943) orders inconsistency between strcoll (943) and strxfrm (944) orders shs_CA.utf8 BAD Using LC_COLLATE = "si_LK.utf8" Using LC_CTYPE = "si_LK.utf8" si_LK.utf8 good Using LC_COLLATE = "sid_ET.utf8" Using LC_CTYPE = "sid_ET.utf8" sid_ET.utf8 good Using LC_COLLATE = "sk_SK.utf8" Using LC_CTYPE = "sk_SK.utf8" sk_SK.utf8 good Using LC_COLLATE = "sl_SI.utf8" Using LC_CTYPE = "sl_SI.utf8" sl_SI.utf8 good Using LC_COLLATE = "so_DJ.utf8" Using LC_CTYPE = "so_DJ.utf8" so_DJ.utf8 good Using LC_COLLATE = "so_ET.utf8" Using LC_CTYPE = "so_ET.utf8" so_ET.utf8 good Using LC_COLLATE = "so_KE.utf8" Using LC_CTYPE = "so_KE.utf8" so_KE.utf8 good Using LC_COLLATE = "so_SO.utf8" Using LC_CTYPE = "so_SO.utf8" so_SO.utf8 good Using LC_COLLATE = "sq_AL.utf8" Using LC_CTYPE = "sq_AL.utf8" inconsistency between strcoll (286) and strxfrm (285) orders inconsistency between strcoll (285) and strxfrm (286) orders sq_AL.utf8 BAD Using LC_COLLATE = "sq_MK.utf8" Using LC_CTYPE = "sq_MK.utf8" inconsistency between strcoll (286) and strxfrm (285) orders inconsistency between strcoll (285) and strxfrm (286) orders sq_MK.utf8 BAD Using LC_COLLATE = "sr_ME.utf8" Using LC_CTYPE = "sr_ME.utf8" sr_ME.utf8 good Using LC_COLLATE = "sr_RS.utf8" Using LC_CTYPE = "sr_RS.utf8" sr_RS.utf8 good Using LC_COLLATE = "sr_RS.utf8@latin" Using LC_CTYPE = "sr_RS.utf8@latin" sr_RS.utf8@latin good Using LC_COLLATE = "ss_ZA.utf8" Using LC_CTYPE = "ss_ZA.utf8" ss_ZA.utf8 good Using LC_COLLATE = "st_ZA.utf8" Using LC_CTYPE = "st_ZA.utf8" st_ZA.utf8 good Using LC_COLLATE = "sv_FI.utf8" Using LC_CTYPE = "sv_FI.utf8" inconsistency between strcoll (898) and strxfrm (897) orders inconsistency between strcoll (897) and strxfrm (898) orders sv_FI.utf8 BAD Using LC_COLLATE = "sv_SE.utf8" Using LC_CTYPE = "sv_SE.utf8" inconsistency between strcoll (788) and strxfrm (785) orders inconsistency between strcoll (785) and strxfrm (786) orders inconsistency between strcoll (786) and strxfrm (787) orders inconsistency between strcoll (787) and strxfrm (788) orders inconsistency between strcoll (837) and strxfrm (836) orders inconsistency between strcoll (836) and strxfrm (837) orders inconsistency between strcoll (903) and strxfrm (902) orders inconsistency between strcoll (902) and strxfrm (903) orders sv_SE.utf8 BAD Using LC_COLLATE = "ta_IN.utf8" Using LC_CTYPE = "ta_IN.utf8" ta_IN.utf8 good Using LC_COLLATE = "te_IN.utf8" Using LC_CTYPE = "te_IN.utf8" te_IN.utf8 good Using LC_COLLATE = "tg_TJ.utf8" Using LC_CTYPE = "tg_TJ.utf8" tg_TJ.utf8 good Using LC_COLLATE = "th_TH.utf8" Using LC_CTYPE = "th_TH.utf8" th_TH.utf8 good Using LC_COLLATE = "ti_ER.utf8" Using LC_CTYPE = "ti_ER.utf8" ti_ER.utf8 good Using LC_COLLATE = "ti_ET.utf8" Using LC_CTYPE = "ti_ET.utf8" ti_ET.utf8 good Using LC_COLLATE = "tig_ER.utf8" Using LC_CTYPE = "tig_ER.utf8" tig_ER.utf8 good Using LC_COLLATE = "tk_TM.utf8" Using LC_CTYPE = "tk_TM.utf8" inconsistency between strcoll (383) and strxfrm (382) orders inconsistency between strcoll (384) and strxfrm (383) orders inconsistency between strcoll (382) and strxfrm (384) orders inconsistency between strcoll (700) and strxfrm (699) orders inconsistency between strcoll (699) and strxfrm (700) orders inconsistency between strcoll (858) and strxfrm (857) orders inconsistency between strcoll (857) and strxfrm (858) orders tk_TM.utf8 BAD Using LC_COLLATE = "tl_PH.utf8" Using LC_CTYPE = "tl_PH.utf8" tl_PH.utf8 good Using LC_COLLATE = "tn_ZA.utf8" Using LC_CTYPE = "tn_ZA.utf8" tn_ZA.utf8 good Using LC_COLLATE = "tr_CY.utf8" Using LC_CTYPE = "tr_CY.utf8" tr_CY.utf8 good Using LC_COLLATE = "tr_TR.utf8" Using LC_CTYPE = "tr_TR.utf8" tr_TR.utf8 good Using LC_COLLATE = "ts_ZA.utf8" Using LC_CTYPE = "ts_ZA.utf8" ts_ZA.utf8 good Using LC_COLLATE = "tt_RU.utf8" Using LC_CTYPE = "tt_RU.utf8" inconsistency between strcoll (248) and strxfrm (247) orders inconsistency between strcoll (249) and strxfrm (248) orders inconsistency between strcoll (247) and strxfrm (249) orders inconsistency between strcoll (431) and strxfrm (430) orders inconsistency between strcoll (432) and strxfrm (431) orders inconsistency between strcoll (430) and strxfrm (432) orders inconsistency between strcoll (714) and strxfrm (713) orders inconsistency between strcoll (713) and strxfrm (714) orders tt_RU.utf8 BAD Using LC_COLLATE = "tt_RU.utf8@iqtelif" Using LC_CTYPE = "tt_RU.utf8@iqtelif" inconsistency between strcoll (431) and strxfrm (430) orders inconsistency between strcoll (432) and strxfrm (431) orders inconsistency between strcoll (430) and strxfrm (432) orders inconsistency between strcoll (700) and strxfrm (699) orders inconsistency between strcoll (699) and strxfrm (700) orders tt_RU.utf8@iqtelif BAD Using LC_COLLATE = "ug_CN.utf8" Using LC_CTYPE = "ug_CN.utf8" inconsistency between strcoll (248) and strxfrm (247) orders inconsistency between strcoll (249) and strxfrm (248) orders inconsistency between strcoll (247) and strxfrm (249) orders inconsistency between strcoll (700) and strxfrm (699) orders inconsistency between strcoll (699) and strxfrm (700) orders ug_CN.utf8 BAD Using LC_COLLATE = "uk_UA.utf8" Using LC_CTYPE = "uk_UA.utf8" uk_UA.utf8 good Using LC_COLLATE = "ur_PK.utf8" Using LC_CTYPE = "ur_PK.utf8" ur_PK.utf8 good Using LC_COLLATE = "uz_UZ.utf8@cyrillic" Using LC_CTYPE = "uz_UZ.utf8@cyrillic" uz_UZ.utf8@cyrillic good Using LC_COLLATE = "ve_ZA.utf8" Using LC_CTYPE = "ve_ZA.utf8" ve_ZA.utf8 good Using LC_COLLATE = "vi_VN.utf8" Using LC_CTYPE = "vi_VN.utf8" inconsistency between strcoll (379) and strxfrm (378) orders inconsistency between strcoll (380) and strxfrm (379) orders inconsistency between strcoll (378) and strxfrm (380) orders vi_VN.utf8 BAD Using LC_COLLATE = "wa_BE.utf8" Using LC_CTYPE = "wa_BE.utf8" wa_BE.utf8 good Using LC_COLLATE = "wo_SN.utf8" Using LC_CTYPE = "wo_SN.utf8" wo_SN.utf8 good Using LC_COLLATE = "xh_ZA.utf8" Using LC_CTYPE = "xh_ZA.utf8" xh_ZA.utf8 good Using LC_COLLATE = "yi_US.utf8" Using LC_CTYPE = "yi_US.utf8" yi_US.utf8 good Using LC_COLLATE = "yo_NG.utf8" Using LC_CTYPE = "yo_NG.utf8" inconsistency between strcoll (347) and strxfrm (346) orders inconsistency between strcoll (348) and strxfrm (347) orders inconsistency between strcoll (346) and strxfrm (348) orders inconsistency between strcoll (793) and strxfrm (791) orders inconsistency between strcoll (791) and strxfrm (792) orders inconsistency between strcoll (792) and strxfrm (793) orders inconsistency between strcoll (795) and strxfrm (794) orders inconsistency between strcoll (794) and strxfrm (795) orders yo_NG.utf8 BAD Using LC_COLLATE = "zh_CN.utf8" Using LC_CTYPE = "zh_CN.utf8" zh_CN.utf8 good Using LC_COLLATE = "zh_HK.utf8" Using LC_CTYPE = "zh_HK.utf8" zh_HK.utf8 good Using LC_COLLATE = "zh_SG.utf8" Using LC_CTYPE = "zh_SG.utf8" zh_SG.utf8 good Using LC_COLLATE = "zh_TW.utf8" Using LC_CTYPE = "zh_TW.utf8" zh_TW.utf8 good Using LC_COLLATE = "zu_ZA.utf8" Using LC_CTYPE = "zu_ZA.utf8" zu_ZA.utf8 good
Peter Geoghegan <pg@heroku.com> writes: > I now think that we have this backwards: This isn't a bug in glibc's > strxfrm(); it's a bug in glibc's strcoll(). FWIW, the test program I just posted includes checks to see if the two cases produce self-consistent sort orders. So far I've seen no evidence that they don't; that is, strcoll() produces a consistent sort order, and strxfrm() produces a consistent sort order, but not the same one. That being the case, arguing about which one is wrong seems a bit academic, not to mention well above my pay grade so far as the theoretical behavior of locale-specific sort ordering is concerned. regards, tom lane
Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Tue, Mar 22, 2016 at 4:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Geoghegan <pg@heroku.com> writes: >> I now think that we have this backwards: This isn't a bug in glibc's >> strxfrm(); it's a bug in glibc's strcoll(). > > FWIW, the test program I just posted includes checks to see if the two > cases produce self-consistent sort orders. So far I've seen no evidence > that they don't; that is, strcoll() produces a consistent sort order, > and strxfrm() produces a consistent sort order, but not the same one. > That being the case, arguing about which one is wrong seems a bit > academic, not to mention well above my pay grade so far as the theoretical > behavior of locale-specific sort ordering is concerned. I hope you're right about it being academic. -- Peter Geoghegan
On Tue, Mar 22, 2016 at 7:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Please try this on as many platforms as you can get hold of ... On MacOS X 10.10.5, this fails because the strxfrm() blobs are far longer than the maximum you defined (about 8n+8 bytes, IIRC). I fixed that and ran this; all locales tested good. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: >> I was a little worried that it was too much to hope for that all libc >> vendors on earth would ship a strxfrm() implementation that was actually >> consistent with strcoll(), and here we are. BTW, the glibc discussion starting here: https://sourceware.org/ml/libc-alpha/2015-09/msg00196.html should put substantial fear in us about the advisability of putting strxfrm results on-disk, as I understand we're now doing in btrees. I was led to that while looking to see if there were any already-filed glibc bug reports concerning this issue. AFAICS there are not, which is odd if the bug is gone in more recent releases ... regards, tom lane
On Tue, Mar 22, 2016 at 7:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >>> I was a little worried that it was too much to hope for that all libc >>> vendors on earth would ship a strxfrm() implementation that was actually >>> consistent with strcoll(), and here we are. > > BTW, the glibc discussion starting here: > https://sourceware.org/ml/libc-alpha/2015-09/msg00196.html > should put substantial fear in us about the advisability of putting strxfrm > results on-disk, as I understand we're now doing in btrees. No. Peter proposed that, but it hasn't actually been done. This certainly makes that sound inadvisable, though. We are, however, putting indexes on disk whose ordering was determined partly by the result of strxfrm() comparisons. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Tue, Mar 22, 2016 at 4:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > BTW, the glibc discussion starting here: > https://sourceware.org/ml/libc-alpha/2015-09/msg00196.html > should put substantial fear in us about the advisability of putting strxfrm > results on-disk, as I understand we're now doing in btrees. > > I was led to that while looking to see if there were any already-filed > glibc bug reports concerning this issue. AFAICS there are not, which > is odd if the bug is gone in more recent releases ... I always knew it wouldn't fly to store strxfrm on disk, and we don't do that. I actually quoted a paper saying just that at one point. I specifically acknowledged that that was clearly a non-starter a couple of times. B-Trees are built based on strxfrm() comparisons at a point in time. strxfrm() should be able to produce the same results as strcoll(). That is what it's documented to do, in C90. glibc has license to change the strxfrm() representation while still producing answers consistent with previous answers. Just not during an ongoing sort, obviously. It's not 100% clear that we have a contract with glibc to never change collation rules, even for strcoll(), but our current use of strxfrm() should not have made that any worse. Problems only cropped up because of bugs in glibc. -- Peter Geoghegan
Robert Haas <robertmhaas@gmail.com> writes: > We are, however, putting indexes on disk whose ordering was determined > partly by the result of strxfrm() comparisons. Yeah. It appears to me that the originally-submitted test case creates an index whose entries are ordered correctly according to strxfrm(), but not so much according to strcoll(). regards, tom lane
On Tue, Mar 22, 2016 at 7:48 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Mar 22, 2016 at 7:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Please try this on as many platforms as you can get hold of ... > > On MacOS X 10.10.5, this fails because the strxfrm() blobs are far > longer than the maximum you defined (about 8n+8 bytes, IIRC). I fixed > that and ran this; all locales tested good. Here are the results on Fedora 16 and RHEL 7.1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Robert Haas <robertmhaas@gmail.com> writes: > Here are the results on Fedora 16 and RHEL 7.1. So much for the theory that it's fixed in RHEL7. I now think that the glibc folk actually do not know about this, and have accordingly filed https://bugzilla.redhat.com/show_bug.cgi?id=1320356 regards, tom lane
On Tue, Mar 22, 2016 at 8:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Here are the results on Fedora 16 and RHEL 7.1. > > So much for the theory that it's fixed in RHEL7. I now think that the > glibc folk actually do not know about this, and have accordingly filed > https://bugzilla.redhat.com/show_bug.cgi?id=1320356 Good plan, but what do we do between now and when they fix it? This seems quite bad. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Mar 22, 2016 at 8:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So much for the theory that it's fixed in RHEL7. I now think that the >> glibc folk actually do not know about this, and have accordingly filed >> https://bugzilla.redhat.com/show_bug.cgi?id=1320356 > Good plan, but what do we do between now and when they fix it? This > seems quite bad. At the moment I think we're still in information-gathering mode. The upstream reaction to this will be valuable data. In the meantime, I'd still like to find out which other platforms have similar issues. I really kinda doubt the upthread report that Ubuntu doesn't have a comparable problem, for instance, given the lack of any evidence that this is a known/fixed issue in glibc. regards, tom lane
Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > I was a little worried that it was too much to hope for that all libc > > vendors on earth would ship a strxfrm() implementation that was actually > > consistent with strcoll(), and here we are. >=20 > Indeed. To try to put some scope on the problem, I made an idiot little > program that just generates some random UTF8 strings and sees whether > strcoll and strxfrm sort them alike. Attached are that program, a even > more idiot little shell script that runs it over all available UTF8 > locales, and the results on my RHEL6 box. While de_DE seems to be the > worst-broken locale, it's far from the only one. >=20 > Please try this on as many platforms as you can get hold of ... Results for Ubuntu 15.10: Using LC_COLLATE =3D "C.UTF-8" Using LC_CTYPE =3D "en_US.UTF-8" C.UTF-8 good Using LC_COLLATE =3D "de_DE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" de_DE.utf8 good Using LC_COLLATE =3D "en_AG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_AG.utf8 good Using LC_COLLATE =3D "en_AU.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_AU.utf8 good Using LC_COLLATE =3D "en_BW.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_BW.utf8 good Using LC_COLLATE =3D "en_CA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_CA.utf8 good Using LC_COLLATE =3D "en_DK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_DK.utf8 good Using LC_COLLATE =3D "en_GB.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_GB.utf8 good Using LC_COLLATE =3D "en_HK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_HK.utf8 good Using LC_COLLATE =3D "en_IE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_IE.utf8 good Using LC_COLLATE =3D "en_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_IN.utf8 good Using LC_COLLATE =3D "en_NG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_NG.utf8 good Using LC_COLLATE =3D "en_NZ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_NZ.utf8 good Using LC_COLLATE =3D "en_PH.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_PH.utf8 good Using LC_COLLATE =3D "en_SG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_SG.utf8 good Using LC_COLLATE =3D "en_US.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_US.utf8 good Using LC_COLLATE =3D "en_ZA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_ZA.utf8 good Using LC_COLLATE =3D "en_ZM.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_ZM.utf8 good Using LC_COLLATE =3D "en_ZW.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_ZW.utf8 good Will try on others. Thanks! Stephen
On Wed, Mar 23, 2016 at 12:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I was a little worried that it was too much to hope for that all libc >> vendors on earth would ship a strxfrm() implementation that was actually >> consistent with strcoll(), and here we are. > > Indeed. To try to put some scope on the problem, I made an idiot little > program that just generates some random UTF8 strings and sees whether > strcoll and strxfrm sort them alike. Attached are that program, a even > more idiot little shell script that runs it over all available UTF8 > locales, and the results on my RHEL6 box. While de_DE seems to be the > worst-broken locale, it's far from the only one. > > Please try this on as many platforms as you can get hold of ... Failed on Debian 8.2, but only for de_DE.utf8. libc 2.19-18+deb8u1. Attached. -- Thomas Munro http://www.enterprisedb.com
Attachment
Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > I was a little worried that it was too much to hope for that all libc > > vendors on earth would ship a strxfrm() implementation that was actually > > consistent with strcoll(), and here we are. >=20 > Indeed. To try to put some scope on the problem, I made an idiot little > program that just generates some random UTF8 strings and sees whether > strcoll and strxfrm sort them alike. Attached are that program, a even > more idiot little shell script that runs it over all available UTF8 > locales, and the results on my RHEL6 box. While de_DE seems to be the > worst-broken locale, it's far from the only one. >=20 > Please try this on as many platforms as you can get hold of ... Results for Ubuntu 14.04: sfrost@dwemer:/home/sfrost> sh tryalllocales.sh =20 Using LC_COLLATE =3D "C.UTF-8" Using LC_CTYPE =3D "en_US.UTF-8" C.UTF-8 good Using LC_COLLATE =3D "de_DE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" inconsistency between strcoll (36) and strxfrm (35) orders inconsistency between strcoll (35) and strxfrm (36) orders inconsistency between strcoll (160) and strxfrm (159) orders inconsistency between strcoll (159) and strxfrm (160) orders inconsistency between strcoll (347) and strxfrm (346) orders inconsistency between strcoll (348) and strxfrm (347) orders inconsistency between strcoll (346) and strxfrm (348) orders inconsistency between strcoll (355) and strxfrm (353) orders inconsistency between strcoll (353) and strxfrm (354) orders inconsistency between strcoll (354) and strxfrm (355) orders inconsistency between strcoll (440) and strxfrm (439) orders inconsistency between strcoll (441) and strxfrm (440) orders inconsistency between strcoll (439) and strxfrm (441) orders inconsistency between strcoll (450) and strxfrm (449) orders inconsistency between strcoll (449) and strxfrm (450) orders inconsistency between strcoll (454) and strxfrm (452) orders inconsistency between strcoll (455) and strxfrm (453) orders inconsistency between strcoll (452) and strxfrm (454) orders inconsistency between strcoll (453) and strxfrm (455) orders inconsistency between strcoll (521) and strxfrm (520) orders inconsistency between strcoll (520) and strxfrm (521) orders inconsistency between strcoll (529) and strxfrm (528) orders inconsistency between strcoll (528) and strxfrm (529) orders inconsistency between strcoll (682) and strxfrm (681) orders inconsistency between strcoll (681) and strxfrm (682) orders inconsistency between strcoll (743) and strxfrm (742) orders inconsistency between strcoll (742) and strxfrm (743) orders inconsistency between strcoll (830) and strxfrm (829) orders inconsistency between strcoll (829) and strxfrm (830) orders inconsistency between strcoll (870) and strxfrm (869) orders inconsistency between strcoll (869) and strxfrm (870) orders inconsistency between strcoll (933) and strxfrm (931) orders inconsistency between strcoll (931) and strxfrm (932) orders inconsistency between strcoll (932) and strxfrm (933) orders de_DE.utf8 BAD Using LC_COLLATE =3D "en_US.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_US.utf8 good Thanks! Stephen
Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > I was a little worried that it was too much to hope for that all libc > > vendors on earth would ship a strxfrm() implementation that was actually > > consistent with strcoll(), and here we are. >=20 > Indeed. To try to put some scope on the problem, I made an idiot little > program that just generates some random UTF8 strings and sees whether > strcoll and strxfrm sort them alike. Attached are that program, a even > more idiot little shell script that runs it over all available UTF8 > locales, and the results on my RHEL6 box. While de_DE seems to be the > worst-broken locale, it's far from the only one. >=20 > Please try this on as many platforms as you can get hold of ... I found the 'all' button on Debian 8.3: sfrost@mahout:~$ sh tryalllocales.sh=20 Using LC_COLLATE =3D "aa_DJ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" aa_DJ.utf8 good Using LC_COLLATE =3D "aa_ER.utf8" Using LC_CTYPE =3D "en_US.UTF-8" aa_ER.utf8 good Using LC_COLLATE =3D "aa_ER.utf8@saaho" Using LC_CTYPE =3D "en_US.UTF-8" aa_ER.utf8@saaho good Using LC_COLLATE =3D "aa_ET.utf8" Using LC_CTYPE =3D "en_US.UTF-8" aa_ET.utf8 good Using LC_COLLATE =3D "af_ZA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" af_ZA.utf8 good Using LC_COLLATE =3D "ak_GH.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ak_GH.utf8 good Using LC_COLLATE =3D "am_ET.utf8" Using LC_CTYPE =3D "en_US.UTF-8" am_ET.utf8 good Using LC_COLLATE =3D "an_ES.utf8" Using LC_CTYPE =3D "en_US.UTF-8" an_ES.utf8 good Using LC_COLLATE =3D "anp_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" anp_IN.utf8 good Using LC_COLLATE =3D "ar_AE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_AE.utf8 good Using LC_COLLATE =3D "ar_BH.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_BH.utf8 good Using LC_COLLATE =3D "ar_DZ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_DZ.utf8 good Using LC_COLLATE =3D "ar_EG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_EG.utf8 good Using LC_COLLATE =3D "ar_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_IN.utf8 good Using LC_COLLATE =3D "ar_IQ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_IQ.utf8 good Using LC_COLLATE =3D "ar_JO.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_JO.utf8 good Using LC_COLLATE =3D "ar_KW.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_KW.utf8 good Using LC_COLLATE =3D "ar_LB.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_LB.utf8 good Using LC_COLLATE =3D "ar_LY.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_LY.utf8 good Using LC_COLLATE =3D "ar_MA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_MA.utf8 good Using LC_COLLATE =3D "ar_OM.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_OM.utf8 good Using LC_COLLATE =3D "ar_QA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_QA.utf8 good Using LC_COLLATE =3D "ar_SA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_SA.utf8 good Using LC_COLLATE =3D "ar_SD.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_SD.utf8 good Using LC_COLLATE =3D "ar_SS.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_SS.utf8 good Using LC_COLLATE =3D "ar_SY.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_SY.utf8 good Using LC_COLLATE =3D "ar_TN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_TN.utf8 good Using LC_COLLATE =3D "ar_YE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_YE.utf8 good Using LC_COLLATE =3D "as_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" as_IN.utf8 good Using LC_COLLATE =3D "ast_ES.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ast_ES.utf8 good Using LC_COLLATE =3D "ayc_PE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ayc_PE.utf8 good Using LC_COLLATE =3D "az_AZ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" az_AZ.utf8 good Using LC_COLLATE =3D "be_BY.utf8" Using LC_CTYPE =3D "en_US.UTF-8" be_BY.utf8 good Using LC_COLLATE =3D "be_BY.utf8@latin" Using LC_CTYPE =3D "en_US.UTF-8" be_BY.utf8@latin good Using LC_COLLATE =3D "bem_ZM.utf8" Using LC_CTYPE =3D "en_US.UTF-8" bem_ZM.utf8 good Using LC_COLLATE =3D "ber_DZ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ber_DZ.utf8 good Using LC_COLLATE =3D "ber_MA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ber_MA.utf8 good Using LC_COLLATE =3D "bg_BG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" bg_BG.utf8 good Using LC_COLLATE =3D "bho_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" bho_IN.utf8 good Using LC_COLLATE =3D "bn_BD.utf8" Using LC_CTYPE =3D "en_US.UTF-8" bn_BD.utf8 good Using LC_COLLATE =3D "bn_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" bn_IN.utf8 good Using LC_COLLATE =3D "bo_CN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" bo_CN.utf8 good Using LC_COLLATE =3D "bo_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" bo_IN.utf8 good Using LC_COLLATE =3D "br_FR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" br_FR.utf8 good Using LC_COLLATE =3D "brx_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" brx_IN.utf8 good Using LC_COLLATE =3D "bs_BA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" bs_BA.utf8 good Using LC_COLLATE =3D "byn_ER.utf8" Using LC_CTYPE =3D "en_US.UTF-8" byn_ER.utf8 good Using LC_COLLATE =3D "ca_AD.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ca_AD.utf8 good Using LC_COLLATE =3D "ca_ES.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ca_ES.utf8 good Using LC_COLLATE =3D "ca_ES.utf8@valencia" Using LC_CTYPE =3D "en_US.UTF-8" ca_ES.utf8@valencia good Using LC_COLLATE =3D "ca_FR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ca_FR.utf8 good Using LC_COLLATE =3D "ca_IT.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ca_IT.utf8 good Using LC_COLLATE =3D "cmn_TW.utf8" Using LC_CTYPE =3D "en_US.UTF-8" cmn_TW.utf8 good Using LC_COLLATE =3D "crh_UA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" crh_UA.utf8 good Using LC_COLLATE =3D "csb_PL.utf8" Using LC_CTYPE =3D "en_US.UTF-8" csb_PL.utf8 good Using LC_COLLATE =3D "cs_CZ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" cs_CZ.utf8 good Using LC_COLLATE =3D "C.UTF-8" Using LC_CTYPE =3D "en_US.UTF-8" C.UTF-8 good Using LC_COLLATE =3D "cv_RU.utf8" Using LC_CTYPE =3D "en_US.UTF-8" cv_RU.utf8 good Using LC_COLLATE =3D "cy_GB.utf8" Using LC_CTYPE =3D "en_US.UTF-8" cy_GB.utf8 good Using LC_COLLATE =3D "da_DK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" da_DK.utf8 good Using LC_COLLATE =3D "de_AT.utf8" Using LC_CTYPE =3D "en_US.UTF-8" de_AT.utf8 good Using LC_COLLATE =3D "de_BE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" de_BE.utf8 good Using LC_COLLATE =3D "de_CH.utf8" Using LC_CTYPE =3D "en_US.UTF-8" de_CH.utf8 good Using LC_COLLATE =3D "de_DE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" inconsistency between strcoll (72) and strxfrm (71) orders inconsistency between strcoll (71) and strxfrm (72) orders inconsistency between strcoll (136) and strxfrm (135) orders inconsistency between strcoll (135) and strxfrm (136) orders inconsistency between strcoll (135) and strxfrm (136) orders inconsistency between strcoll (139) and strxfrm (137) orders inconsistency between strcoll (140) and strxfrm (138) orders inconsistency between strcoll (137) and strxfrm (139) orders inconsistency between strcoll (138) and strxfrm (140) orders inconsistency between strcoll (149) and strxfrm (148) orders inconsistency between strcoll (148) and strxfrm (149) orders inconsistency between strcoll (254) and strxfrm (252) orders inconsistency between strcoll (252) and strxfrm (253) orders inconsistency between strcoll (253) and strxfrm (254) orders inconsistency between strcoll (274) and strxfrm (273) orders inconsistency between strcoll (275) and strxfrm (274) orders inconsistency between strcoll (273) and strxfrm (275) orders inconsistency between strcoll (339) and strxfrm (338) orders inconsistency between strcoll (338) and strxfrm (339) orders inconsistency between strcoll (338) and strxfrm (339) orders inconsistency between strcoll (390) and strxfrm (388) orders inconsistency between strcoll (388) and strxfrm (389) orders inconsistency between strcoll (389) and strxfrm (390) orders inconsistency between strcoll (411) and strxfrm (410) orders inconsistency between strcoll (410) and strxfrm (411) orders inconsistency between strcoll (449) and strxfrm (448) orders inconsistency between strcoll (448) and strxfrm (449) orders inconsistency between strcoll (454) and strxfrm (453) orders inconsistency between strcoll (453) and strxfrm (454) orders inconsistency between strcoll (529) and strxfrm (528) orders inconsistency between strcoll (528) and strxfrm (529) orders inconsistency between strcoll (543) and strxfrm (542) orders inconsistency between strcoll (544) and strxfrm (543) orders inconsistency between strcoll (542) and strxfrm (544) orders inconsistency between strcoll (542) and strxfrm (544) orders inconsistency between strcoll (567) and strxfrm (566) orders inconsistency between strcoll (566) and strxfrm (567) orders inconsistency between strcoll (589) and strxfrm (588) orders inconsistency between strcoll (588) and strxfrm (589) orders inconsistency between strcoll (592) and strxfrm (591) orders inconsistency between strcoll (591) and strxfrm (592) orders inconsistency between strcoll (594) and strxfrm (593) orders inconsistency between strcoll (593) and strxfrm (594) orders inconsistency between strcoll (597) and strxfrm (595) orders inconsistency between strcoll (595) and strxfrm (596) orders inconsistency between strcoll (596) and strxfrm (597) orders inconsistency between strcoll (601) and strxfrm (600) orders inconsistency between strcoll (600) and strxfrm (601) orders inconsistency between strcoll (726) and strxfrm (724) orders inconsistency between strcoll (724) and strxfrm (725) orders inconsistency between strcoll (725) and strxfrm (726) orders inconsistency between strcoll (743) and strxfrm (741) orders inconsistency between strcoll (741) and strxfrm (742) orders inconsistency between strcoll (741) and strxfrm (742) orders inconsistency between strcoll (744) and strxfrm (743) orders inconsistency between strcoll (742) and strxfrm (744) orders inconsistency between strcoll (765) and strxfrm (764) orders inconsistency between strcoll (764) and strxfrm (765) orders inconsistency between strcoll (786) and strxfrm (784) orders inconsistency between strcoll (784) and strxfrm (786) orders inconsistency between strcoll (896) and strxfrm (895) orders inconsistency between strcoll (895) and strxfrm (896) orders inconsistency between strcoll (941) and strxfrm (939) orders inconsistency between strcoll (942) and strxfrm (940) orders inconsistency between strcoll (943) and strxfrm (941) orders inconsistency between strcoll (939) and strxfrm (942) orders inconsistency between strcoll (940) and strxfrm (943) orders de_DE.utf8 BAD Using LC_COLLATE =3D "de_LI.utf8" Using LC_CTYPE =3D "en_US.UTF-8" de_LI.utf8 good Using LC_COLLATE =3D "de_LU.utf8" Using LC_CTYPE =3D "en_US.UTF-8" de_LU.utf8 good Using LC_COLLATE =3D "doi_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" doi_IN.utf8 good Using LC_COLLATE =3D "dv_MV.utf8" Using LC_CTYPE =3D "en_US.UTF-8" dv_MV.utf8 good Using LC_COLLATE =3D "dz_BT.utf8" Using LC_CTYPE =3D "en_US.UTF-8" dz_BT.utf8 good Using LC_COLLATE =3D "el_CY.utf8" Using LC_CTYPE =3D "en_US.UTF-8" el_CY.utf8 good Using LC_COLLATE =3D "el_GR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" el_GR.utf8 good Using LC_COLLATE =3D "en_AG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_AG.utf8 good Using LC_COLLATE =3D "en_AU.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_AU.utf8 good Using LC_COLLATE =3D "en_BW.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_BW.utf8 good Using LC_COLLATE =3D "en_CA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_CA.utf8 good Using LC_COLLATE =3D "en_DK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_DK.utf8 good Using LC_COLLATE =3D "en_GB.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_GB.utf8 good Using LC_COLLATE =3D "en_HK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_HK.utf8 good Using LC_COLLATE =3D "en_IE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_IE.utf8 good Using LC_COLLATE =3D "en_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_IN.utf8 good Using LC_COLLATE =3D "en_NG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_NG.utf8 good Using LC_COLLATE =3D "en_NZ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_NZ.utf8 good Using LC_COLLATE =3D "en_PH.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_PH.utf8 good Using LC_COLLATE =3D "en_SG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_SG.utf8 good Using LC_COLLATE =3D "en_US.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_US.utf8 good Using LC_COLLATE =3D "en_ZA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_ZA.utf8 good Using LC_COLLATE =3D "en_ZM.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_ZM.utf8 good Using LC_COLLATE =3D "en_ZW.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_ZW.utf8 good Using LC_COLLATE =3D "eo.utf8" Using LC_CTYPE =3D "en_US.UTF-8" eo.utf8 good Using LC_COLLATE =3D "es_AR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_AR.utf8 good Using LC_COLLATE =3D "es_BO.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_BO.utf8 good Using LC_COLLATE =3D "es_CL.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_CL.utf8 good Using LC_COLLATE =3D "es_CO.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_CO.utf8 good Using LC_COLLATE =3D "es_CR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_CR.utf8 good Using LC_COLLATE =3D "es_CU.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_CU.utf8 good Using LC_COLLATE =3D "es_DO.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_DO.utf8 good Using LC_COLLATE =3D "es_EC.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_EC.utf8 good Using LC_COLLATE =3D "es_ES.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_ES.utf8 good Using LC_COLLATE =3D "es_GT.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_GT.utf8 good Using LC_COLLATE =3D "es_HN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_HN.utf8 good Using LC_COLLATE =3D "es_MX.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_MX.utf8 good Using LC_COLLATE =3D "es_NI.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_NI.utf8 good Using LC_COLLATE =3D "es_PA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_PA.utf8 good Using LC_COLLATE =3D "es_PE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_PE.utf8 good Using LC_COLLATE =3D "es_PR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_PR.utf8 good Using LC_COLLATE =3D "es_PY.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_PY.utf8 good Using LC_COLLATE =3D "es_SV.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_SV.utf8 good Using LC_COLLATE =3D "es_US.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_US.utf8 good Using LC_COLLATE =3D "es_UY.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_UY.utf8 good Using LC_COLLATE =3D "es_VE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_VE.utf8 good Using LC_COLLATE =3D "et_EE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" et_EE.utf8 good Using LC_COLLATE =3D "eu_ES.utf8" Using LC_CTYPE =3D "en_US.UTF-8" eu_ES.utf8 good Using LC_COLLATE =3D "eu_FR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" eu_FR.utf8 good Using LC_COLLATE =3D "fa_IR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fa_IR.utf8 good Using LC_COLLATE =3D "ff_SN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ff_SN.utf8 good Using LC_COLLATE =3D "fi_FI.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fi_FI.utf8 good Using LC_COLLATE =3D "fil_PH.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fil_PH.utf8 good Using LC_COLLATE =3D "fo_FO.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fo_FO.utf8 good Using LC_COLLATE =3D "fr_BE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fr_BE.utf8 good Using LC_COLLATE =3D "fr_CA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fr_CA.utf8 good Using LC_COLLATE =3D "fr_CH.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fr_CH.utf8 good Using LC_COLLATE =3D "fr_FR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fr_FR.utf8 good Using LC_COLLATE =3D "fr_LU.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fr_LU.utf8 good Using LC_COLLATE =3D "fur_IT.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fur_IT.utf8 good Using LC_COLLATE =3D "fy_DE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fy_DE.utf8 good Using LC_COLLATE =3D "fy_NL.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fy_NL.utf8 good Using LC_COLLATE =3D "ga_IE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ga_IE.utf8 good Using LC_COLLATE =3D "gd_GB.utf8" Using LC_CTYPE =3D "en_US.UTF-8" gd_GB.utf8 good Using LC_COLLATE =3D "gez_ER.utf8" Using LC_CTYPE =3D "en_US.UTF-8" gez_ER.utf8 good Using LC_COLLATE =3D "gez_ER.utf8@abegede" Using LC_CTYPE =3D "en_US.UTF-8" gez_ER.utf8@abegede good Using LC_COLLATE =3D "gez_ET.utf8" Using LC_CTYPE =3D "en_US.UTF-8" gez_ET.utf8 good Using LC_COLLATE =3D "gez_ET.utf8@abegede" Using LC_CTYPE =3D "en_US.UTF-8" gez_ET.utf8@abegede good Using LC_COLLATE =3D "gl_ES.utf8" Using LC_CTYPE =3D "en_US.UTF-8" gl_ES.utf8 good Using LC_COLLATE =3D "gu_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" gu_IN.utf8 good Using LC_COLLATE =3D "gv_GB.utf8" Using LC_CTYPE =3D "en_US.UTF-8" gv_GB.utf8 good Using LC_COLLATE =3D "hak_TW.utf8" Using LC_CTYPE =3D "en_US.UTF-8" hak_TW.utf8 good Using LC_COLLATE =3D "ha_NG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ha_NG.utf8 good Using LC_COLLATE =3D "he_IL.utf8" Using LC_CTYPE =3D "en_US.UTF-8" he_IL.utf8 good Using LC_COLLATE =3D "hi_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" hi_IN.utf8 good Using LC_COLLATE =3D "hne_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" hne_IN.utf8 good Using LC_COLLATE =3D "hr_HR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" hr_HR.utf8 good Using LC_COLLATE =3D "hsb_DE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" hsb_DE.utf8 good Using LC_COLLATE =3D "ht_HT.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ht_HT.utf8 good Using LC_COLLATE =3D "hu_HU.utf8" Using LC_CTYPE =3D "en_US.UTF-8" hu_HU.utf8 good Using LC_COLLATE =3D "hy_AM.utf8" Using LC_CTYPE =3D "en_US.UTF-8" hy_AM.utf8 good Using LC_COLLATE =3D "ia_FR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ia_FR.utf8 good Using LC_COLLATE =3D "id_ID.utf8" Using LC_CTYPE =3D "en_US.UTF-8" id_ID.utf8 good Using LC_COLLATE =3D "ig_NG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ig_NG.utf8 good Using LC_COLLATE =3D "ik_CA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ik_CA.utf8 good Using LC_COLLATE =3D "is_IS.utf8" Using LC_CTYPE =3D "en_US.UTF-8" is_IS.utf8 good Using LC_COLLATE =3D "it_CH.utf8" Using LC_CTYPE =3D "en_US.UTF-8" it_CH.utf8 good Using LC_COLLATE =3D "it_IT.utf8" Using LC_CTYPE =3D "en_US.UTF-8" it_IT.utf8 good Using LC_COLLATE =3D "iu_CA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" iu_CA.utf8 good Using LC_COLLATE =3D "iw_IL.utf8" Using LC_CTYPE =3D "en_US.UTF-8" iw_IL.utf8 good Using LC_COLLATE =3D "ja_JP.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ja_JP.utf8 good Using LC_COLLATE =3D "ka_GE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ka_GE.utf8 good Using LC_COLLATE =3D "kk_KZ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" kk_KZ.utf8 good Using LC_COLLATE =3D "kl_GL.utf8" Using LC_CTYPE =3D "en_US.UTF-8" kl_GL.utf8 good Using LC_COLLATE =3D "km_KH.utf8" Using LC_CTYPE =3D "en_US.UTF-8" km_KH.utf8 good Using LC_COLLATE =3D "kn_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" kn_IN.utf8 good Using LC_COLLATE =3D "kok_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" kok_IN.utf8 good Using LC_COLLATE =3D "ko_KR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ko_KR.utf8 good Using LC_COLLATE =3D "ks_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ks_IN.utf8 good Using LC_COLLATE =3D "ks_IN.utf8@devanagari" Using LC_CTYPE =3D "en_US.UTF-8" ks_IN.utf8@devanagari good Using LC_COLLATE =3D "ku_TR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ku_TR.utf8 good Using LC_COLLATE =3D "kw_GB.utf8" Using LC_CTYPE =3D "en_US.UTF-8" kw_GB.utf8 good Using LC_COLLATE =3D "ky_KG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ky_KG.utf8 good Using LC_COLLATE =3D "lb_LU.utf8" Using LC_CTYPE =3D "en_US.UTF-8" inconsistency between strcoll (137) and strxfrm (136) orders inconsistency between strcoll (136) and strxfrm (137) orders inconsistency between strcoll (171) and strxfrm (170) orders inconsistency between strcoll (170) and strxfrm (171) orders inconsistency between strcoll (351) and strxfrm (350) orders inconsistency between strcoll (350) and strxfrm (351) orders inconsistency between strcoll (350) and strxfrm (351) orders inconsistency between strcoll (356) and strxfrm (353) orders inconsistency between strcoll (353) and strxfrm (354) orders inconsistency between strcoll (354) and strxfrm (355) orders inconsistency between strcoll (355) and strxfrm (356) orders inconsistency between strcoll (465) and strxfrm (464) orders inconsistency between strcoll (464) and strxfrm (465) orders inconsistency between strcoll (467) and strxfrm (466) orders inconsistency between strcoll (466) and strxfrm (467) orders inconsistency between strcoll (470) and strxfrm (469) orders inconsistency between strcoll (469) and strxfrm (470) orders inconsistency between strcoll (573) and strxfrm (572) orders inconsistency between strcoll (574) and strxfrm (573) orders inconsistency between strcoll (572) and strxfrm (574) orders inconsistency between strcoll (572) and strxfrm (574) orders inconsistency between strcoll (612) and strxfrm (611) orders inconsistency between strcoll (611) and strxfrm (612) orders inconsistency between strcoll (709) and strxfrm (708) orders inconsistency between strcoll (710) and strxfrm (709) orders inconsistency between strcoll (708) and strxfrm (710) orders inconsistency between strcoll (771) and strxfrm (770) orders inconsistency between strcoll (770) and strxfrm (771) orders inconsistency between strcoll (789) and strxfrm (787) orders inconsistency between strcoll (787) and strxfrm (788) orders inconsistency between strcoll (788) and strxfrm (789) orders inconsistency between strcoll (948) and strxfrm (947) orders inconsistency between strcoll (947) and strxfrm (948) orders lb_LU.utf8 BAD Using LC_COLLATE =3D "lg_UG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" lg_UG.utf8 good Using LC_COLLATE =3D "li_BE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" li_BE.utf8 good Using LC_COLLATE =3D "lij_IT.utf8" Using LC_CTYPE =3D "en_US.UTF-8" lij_IT.utf8 good Using LC_COLLATE =3D "li_NL.utf8" Using LC_CTYPE =3D "en_US.UTF-8" li_NL.utf8 good Using LC_COLLATE =3D "lo_LA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" lo_LA.utf8 good Using LC_COLLATE =3D "lt_LT.utf8" Using LC_CTYPE =3D "en_US.UTF-8" lt_LT.utf8 good Using LC_COLLATE =3D "lv_LV.utf8" Using LC_CTYPE =3D "en_US.UTF-8" lv_LV.utf8 good Using LC_COLLATE =3D "lzh_TW.utf8" Using LC_CTYPE =3D "en_US.UTF-8" lzh_TW.utf8 good Using LC_COLLATE =3D "mag_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" mag_IN.utf8 good Using LC_COLLATE =3D "mai_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" mai_IN.utf8 good Using LC_COLLATE =3D "mg_MG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" mg_MG.utf8 good Using LC_COLLATE =3D "mhr_RU.utf8" Using LC_CTYPE =3D "en_US.UTF-8" mhr_RU.utf8 good Using LC_COLLATE =3D "mi_NZ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" mi_NZ.utf8 good Using LC_COLLATE =3D "mk_MK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" mk_MK.utf8 good Using LC_COLLATE =3D "ml_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ml_IN.utf8 good Using LC_COLLATE =3D "mni_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" mni_IN.utf8 good Using LC_COLLATE =3D "mn_MN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" mn_MN.utf8 good Using LC_COLLATE =3D "mr_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" mr_IN.utf8 good Using LC_COLLATE =3D "ms_MY.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ms_MY.utf8 good Using LC_COLLATE =3D "mt_MT.utf8" Using LC_CTYPE =3D "en_US.UTF-8" mt_MT.utf8 good Using LC_COLLATE =3D "my_MM.utf8" Using LC_CTYPE =3D "en_US.UTF-8" my_MM.utf8 good Using LC_COLLATE =3D "nan_TW.utf8" Using LC_CTYPE =3D "en_US.UTF-8" nan_TW.utf8 good Using LC_COLLATE =3D "nan_TW.utf8@latin" Using LC_CTYPE =3D "en_US.UTF-8" nan_TW.utf8@latin good Using LC_COLLATE =3D "nb_NO.utf8" Using LC_CTYPE =3D "en_US.UTF-8" nb_NO.utf8 good Using LC_COLLATE =3D "nds_DE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" nds_DE.utf8 good Using LC_COLLATE =3D "nds_NL.utf8" Using LC_CTYPE =3D "en_US.UTF-8" nds_NL.utf8 good Using LC_COLLATE =3D "ne_NP.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ne_NP.utf8 good Using LC_COLLATE =3D "nhn_MX.utf8" Using LC_CTYPE =3D "en_US.UTF-8" nhn_MX.utf8 good Using LC_COLLATE =3D "niu_NU.utf8" Using LC_CTYPE =3D "en_US.UTF-8" niu_NU.utf8 good Using LC_COLLATE =3D "niu_NZ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" niu_NZ.utf8 good Using LC_COLLATE =3D "nl_AW.utf8" Using LC_CTYPE =3D "en_US.UTF-8" nl_AW.utf8 good Using LC_COLLATE =3D "nl_BE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" nl_BE.utf8 good Using LC_COLLATE =3D "nl_NL.utf8" Using LC_CTYPE =3D "en_US.UTF-8" nl_NL.utf8 good Using LC_COLLATE =3D "nn_NO.utf8" Using LC_CTYPE =3D "en_US.UTF-8" nn_NO.utf8 good Using LC_COLLATE =3D "nr_ZA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" nr_ZA.utf8 good Using LC_COLLATE =3D "nso_ZA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" nso_ZA.utf8 good Using LC_COLLATE =3D "oc_FR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" oc_FR.utf8 good Using LC_COLLATE =3D "om_ET.utf8" Using LC_CTYPE =3D "en_US.UTF-8" om_ET.utf8 good Using LC_COLLATE =3D "om_KE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" om_KE.utf8 good Using LC_COLLATE =3D "or_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" or_IN.utf8 good Using LC_COLLATE =3D "os_RU.utf8" Using LC_CTYPE =3D "en_US.UTF-8" os_RU.utf8 good Using LC_COLLATE =3D "pa_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" pa_IN.utf8 good Using LC_COLLATE =3D "pap_AN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" pap_AN.utf8 good Using LC_COLLATE =3D "pap_AW.utf8" Using LC_CTYPE =3D "en_US.UTF-8" pap_AW.utf8 good Using LC_COLLATE =3D "pap_CW.utf8" Using LC_CTYPE =3D "en_US.UTF-8" pap_CW.utf8 good Using LC_COLLATE =3D "pa_PK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" pa_PK.utf8 good Using LC_COLLATE =3D "pl_PL.utf8" Using LC_CTYPE =3D "en_US.UTF-8" pl_PL.utf8 good Using LC_COLLATE =3D "ps_AF.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ps_AF.utf8 good Using LC_COLLATE =3D "pt_BR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" pt_BR.utf8 good Using LC_COLLATE =3D "pt_PT.utf8" Using LC_CTYPE =3D "en_US.UTF-8" pt_PT.utf8 good Using LC_COLLATE =3D "quz_PE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" quz_PE.utf8 good Using LC_COLLATE =3D "ro_RO.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ro_RO.utf8 good Using LC_COLLATE =3D "ru_RU.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ru_RU.utf8 good Using LC_COLLATE =3D "ru_UA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ru_UA.utf8 good Using LC_COLLATE =3D "rw_RW.utf8" Using LC_CTYPE =3D "en_US.UTF-8" rw_RW.utf8 good Using LC_COLLATE =3D "sa_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sa_IN.utf8 good Using LC_COLLATE =3D "sat_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sat_IN.utf8 good Using LC_COLLATE =3D "sc_IT.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sc_IT.utf8 good Using LC_COLLATE =3D "sd_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sd_IN.utf8 good Using LC_COLLATE =3D "sd_IN.utf8@devanagari" Using LC_CTYPE =3D "en_US.UTF-8" sd_IN.utf8@devanagari good Using LC_COLLATE =3D "se_NO.utf8" Using LC_CTYPE =3D "en_US.UTF-8" se_NO.utf8 good Using LC_COLLATE =3D "shs_CA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" shs_CA.utf8 good Using LC_COLLATE =3D "sid_ET.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sid_ET.utf8 good Using LC_COLLATE =3D "si_LK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" si_LK.utf8 good Using LC_COLLATE =3D "sk_SK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sk_SK.utf8 good Using LC_COLLATE =3D "sl_SI.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sl_SI.utf8 good Using LC_COLLATE =3D "so_DJ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" so_DJ.utf8 good Using LC_COLLATE =3D "so_ET.utf8" Using LC_CTYPE =3D "en_US.UTF-8" so_ET.utf8 good Using LC_COLLATE =3D "so_KE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" so_KE.utf8 good Using LC_COLLATE =3D "so_SO.utf8" Using LC_CTYPE =3D "en_US.UTF-8" so_SO.utf8 good Using LC_COLLATE =3D "sq_AL.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sq_AL.utf8 good Using LC_COLLATE =3D "sq_MK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sq_MK.utf8 good Using LC_COLLATE =3D "sr_ME.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sr_ME.utf8 good Using LC_COLLATE =3D "sr_RS.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sr_RS.utf8 good Using LC_COLLATE =3D "sr_RS.utf8@latin" Using LC_CTYPE =3D "en_US.UTF-8" sr_RS.utf8@latin good Using LC_COLLATE =3D "ss_ZA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ss_ZA.utf8 good Using LC_COLLATE =3D "st_ZA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" st_ZA.utf8 good Using LC_COLLATE =3D "sv_FI.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sv_FI.utf8 good Using LC_COLLATE =3D "sv_SE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sv_SE.utf8 good Using LC_COLLATE =3D "sw_KE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sw_KE.utf8 good Using LC_COLLATE =3D "sw_TZ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sw_TZ.utf8 good Using LC_COLLATE =3D "szl_PL.utf8" Using LC_CTYPE =3D "en_US.UTF-8" szl_PL.utf8 good Using LC_COLLATE =3D "ta_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ta_IN.utf8 good Using LC_COLLATE =3D "ta_LK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ta_LK.utf8 good Using LC_COLLATE =3D "te_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" te_IN.utf8 good Using LC_COLLATE =3D "tg_TJ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" tg_TJ.utf8 good Using LC_COLLATE =3D "the_NP.utf8" Using LC_CTYPE =3D "en_US.UTF-8" the_NP.utf8 good Using LC_COLLATE =3D "th_TH.utf8" Using LC_CTYPE =3D "en_US.UTF-8" th_TH.utf8 good Using LC_COLLATE =3D "ti_ER.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ti_ER.utf8 good Using LC_COLLATE =3D "ti_ET.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ti_ET.utf8 good Using LC_COLLATE =3D "tig_ER.utf8" Using LC_CTYPE =3D "en_US.UTF-8" tig_ER.utf8 good Using LC_COLLATE =3D "tk_TM.utf8" Using LC_CTYPE =3D "en_US.UTF-8" tk_TM.utf8 good Using LC_COLLATE =3D "tl_PH.utf8" Using LC_CTYPE =3D "en_US.UTF-8" tl_PH.utf8 good Using LC_COLLATE =3D "tn_ZA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" tn_ZA.utf8 good Using LC_COLLATE =3D "tr_CY.utf8" Using LC_CTYPE =3D "en_US.UTF-8" tr_CY.utf8 good Using LC_COLLATE =3D "tr_TR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" tr_TR.utf8 good Using LC_COLLATE =3D "ts_ZA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ts_ZA.utf8 good Using LC_COLLATE =3D "tt_RU.utf8" Using LC_CTYPE =3D "en_US.UTF-8" tt_RU.utf8 good Using LC_COLLATE =3D "tt_RU.utf8@iqtelif" Using LC_CTYPE =3D "en_US.UTF-8" tt_RU.utf8@iqtelif good Using LC_COLLATE =3D "ug_CN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ug_CN.utf8 good Using LC_COLLATE =3D "uk_UA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" uk_UA.utf8 good Using LC_COLLATE =3D "unm_US.utf8" Using LC_CTYPE =3D "en_US.UTF-8" unm_US.utf8 good Using LC_COLLATE =3D "ur_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ur_IN.utf8 good Using LC_COLLATE =3D "ur_PK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ur_PK.utf8 good Using LC_COLLATE =3D "uz_UZ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" uz_UZ.utf8 good Using LC_COLLATE =3D "uz_UZ.utf8@cyrillic" Using LC_CTYPE =3D "en_US.UTF-8" uz_UZ.utf8@cyrillic good Using LC_COLLATE =3D "ve_ZA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ve_ZA.utf8 good Using LC_COLLATE =3D "vi_VN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" vi_VN.utf8 good Using LC_COLLATE =3D "wa_BE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" wa_BE.utf8 good Using LC_COLLATE =3D "wae_CH.utf8" Using LC_CTYPE =3D "en_US.UTF-8" wae_CH.utf8 good Using LC_COLLATE =3D "wal_ET.utf8" Using LC_CTYPE =3D "en_US.UTF-8" wal_ET.utf8 good Using LC_COLLATE =3D "wo_SN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" wo_SN.utf8 good Using LC_COLLATE =3D "xh_ZA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" xh_ZA.utf8 good Using LC_COLLATE =3D "yi_US.utf8" Using LC_CTYPE =3D "en_US.UTF-8" yi_US.utf8 good Using LC_COLLATE =3D "yo_NG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" yo_NG.utf8 good Using LC_COLLATE =3D "yue_HK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" yue_HK.utf8 good Using LC_COLLATE =3D "zh_CN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" zh_CN.utf8 good Using LC_COLLATE =3D "zh_HK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" zh_HK.utf8 good Using LC_COLLATE =3D "zh_SG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" zh_SG.utf8 good Using LC_COLLATE =3D "zh_TW.utf8" Using LC_CTYPE =3D "en_US.UTF-8" zh_TW.utf8 good Using LC_COLLATE =3D "zu_ZA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" zu_ZA.utf8 good Thanks! Stephen
On Wed, Mar 23, 2016 at 2:18 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Wed, Mar 23, 2016 at 12:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> I was a little worried that it was too much to hope for that all libc >>> vendors on earth would ship a strxfrm() implementation that was actually >>> consistent with strcoll(), and here we are. >> >> Indeed. To try to put some scope on the problem, I made an idiot little >> program that just generates some random UTF8 strings and sees whether >> strcoll and strxfrm sort them alike. Attached are that program, a even >> more idiot little shell script that runs it over all available UTF8 >> locales, and the results on my RHEL6 box. While de_DE seems to be the >> worst-broken locale, it's far from the only one. >> >> Please try this on as many platforms as you can get hold of ... > > Failed on Debian 8.2, but only for de_DE.utf8. libc 2.19-18+deb8u1. Attached. Ran again after apt-get upgrade took me to 8.3 and libc6 2.19-18+deb8u2. Results similar, de_DE.utf8 has inconsistencies but nothing else. So Debian stable is affected. (Just noticed that Stephen Frost's output from the same OS reports a broken lb_LU.utf8 too, but after conferring on IRC it seems that may be because I installed "locales-all" (precompiled) which didn't give me lb_LU.utf8, and he generated all locales which apparently does.) -- Thomas Munro http://www.enterprisedb.com
Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > I was a little worried that it was too much to hope for that all libc > > vendors on earth would ship a strxfrm() implementation that was actually > > consistent with strcoll(), and here we are. >=20 > Indeed. To try to put some scope on the problem, I made an idiot little > program that just generates some random UTF8 strings and sees whether > strcoll and strxfrm sort them alike. Attached are that program, a even > more idiot little shell script that runs it over all available UTF8 > locales, and the results on my RHEL6 box. While de_DE seems to be the > worst-broken locale, it's far from the only one. >=20 > Please try this on as many platforms as you can get hold of ... Debian 7.9 results with all locales locally generated: Using LC_COLLATE =3D "aa_DJ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" aa_DJ.utf8 good Using LC_COLLATE =3D "aa_ER.utf8" Using LC_CTYPE =3D "en_US.UTF-8" aa_ER.utf8 good Using LC_COLLATE =3D "aa_ER.utf8@saaho" Using LC_CTYPE =3D "en_US.UTF-8" aa_ER.utf8@saaho good Using LC_COLLATE =3D "aa_ET.utf8" Using LC_CTYPE =3D "en_US.UTF-8" aa_ET.utf8 good Using LC_COLLATE =3D "af_ZA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" af_ZA.utf8 good Using LC_COLLATE =3D "am_ET.utf8" Using LC_CTYPE =3D "en_US.UTF-8" am_ET.utf8 good Using LC_COLLATE =3D "an_ES.utf8" Using LC_CTYPE =3D "en_US.UTF-8" an_ES.utf8 good Using LC_COLLATE =3D "ar_AE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_AE.utf8 good Using LC_COLLATE =3D "ar_BH.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_BH.utf8 good Using LC_COLLATE =3D "ar_DZ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_DZ.utf8 good Using LC_COLLATE =3D "ar_EG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_EG.utf8 good Using LC_COLLATE =3D "ar_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_IN.utf8 good Using LC_COLLATE =3D "ar_IQ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_IQ.utf8 good Using LC_COLLATE =3D "ar_JO.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_JO.utf8 good Using LC_COLLATE =3D "ar_KW.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_KW.utf8 good Using LC_COLLATE =3D "ar_LB.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_LB.utf8 good Using LC_COLLATE =3D "ar_LY.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_LY.utf8 good Using LC_COLLATE =3D "ar_MA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_MA.utf8 good Using LC_COLLATE =3D "ar_OM.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_OM.utf8 good Using LC_COLLATE =3D "ar_QA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_QA.utf8 good Using LC_COLLATE =3D "ar_SA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_SA.utf8 good Using LC_COLLATE =3D "ar_SD.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_SD.utf8 good Using LC_COLLATE =3D "ar_SY.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_SY.utf8 good Using LC_COLLATE =3D "ar_TN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_TN.utf8 good Using LC_COLLATE =3D "ar_YE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ar_YE.utf8 good Using LC_COLLATE =3D "as_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" as_IN.utf8 good Using LC_COLLATE =3D "ast_ES.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ast_ES.utf8 good Using LC_COLLATE =3D "az_AZ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" az_AZ.utf8 good Using LC_COLLATE =3D "be_BY.utf8" Using LC_CTYPE =3D "en_US.UTF-8" be_BY.utf8 good Using LC_COLLATE =3D "be_BY.utf8@latin" Using LC_CTYPE =3D "en_US.UTF-8" be_BY.utf8@latin good Using LC_COLLATE =3D "bem_ZM.utf8" Using LC_CTYPE =3D "en_US.UTF-8" bem_ZM.utf8 good Using LC_COLLATE =3D "ber_DZ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ber_DZ.utf8 good Using LC_COLLATE =3D "ber_MA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ber_MA.utf8 good Using LC_COLLATE =3D "bg_BG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" bg_BG.utf8 good Using LC_COLLATE =3D "bn_BD.utf8" Using LC_CTYPE =3D "en_US.UTF-8" bn_BD.utf8 good Using LC_COLLATE =3D "bn_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" bn_IN.utf8 good Using LC_COLLATE =3D "bo_CN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" bo_CN.utf8 good Using LC_COLLATE =3D "bo_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" bo_IN.utf8 good Using LC_COLLATE =3D "br_FR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" br_FR.utf8 good Using LC_COLLATE =3D "bs_BA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" bs_BA.utf8 good Using LC_COLLATE =3D "byn_ER.utf8" Using LC_CTYPE =3D "en_US.UTF-8" byn_ER.utf8 good Using LC_COLLATE =3D "ca_AD.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ca_AD.utf8 good Using LC_COLLATE =3D "ca_ES.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ca_ES.utf8 good Using LC_COLLATE =3D "ca_ES.utf8@valencia" Using LC_CTYPE =3D "en_US.UTF-8" ca_ES.utf8@valencia good Using LC_COLLATE =3D "ca_FR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ca_FR.utf8 good Using LC_COLLATE =3D "ca_IT.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ca_IT.utf8 good Using LC_COLLATE =3D "crh_UA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" crh_UA.utf8 good Using LC_COLLATE =3D "csb_PL.utf8" Using LC_CTYPE =3D "en_US.UTF-8" csb_PL.utf8 good Using LC_COLLATE =3D "cs_CZ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" cs_CZ.utf8 good Using LC_COLLATE =3D "C.UTF-8" Using LC_CTYPE =3D "en_US.UTF-8" C.UTF-8 good Using LC_COLLATE =3D "cv_RU.utf8" Using LC_CTYPE =3D "en_US.UTF-8" cv_RU.utf8 good Using LC_COLLATE =3D "cy_GB.utf8" Using LC_CTYPE =3D "en_US.UTF-8" cy_GB.utf8 good Using LC_COLLATE =3D "da_DK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" da_DK.utf8 good Using LC_COLLATE =3D "de_AT.utf8" Using LC_CTYPE =3D "en_US.UTF-8" de_AT.utf8 good Using LC_COLLATE =3D "de_BE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" de_BE.utf8 good Using LC_COLLATE =3D "de_CH.utf8" Using LC_CTYPE =3D "en_US.UTF-8" de_CH.utf8 good Using LC_COLLATE =3D "de_DE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" inconsistency between strcoll (71) and strxfrm (70) orders inconsistency between strcoll (70) and strxfrm (71) orders inconsistency between strcoll (98) and strxfrm (97) orders inconsistency between strcoll (97) and strxfrm (98) orders inconsistency between strcoll (130) and strxfrm (128) orders inconsistency between strcoll (131) and strxfrm (129) orders inconsistency between strcoll (128) and strxfrm (130) orders inconsistency between strcoll (129) and strxfrm (131) orders inconsistency between strcoll (143) and strxfrm (142) orders inconsistency between strcoll (142) and strxfrm (143) orders inconsistency between strcoll (147) and strxfrm (146) orders inconsistency between strcoll (146) and strxfrm (147) orders inconsistency between strcoll (152) and strxfrm (150) orders inconsistency between strcoll (150) and strxfrm (151) orders inconsistency between strcoll (151) and strxfrm (152) orders inconsistency between strcoll (155) and strxfrm (154) orders inconsistency between strcoll (154) and strxfrm (155) orders inconsistency between strcoll (154) and strxfrm (155) orders inconsistency between strcoll (157) and strxfrm (156) orders inconsistency between strcoll (156) and strxfrm (157) orders inconsistency between strcoll (195) and strxfrm (194) orders inconsistency between strcoll (194) and strxfrm (195) orders inconsistency between strcoll (314) and strxfrm (313) orders inconsistency between strcoll (315) and strxfrm (314) orders inconsistency between strcoll (316) and strxfrm (315) orders inconsistency between strcoll (313) and strxfrm (316) orders inconsistency between strcoll (350) and strxfrm (349) orders inconsistency between strcoll (351) and strxfrm (350) orders inconsistency between strcoll (352) and strxfrm (351) orders inconsistency between strcoll (353) and strxfrm (352) orders inconsistency between strcoll (354) and strxfrm (353) orders inconsistency between strcoll (349) and strxfrm (354) orders inconsistency between strcoll (357) and strxfrm (356) orders inconsistency between strcoll (356) and strxfrm (357) orders inconsistency between strcoll (360) and strxfrm (359) orders inconsistency between strcoll (359) and strxfrm (360) orders inconsistency between strcoll (433) and strxfrm (432) orders inconsistency between strcoll (432) and strxfrm (433) orders inconsistency between strcoll (535) and strxfrm (534) orders inconsistency between strcoll (534) and strxfrm (535) orders inconsistency between strcoll (634) and strxfrm (632) orders inconsistency between strcoll (635) and strxfrm (633) orders inconsistency between strcoll (632) and strxfrm (634) orders inconsistency between strcoll (633) and strxfrm (635) orders inconsistency between strcoll (642) and strxfrm (641) orders inconsistency between strcoll (641) and strxfrm (642) orders inconsistency between strcoll (760) and strxfrm (758) orders inconsistency between strcoll (758) and strxfrm (759) orders inconsistency between strcoll (761) and strxfrm (760) orders inconsistency between strcoll (759) and strxfrm (761) orders inconsistency between strcoll (794) and strxfrm (793) orders inconsistency between strcoll (795) and strxfrm (794) orders inconsistency between strcoll (796) and strxfrm (795) orders inconsistency between strcoll (797) and strxfrm (796) orders inconsistency between strcoll (793) and strxfrm (797) orders inconsistency between strcoll (799) and strxfrm (798) orders inconsistency between strcoll (798) and strxfrm (799) orders inconsistency between strcoll (803) and strxfrm (802) orders inconsistency between strcoll (802) and strxfrm (803) orders inconsistency between strcoll (880) and strxfrm (879) orders inconsistency between strcoll (879) and strxfrm (880) orders inconsistency between strcoll (879) and strxfrm (880) orders inconsistency between strcoll (890) and strxfrm (889) orders inconsistency between strcoll (889) and strxfrm (890) orders de_DE.utf8 BAD Using LC_COLLATE =3D "de_LI.utf8" Using LC_CTYPE =3D "en_US.UTF-8" de_LI.utf8 good Using LC_COLLATE =3D "de_LU.utf8" Using LC_CTYPE =3D "en_US.UTF-8" de_LU.utf8 good Using LC_COLLATE =3D "dv_MV.utf8" Using LC_CTYPE =3D "en_US.UTF-8" dv_MV.utf8 good Using LC_COLLATE =3D "dz_BT.utf8" Using LC_CTYPE =3D "en_US.UTF-8" dz_BT.utf8 good Using LC_COLLATE =3D "el_CY.utf8" Using LC_CTYPE =3D "en_US.UTF-8" el_CY.utf8 good Using LC_COLLATE =3D "el_GR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" el_GR.utf8 good Using LC_COLLATE =3D "en_AG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_AG.utf8 good Using LC_COLLATE =3D "en_AU.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_AU.utf8 good Using LC_COLLATE =3D "en_BW.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_BW.utf8 good Using LC_COLLATE =3D "en_CA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_CA.utf8 good Using LC_COLLATE =3D "en_DK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_DK.utf8 good Using LC_COLLATE =3D "en_GB.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_GB.utf8 good Using LC_COLLATE =3D "en_HK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_HK.utf8 good Using LC_COLLATE =3D "en_IE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_IE.utf8 good Using LC_COLLATE =3D "en_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_IN.utf8 good Using LC_COLLATE =3D "en_NG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_NG.utf8 good Using LC_COLLATE =3D "en_NZ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_NZ.utf8 good Using LC_COLLATE =3D "en_PH.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_PH.utf8 good Using LC_COLLATE =3D "en_SG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_SG.utf8 good Using LC_COLLATE =3D "en_US.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_US.utf8 good Using LC_COLLATE =3D "en_ZA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_ZA.utf8 good Using LC_COLLATE =3D "en_ZM.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_ZM.utf8 good Using LC_COLLATE =3D "en_ZW.utf8" Using LC_CTYPE =3D "en_US.UTF-8" en_ZW.utf8 good Using LC_COLLATE =3D "eo.utf8" Using LC_CTYPE =3D "en_US.UTF-8" eo.utf8 good Using LC_COLLATE =3D "es_AR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_AR.utf8 good Using LC_COLLATE =3D "es_BO.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_BO.utf8 good Using LC_COLLATE =3D "es_CL.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_CL.utf8 good Using LC_COLLATE =3D "es_CO.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_CO.utf8 good Using LC_COLLATE =3D "es_CR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_CR.utf8 good Using LC_COLLATE =3D "es_DO.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_DO.utf8 good Using LC_COLLATE =3D "es_EC.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_EC.utf8 good Using LC_COLLATE =3D "es_ES.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_ES.utf8 good Using LC_COLLATE =3D "es_GT.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_GT.utf8 good Using LC_COLLATE =3D "es_HN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_HN.utf8 good Using LC_COLLATE =3D "es_MX.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_MX.utf8 good Using LC_COLLATE =3D "es_NI.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_NI.utf8 good Using LC_COLLATE =3D "es_PA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_PA.utf8 good Using LC_COLLATE =3D "es_PE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_PE.utf8 good Using LC_COLLATE =3D "es_PR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_PR.utf8 good Using LC_COLLATE =3D "es_PY.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_PY.utf8 good Using LC_COLLATE =3D "es_SV.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_SV.utf8 good Using LC_COLLATE =3D "es_US.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_US.utf8 good Using LC_COLLATE =3D "es_UY.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_UY.utf8 good Using LC_COLLATE =3D "es_VE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" es_VE.utf8 good Using LC_COLLATE =3D "et_EE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" et_EE.utf8 good Using LC_COLLATE =3D "eu_ES.utf8" Using LC_CTYPE =3D "en_US.UTF-8" eu_ES.utf8 good Using LC_COLLATE =3D "eu_FR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" eu_FR.utf8 good Using LC_COLLATE =3D "fa_IR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fa_IR.utf8 good Using LC_COLLATE =3D "ff_SN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ff_SN.utf8 good Using LC_COLLATE =3D "fi_FI.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fi_FI.utf8 good Using LC_COLLATE =3D "fil_PH.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fil_PH.utf8 good Using LC_COLLATE =3D "fo_FO.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fo_FO.utf8 good Using LC_COLLATE =3D "fr_BE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fr_BE.utf8 good Using LC_COLLATE =3D "fr_CA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fr_CA.utf8 good Using LC_COLLATE =3D "fr_CH.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fr_CH.utf8 good Using LC_COLLATE =3D "fr_FR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fr_FR.utf8 good Using LC_COLLATE =3D "fr_LU.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fr_LU.utf8 good Using LC_COLLATE =3D "fur_IT.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fur_IT.utf8 good Using LC_COLLATE =3D "fy_DE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fy_DE.utf8 good Using LC_COLLATE =3D "fy_NL.utf8" Using LC_CTYPE =3D "en_US.UTF-8" fy_NL.utf8 good Using LC_COLLATE =3D "ga_IE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ga_IE.utf8 good Using LC_COLLATE =3D "gd_GB.utf8" Using LC_CTYPE =3D "en_US.UTF-8" gd_GB.utf8 good Using LC_COLLATE =3D "gez_ER.utf8" Using LC_CTYPE =3D "en_US.UTF-8" gez_ER.utf8 good Using LC_COLLATE =3D "gez_ER.utf8@abegede" Using LC_CTYPE =3D "en_US.UTF-8" gez_ER.utf8@abegede good Using LC_COLLATE =3D "gez_ET.utf8" Using LC_CTYPE =3D "en_US.UTF-8" gez_ET.utf8 good Using LC_COLLATE =3D "gez_ET.utf8@abegede" Using LC_CTYPE =3D "en_US.UTF-8" gez_ET.utf8@abegede good Using LC_COLLATE =3D "gl_ES.utf8" Using LC_CTYPE =3D "en_US.UTF-8" gl_ES.utf8 good Using LC_COLLATE =3D "gu_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" gu_IN.utf8 good Using LC_COLLATE =3D "gv_GB.utf8" Using LC_CTYPE =3D "en_US.UTF-8" gv_GB.utf8 good Using LC_COLLATE =3D "ha_NG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ha_NG.utf8 good Using LC_COLLATE =3D "he_IL.utf8" Using LC_CTYPE =3D "en_US.UTF-8" he_IL.utf8 good Using LC_COLLATE =3D "hi_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" hi_IN.utf8 good Using LC_COLLATE =3D "hne_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" hne_IN.utf8 good Using LC_COLLATE =3D "hr_HR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" hr_HR.utf8 good Using LC_COLLATE =3D "hsb_DE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" hsb_DE.utf8 good Using LC_COLLATE =3D "ht_HT.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ht_HT.utf8 good Using LC_COLLATE =3D "hu_HU.utf8" Using LC_CTYPE =3D "en_US.UTF-8" hu_HU.utf8 good Using LC_COLLATE =3D "hy_AM.utf8" Using LC_CTYPE =3D "en_US.UTF-8" hy_AM.utf8 good Using LC_COLLATE =3D "ia.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ia.utf8 good Using LC_COLLATE =3D "id_ID.utf8" Using LC_CTYPE =3D "en_US.UTF-8" id_ID.utf8 good Using LC_COLLATE =3D "ig_NG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ig_NG.utf8 good Using LC_COLLATE =3D "ik_CA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ik_CA.utf8 good Using LC_COLLATE =3D "is_IS.utf8" Using LC_CTYPE =3D "en_US.UTF-8" is_IS.utf8 good Using LC_COLLATE =3D "it_CH.utf8" Using LC_CTYPE =3D "en_US.UTF-8" it_CH.utf8 good Using LC_COLLATE =3D "it_IT.utf8" Using LC_CTYPE =3D "en_US.UTF-8" it_IT.utf8 good Using LC_COLLATE =3D "iu_CA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" iu_CA.utf8 good Using LC_COLLATE =3D "iw_IL.utf8" Using LC_CTYPE =3D "en_US.UTF-8" iw_IL.utf8 good Using LC_COLLATE =3D "ja_JP.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ja_JP.utf8 good Using LC_COLLATE =3D "ka_GE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ka_GE.utf8 good Using LC_COLLATE =3D "kk_KZ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" kk_KZ.utf8 good Using LC_COLLATE =3D "kl_GL.utf8" Using LC_CTYPE =3D "en_US.UTF-8" kl_GL.utf8 good Using LC_COLLATE =3D "km_KH.utf8" Using LC_CTYPE =3D "en_US.UTF-8" km_KH.utf8 good Using LC_COLLATE =3D "kn_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" kn_IN.utf8 good Using LC_COLLATE =3D "kok_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" kok_IN.utf8 good Using LC_COLLATE =3D "ko_KR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ko_KR.utf8 good Using LC_COLLATE =3D "ks_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ks_IN.utf8 good Using LC_COLLATE =3D "ks_IN.utf8@devanagari" Using LC_CTYPE =3D "en_US.UTF-8" ks_IN.utf8@devanagari good Using LC_COLLATE =3D "ku_TR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ku_TR.utf8 good Using LC_COLLATE =3D "kw_GB.utf8" Using LC_CTYPE =3D "en_US.UTF-8" kw_GB.utf8 good Using LC_COLLATE =3D "ky_KG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ky_KG.utf8 good Using LC_COLLATE =3D "lg_UG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" lg_UG.utf8 good Using LC_COLLATE =3D "li_BE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" li_BE.utf8 good Using LC_COLLATE =3D "li_NL.utf8" Using LC_CTYPE =3D "en_US.UTF-8" li_NL.utf8 good Using LC_COLLATE =3D "lo_LA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" lo_LA.utf8 good Using LC_COLLATE =3D "lt_LT.utf8" Using LC_CTYPE =3D "en_US.UTF-8" lt_LT.utf8 good Using LC_COLLATE =3D "lv_LV.utf8" Using LC_CTYPE =3D "en_US.UTF-8" lv_LV.utf8 good Using LC_COLLATE =3D "mai_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" mai_IN.utf8 good Using LC_COLLATE =3D "mg_MG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" mg_MG.utf8 good Using LC_COLLATE =3D "mi_NZ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" mi_NZ.utf8 good Using LC_COLLATE =3D "mk_MK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" mk_MK.utf8 good Using LC_COLLATE =3D "ml_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ml_IN.utf8 good Using LC_COLLATE =3D "mn_MN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" mn_MN.utf8 good Using LC_COLLATE =3D "mr_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" mr_IN.utf8 good Using LC_COLLATE =3D "ms_MY.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ms_MY.utf8 good Using LC_COLLATE =3D "mt_MT.utf8" Using LC_CTYPE =3D "en_US.UTF-8" mt_MT.utf8 good Using LC_COLLATE =3D "my_MM.utf8" Using LC_CTYPE =3D "en_US.UTF-8" my_MM.utf8 good Using LC_COLLATE =3D "nan_TW.utf8@latin" Using LC_CTYPE =3D "en_US.UTF-8" nan_TW.utf8@latin good Using LC_COLLATE =3D "nb_NO.utf8" Using LC_CTYPE =3D "en_US.UTF-8" nb_NO.utf8 good Using LC_COLLATE =3D "nds_DE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" nds_DE.utf8 good Using LC_COLLATE =3D "nds_NL.utf8" Using LC_CTYPE =3D "en_US.UTF-8" nds_NL.utf8 good Using LC_COLLATE =3D "ne_NP.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ne_NP.utf8 good Using LC_COLLATE =3D "nl_AW.utf8" Using LC_CTYPE =3D "en_US.UTF-8" nl_AW.utf8 good Using LC_COLLATE =3D "nl_BE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" nl_BE.utf8 good Using LC_COLLATE =3D "nl_NL.utf8" Using LC_CTYPE =3D "en_US.UTF-8" nl_NL.utf8 good Using LC_COLLATE =3D "nn_NO.utf8" Using LC_CTYPE =3D "en_US.UTF-8" nn_NO.utf8 good Using LC_COLLATE =3D "nr_ZA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" nr_ZA.utf8 good Using LC_COLLATE =3D "nso_ZA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" nso_ZA.utf8 good Using LC_COLLATE =3D "oc_FR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" oc_FR.utf8 good Using LC_COLLATE =3D "om_ET.utf8" Using LC_CTYPE =3D "en_US.UTF-8" om_ET.utf8 good Using LC_COLLATE =3D "om_KE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" om_KE.utf8 good Using LC_COLLATE =3D "or_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" or_IN.utf8 good Using LC_COLLATE =3D "os_RU.utf8" Using LC_CTYPE =3D "en_US.UTF-8" inconsistency between strcoll (936) and strxfrm (935) orders inconsistency between strcoll (935) and strxfrm (936) orders os_RU.utf8 BAD Using LC_COLLATE =3D "pa_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" pa_IN.utf8 good Using LC_COLLATE =3D "pap_AN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" pap_AN.utf8 good Using LC_COLLATE =3D "pa_PK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" pa_PK.utf8 good Using LC_COLLATE =3D "pl_PL.utf8" Using LC_CTYPE =3D "en_US.UTF-8" pl_PL.utf8 good Using LC_COLLATE =3D "ps_AF.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ps_AF.utf8 good Using LC_COLLATE =3D "pt_BR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" pt_BR.utf8 good Using LC_COLLATE =3D "pt_PT.utf8" Using LC_CTYPE =3D "en_US.UTF-8" pt_PT.utf8 good Using LC_COLLATE =3D "ro_RO.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ro_RO.utf8 good Using LC_COLLATE =3D "ru_RU.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ru_RU.utf8 good Using LC_COLLATE =3D "ru_UA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ru_UA.utf8 good Using LC_COLLATE =3D "rw_RW.utf8" Using LC_CTYPE =3D "en_US.UTF-8" rw_RW.utf8 good Using LC_COLLATE =3D "sa_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sa_IN.utf8 good Using LC_COLLATE =3D "sc_IT.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sc_IT.utf8 good Using LC_COLLATE =3D "sd_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sd_IN.utf8 good Using LC_COLLATE =3D "sd_IN.utf8@devanagari" Using LC_CTYPE =3D "en_US.UTF-8" sd_IN.utf8@devanagari good Using LC_COLLATE =3D "se_NO.utf8" Using LC_CTYPE =3D "en_US.UTF-8" se_NO.utf8 good Using LC_COLLATE =3D "shs_CA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" strxfrm() result for 18-length string exceeded 100 bytes shs_CA.utf8 BAD Using LC_COLLATE =3D "sid_ET.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sid_ET.utf8 good Using LC_COLLATE =3D "si_LK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" si_LK.utf8 good Using LC_COLLATE =3D "sk_SK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sk_SK.utf8 good Using LC_COLLATE =3D "sl_SI.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sl_SI.utf8 good Using LC_COLLATE =3D "so_DJ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" so_DJ.utf8 good Using LC_COLLATE =3D "so_ET.utf8" Using LC_CTYPE =3D "en_US.UTF-8" so_ET.utf8 good Using LC_COLLATE =3D "so_KE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" so_KE.utf8 good Using LC_COLLATE =3D "so_SO.utf8" Using LC_CTYPE =3D "en_US.UTF-8" so_SO.utf8 good Using LC_COLLATE =3D "sq_AL.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sq_AL.utf8 good Using LC_COLLATE =3D "sq_MK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sq_MK.utf8 good Using LC_COLLATE =3D "sr_ME.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sr_ME.utf8 good Using LC_COLLATE =3D "sr_RS.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sr_RS.utf8 good Using LC_COLLATE =3D "sr_RS.utf8@latin" Using LC_CTYPE =3D "en_US.UTF-8" sr_RS.utf8@latin good Using LC_COLLATE =3D "ss_ZA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ss_ZA.utf8 good Using LC_COLLATE =3D "st_ZA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" st_ZA.utf8 good Using LC_COLLATE =3D "sv_FI.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sv_FI.utf8 good Using LC_COLLATE =3D "sv_SE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sv_SE.utf8 good Using LC_COLLATE =3D "sw_KE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sw_KE.utf8 good Using LC_COLLATE =3D "sw_TZ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" sw_TZ.utf8 good Using LC_COLLATE =3D "ta_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ta_IN.utf8 good Using LC_COLLATE =3D "te_IN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" te_IN.utf8 good Using LC_COLLATE =3D "tg_TJ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" tg_TJ.utf8 good Using LC_COLLATE =3D "th_TH.utf8" Using LC_CTYPE =3D "en_US.UTF-8" th_TH.utf8 good Using LC_COLLATE =3D "ti_ER.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ti_ER.utf8 good Using LC_COLLATE =3D "ti_ET.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ti_ET.utf8 good Using LC_COLLATE =3D "tig_ER.utf8" Using LC_CTYPE =3D "en_US.UTF-8" tig_ER.utf8 good Using LC_COLLATE =3D "tk_TM.utf8" Using LC_CTYPE =3D "en_US.UTF-8" tk_TM.utf8 good Using LC_COLLATE =3D "tl_PH.utf8" Using LC_CTYPE =3D "en_US.UTF-8" tl_PH.utf8 good Using LC_COLLATE =3D "tn_ZA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" tn_ZA.utf8 good Using LC_COLLATE =3D "tr_CY.utf8" Using LC_CTYPE =3D "en_US.UTF-8" tr_CY.utf8 good Using LC_COLLATE =3D "tr_TR.utf8" Using LC_CTYPE =3D "en_US.UTF-8" tr_TR.utf8 good Using LC_COLLATE =3D "ts_ZA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ts_ZA.utf8 good Using LC_COLLATE =3D "tt_RU.utf8" Using LC_CTYPE =3D "en_US.UTF-8" tt_RU.utf8 good Using LC_COLLATE =3D "tt_RU.utf8@iqtelif" Using LC_CTYPE =3D "en_US.UTF-8" tt_RU.utf8@iqtelif good Using LC_COLLATE =3D "ug_CN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ug_CN.utf8 good Using LC_COLLATE =3D "uk_UA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" uk_UA.utf8 good Using LC_COLLATE =3D "ur_PK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ur_PK.utf8 good Using LC_COLLATE =3D "uz_UZ.utf8" Using LC_CTYPE =3D "en_US.UTF-8" uz_UZ.utf8 good Using LC_COLLATE =3D "uz_UZ.utf8@cyrillic" Using LC_CTYPE =3D "en_US.UTF-8" uz_UZ.utf8@cyrillic good Using LC_COLLATE =3D "ve_ZA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" ve_ZA.utf8 good Using LC_COLLATE =3D "vi_VN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" vi_VN.utf8 good Using LC_COLLATE =3D "wa_BE.utf8" Using LC_CTYPE =3D "en_US.UTF-8" wa_BE.utf8 good Using LC_COLLATE =3D "wo_SN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" wo_SN.utf8 good Using LC_COLLATE =3D "xh_ZA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" xh_ZA.utf8 good Using LC_COLLATE =3D "yi_US.utf8" Using LC_CTYPE =3D "en_US.UTF-8" yi_US.utf8 good Using LC_COLLATE =3D "yo_NG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" yo_NG.utf8 good Using LC_COLLATE =3D "zh_CN.utf8" Using LC_CTYPE =3D "en_US.UTF-8" zh_CN.utf8 good Using LC_COLLATE =3D "zh_HK.utf8" Using LC_CTYPE =3D "en_US.UTF-8" zh_HK.utf8 good Using LC_COLLATE =3D "zh_SG.utf8" Using LC_CTYPE =3D "en_US.UTF-8" zh_SG.utf8 good Using LC_COLLATE =3D "zh_TW.utf8" Using LC_CTYPE =3D "en_US.UTF-8" zh_TW.utf8 good Using LC_COLLATE =3D "zu_ZA.utf8" Using LC_CTYPE =3D "en_US.UTF-8" zu_ZA.utf8 good Thanks! Stephen
Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > I was a little worried that it was too much to hope for that all libc > > vendors on earth would ship a strxfrm() implementation that was actually > > consistent with strcoll(), and here we are. >=20 > Indeed. To try to put some scope on the problem, I made an idiot little > program that just generates some random UTF8 strings and sees whether > strcoll and strxfrm sort them alike. Attached are that program, a even > more idiot little shell script that runs it over all available UTF8 > locales, and the results on my RHEL6 box. While de_DE seems to be the > worst-broken locale, it's far from the only one. >=20 > Please try this on as many platforms as you can get hold of ... =46rom IRC (not mine), "debian testing, glibc 2.22-3": Using LC_COLLATE =3D "aa_DJ.utf8" Using LC_CTYPE =3D "aa_DJ.utf8" aa_DJ.utf8 good Using LC_COLLATE =3D "aa_ER" Using LC_CTYPE =3D "aa_ER" aa_ER good Using LC_COLLATE =3D "aa_ER@saaho" Using LC_CTYPE =3D "aa_ER@saaho" aa_ER@saaho good Using LC_COLLATE =3D "aa_ET" Using LC_CTYPE =3D "aa_ET" aa_ET good Using LC_COLLATE =3D "af_ZA.utf8" Using LC_CTYPE =3D "af_ZA.utf8" af_ZA.utf8 good Using LC_COLLATE =3D "ak_GH" Using LC_CTYPE =3D "ak_GH" ak_GH good Using LC_COLLATE =3D "am_ET" Using LC_CTYPE =3D "am_ET" am_ET good Using LC_COLLATE =3D "an_ES.utf8" Using LC_CTYPE =3D "an_ES.utf8" an_ES.utf8 good Using LC_COLLATE =3D "anp_IN" Using LC_CTYPE =3D "anp_IN" anp_IN good Using LC_COLLATE =3D "ar_AE.utf8" Using LC_CTYPE =3D "ar_AE.utf8" ar_AE.utf8 good Using LC_COLLATE =3D "ar_BH.utf8" Using LC_CTYPE =3D "ar_BH.utf8" ar_BH.utf8 good Using LC_COLLATE =3D "ar_DZ.utf8" Using LC_CTYPE =3D "ar_DZ.utf8" ar_DZ.utf8 good Using LC_COLLATE =3D "ar_EG.utf8" Using LC_CTYPE =3D "ar_EG.utf8" ar_EG.utf8 good Using LC_COLLATE =3D "ar_IN" Using LC_CTYPE =3D "ar_IN" ar_IN good Using LC_COLLATE =3D "ar_IQ.utf8" Using LC_CTYPE =3D "ar_IQ.utf8" ar_IQ.utf8 good Using LC_COLLATE =3D "ar_JO.utf8" Using LC_CTYPE =3D "ar_JO.utf8" ar_JO.utf8 good Using LC_COLLATE =3D "ar_KW.utf8" Using LC_CTYPE =3D "ar_KW.utf8" ar_KW.utf8 good Using LC_COLLATE =3D "ar_LB.utf8" Using LC_CTYPE =3D "ar_LB.utf8" ar_LB.utf8 good Using LC_COLLATE =3D "ar_LY.utf8" Using LC_CTYPE =3D "ar_LY.utf8" ar_LY.utf8 good Using LC_COLLATE =3D "ar_MA.utf8" Using LC_CTYPE =3D "ar_MA.utf8" ar_MA.utf8 good Using LC_COLLATE =3D "ar_OM.utf8" Using LC_CTYPE =3D "ar_OM.utf8" ar_OM.utf8 good Using LC_COLLATE =3D "ar_QA.utf8" Using LC_CTYPE =3D "ar_QA.utf8" ar_QA.utf8 good Using LC_COLLATE =3D "ar_SA.utf8" Using LC_CTYPE =3D "ar_SA.utf8" ar_SA.utf8 good Using LC_COLLATE =3D "ar_SD.utf8" Using LC_CTYPE =3D "ar_SD.utf8" ar_SD.utf8 good Using LC_COLLATE =3D "ar_SS" Using LC_CTYPE =3D "ar_SS" ar_SS good Using LC_COLLATE =3D "ar_SY.utf8" Using LC_CTYPE =3D "ar_SY.utf8" ar_SY.utf8 good Using LC_COLLATE =3D "ar_TN.utf8" Using LC_CTYPE =3D "ar_TN.utf8" ar_TN.utf8 good Using LC_COLLATE =3D "ar_YE.utf8" Using LC_CTYPE =3D "ar_YE.utf8" ar_YE.utf8 good Using LC_COLLATE =3D "as_IN" Using LC_CTYPE =3D "as_IN" as_IN good Using LC_COLLATE =3D "ast_ES.utf8" Using LC_CTYPE =3D "ast_ES.utf8" ast_ES.utf8 good Using LC_COLLATE =3D "ayc_PE" Using LC_CTYPE =3D "ayc_PE" ayc_PE good Using LC_COLLATE =3D "az_AZ" Using LC_CTYPE =3D "az_AZ" az_AZ good Using LC_COLLATE =3D "be_BY@latin" Using LC_CTYPE =3D "be_BY@latin" be_BY@latin good Using LC_COLLATE =3D "be_BY.utf8" Using LC_CTYPE =3D "be_BY.utf8" be_BY.utf8 good Using LC_COLLATE =3D "bem_ZM" Using LC_CTYPE =3D "bem_ZM" bem_ZM good Using LC_COLLATE =3D "ber_DZ" Using LC_CTYPE =3D "ber_DZ" ber_DZ good Using LC_COLLATE =3D "ber_MA" Using LC_CTYPE =3D "ber_MA" ber_MA good Using LC_COLLATE =3D "bg_BG.utf8" Using LC_CTYPE =3D "bg_BG.utf8" bg_BG.utf8 good Using LC_COLLATE =3D "bhb_IN.utf8" Using LC_CTYPE =3D "bhb_IN.utf8" bhb_IN.utf8 good Using LC_COLLATE =3D "bho_IN" Using LC_CTYPE =3D "bho_IN" bho_IN good Using LC_COLLATE =3D "bn_BD" Using LC_CTYPE =3D "bn_BD" bn_BD good Using LC_COLLATE =3D "bn_IN" Using LC_CTYPE =3D "bn_IN" bn_IN good Using LC_COLLATE =3D "bo_CN" Using LC_CTYPE =3D "bo_CN" bo_CN good Using LC_COLLATE =3D "bo_IN" Using LC_CTYPE =3D "bo_IN" bo_IN good Using LC_COLLATE =3D "br_FR.utf8" Using LC_CTYPE =3D "br_FR.utf8" br_FR.utf8 good Using LC_COLLATE =3D "brx_IN" Using LC_CTYPE =3D "brx_IN" brx_IN good Using LC_COLLATE =3D "bs_BA.utf8" Using LC_CTYPE =3D "bs_BA.utf8" bs_BA.utf8 good Using LC_COLLATE =3D "byn_ER" Using LC_CTYPE =3D "byn_ER" byn_ER good Using LC_COLLATE =3D "ca_AD.utf8" Using LC_CTYPE =3D "ca_AD.utf8" ca_AD.utf8 good Using LC_COLLATE =3D "ca_ES.utf8" Using LC_CTYPE =3D "ca_ES.utf8" ca_ES.utf8 good Using LC_COLLATE =3D "ca_ES.utf8@valencia" Using LC_CTYPE =3D "ca_ES.utf8@valencia" ca_ES.utf8@valencia good Using LC_COLLATE =3D "ca_FR.utf8" Using LC_CTYPE =3D "ca_FR.utf8" ca_FR.utf8 good Using LC_COLLATE =3D "ca_IT.utf8" Using LC_CTYPE =3D "ca_IT.utf8" ca_IT.utf8 good Using LC_COLLATE =3D "ce_RU" Using LC_CTYPE =3D "ce_RU" ce_RU good Using LC_COLLATE =3D "cmn_TW" Using LC_CTYPE =3D "cmn_TW" cmn_TW good Using LC_COLLATE =3D "crh_UA" Using LC_CTYPE =3D "crh_UA" crh_UA good Using LC_COLLATE =3D "csb_PL" Using LC_CTYPE =3D "csb_PL" csb_PL good Using LC_COLLATE =3D "cs_CZ.utf8" Using LC_CTYPE =3D "cs_CZ.utf8" cs_CZ.utf8 good Using LC_COLLATE =3D "C.UTF-8" Using LC_CTYPE =3D "C.UTF-8" C.UTF-8 good Using LC_COLLATE =3D "cv_RU" Using LC_CTYPE =3D "cv_RU" cv_RU good Using LC_COLLATE =3D "cy_GB.utf8" Using LC_CTYPE =3D "cy_GB.utf8" cy_GB.utf8 good Using LC_COLLATE =3D "da_DK.utf8" Using LC_CTYPE =3D "da_DK.utf8" da_DK.utf8 good Using LC_COLLATE =3D "de_AT.utf8" Using LC_CTYPE =3D "de_AT.utf8" de_AT.utf8 good Using LC_COLLATE =3D "de_BE.utf8" Using LC_CTYPE =3D "de_BE.utf8" de_BE.utf8 good Using LC_COLLATE =3D "de_CH.utf8" Using LC_CTYPE =3D "de_CH.utf8" de_CH.utf8 good Using LC_COLLATE =3D "de_DE.utf8" Using LC_CTYPE =3D "de_DE.utf8" de_DE.utf8 good Using LC_COLLATE =3D "de_LI.utf8" Using LC_CTYPE =3D "de_LI.utf8" de_LI.utf8 good Using LC_COLLATE =3D "de_LU.utf8" Using LC_CTYPE =3D "de_LU.utf8" de_LU.utf8 good Using LC_COLLATE =3D "doi_IN" Using LC_CTYPE =3D "doi_IN" doi_IN good Using LC_COLLATE =3D "dv_MV" Using LC_CTYPE =3D "dv_MV" dv_MV good Using LC_COLLATE =3D "dz_BT" Using LC_CTYPE =3D "dz_BT" dz_BT good Using LC_COLLATE =3D "el_CY.utf8" Using LC_CTYPE =3D "el_CY.utf8" el_CY.utf8 good Using LC_COLLATE =3D "el_GR.utf8" Using LC_CTYPE =3D "el_GR.utf8" el_GR.utf8 good Using LC_COLLATE =3D "en_AG" Using LC_CTYPE =3D "en_AG" en_AG good Using LC_COLLATE =3D "en_AU.utf8" Using LC_CTYPE =3D "en_AU.utf8" en_AU.utf8 good Using LC_COLLATE =3D "en_BW.utf8" Using LC_CTYPE =3D "en_BW.utf8" en_BW.utf8 good Using LC_COLLATE =3D "en_CA.utf8" Using LC_CTYPE =3D "en_CA.utf8" en_CA.utf8 good Using LC_COLLATE =3D "en_DK.utf8" Using LC_CTYPE =3D "en_DK.utf8" en_DK.utf8 good Using LC_COLLATE =3D "en_GB.utf8" Using LC_CTYPE =3D "en_GB.utf8" en_GB.utf8 good Using LC_COLLATE =3D "en_HK.utf8" Using LC_CTYPE =3D "en_HK.utf8" en_HK.utf8 good Using LC_COLLATE =3D "en_IE.utf8" Using LC_CTYPE =3D "en_IE.utf8" en_IE.utf8 good Using LC_COLLATE =3D "en_IN" Using LC_CTYPE =3D "en_IN" en_IN good Using LC_COLLATE =3D "en_NG" Using LC_CTYPE =3D "en_NG" en_NG good Using LC_COLLATE =3D "en_NZ.utf8" Using LC_CTYPE =3D "en_NZ.utf8" en_NZ.utf8 good Using LC_COLLATE =3D "en_PH.utf8" Using LC_CTYPE =3D "en_PH.utf8" en_PH.utf8 good Using LC_COLLATE =3D "en_SG.utf8" Using LC_CTYPE =3D "en_SG.utf8" en_SG.utf8 good Using LC_COLLATE =3D "en_US.utf8" Using LC_CTYPE =3D "en_US.utf8" en_US.utf8 good Using LC_COLLATE =3D "en_ZA.utf8" Using LC_CTYPE =3D "en_ZA.utf8" en_ZA.utf8 good Using LC_COLLATE =3D "en_ZM" Using LC_CTYPE =3D "en_ZM" en_ZM good Using LC_COLLATE =3D "en_ZW.utf8" Using LC_CTYPE =3D "en_ZW.utf8" en_ZW.utf8 good Using LC_COLLATE =3D "eo.utf8" Using LC_CTYPE =3D "eo.utf8" eo.utf8 good Using LC_COLLATE =3D "es_AR.utf8" Using LC_CTYPE =3D "es_AR.utf8" es_AR.utf8 good Using LC_COLLATE =3D "es_BO.utf8" Using LC_CTYPE =3D "es_BO.utf8" es_BO.utf8 good Using LC_COLLATE =3D "es_CL.utf8" Using LC_CTYPE =3D "es_CL.utf8" es_CL.utf8 good Using LC_COLLATE =3D "es_CO.utf8" Using LC_CTYPE =3D "es_CO.utf8" es_CO.utf8 good Using LC_COLLATE =3D "es_CR.utf8" Using LC_CTYPE =3D "es_CR.utf8" es_CR.utf8 good Using LC_COLLATE =3D "es_CU" Using LC_CTYPE =3D "es_CU" es_CU good Using LC_COLLATE =3D "es_DO.utf8" Using LC_CTYPE =3D "es_DO.utf8" es_DO.utf8 good Using LC_COLLATE =3D "es_EC.utf8" Using LC_CTYPE =3D "es_EC.utf8" es_EC.utf8 good Using LC_COLLATE =3D "es_ES.utf8" Using LC_CTYPE =3D "es_ES.utf8" es_ES.utf8 good Using LC_COLLATE =3D "es_GT.utf8" Using LC_CTYPE =3D "es_GT.utf8" es_GT.utf8 good Using LC_COLLATE =3D "es_HN.utf8" Using LC_CTYPE =3D "es_HN.utf8" es_HN.utf8 good Using LC_COLLATE =3D "es_MX.utf8" Using LC_CTYPE =3D "es_MX.utf8" es_MX.utf8 good Using LC_COLLATE =3D "es_NI.utf8" Using LC_CTYPE =3D "es_NI.utf8" es_NI.utf8 good Using LC_COLLATE =3D "es_PA.utf8" Using LC_CTYPE =3D "es_PA.utf8" es_PA.utf8 good Using LC_COLLATE =3D "es_PE.utf8" Using LC_CTYPE =3D "es_PE.utf8" es_PE.utf8 good Using LC_COLLATE =3D "es_PR.utf8" Using LC_CTYPE =3D "es_PR.utf8" es_PR.utf8 good Using LC_COLLATE =3D "es_PY.utf8" Using LC_CTYPE =3D "es_PY.utf8" es_PY.utf8 good Using LC_COLLATE =3D "es_SV.utf8" Using LC_CTYPE =3D "es_SV.utf8" es_SV.utf8 good Using LC_COLLATE =3D "es_US.utf8" Using LC_CTYPE =3D "es_US.utf8" es_US.utf8 good Using LC_COLLATE =3D "es_UY.utf8" Using LC_CTYPE =3D "es_UY.utf8" es_UY.utf8 good Using LC_COLLATE =3D "es_VE.utf8" Using LC_CTYPE =3D "es_VE.utf8" es_VE.utf8 good Using LC_COLLATE =3D "et_EE.utf8" Using LC_CTYPE =3D "et_EE.utf8" et_EE.utf8 good Using LC_COLLATE =3D "eu_ES.utf8" Using LC_CTYPE =3D "eu_ES.utf8" eu_ES.utf8 good Using LC_COLLATE =3D "eu_FR.utf8" Using LC_CTYPE =3D "eu_FR.utf8" eu_FR.utf8 good Using LC_COLLATE =3D "fa_IR" Using LC_CTYPE =3D "fa_IR" fa_IR good Using LC_COLLATE =3D "ff_SN" Using LC_CTYPE =3D "ff_SN" ff_SN good Using LC_COLLATE =3D "fi_FI.utf8" Using LC_CTYPE =3D "fi_FI.utf8" fi_FI.utf8 good Using LC_COLLATE =3D "fil_PH" Using LC_CTYPE =3D "fil_PH" fil_PH good Using LC_COLLATE =3D "fo_FO.utf8" Using LC_CTYPE =3D "fo_FO.utf8" fo_FO.utf8 good Using LC_COLLATE =3D "fr_BE.utf8" Using LC_CTYPE =3D "fr_BE.utf8" fr_BE.utf8 good Using LC_COLLATE =3D "fr_CA.utf8" Using LC_CTYPE =3D "fr_CA.utf8" fr_CA.utf8 good Using LC_COLLATE =3D "fr_CH.utf8" Using LC_CTYPE =3D "fr_CH.utf8" fr_CH.utf8 good Using LC_COLLATE =3D "fr_FR.utf8" Using LC_CTYPE =3D "fr_FR.utf8" fr_FR.utf8 good Using LC_COLLATE =3D "fr_LU.utf8" Using LC_CTYPE =3D "fr_LU.utf8" fr_LU.utf8 good Using LC_COLLATE =3D "fur_IT" Using LC_CTYPE =3D "fur_IT" fur_IT good Using LC_COLLATE =3D "fy_DE" Using LC_CTYPE =3D "fy_DE" fy_DE good Using LC_COLLATE =3D "fy_NL" Using LC_CTYPE =3D "fy_NL" fy_NL good Using LC_COLLATE =3D "ga_IE.utf8" Using LC_CTYPE =3D "ga_IE.utf8" ga_IE.utf8 good Using LC_COLLATE =3D "gd_GB.utf8" Using LC_CTYPE =3D "gd_GB.utf8" gd_GB.utf8 good Using LC_COLLATE =3D "gez_ER" Using LC_CTYPE =3D "gez_ER" gez_ER good Using LC_COLLATE =3D "gez_ER@abegede" Using LC_CTYPE =3D "gez_ER@abegede" gez_ER@abegede good Using LC_COLLATE =3D "gez_ET" Using LC_CTYPE =3D "gez_ET" gez_ET good Using LC_COLLATE =3D "gez_ET@abegede" Using LC_CTYPE =3D "gez_ET@abegede" gez_ET@abegede good Using LC_COLLATE =3D "gl_ES.utf8" Using LC_CTYPE =3D "gl_ES.utf8" gl_ES.utf8 good Using LC_COLLATE =3D "gu_IN" Using LC_CTYPE =3D "gu_IN" gu_IN good Using LC_COLLATE =3D "gv_GB.utf8" Using LC_CTYPE =3D "gv_GB.utf8" gv_GB.utf8 good Using LC_COLLATE =3D "hak_TW" Using LC_CTYPE =3D "hak_TW" hak_TW good Using LC_COLLATE =3D "ha_NG" Using LC_CTYPE =3D "ha_NG" ha_NG good Using LC_COLLATE =3D "he_IL.utf8" Using LC_CTYPE =3D "he_IL.utf8" he_IL.utf8 good Using LC_COLLATE =3D "hi_IN" Using LC_CTYPE =3D "hi_IN" hi_IN good Using LC_COLLATE =3D "hne_IN" Using LC_CTYPE =3D "hne_IN" hne_IN good Using LC_COLLATE =3D "hr_HR.utf8" Using LC_CTYPE =3D "hr_HR.utf8" hr_HR.utf8 good Using LC_COLLATE =3D "hsb_DE.utf8" Using LC_CTYPE =3D "hsb_DE.utf8" hsb_DE.utf8 good Using LC_COLLATE =3D "ht_HT" Using LC_CTYPE =3D "ht_HT" ht_HT good Using LC_COLLATE =3D "hu_HU.utf8" Using LC_CTYPE =3D "hu_HU.utf8" hu_HU.utf8 good Using LC_COLLATE =3D "hy_AM" Using LC_CTYPE =3D "hy_AM" hy_AM good Using LC_COLLATE =3D "ia_FR" Using LC_CTYPE =3D "ia_FR" ia_FR good Using LC_COLLATE =3D "id_ID.utf8" Using LC_CTYPE =3D "id_ID.utf8" id_ID.utf8 good Using LC_COLLATE =3D "ig_NG" Using LC_CTYPE =3D "ig_NG" ig_NG good Using LC_COLLATE =3D "ik_CA" Using LC_CTYPE =3D "ik_CA" ik_CA good Using LC_COLLATE =3D "is_IS.utf8" Using LC_CTYPE =3D "is_IS.utf8" is_IS.utf8 good Using LC_COLLATE =3D "it_CH.utf8" Using LC_CTYPE =3D "it_CH.utf8" it_CH.utf8 good Using LC_COLLATE =3D "it_IT.utf8" Using LC_CTYPE =3D "it_IT.utf8" it_IT.utf8 good Using LC_COLLATE =3D "iu_CA" Using LC_CTYPE =3D "iu_CA" iu_CA good Using LC_COLLATE =3D "iw_IL.utf8" Using LC_CTYPE =3D "iw_IL.utf8" iw_IL.utf8 good Using LC_COLLATE =3D "ja_JP.utf8" Using LC_CTYPE =3D "ja_JP.utf8" ja_JP.utf8 good Using LC_COLLATE =3D "ka_GE.utf8" Using LC_CTYPE =3D "ka_GE.utf8" ka_GE.utf8 good Using LC_COLLATE =3D "kk_KZ.utf8" Using LC_CTYPE =3D "kk_KZ.utf8" kk_KZ.utf8 good Using LC_COLLATE =3D "kl_GL.utf8" Using LC_CTYPE =3D "kl_GL.utf8" kl_GL.utf8 good Using LC_COLLATE =3D "km_KH" Using LC_CTYPE =3D "km_KH" km_KH good Using LC_COLLATE =3D "kn_IN" Using LC_CTYPE =3D "kn_IN" kn_IN good Using LC_COLLATE =3D "kok_IN" Using LC_CTYPE =3D "kok_IN" kok_IN good Using LC_COLLATE =3D "ko_KR.utf8" Using LC_CTYPE =3D "ko_KR.utf8" ko_KR.utf8 good Using LC_COLLATE =3D "ks_IN" Using LC_CTYPE =3D "ks_IN" ks_IN good Using LC_COLLATE =3D "ks_IN@devanagari" Using LC_CTYPE =3D "ks_IN@devanagari" ks_IN@devanagari good Using LC_COLLATE =3D "ku_TR.utf8" Using LC_CTYPE =3D "ku_TR.utf8" ku_TR.utf8 good Using LC_COLLATE =3D "kw_GB.utf8" Using LC_CTYPE =3D "kw_GB.utf8" kw_GB.utf8 good Using LC_COLLATE =3D "ky_KG" Using LC_CTYPE =3D "ky_KG" ky_KG good Using LC_COLLATE =3D "lb_LU" Using LC_CTYPE =3D "lb_LU" lb_LU good Using LC_COLLATE =3D "lg_UG.utf8" Using LC_CTYPE =3D "lg_UG.utf8" lg_UG.utf8 good Using LC_COLLATE =3D "li_BE" Using LC_CTYPE =3D "li_BE" li_BE good Using LC_COLLATE =3D "lij_IT" Using LC_CTYPE =3D "lij_IT" lij_IT good Using LC_COLLATE =3D "li_NL" Using LC_CTYPE =3D "li_NL" li_NL good Using LC_COLLATE =3D "lo_LA" Using LC_CTYPE =3D "lo_LA" lo_LA good Using LC_COLLATE =3D "lt_LT.utf8" Using LC_CTYPE =3D "lt_LT.utf8" lt_LT.utf8 good Using LC_COLLATE =3D "lv_LV.utf8" Using LC_CTYPE =3D "lv_LV.utf8" lv_LV.utf8 good Using LC_COLLATE =3D "lzh_TW" Using LC_CTYPE =3D "lzh_TW" lzh_TW good Using LC_COLLATE =3D "mag_IN" Using LC_CTYPE =3D "mag_IN" mag_IN good Using LC_COLLATE =3D "mai_IN" Using LC_CTYPE =3D "mai_IN" mai_IN good Using LC_COLLATE =3D "mg_MG.utf8" Using LC_CTYPE =3D "mg_MG.utf8" mg_MG.utf8 good Using LC_COLLATE =3D "mhr_RU" Using LC_CTYPE =3D "mhr_RU" mhr_RU good Using LC_COLLATE =3D "mi_NZ.utf8" Using LC_CTYPE =3D "mi_NZ.utf8" mi_NZ.utf8 good Using LC_COLLATE =3D "mk_MK.utf8" Using LC_CTYPE =3D "mk_MK.utf8" mk_MK.utf8 good Using LC_COLLATE =3D "ml_IN" Using LC_CTYPE =3D "ml_IN" ml_IN good Using LC_COLLATE =3D "mni_IN" Using LC_CTYPE =3D "mni_IN" mni_IN good Using LC_COLLATE =3D "mn_MN" Using LC_CTYPE =3D "mn_MN" mn_MN good Using LC_COLLATE =3D "mr_IN" Using LC_CTYPE =3D "mr_IN" mr_IN good Using LC_COLLATE =3D "ms_MY.utf8" Using LC_CTYPE =3D "ms_MY.utf8" ms_MY.utf8 good Using LC_COLLATE =3D "mt_MT.utf8" Using LC_CTYPE =3D "mt_MT.utf8" mt_MT.utf8 good Using LC_COLLATE =3D "my_MM" Using LC_CTYPE =3D "my_MM" my_MM good Using LC_COLLATE =3D "nan_TW" Using LC_CTYPE =3D "nan_TW" nan_TW good Using LC_COLLATE =3D "nan_TW@latin" Using LC_CTYPE =3D "nan_TW@latin" nan_TW@latin good Using LC_COLLATE =3D "nb_NO.utf8" Using LC_CTYPE =3D "nb_NO.utf8" nb_NO.utf8 good Using LC_COLLATE =3D "nds_DE" Using LC_CTYPE =3D "nds_DE" nds_DE good Using LC_COLLATE =3D "nds_NL" Using LC_CTYPE =3D "nds_NL" nds_NL good Using LC_COLLATE =3D "ne_NP" Using LC_CTYPE =3D "ne_NP" ne_NP good Using LC_COLLATE =3D "nhn_MX" Using LC_CTYPE =3D "nhn_MX" nhn_MX good Using LC_COLLATE =3D "niu_NU" Using LC_CTYPE =3D "niu_NU" niu_NU good Using LC_COLLATE =3D "niu_NZ" Using LC_CTYPE =3D "niu_NZ" niu_NZ good Using LC_COLLATE =3D "nl_AW" Using LC_CTYPE =3D "nl_AW" nl_AW good Using LC_COLLATE =3D "nl_BE.utf8" Using LC_CTYPE =3D "nl_BE.utf8" nl_BE.utf8 good Using LC_COLLATE =3D "nl_NL.utf8" Using LC_CTYPE =3D "nl_NL.utf8" nl_NL.utf8 good Using LC_COLLATE =3D "nn_NO.utf8" Using LC_CTYPE =3D "nn_NO.utf8" nn_NO.utf8 good Using LC_COLLATE =3D "nr_ZA" Using LC_CTYPE =3D "nr_ZA" nr_ZA good Using LC_COLLATE =3D "nso_ZA" Using LC_CTYPE =3D "nso_ZA" nso_ZA good Using LC_COLLATE =3D "oc_FR.utf8" Using LC_CTYPE =3D "oc_FR.utf8" oc_FR.utf8 good Using LC_COLLATE =3D "om_ET" Using LC_CTYPE =3D "om_ET" om_ET good Using LC_COLLATE =3D "om_KE.utf8" Using LC_CTYPE =3D "om_KE.utf8" om_KE.utf8 good Using LC_COLLATE =3D "or_IN" Using LC_CTYPE =3D "or_IN" or_IN good Using LC_COLLATE =3D "os_RU" Using LC_CTYPE =3D "os_RU" os_RU good Using LC_COLLATE =3D "pa_IN" Using LC_CTYPE =3D "pa_IN" pa_IN good Using LC_COLLATE =3D "pap_AN" Using LC_CTYPE =3D "pap_AN" pap_AN good Using LC_COLLATE =3D "pap_AW" Using LC_CTYPE =3D "pap_AW" pap_AW good Using LC_COLLATE =3D "pap_CW" Using LC_CTYPE =3D "pap_CW" pap_CW good Using LC_COLLATE =3D "pa_PK" Using LC_CTYPE =3D "pa_PK" pa_PK good Using LC_COLLATE =3D "pl_PL.utf8" Using LC_CTYPE =3D "pl_PL.utf8" pl_PL.utf8 good Using LC_COLLATE =3D "ps_AF" Using LC_CTYPE =3D "ps_AF" ps_AF good Using LC_COLLATE =3D "pt_BR.utf8" Using LC_CTYPE =3D "pt_BR.utf8" pt_BR.utf8 good Using LC_COLLATE =3D "pt_PT.utf8" Using LC_CTYPE =3D "pt_PT.utf8" pt_PT.utf8 good Using LC_COLLATE =3D "quz_PE" Using LC_CTYPE =3D "quz_PE" quz_PE good Using LC_COLLATE =3D "raj_IN" Using LC_CTYPE =3D "raj_IN" raj_IN good Using LC_COLLATE =3D "ro_RO.utf8" Using LC_CTYPE =3D "ro_RO.utf8" ro_RO.utf8 good Using LC_COLLATE =3D "ru_RU.utf8" Using LC_CTYPE =3D "ru_RU.utf8" ru_RU.utf8 good Using LC_COLLATE =3D "ru_UA.utf8" Using LC_CTYPE =3D "ru_UA.utf8" ru_UA.utf8 good Using LC_COLLATE =3D "rw_RW" Using LC_CTYPE =3D "rw_RW" rw_RW good Using LC_COLLATE =3D "sa_IN" Using LC_CTYPE =3D "sa_IN" sa_IN good Using LC_COLLATE =3D "sat_IN" Using LC_CTYPE =3D "sat_IN" sat_IN good Using LC_COLLATE =3D "sc_IT" Using LC_CTYPE =3D "sc_IT" sc_IT good Using LC_COLLATE =3D "sd_IN" Using LC_CTYPE =3D "sd_IN" sd_IN good Using LC_COLLATE =3D "sd_IN@devanagari" Using LC_CTYPE =3D "sd_IN@devanagari" sd_IN@devanagari good Using LC_COLLATE =3D "se_NO" Using LC_CTYPE =3D "se_NO" se_NO good Using LC_COLLATE =3D "shs_CA" Using LC_CTYPE =3D "shs_CA" shs_CA good Using LC_COLLATE =3D "sid_ET" Using LC_CTYPE =3D "sid_ET" sid_ET good Using LC_COLLATE =3D "si_LK" Using LC_CTYPE =3D "si_LK" si_LK good Using LC_COLLATE =3D "sk_SK.utf8" Using LC_CTYPE =3D "sk_SK.utf8" sk_SK.utf8 good Using LC_COLLATE =3D "sl_SI.utf8" Using LC_CTYPE =3D "sl_SI.utf8" sl_SI.utf8 good Using LC_COLLATE =3D "so_DJ.utf8" Using LC_CTYPE =3D "so_DJ.utf8" so_DJ.utf8 good Using LC_COLLATE =3D "so_ET" Using LC_CTYPE =3D "so_ET" so_ET good Using LC_COLLATE =3D "so_KE.utf8" Using LC_CTYPE =3D "so_KE.utf8" so_KE.utf8 good Using LC_COLLATE =3D "so_SO.utf8" Using LC_CTYPE =3D "so_SO.utf8" so_SO.utf8 good Using LC_COLLATE =3D "sq_AL.utf8" Using LC_CTYPE =3D "sq_AL.utf8" sq_AL.utf8 good Using LC_COLLATE =3D "sq_MK" Using LC_CTYPE =3D "sq_MK" sq_MK good Using LC_COLLATE =3D "sr_ME" Using LC_CTYPE =3D "sr_ME" sr_ME good Using LC_COLLATE =3D "sr_RS" Using LC_CTYPE =3D "sr_RS" sr_RS good Using LC_COLLATE =3D "sr_RS@latin" Using LC_CTYPE =3D "sr_RS@latin" sr_RS@latin good Using LC_COLLATE =3D "ss_ZA" Using LC_CTYPE =3D "ss_ZA" ss_ZA good Using LC_COLLATE =3D "st_ZA.utf8" Using LC_CTYPE =3D "st_ZA.utf8" st_ZA.utf8 good Using LC_COLLATE =3D "sv_FI.utf8" Using LC_CTYPE =3D "sv_FI.utf8" sv_FI.utf8 good Using LC_COLLATE =3D "sv_SE.utf8" Using LC_CTYPE =3D "sv_SE.utf8" sv_SE.utf8 good Using LC_COLLATE =3D "sw_KE" Using LC_CTYPE =3D "sw_KE" sw_KE good Using LC_COLLATE =3D "sw_TZ" Using LC_CTYPE =3D "sw_TZ" sw_TZ good Using LC_COLLATE =3D "szl_PL" Using LC_CTYPE =3D "szl_PL" szl_PL good Using LC_COLLATE =3D "ta_IN" Using LC_CTYPE =3D "ta_IN" ta_IN good Using LC_COLLATE =3D "ta_LK" Using LC_CTYPE =3D "ta_LK" ta_LK good Using LC_COLLATE =3D "tcy_IN.utf8" Using LC_CTYPE =3D "tcy_IN.utf8" tcy_IN.utf8 good Using LC_COLLATE =3D "te_IN" Using LC_CTYPE =3D "te_IN" te_IN good Using LC_COLLATE =3D "tg_TJ.utf8" Using LC_CTYPE =3D "tg_TJ.utf8" tg_TJ.utf8 good Using LC_COLLATE =3D "the_NP" Using LC_CTYPE =3D "the_NP" the_NP good Using LC_COLLATE =3D "th_TH.utf8" Using LC_CTYPE =3D "th_TH.utf8" th_TH.utf8 good Using LC_COLLATE =3D "ti_ER" Using LC_CTYPE =3D "ti_ER" ti_ER good Using LC_COLLATE =3D "ti_ET" Using LC_CTYPE =3D "ti_ET" ti_ET good Using LC_COLLATE =3D "tig_ER" Using LC_CTYPE =3D "tig_ER" tig_ER good Using LC_COLLATE =3D "tk_TM" Using LC_CTYPE =3D "tk_TM" tk_TM good Using LC_COLLATE =3D "tl_PH.utf8" Using LC_CTYPE =3D "tl_PH.utf8" tl_PH.utf8 good Using LC_COLLATE =3D "tn_ZA" Using LC_CTYPE =3D "tn_ZA" tn_ZA good Using LC_COLLATE =3D "tr_CY.utf8" Using LC_CTYPE =3D "tr_CY.utf8" tr_CY.utf8 good Using LC_COLLATE =3D "tr_TR.utf8" Using LC_CTYPE =3D "tr_TR.utf8" tr_TR.utf8 good Using LC_COLLATE =3D "ts_ZA" Using LC_CTYPE =3D "ts_ZA" ts_ZA good Using LC_COLLATE =3D "tt_RU" Using LC_CTYPE =3D "tt_RU" tt_RU good Using LC_COLLATE =3D "tt_RU@iqtelif" Using LC_CTYPE =3D "tt_RU@iqtelif" tt_RU@iqtelif good Using LC_COLLATE =3D "ug_CN" Using LC_CTYPE =3D "ug_CN" ug_CN good Using LC_COLLATE =3D "uk_UA.utf8" Using LC_CTYPE =3D "uk_UA.utf8" uk_UA.utf8 good Using LC_COLLATE =3D "unm_US" Using LC_CTYPE =3D "unm_US" unm_US good Using LC_COLLATE =3D "ur_IN" Using LC_CTYPE =3D "ur_IN" ur_IN good Using LC_COLLATE =3D "ur_PK" Using LC_CTYPE =3D "ur_PK" ur_PK good Using LC_COLLATE =3D "uz_UZ@cyrillic" Using LC_CTYPE =3D "uz_UZ@cyrillic" uz_UZ@cyrillic good Using LC_COLLATE =3D "uz_UZ.utf8" Using LC_CTYPE =3D "uz_UZ.utf8" uz_UZ.utf8 good Using LC_COLLATE =3D "ve_ZA" Using LC_CTYPE =3D "ve_ZA" ve_ZA good Using LC_COLLATE =3D "vi_VN" Using LC_CTYPE =3D "vi_VN" vi_VN good Using LC_COLLATE =3D "wa_BE.utf8" Using LC_CTYPE =3D "wa_BE.utf8" wa_BE.utf8 good Using LC_COLLATE =3D "wae_CH" Using LC_CTYPE =3D "wae_CH" wae_CH good Using LC_COLLATE =3D "wal_ET" Using LC_CTYPE =3D "wal_ET" wal_ET good Using LC_COLLATE =3D "wo_SN" Using LC_CTYPE =3D "wo_SN" wo_SN good Using LC_COLLATE =3D "xh_ZA.utf8" Using LC_CTYPE =3D "xh_ZA.utf8" xh_ZA.utf8 good Using LC_COLLATE =3D "yi_US.utf8" Using LC_CTYPE =3D "yi_US.utf8" yi_US.utf8 good Using LC_COLLATE =3D "yo_NG" Using LC_CTYPE =3D "yo_NG" yo_NG good Using LC_COLLATE =3D "yue_HK" Using LC_CTYPE =3D "yue_HK" yue_HK good Using LC_COLLATE =3D "zh_CN.utf8" Using LC_CTYPE =3D "zh_CN.utf8" zh_CN.utf8 good Using LC_COLLATE =3D "zh_HK.utf8" Using LC_CTYPE =3D "zh_HK.utf8" zh_HK.utf8 good Using LC_COLLATE =3D "zh_SG.utf8" Using LC_CTYPE =3D "zh_SG.utf8" zh_SG.utf8 good Using LC_COLLATE =3D "zh_TW.utf8" Using LC_CTYPE =3D "zh_TW.utf8" zh_TW.utf8 good Using LC_COLLATE =3D "zu_ZA.utf8" Using LC_CTYPE =3D "zu_ZA.utf8" zu_ZA.utf8 good Thanks! Stephen
Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Tue, Mar 22, 2016 at 3:06 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Well, if we implement a compatibility GUC that shuts off our > dependency on strxfrm(), people can go back to having 9.5 be no more > broken than 9.4 was. I vote we do that and go home. I don't have a problem with that idea, but I fear "no more broken than 9.4 was" might be a very low bar for certain systems and collations. Abbreviated key may have simply unmasked the problem in some cases. Consider: [vagrant@localhost ~]$ LC_COLLATE=en_us sort strings.txt <-- correct x xx x xx" xxx xxx" [vagrant@localhost ~]$ LC_COLLATE=de_DE sort strings.txt <-- wrong xxx xxx" x xx x xx" [vagrant@localhost ~]$ ./strxfrm-binary de_DE.UTF-8 'xxx' 'x xx' "xxx" -> 2323230108080801020202 (11 bytes) "x xx" -> 2323230108080801020202010235 (14 bytes) strcmp(arg1, arg2) result: -1 strcoll(arg1, arg2) result: 6 My concern was not merely "academic" (i.e. it was not limited in scope to things that don't make B-Tree indexes corrupt). Pretty sure that we need to start thinking of this as a problem with strcoll() that strxfrm() does not have for more fundamental reasons, because strcoll() says that the first string in the de_DE sorted list is *greater* than the third string. That's wrong, and not just because strxfrm() gives an intuitively correct answer -- it's wrong specifically because the transitive law has been broken. -- Peter Geoghegan
Peter Geoghegan <pg@heroku.com> writes: > My concern was not merely "academic" (i.e. it was not limited in scope > to things that don't make B-Tree indexes corrupt). Pretty sure that we > need to start thinking of this as a problem with strcoll() that > strxfrm() does not have for more fundamental reasons, because > strcoll() says that the first string in the de_DE sorted list is > *greater* than the third string. [ squint... ] I was looking specifically for that sort of misbehavior in my test program, and I haven't seen it. regards, tom lane
On Tue, Mar 22, 2016 at 07:19:44PM -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > I was a little worried that it was too much to hope for that all libc > > vendors on earth would ship a strxfrm() implementation that was actually > > consistent with strcoll(), and here we are. > > Indeed. To try to put some scope on the problem, I made an idiot little > program that just generates some random UTF8 strings and sees whether > strcoll and strxfrm sort them alike. Attached are that program, a even > more idiot little shell script that runs it over all available UTF8 > locales, and the results on my RHEL6 box. While de_DE seems to be the > worst-broken locale, it's far from the only one. > > Please try this on as many platforms as you can get hold of ... I, too, found MAXXFRMLEN insufficient; I raised it fourfold. Cygwin 2.2.1(0.289/5/3) caught fire; 10% of locales passed. (varstr_sortsupport() already blacklists the UTF8/native Windows case.) The test passed on Solaris 10, Solaris 11, HP-UX B.11.31, OpenBSD 5.0, NetBSD 5.1.2, and FreeBSD 9.0. See attached tryalllocales.sh outputs. I did not test AIX, because the AIX machines I use have no UTF8 locales installed.
Attachment
--On 22. März 2016 19:19:44 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Please try this on as many platforms as you can get hold of ... Since i have to work on SuSE/SLES platforms atm some results from them (openLeap/SLES12 are identical, but that isn't a surprise since SLES12 is based on openLeap42.1): SLES12: grep BAD results_sles12.txt ca_AD.utf8 BAD ca_ES.utf8 BAD ca_FR.utf8 BAD ca_IT.utf8 BAD da_DK.utf8 BAD de_DE.utf8 BAD en_BE.utf8 BAD en_CA.utf8 BAD es_EC.utf8 BAD es_US.utf8 BAD fi_FI.utf8 BAD fo_FO.utf8 BAD fr_CA.utf8 BAD hu_HU.utf8 BAD kl_GL.utf8 BAD ku_TR.utf8 BAD nb_NO.utf8 BAD nn_NO.utf8 BAD no_NO.utf8 BAD ro_RO.utf8 BAD sh_YU.utf8 BAD sq_AL.utf8 BAD sv_FI.utf8 BAD sv_SE.utf8 BAD SLES11 SP4: grep BAD results_sles11sp4.txt az_AZ.utf8 BAD ca_AD.utf8 BAD ca_ES.utf8 BAD ca_FR.utf8 BAD ca_IT.utf8 BAD da_DK.utf8 BAD de_DE.utf8 BAD en_BE.utf8 BAD en_CA.utf8 BAD es_EC.utf8 BAD es_US.utf8 BAD fi_FI.utf8 BAD fo_FO.utf8 BAD fr_CA.utf8 BAD hu_HU.utf8 BAD kl_GL.utf8 BAD ku_TR.utf8 BAD nb_NO.utf8 BAD nn_NO.utf8 BAD no_NO.utf8 BAD ro_RO.utf8 BAD se_NO.utf8 BAD sh_YU.utf8 BAD sq_AL.utf8 BAD sv_FI.utf8 BAD sv_SE.utf8 BAD tt_RU.utf8 BAD tt_RU@iqtelif.UTF-8 BAD openSuSE/openLeap 42.1: grep BAD results_openleap421.txt ca_AD.utf8 BAD ca_ES.utf8 BAD ca_FR.utf8 BAD ca_IT.utf8 BAD da_DK.utf8 BAD de_DE.utf8 BAD en_BE.utf8 BAD en_CA.utf8 BAD es_EC.utf8 BAD es_US.utf8 BAD fi_FI.utf8 BAD fo_FO.utf8 BAD fr_CA.utf8 BAD hu_HU.utf8 BAD kl_GL.utf8 BAD ku_TR.utf8 BAD nb_NO.utf8 BAD nn_NO.utf8 BAD no_NO.utf8 BAD ro_RO.utf8 BAD sh_YU.utf8 BAD sq_AL.utf8 BAD sv_FI.utf8 BAD sv_SE.utf8 BAD -- Thanks Bernd
Attachment
On Tue, Mar 22, 2016 at 10:44 PM, Noah Misch <noah@leadboat.com> wrote: > On Tue, Mar 22, 2016 at 07:19:44PM -0400, Tom Lane wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >> > I was a little worried that it was too much to hope for that all libc >> > vendors on earth would ship a strxfrm() implementation that was actually >> > consistent with strcoll(), and here we are. >> >> Indeed. To try to put some scope on the problem, I made an idiot little >> program that just generates some random UTF8 strings and sees whether >> strcoll and strxfrm sort them alike. Attached are that program, a even >> more idiot little shell script that runs it over all available UTF8 >> locales, and the results on my RHEL6 box. While de_DE seems to be the >> worst-broken locale, it's far from the only one. >> >> Please try this on as many platforms as you can get hold of ... > > I, too, found MAXXFRMLEN insufficient; I raised it fourfold. Cygwin > 2.2.1(0.289/5/3) caught fire; 10% of locales passed. (varstr_sortsupport() > already blacklists the UTF8/native Windows case.) The test passed on Solaris > 10, Solaris 11, HP-UX B.11.31, OpenBSD 5.0, NetBSD 5.1.2, and FreeBSD 9.0. > See attached tryalllocales.sh outputs. I did not test AIX, because the AIX > machines I use have no UTF8 locales installed. Wow, thanks for the extensive testing. This suggests that, apart from Cygwin which apparently doesn't matter right now, the only thing that is busted is glibc. I believe we have yet to see a single locale that fails anywhere else (apart from Cygwin). Good thing so few of our users run glibc! Ha ha, little joke there. So, options: 1. We could make it the user's problem to figure out whether they've got a buggy glibc and add a GUC to shut this off, as previously suggested. 2. We could add a blacklist (either hardcoded or a GUC) shutting this off for locales known to be buggy anywhere. 3. We could write some test code that runs at startup time which reliably detects all of the broken locales we've so far uncovered and disables this if so. 4. We could shut this off for all Linux users in all locales and tell everybody to REINDEX. That would be pretty sad, though. Thoughts? Other ideas? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Mar 22, 2016 at 10:44 PM, Noah Misch <noah@leadboat.com> wrote: >> I, too, found MAXXFRMLEN insufficient; I raised it fourfold. Cygwin >> 2.2.1(0.289/5/3) caught fire; 10% of locales passed. (varstr_sortsupport() >> already blacklists the UTF8/native Windows case.) The test passed on Solaris >> 10, Solaris 11, HP-UX B.11.31, OpenBSD 5.0, NetBSD 5.1.2, and FreeBSD 9.0. >> See attached tryalllocales.sh outputs. I did not test AIX, because the AIX >> machines I use have no UTF8 locales installed. > Wow, thanks for the extensive testing. This suggests that, apart from > Cygwin which apparently doesn't matter right now, the only thing that > is busted is glibc. I believe we have yet to see a single locale that > fails anywhere else (apart from Cygwin). Good thing so few of our > users run glibc! I extended my test program to be able to check locales using ISO-8859-x encodings. RHEL6 shows me failures in a set of locales that is remarkably unlike the set it fails on for UTF8 (though good ol de_DE manages to fail in both encodings, as do a few others). I'm not sure what that implies for the underlying bug(s). > So, options: > 1. We could make it the user's problem to figure out whether they've > got a buggy glibc and add a GUC to shut this off, as previously > suggested. > 2. We could add a blacklist (either hardcoded or a GUC) shutting this > off for locales known to be buggy anywhere. > 3. We could write some test code that runs at startup time which > reliably detects all of the broken locales we've so far uncovered and > disables this if so. > 4. We could shut this off for all Linux users in all locales and tell > everybody to REINDEX. That would be pretty sad, though. TBH, I think #1 is right out, unless maybe the GUC defaults to off. We aren't that cavalier with data consistency in other departments. #2 and #3 presume a level of knowledge of the bug details that we have not got, and probably can't get by Monday. As far as #4 goes, we're going to have to tell people to REINDEX no matter what the other aspects of the fix look like. On-disk indexes are broken right now, if you're using one of the affected locales. regards, tom lane #include <stdio.h> #include <stdlib.h> #include <string.h> #include <locale.h> #include <langinfo.h> #include <time.h> /* * Test: generate 1000 random UTF8 strings, sort them by strcoll, sanity- * check the sort result, sort them by strxfrm, sanity-check that result, * and compare the two sort orders. */ #define NSTRINGS 1000 #define MAXSTRLEN 20 #define MAXXFRMLEN (MAXSTRLEN * 10) typedef struct { char strval[MAXSTRLEN]; char xfrmval[MAXXFRMLEN]; int strsortpos; int xfrmsortpos; } OneString; /* qsort comparators */ static int strcoll_compare(const void *pa, const void *pb) { const OneString *a = (const OneString *) pa; const OneString *b = (const OneString *) pb; return strcoll(a->strval, b->strval); } static int strxfrm_compare(const void *pa, const void *pb) { const OneString *a = (const OneString *) pa; const OneString *b = (const OneString *) pb; return strcmp(a->xfrmval, b->xfrmval); } /* returns 1 if OK, 0 if inconsistency detected */ static int run_test_case(int is_utf8) { int ok = 1; OneString data[NSTRINGS]; int i, j; /* Generate random strings of length less than MAXSTRLEN bytes */ for (i = 0; i < NSTRINGS; i++) { char *p = data[i].strval; int len; len = 1 + (random() % (MAXSTRLEN - 1)); while (len > 0) { int c; /* Generate random printable char in ISO8859-1 range */ /* Bias towards producing a lot of spaces */ if ((random() % 16) < 3) c = ' '; else { do { c = random() & 0xFF; } while (!((c >= ' ' && c <= 127) || (c >= 0xA0 && c <= 0xFF))); } if (c <= 127 || !is_utf8) { *p++ = c; len--; } else { if (len < 2) break; /* Poor man's utf8-ification */ *p++ = 0xC0 + (c >> 6); len--; *p++ = 0x80 + (c & 0x3F); len--; } } *p = '\0'; /* strxfrm each string as we produce it */ if (strxfrm(data[i].xfrmval, data[i].strval, MAXXFRMLEN) >= MAXXFRMLEN) { fprintf(stderr, "strxfrm() result for %d-length string exceeded %d bytes\n", (int) strlen(data[i].strval), MAXXFRMLEN); exit(1); } #if 0 printf("%d %s\n", i, data[i].strval); #endif } /* Sort per strcoll(), and label, being careful in case some are equal */ qsort(data, NSTRINGS, sizeof(OneString), strcoll_compare); j = 0; for (i = 0; i < NSTRINGS; i++) { if (i > 0 && strcoll(data[i].strval, data[i-1].strval) != 0) j++; data[i].strsortpos = j; } /* Sanity-check: is each string <= those after it? */ for (i = 0; i < NSTRINGS; i++) { for (j = i + 1; j < NSTRINGS; j++) { if (strcoll(data[i].strval, data[j].strval) > 0) { fprintf(stdout, "strcoll sort inconsistency between positions %d and %d\n", i, j); ok = 0; } } } /* Sort per strxfrm(), and label, being careful in case some are equal */ qsort(data, NSTRINGS, sizeof(OneString), strxfrm_compare); j = 0; for (i = 0; i < NSTRINGS; i++) { if (i > 0 && strcmp(data[i].xfrmval, data[i-1].xfrmval) != 0) j++; data[i].xfrmsortpos = j; } /* Sanity-check: is each string <= those after it? */ for (i = 0; i < NSTRINGS; i++) { for (j = i + 1; j < NSTRINGS; j++) { if (strcmp(data[i].xfrmval, data[j].xfrmval) > 0) { fprintf(stdout, "strxfrm sort inconsistency between positions %d and %d\n", i, j); ok = 0; } } } /* Compare */ for (i = 0; i < NSTRINGS; i++) { if (data[i].strsortpos != data[i].xfrmsortpos) { fprintf(stdout, "inconsistency between strcoll (%d) and strxfrm (%d) orders\n", data[i].strsortpos, data[i].xfrmsortpos); ok = 0; } } return ok; } int main(int argc, char **argv) { const char *lc; const char *cset; int is_utf8; int ntries; int result = 0; /* Absorb locale from environment, and report what we're using */ if (setlocale(LC_ALL, "") == NULL) { perror("setlocale(LC_ALL) failed"); exit(1); } lc = setlocale(LC_COLLATE, NULL); if (lc) { printf("Using LC_COLLATE = \"%s\"\n", lc); } else { perror("setlocale(LC_COLLATE) failed"); exit(1); } lc = setlocale(LC_CTYPE, NULL); if (lc) { printf("Using LC_CTYPE = \"%s\"\n", lc); } else { perror("setlocale(LC_CTYPE) failed"); exit(1); } /* Identify encoding */ cset = nl_langinfo(CODESET); if (!cset) { perror("nl_langinfo(CODESET) failed"); exit(1); } if (strstr(cset, "utf") || strstr(cset, "UTF")) is_utf8 = 1; else if (strstr(cset, "iso-8859") || strstr(cset, "ISO-8859") || strstr(cset, "iso8859") || strstr(cset, "ISO8859")) is_utf8 = 0; else { fprintf(stderr, "unrecognized codeset name \"%s\"\n", cset); exit(1); } /* Ensure new random() values on every run */ srandom((unsigned int) time(NULL)); /* argv[1] can be the max number of tries to run */ if (argc > 1) ntries = atoi(argv[1]); else ntries = 1; /* Run one test instance per loop */ while (ntries-- > 0) { if (!run_test_case(is_utf8)) result = 1; } return result; } az_AZ.utf8 BAD ca_AD.utf8 BAD ca_ES.utf8 BAD ca_FR.utf8 BAD ca_IT.utf8 BAD crh_UA.utf8 BAD csb_PL.utf8 BAD cv_RU.utf8 BAD da_DK.utf8 BAD de_DE.utf8 BAD en_CA.utf8 BAD es_EC.utf8 BAD es_US.utf8 BAD fi_FI.utf8 BAD fil_PH.utf8 BAD fo_FO.utf8 BAD fr_CA.utf8 BAD fur_IT.utf8 BAD hu_HU.utf8 BAD ig_NG.utf8 BAD ik_CA.utf8 BAD iu_CA.utf8 BAD kl_GL.utf8 BAD ku_TR.utf8 BAD nb_NO.utf8 BAD nn_NO.utf8 BAD no_NO.utf8 BAD ro_RO.utf8 BAD sc_IT.utf8 BAD se_NO.utf8 BAD shs_CA.utf8 BAD sq_AL.utf8 BAD sq_MK.utf8 BAD sv_FI.utf8 BAD sv_SE.utf8 BAD tk_TM.utf8 BAD tt_RU.utf8 BAD tt_RU.utf8@iqtelif BAD ug_CN.utf8 BAD vi_VN.utf8 BAD yo_NG.utf8 BAD ar_AE.iso88596 BAD ar_BH.iso88596 BAD ar_DZ.iso88596 BAD ar_EG.iso88596 BAD ar_IQ.iso88596 BAD ar_JO.iso88596 BAD ar_KW.iso88596 BAD ar_LB.iso88596 BAD ar_LY.iso88596 BAD ar_MA.iso88596 BAD ar_OM.iso88596 BAD ar_QA.iso88596 BAD ar_SD.iso88596 BAD ar_SY.iso88596 BAD ar_TN.iso88596 BAD ar_YE.iso88596 BAD bs_BA.iso88592 BAD ca_AD.iso885915 BAD ca_ES.iso88591 BAD ca_ES.iso885915@euro BAD ca_FR.iso885915 BAD ca_IT.iso885915 BAD da_DK.iso88591 BAD da_DK.iso885915 BAD de_DE.iso88591 BAD es_EC.iso88591 BAD es_US.iso88591 BAD fi_FI.iso88591 BAD fi_FI.iso885915@euro BAD fo_FO.iso88591 BAD fr_CA.iso88591 BAD he_IL.iso88598 BAD hu_HU.iso88592 BAD iw_IL.iso88598 BAD kl_GL.iso88591 BAD ku_TR.iso88599 BAD mk_MK.iso88595 BAD mt_MT.iso88593 BAD nb_NO.iso88591 BAD nn_NO.iso88591 BAD no_NO.iso88591 BAD ro_RO.iso88592 BAD ru_RU.iso88595 BAD sq_AL.iso88591 BAD sv_FI.iso88591 BAD sv_FI.iso885915@euro BAD sv_SE.iso88591 BAD sv_SE.iso885915 BAD
I wrote: > I extended my test program to be able to check locales using ISO-8859-x > encodings. RHEL6 shows me failures in a set of locales that is remarkably > unlike the set it fails on for UTF8 (though good ol de_DE manages to fail > in both encodings, as do a few others). I'm not sure what that implies > for the underlying bug(s). Closer analysis says that all of the cases where only utf8 is reported to fail are in fact because there is no iso8859 equivalent locale on my machine. Many of the cases where only iso8859 is reported to fail are just chance passes due to not having randomly generated a failure case; you can reduce the odds of that by passing strcolltest a repeat count larger than 1. There remain, however, a few locales in which it seems that indeed iso8859 is broken and utf8 is not; ru_RU being the most prominent example. In short, the problem is actually worse in non-UTF8 locales. regards, tom lane
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
"David G. Johnston"
Date:
On Wed, Mar 23, 2016 at 9:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: > > I extended my test program to be able to check locales using ISO-8859-x > > encodings. RHEL6 shows me failures in a set of locales that is > remarkably > > unlike the set it fails on for UTF8 (though good ol de_DE manages to fail > > in both encodings, as do a few others). I'm not sure what that implies > > for the underlying bug(s). > > Closer analysis says that all of the cases where only utf8 is reported to > fail are in fact because there is no iso8859 equivalent locale on my > machine. Many of the cases where only iso8859 is reported to fail are > just chance passes due to not having randomly generated a failure case; > you can reduce the odds of that by passing strcolltest a repeat count > larger than 1. There remain, however, a few locales in which it seems > that indeed iso8859 is broken and utf8 is not; ru_RU being the most > prominent example. > > In short, the problem is actually worse in non-UTF8 locales. > Is the POSIX/C (non)-locale affected? David J.
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Robert Haas
Date:
On Wed, Mar 23, 2016 at 12:19 PM, David G. Johnston <david.g.johnston@gmail.com> wrote: > Is the POSIX/C (non)-locale affected? We don't use strxfrm() or strcoll() in that case, so I sure hope not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Robert Haas
Date:
On Wed, Mar 23, 2016 at 12:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> I extended my test program to be able to check locales using ISO-8859-x >> encodings. RHEL6 shows me failures in a set of locales that is remarkably >> unlike the set it fails on for UTF8 (though good ol de_DE manages to fail >> in both encodings, as do a few others). I'm not sure what that implies >> for the underlying bug(s). > > Closer analysis says that all of the cases where only utf8 is reported to > fail are in fact because there is no iso8859 equivalent locale on my > machine. Many of the cases where only iso8859 is reported to fail are > just chance passes due to not having randomly generated a failure case; > you can reduce the odds of that by passing strcolltest a repeat count > larger than 1. There remain, however, a few locales in which it seems > that indeed iso8859 is broken and utf8 is not; ru_RU being the most > prominent example. > > In short, the problem is actually worse in non-UTF8 locales. I guess that's not terribly surprising. If the glibc maintainers haven't managed to get this right for UTF-8 locales, I can't imagine why they would have been more careful for non-UTF-8 locales that - I would guess - get less use. Are you still in information-gathering more, or are you going to issue a recommendation on how we should proceed here, or what? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Are you still in information-gathering more, or are you going to issue > a recommendation on how we should proceed here, or what? If I had to make a recommendation right now, I would go for your option #4, ie shut 'em all down Scotty. We do not know the full extent of the problem but it looks pretty bad, and I think our first priority has to be to guarantee data integrity. I do not have a lot of faith in the proposition that glibc's is the only buggy implementation, either. regards, tom lane
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Wed, Mar 23, 2016 at 10:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > If I had to make a recommendation right now, I would go for your > option #4, ie shut 'em all down Scotty. We do not know the full extent > of the problem but it looks pretty bad, and I think our first priority > has to be to guarantee data integrity. +1, but only for glibc, and configurable. The glibc default might later be revisited in the stable 9.5 branch. -- Peter Geoghegan
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Magnus Hagander
Date:
On Mar 23, 2016 18:53, "Peter Geoghegan" <pg@heroku.com> wrote: > > On Wed, Mar 23, 2016 at 10:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > If I had to make a recommendation right now, I would go for your > > option #4, ie shut 'em all down Scotty. We do not know the full extent > > of the problem but it looks pretty bad, and I think our first priority > > has to be to guarantee data integrity. > > +1, but only for glibc, and configurable. The glibc default might > later be revisited in the stable 9.5 branch. > Are you talking about configurable at./configure time, or guc? Making it a compile time option makes sense I think. But turning it into a guc will expose users to a lot of failure scenarios if they *change* the value, and that seems risky. Putting it in autoconf and default to off in the upcoming minor seems like a good idea. Then once we have more information, we can consider if we want to turn it back on in backbranches our just in 9.6 (when/if properly fixed). /Magnus
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Wed, Mar 23, 2016 at 10:56 AM, Magnus Hagander <magnus@hagander.net> wrote: > Are you talking about configurable at./configure time, or guc? I meant a GUC. I think a ./configure option is overkill. What about the existing caller of strxfrm(), convert_string_datum()? -- Peter Geoghegan
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Wed, Mar 23, 2016 at 10:58 AM, Peter Geoghegan <pg@heroku.com> wrote: > What about the existing caller of strxfrm(), convert_string_datum()? I mean, the caller exists in all back-branches, not just 9.5. -- Peter Geoghegan
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Wed, Mar 23, 2016 at 10:23 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I guess that's not terribly surprising. If the glibc maintainers > haven't managed to get this right for UTF-8 locales, I can't imagine > why they would have been more careful for non-UTF-8 locales that - I > would guess - get less use. We don't want to suggest that locales are broken as such. My inability to reproduce the original complaint on alternative German locales (e.g. Austrian) suggest to me that it just "accidentally fails to fail" for whatever reason (maybe they fail in other ways). I should say "accidentally fails to not fail", because this is a failure of strxfrm() to be bug-compatible with strcoll(), which I think needs to not be forgotten. -- Peter Geoghegan
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Magnus Hagander
Date:
On Wed, Mar 23, 2016 at 6:58 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Wed, Mar 23, 2016 at 10:56 AM, Magnus Hagander <magnus@hagander.net> > wrote: > > Are you talking about configurable at./configure time, or guc? > > I meant a GUC. I think a ./configure option is overkill. > We clearly have different views of the amount of kill effort required for the different options :) I would've said that a ./configure option is the easier way, and that doing a GUC is the one that's an overkill (being significantly more effort). That said, my main point is that I do not think the knob is something that should be tuned by the average end user. For most people, that should be left to the packagers for the platform, who can make an informed choice about if it's safe to turn it on. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Wed, Mar 23, 2016 at 11:04 AM, Magnus Hagander <magnus@hagander.net> wrote: > That said, my main point is that I do not think the knob is something that > should be tuned by the average end user. For most people, that should be > left to the packagers for the platform, who can make an informed choice > about if it's safe to turn it on. I could get behind that if we really make an effort to help them make an informed choice. The abbreviated keys optimization is highly valuable, and I put a lot of work into it, as did Robert. -- Peter Geoghegan
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Magnus Hagander
Date:
On Wed, Mar 23, 2016 at 7:06 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Wed, Mar 23, 2016 at 11:04 AM, Magnus Hagander <magnus@hagander.net> > wrote: > > That said, my main point is that I do not think the knob is something > that > > should be tuned by the average end user. For most people, that should be > > left to the packagers for the platform, who can make an informed choice > > about if it's safe to turn it on. > > I could get behind that if we really make an effort to help them make > an informed choice. The abbreviated keys optimization is highly > valuable, and I put a lot of work into it, as did Robert. > Oh, I totally appreciate that. It's one of the great improvements in 9.5, and one of the best things is that it's an "automatic improvement" that doesn't require the users to change their applications to benefit from it. But it's also currently badly broken on some of our most common platforms. We want to get it back to working. But short-term, it's more important to limit the scope of the brokenness, since this is a version that people are putting in production. Once we have enough info to safely say we've put a workaround in place, we turn it back on. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Peter Geoghegan <pg@heroku.com> writes: > What about the existing caller of strxfrm(), convert_string_datum()? convert_string_datum is, and always has been, used only for planner estimation purposes. We do not care if it sometimes gets inaccurate answers. Even if it's as wrong as it can possibly be, that will only affect planner estimates to the extent of wrongly interpolating between the endpoints of a histogram bin, so that the effects are no worse than about 1/statistics_target. And there are bigger limitations on the accuracy of those estimates anyway, notably that we use the same stats regardless of the collation that applies to a particular WHERE-clause operator. regards, tom lane
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Wed, Mar 23, 2016 at 11:09 AM, Magnus Hagander <magnus@hagander.net> wrote: > We want to get it back to working. But short-term, it's more important to > limit the scope of the brokenness, since this is a version that people are > putting in production. Once we have enough info to safely say we've put a > workaround in place, we turn it back on. Do you think it's possible that my amcheck tool might have a role to play here? I wrote it for exactly this kind of scenario. If we could get it reviewed, then a pre-release version compatible with 9.5 could be made available. I'd be willing to work on that side of things if core are receptive. Early prototypes of the tool were used to detect collation incompatibility issues in production. -- Peter Geoghegan
Peter Geoghegan <pg@heroku.com> writes: > On Wed, Mar 23, 2016 at 11:04 AM, Magnus Hagander <magnus@hagander.net> wrote: >> That said, my main point is that I do not think the knob is something that >> should be tuned by the average end user. For most people, that should be >> left to the packagers for the platform, who can make an informed choice >> about if it's safe to turn it on. > I could get behind that if we really make an effort to help them make > an informed choice. The abbreviated keys optimization is highly > valuable, and I put a lot of work into it, as did Robert. I realize that, and I'm sympathetic, but I'm afraid it also means that your judgment in this matter is rather biased. I do not think that end users can be expected to know whether this is safe to turn on, and TBH I do not think that most packagers will either. My opinion is that our only guaranteed-safe option is to turn it off, period, no exceptions for platforms that we've not yet found a failure case for. We can consider turning it back on later, once we've done vastly more study and testing than has evidently been done to date. One thing I'm going to want to know is what was the root cause of glibc's bug, and what is the reason to think that other implementations are going to be any more reliable. At this point I'm disinclined to trust any implementation that can't point to a structural reason (e.g., sharing code) to believe that strcoll and strxfrm must yield equivalent answers. (In other words, I want an #ifdef NOT_USED, which is even less effort than either a GUC or a configure option ;-(. As well as being something that we won't need to document and support indefinitely.) regards, tom lane
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Wed, Mar 23, 2016 at 11:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I do not think that end users can be expected to know whether this is safe > to turn on, and TBH I do not think that most packagers will either. My > opinion is that our only guaranteed-safe option is to turn it off, period, > no exceptions for platforms that we've not yet found a failure case for. > We can consider turning it back on later, once we've done vastly more > study and testing than has evidently been done to date. One thing I'm > going to want to know is what was the root cause of glibc's bug, and what > is the reason to think that other implementations are going to be any more > reliable. At this point I'm disinclined to trust any implementation that > can't point to a structural reason (e.g., sharing code) to believe that > strcoll and strxfrm must yield equivalent answers. The more I think about it, the more I agree that not trusting strxfrm() across the board is the right move short-term. So, I'm not going to be upset, provided we do actually follow through later with an effort to turn it back on in 9.5 as as when it is known to be reliable. All I'm asking for is that we actively work towards making it safe, which evidently requires leg-work, that I can only do part of. (For example, I'm not on the -packagers list, so cannot really coordinate with packagers). I think that that's a reasonable thing for me to expect. -- Peter Geoghegan
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Robert Haas
Date:
On Wed, Mar 23, 2016 at 2:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Geoghegan <pg@heroku.com> writes: >> On Wed, Mar 23, 2016 at 11:04 AM, Magnus Hagander <magnus@hagander.net> wrote: >>> That said, my main point is that I do not think the knob is something that >>> should be tuned by the average end user. For most people, that should be >>> left to the packagers for the platform, who can make an informed choice >>> about if it's safe to turn it on. > >> I could get behind that if we really make an effort to help them make >> an informed choice. The abbreviated keys optimization is highly >> valuable, and I put a lot of work into it, as did Robert. > > I realize that, and I'm sympathetic, but I'm afraid it also means that > your judgment in this matter is rather biased. > > I do not think that end users can be expected to know whether this is safe > to turn on, and TBH I do not think that most packagers will either. My > opinion is that our only guaranteed-safe option is to turn it off, period, > no exceptions for platforms that we've not yet found a failure case for. > We can consider turning it back on later, once we've done vastly more > study and testing than has evidently been done to date. One thing I'm > going to want to know is what was the root cause of glibc's bug, and what > is the reason to think that other implementations are going to be any more > reliable. At this point I'm disinclined to trust any implementation that > can't point to a structural reason (e.g., sharing code) to believe that > strcoll and strxfrm must yield equivalent answers. > > (In other words, I want an #ifdef NOT_USED, which is even less effort > than either a GUC or a configure option ;-(. As well as being something > that we won't need to document and support indefinitely.) I think that something like the attached would be a reasonable approach to the problem. If we later decide this is altogether hopeless, we can do a more thorough job removing the code that can be reached when collate_c && abbreviate, but let's not do that right now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Wed, Mar 23, 2016 at 11:56 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I think that something like the attached would be a reasonable > approach to the problem. If we later decide this is altogether > hopeless, we can do a more thorough job removing the code that can be > reached when collate_c && abbreviate, but let's not do that right now. This patch looks good to me. I think that disabling abbreviation when the C collation is in makes no sense, though. This has nothing to do with abbreviation as such, and everything to do with glibc. -- Peter Geoghegan
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Robert Haas
Date:
On Wed, Mar 23, 2016 at 3:01 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Wed, Mar 23, 2016 at 11:56 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> I think that something like the attached would be a reasonable >> approach to the problem. If we later decide this is altogether >> hopeless, we can do a more thorough job removing the code that can be >> reached when collate_c && abbreviate, but let's not do that right now. > > This patch looks good to me. > > I think that disabling abbreviation when the C collation is in makes > no sense, though. But the patch doesn't do that, right? > This has nothing to do with abbreviation as such, > and everything to do with glibc. Yes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Wed, Mar 23, 2016 at 12:04 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> I think that disabling abbreviation when the C collation is in makes >> no sense, though. > > But the patch doesn't do that, right? Right, it doesn't. But I was surprised that you even mentioned it as a possibility. That's all. -- Peter Geoghegan
Robert Haas <robertmhaas@gmail.com> writes: > +#ifndef TRUST_STRXFRM > + if (!collate_c) > + abbreviate = false; > +#endif Ah, I did not realize that abbreviation would be of any value in C locale. If it is, then +1 for something like the above. regards, tom lane
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Robert Haas
Date:
On Wed, Mar 23, 2016 at 3:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> +#ifndef TRUST_STRXFRM >> + if (!collate_c) >> + abbreviate = false; >> +#endif > > Ah, I did not realize that abbreviation would be of any value in C locale. > If it is, then +1 for something like the above. It's actually more likely to help for a C locale than for a non-C locale. I have committed this and back-patched it to 9.5. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Tue, Mar 22, 2016 at 7:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Geoghegan <pg@heroku.com> writes: >> My concern was not merely "academic" (i.e. it was not limited in scope >> to things that don't make B-Tree indexes corrupt). Pretty sure that we >> need to start thinking of this as a problem with strcoll() that >> strxfrm() does not have for more fundamental reasons, because >> strcoll() says that the first string in the de_DE sorted list is >> *greater* than the third string. > > [ squint... ] I was looking specifically for that sort of misbehavior > in my test program, and I haven't seen it. Sorry, I was in too much of a hurry to get to the bottom of this with that example. I failed to notice that LC_COLLATE for sort was "de_DE", not "de_DE.UTF-8". For my simple case it would not have mattered if "de_DE" was specified instead of "de_DE.UTF-8" on a non-broken system. But, this was a broken system. Anyway, what prompted the misguided example was this: [vagrant@localhost ~]$ ./strxfrm-binary de_DE.UTF-8 'x xx"' 'xxx"' "x xx"" -> 2323230108080801020202010235034b (16 bytes) "xxx"" -> 232323010808080102020201044b (14 bytes) strcmp(arg1, arg2) result: -2 strcoll(arg1, arg2) result: -6 [vagrant@localhost ~]$ ./strxfrm-binary de_DE.UTF-8 'x xxf' 'xxxf' "x xxf" -> 2323231101080808080102020202010235 (17 bytes) "xxxf" -> 2323231101080808080102020202 (14 bytes) strcmp(arg1, arg2) result: 1 strcoll(arg1, arg2) result: -6 Notice that case where a double-quote is used makes strxfrm() and strcoll() agree. Whereas if that character is a character from the Latin Alphabet instead, they disagree. My intuition is that this is significant from the point of view of fixing the glibc strcoll() bug. It feels like there is an incorrectly applied optimization here, that occurs for strcoll() but not the separate transformation process that strxfrm() does. There seems to be at least a few instances of over-optimizing strcoll() in the past few years. For example: https://github.com/bminor/glibc/commit/87701a58e291bd7ac3b407d10a829dac52c9c16e This bug looks like a possible candidate, given that complaints were about de_DE: https://github.com/bminor/glibc/commit/33a667def79c42e0befed1a4070798c58488170f Is this bug of the right vintage? Seems like it might be a bit too early for RHEL 6 to be affected, but I'm no expert. -- Peter Geoghegan
Peter Geoghegan <pg@heroku.com> writes: > There seems to be at least a few instances of over-optimizing > strcoll() in the past few years. For example: > https://github.com/bminor/glibc/commit/87701a58e291bd7ac3b407d10a829dac52c9c16e > This bug looks like a possible candidate, given that complaints were > about de_DE: > https://github.com/bminor/glibc/commit/33a667def79c42e0befed1a4070798c58488170f > Is this bug of the right vintage? Seems like it might be a bit too > early for RHEL 6 to be affected, but I'm no expert. It is too early. RHEL6 seems to be based off glibc 2.12, released 2010. (By the same token, it's not got the other bug you mention ;-)) regards, tom lane
Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Wed, Mar 23, 2016 at 2:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > It is too early. RHEL6 seems to be based off glibc 2.12, released 2010. > (By the same token, it's not got the other bug you mention ;-)) Well, it looked like everything was fine for "debian testing, glibc 2.22-3", including de_DE.UTF-8. In theory, it's only a matter of using git-bisect to find what the fix was. That's just leg-work. I will find time for it after the ongoing CF. Mercifully, the situation with Glibc 2.22 suggests that the Glibc people *aren't* fixing the strcoll() bugs in stable branches. But that also means that it will take a long time to make non-C collation text sorting use abbreviation on most systems, which is certainly disappointing. -- Peter Geoghegan
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Magnus Hagander
Date:
On Wed, Mar 23, 2016 at 7:14 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Wed, Mar 23, 2016 at 11:09 AM, Magnus Hagander <magnus@hagander.net> > wrote: > > We want to get it back to working. But short-term, it's more important to > > limit the scope of the brokenness, since this is a version that people > are > > putting in production. Once we have enough info to safely say we've put a > > workaround in place, we turn it back on. > > Do you think it's possible that my amcheck tool might have a role to > play here? I wrote it for exactly this kind of scenario. If we could > get it reviewed, then a pre-release version compatible with 9.5 could > be made available. I'd be willing to work on that side of things if > core are receptive. Early prototypes of the tool were used to detect > collation incompatibility issues in production. > That's a good question? Would it catch corruption like this? I haven't actually tested it :) My understanding is that the thing that can happen is that while we don't actually store incorrect values in the indexes, we can end up with index pointers in the wrong order in the indexes with this bug? That does sound like one of those things that the amcheck tool is designed to find? And if not that one, can we find some other way for people to find out if they need to REINDEX after the upgrade? It would be very nice not to have to tell everybody to reindex everything, but to actually detect the cases where it's needed. Or at least provide a supported way to do that, for those where a cluster-wide reindex is really expensive. Even if we can't sneak amcheck into 9.5, if we can show that it detects the problem, then just being able to direct people to "get amcheck from 9.6 if you want to check if the reindex is necessary" would still be a strong improvement over nothing. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Robert Haas
Date:
On Thu, Mar 24, 2016 at 9:04 AM, Magnus Hagander <magnus@hagander.net> wrote: > Even if we can't sneak amcheck into 9.5, if we can show that it detects the > problem, then just being able to direct people to "get amcheck from 9.6 if > you want to check if the reindex is necessary" would still be a strong > improvement over nothing. I agree that back-patching amcheck into 9.5 would be unprecedented, but it wouldn't be crazy: shipping an extra contrib module with no additional dependencies shouldn't break anything for existing users. However, the fact that the patch is not "Ready for Committer" at this point means that it is not going to be available in time for next week's maintenance releases, or very possibly, for 9.6. Time grows very short. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Bernd Helmle
Date:
--On 24. M=C3=A4rz 2016 14:04:22 +0100 Magnus Hagander = <magnus@hagander.net> wrote: > That's a good question? Would it catch corruption like this? I haven't > actually tested it :) My understanding is that the thing that can happen > is that while we don't actually store incorrect values in the indexes, we > can end up with index pointers in the wrong order in the indexes with > this bug? That does sound like one of those things that the amcheck tool > is designed to find? This is exactly where the prototype btreecheck helped a lot. The last time i used it to track down problems we got=20 > WARNING: page order invariant violated for index which nailed down collation problems on that specific machine and to identify indexes, where we got the problem. For example, if you take the bug report from Marc-Olaf and check the affected table/index with the current amcheck patch, you get: bernd@localhost:test #=3D SELECT bt_index_check('foo_val_idx'); ERROR: XX002: page order invariant violated for index "foo_val_idx" DETAIL: Lower index tid=3D(1,1) (points to heap tid=3D(0,1)) higher index tid=3D(1,2) (points to heap tid=3D(0,2)) page lsn=3D0/0. LOCATION: bt_target_page_check, amcheck.c:687 STATEMENT: SELECT bt_index_check('foo_val_idx'); So if you ask me, this absolutely is a "must-have". --=20 Thanks Bernd
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Thu, Mar 24, 2016 at 7:14 AM, Robert Haas <robertmhaas@gmail.com> wrote: > However, the fact that the patch is not "Ready for Committer" at this > point means that it is not going to be available in time for next > week's maintenance releases, or very possibly, for 9.6. Time grows > very short. The only people that are likely comfortable giving final sign-off on it that are active this CF are Tom and Kevin. That is an awkward situation. I could produce a 9.5 variant that had even more limited scope than what's in the CF. That would be strictly limited to checking page order, and the high key invariant. It wouldn't check relationships spanning multiple pages, either on the same level, or though parent/child relationships. Then, I think significantly less expertise is required for review, because locking protocols and so on don't enter into it. I think that the risk of getting something wrong with amcheck as things stand is acceptable for 9.6, and maybe even 9.5. About the worst case scenario is a false positive report of corruption. But with the tool scoped at only looking at really obvious invariants at the level of a single page, which is what I'd propose for 9.5, it seems like the risk of bugs would be very well managed. That would still catch issues caused by this glibc bug very reliable. Keep in mind that in general, amcheck does nothing special with buffer locks + pins -- it just acquires a pin +shared buffer lock on one buffer/page at a time, copies it into local memory, and releases and drops the pin. So, all processing by amcheck happens outside any critical path. I could work hard to get that stripped down amcheck into 9.5. I'm already behind on my CF reviews, and time is short, so it would be good if we moved quickly on this, either way.... -- Peter Geoghegan
Peter Geoghegan <pg@heroku.com> writes: > On Thu, Mar 24, 2016 at 7:14 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> However, the fact that the patch is not "Ready for Committer" at this >> point means that it is not going to be available in time for next >> week's maintenance releases, or very possibly, for 9.6. Time grows >> very short. > The only people that are likely comfortable giving final sign-off on > it that are active this CF are Tom and Kevin. That is an awkward > situation. I would not be comfortable with reviewing an entire module with the intention of shipping it in a stable branch on Monday, even if I had nothing else to do between now and then. I think the only sane way to get this into 9.5.2 would be to slip the release date, and that seems rather counterproductive. We need to get this fix into the hands of users ASAP. I fear our only realistic course of action is to publish release notes along the lines of "if you use any of list-of-affected-locales, you should REINDEX btree indexes on text/varchar/bpchar columns". regards, tom lane
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Thu, Mar 24, 2016 at 12:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The only people that are likely comfortable giving final sign-off on >> it that are active this CF are Tom and Kevin. That is an awkward >> situation. > > I would not be comfortable with reviewing an entire module with the > intention of shipping it in a stable branch on Monday, even if I had > nothing else to do between now and then. I think the only sane way > to get this into 9.5.2 would be to slip the release date, and that > seems rather counterproductive. We need to get this fix into the > hands of users ASAP. That's fair. I didn't really imagine that we'd want to put the tool into 9.5 myself. Still, I think that amcheck could have some role to play in managing the problem. Even the near-term availability of amcheck for 9.5 as a satellite project would count. That could happen without blocking the point release. I just don't want to go over anyone's head with that. "REINDEX everything" isn't a realistic plan for a lot of users. -- Peter Geoghegan
Peter Geoghegan <pg@heroku.com> writes: > That's fair. I didn't really imagine that we'd want to put the tool > into 9.5 myself. Still, I think that amcheck could have some role to > play in managing the problem. Even the near-term availability of > amcheck for 9.5 as a satellite project would count. That could happen > without blocking the point release. I just don't want to go over > anyone's head with that. I have no objection to something like that happening. regards, tom lane
Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Peter Geoghegan
Date:
On Thu, Mar 24, 2016 at 6:04 AM, Magnus Hagander <magnus@hagander.net> wrote: > And if not that one, can we find some other way for people to find out if > they need to REINDEX after the upgrade? It would be very nice not to have to > tell everybody to reindex everything, but to actually detect the cases where > it's needed. Or at least provide a supported way to do that, for those where > a cluster-wide reindex is really expensive. If amcheck was made to only verify pages in isolation, then it have a very strong chance of finding any issues, but not an iron-clad guarantee -- it might be that the ordering was wrong across pages (although that seems like a very small space for problems to hide). Because we know that there is a sane total ordering for both strcoll() and strxfrm() cases on affected systems, I'm pretty sure that the version of amcheck in the ongoing CF (that checks child/parent, as well as sibling relationships) would actually catch any problems of that kind *reliably*. In other words, it would be okay that it didn't check every item against every other item, because per Tom's analysis the transitive law is not broken in either case, even if strcoll() is buggy. > Even if we can't sneak amcheck into 9.5, if we can show that it detects the > problem, then just being able to direct people to "get amcheck from 9.6 if > you want to check if the reindex is necessary" would still be a strong > improvement over nothing. Agreed. -- Peter Geoghegan
Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
From
Marc-Olaf Jaschke
Date:
Thanks for the quick bug fix! I've seen that a wiki page on the subject has been created. Maybe it is = useful to explicitly mention, that 9.5.1 performance can be partly = maintained, by changing the collation of text columns to "C", when there = is no need for special collation handling. Best regards, Marc-Olaf Jaschke > Am 23.03.2016 um 21:07 schrieb Robert Haas <robertmhaas@gmail.com>: >=20 > On Wed, Mar 23, 2016 at 3:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> +#ifndef TRUST_STRXFRM >>> + if (!collate_c) >>> + abbreviate =3D false; >>> +#endif >>=20 >> Ah, I did not realize that abbreviation would be of any value in C = locale. >> If it is, then +1 for something like the above. >=20 > It's actually more likely to help for a C locale than for a non-C = locale. >=20 > I have committed this and back-patched it to 9.5. >=20 > --=20 > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company