Thread: BUG #14112: sorting v and w is broken with et_EE locate
The following bug has been logged on the website: Bug reference: 14112 Logged by: Georg Kahest Email address: georg.kahest@internet.ee PostgreSQL version: 9.4.7 Operating system: Debian Jessie Description: It seems that sorting v and w with et_EE locate is broken (other chars seem to be okey): select name COLLATE "et_EE" from test order by name; name -------------------- a1.ee vvbwjbln7.ee wwvl8.ee wxxezi6lkaq7eoi.ee vyz.ee (5 rows) select name COLLATE "en_US" from test order by name; name -------------------- a1.ee vvbwjbln7.ee vyz.ee wwvl8.ee wxxezi6lkaq7eoi.ee (5 rows)
On Tue, Apr 26, 2016 at 2:37 AM, <georg.kahest@internet.ee> wrote: > The following bug has been logged on the website: > > Bug reference: 14112 > Logged by: Georg Kahest > Email address: georg.kahest@internet.ee > PostgreSQL version: 9.4.7 > Operating system: Debian Jessie > Description: > > It seems that sorting v and w with et_EE locate is broken (other chars seem > to be okey): > > select name COLLATE "et_EE" from test order by name; > name > -------------------- > a1.ee > vvbwjbln7.ee > wwvl8.ee > wxxezi6lkaq7eoi.ee > vyz.ee > (5 rows) > > > select name COLLATE "en_US" from test order by name; > name > -------------------- > a1.ee > vvbwjbln7.ee > vyz.ee > wwvl8.ee > wxxezi6lkaq7eoi.ee > (5 rows) That does look odd. If that's not the correct way to sort Estonian, then that should probably be reported to the Debian glibc maintainers (or maybe the glibc project). Here's a Debian Jessie box demonstrating that behaviour without any help from PostgreSQL: munro@yoga:~/junk$ locale -a | grep et_EE et_EE et_EE.iso885915 et_EE.utf8 munro@yoga:~/junk$ cat input a1.ee vvbwjbln7.ee vyz.ee wwvl8.ee wxxezi6lkaq7eoi.ee munro@yoga:~/junk$ LC_COLLATE=et_EE.utf8 sort < input a1.ee vvbwjbln7.ee wwvl8.ee wxxezi6lkaq7eoi.ee vyz.ee munro@yoga:~/junk$ LC_COLLATE=en_US.utf8 sort < input a1.ee vvbwjbln7.ee vyz.ee wwvl8.ee wxxezi6lkaq7eoi.ee -- Thomas Munro http://www.enterprisedb.com
On Wed, Apr 27, 2016 at 9:07 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > That does look odd. What happens if you replace the dot in each string with a single 'x' character, Georg? Does the sort order look correct to you then? -- Peter Geoghegan
On Wed, Apr 27, 2016 at 9:22 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Wed, Apr 27, 2016 at 9:07 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> That does look odd. > > What happens if you replace the dot in each string with a single 'x' > character, Georg? Does the sort order look correct to you then? I ask because I suspect that this might be the same strcoll() bug I describe here: https://bugzilla.redhat.com/show_bug.cgi?id=1320356 (In particular, see my remarks on Austria and Germany.) -- Peter Geoghegan
Peter Geoghegan <pg@heroku.com> writes: > I ask because I suspect that this might be the same strcoll() bug I > describe here: https://bugzilla.redhat.com/show_bug.cgi?id=1320356 The report is against 9.4, though, so strcoll shouldn't matter. regards, tom lane
On Thu, Apr 28, 2016 at 4:24 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Wed, Apr 27, 2016 at 9:22 PM, Peter Geoghegan <pg@heroku.com> wrote: >> On Wed, Apr 27, 2016 at 9:07 PM, Thomas Munro >> <thomas.munro@enterprisedb.com> wrote: >>> That does look odd. >> >> What happens if you replace the dot in each string with a single 'x' >> character, Georg? Does the sort order look correct to you then? > > I ask because I suspect that this might be the same strcoll() bug I > describe here: https://bugzilla.redhat.com/show_bug.cgi?id=1320356 > > (In particular, see my remarks on Austria and Germany.) No change here. This system has locales-all ("GNU C Library: Precompiled locale data") package version 2.19-18+deb8u4 (and same libc6). munro@yoga:~/junk$ LC_COLLATE=et_EE.utf8 sort < input a1.ee vvbwjbln7.ee wwvl8.ee wxxezi6lkaq7eoi.ee vyz.ee munro@yoga:~/junk$ LC_COLLATE=et_EE.utf8 sort < input2 a1xee vvbwjbln7xee wwvl8xee wxxezi6lkaq7eoixee vyzxee -- Thomas Munro http://www.enterprisedb.com
On Wed, Apr 27, 2016 at 9:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Geoghegan <pg@heroku.com> writes: >> I ask because I suspect that this might be the same strcoll() bug I >> describe here: https://bugzilla.redhat.com/show_bug.cgi?id=1320356 > > The report is against 9.4, though, so strcoll shouldn't matter. It shouldn't matter that it doesn't agree with strxfrm(), which is the most important thing, but not the only thing. I think that it would be interesting to know if this is a strcoll() problem. I have no intention of pursuing a "fix" from the glibc people. -- Peter Geoghegan
On Thu, Apr 28, 2016 at 4:43 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Thu, Apr 28, 2016 at 4:24 PM, Peter Geoghegan <pg@heroku.com> wrote: >> On Wed, Apr 27, 2016 at 9:22 PM, Peter Geoghegan <pg@heroku.com> wrote: >>> On Wed, Apr 27, 2016 at 9:07 PM, Thomas Munro >>> <thomas.munro@enterprisedb.com> wrote: >>>> That does look odd. >>> >>> What happens if you replace the dot in each string with a single 'x' >>> character, Georg? Does the sort order look correct to you then? >> >> I ask because I suspect that this might be the same strcoll() bug I >> describe here: https://bugzilla.redhat.com/show_bug.cgi?id=1320356 >> >> (In particular, see my remarks on Austria and Germany.) > > No change here. This system has locales-all ("GNU C Library: > Precompiled locale data") package version 2.19-18+deb8u4 (and same > libc6). > > munro@yoga:~/junk$ LC_COLLATE=et_EE.utf8 sort < input > a1.ee > vvbwjbln7.ee > wwvl8.ee > wxxezi6lkaq7eoi.ee > vyz.ee > munro@yoga:~/junk$ LC_COLLATE=et_EE.utf8 sort < input2 > a1xee > vvbwjbln7xee > wwvl8xee > wxxezi6lkaq7eoixee > vyzxee Same result on a CentOS box. I think the OP should probably write to bug-glibc-locales@gnu.org. -- Thomas Munro http://www.enterprisedb.com
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 04/28/2016 08:09 AM, Thomas Munro wrote: > On Thu, Apr 28, 2016 at 4:43 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> On Thu, Apr 28, 2016 at 4:24 PM, Peter Geoghegan <pg@heroku.com> >> wrote: >>> On Wed, Apr 27, 2016 at 9:22 PM, Peter Geoghegan >>> <pg@heroku.com> wrote: >>>> On Wed, Apr 27, 2016 at 9:07 PM, Thomas Munro >>>> <thomas.munro@enterprisedb.com> wrote: >>>>> That does look odd. >>>> >>>> What happens if you replace the dot in each string with a >>>> single 'x' character, Georg? Does the sort order look correct >>>> to you then? >>> >>> I ask because I suspect that this might be the same strcoll() >>> bug I describe here: >>> https://bugzilla.redhat.com/show_bug.cgi?id=1320356 >>> >>> (In particular, see my remarks on Austria and Germany.) >> >> No change here. This system has locales-all ("GNU C Library: >> Precompiled locale data") package version 2.19-18+deb8u4 (and >> same libc6). >> >> munro@yoga:~/junk$ LC_COLLATE=et_EE.utf8 sort < input a1.ee >> vvbwjbln7.ee wwvl8.ee wxxezi6lkaq7eoi.ee vyz.ee >> munro@yoga:~/junk$ LC_COLLATE=et_EE.utf8 sort < input2 a1xee >> vvbwjbln7xee wwvl8xee wxxezi6lkaq7eoixee vyzxee > > Same result on a CentOS box. I think the OP should probably write > to bug-glibc-locales@gnu.org. > Hello, Indeed the problem seems to be related to to glibc itself handling it incorrectly. Thank you for your time, ill report the bug to glibc. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJXIeHSAAoJEFDOdES6xIFjLNQP/3hLYCBS1ex78SN+uIZGT4xV 1nx/xadb9qQ3AoVT2CsHVgL9QVwCXmNbXR/tAfdj6OKy9i8WMzBvuI4cvZjKB+ei f1FeJc2ldnpLgAQ9/R9FRqMpGch4MnkhwxhK4+c69TqTvugvPwGpSvAddPj5edxn IM2diNuCtQKSw+fHwP1/N4hB67TfFX+rfoHbdhwSlGbuK8Lxs+kpxIecP1WutcS5 jrFbptaLlWKMTptQmyVKINu8sztRxMdlJ5ywUr9UpL2GdaQv3SzhzC5OOcDh4a96 stmh7fZ6DBBpvvGWg/bJLNTi+nOgyEb9vFwKQMvseyoXnyRG4JyvoNJyzDpccyVt 1lWYhnlPuSFTYOI9zWfhcmWgZ5XY7g3kC3B5Ode5pawvSsHZ1ynvsxEHOK9i3J67 nAU4g1ehjw9sYwl+5g7+xuRXNoGIAr4prGAzlM7ZOG+2mwEpAqaQGkxTYZ+Sts0i I/+SIMpDfQbZmMjzkKvwBSAqJZlCZign1fEt234uhRuIfI3ucxhBAegXGFpUzQSS /qy0knRog+ouTwQN3pV1QAbcHfC8ZcBiPSzivT3KNneHaD7usz2GrD8wB8OyAbMH nzThOL6aUhLUdmCOpm9/zil4HTXcTWxSUcoWLhHEAjzlto+74I4yUJM7L5LK7Lfq LWBpBv7i2Nj/goR2w2Ip =Lfo9 -----END PGP SIGNATURE-----