Thread: BUG #14112: sorting v and w is broken with et_EE locate

BUG #14112: sorting v and w is broken with et_EE locate

From
georg.kahest@internet.ee
Date:
The following bug has been logged on the website:

Bug reference:      14112
Logged by:          Georg Kahest
Email address:      georg.kahest@internet.ee
PostgreSQL version: 9.4.7
Operating system:   Debian Jessie
Description:

It seems that sorting v and w with et_EE locate is broken (other chars seem
to be okey):

select name COLLATE "et_EE" from test order by name;
        name
--------------------
 a1.ee
 vvbwjbln7.ee
 wwvl8.ee
 wxxezi6lkaq7eoi.ee
 vyz.ee
(5 rows)


select name COLLATE "en_US" from test order by name;
        name
--------------------
 a1.ee
 vvbwjbln7.ee
 vyz.ee
 wwvl8.ee
 wxxezi6lkaq7eoi.ee
(5 rows)

Re: BUG #14112: sorting v and w is broken with et_EE locate

From
Thomas Munro
Date:
On Tue, Apr 26, 2016 at 2:37 AM,  <georg.kahest@internet.ee> wrote:
> The following bug has been logged on the website:
>
> Bug reference:      14112
> Logged by:          Georg Kahest
> Email address:      georg.kahest@internet.ee
> PostgreSQL version: 9.4.7
> Operating system:   Debian Jessie
> Description:
>
> It seems that sorting v and w with et_EE locate is broken (other chars seem
> to be okey):
>
> select name COLLATE "et_EE" from test order by name;
>         name
> --------------------
>  a1.ee
>  vvbwjbln7.ee
>  wwvl8.ee
>  wxxezi6lkaq7eoi.ee
>  vyz.ee
> (5 rows)
>
>
> select name COLLATE "en_US" from test order by name;
>         name
> --------------------
>  a1.ee
>  vvbwjbln7.ee
>  vyz.ee
>  wwvl8.ee
>  wxxezi6lkaq7eoi.ee
> (5 rows)

That does look odd.  If that's not the correct way to sort Estonian,
then that should probably be reported to the Debian glibc maintainers
(or maybe the glibc project).  Here's a Debian Jessie box
demonstrating that behaviour without any help from PostgreSQL:

munro@yoga:~/junk$ locale -a | grep et_EE
et_EE
et_EE.iso885915
et_EE.utf8
munro@yoga:~/junk$ cat input
a1.ee
vvbwjbln7.ee
vyz.ee
wwvl8.ee
wxxezi6lkaq7eoi.ee
munro@yoga:~/junk$ LC_COLLATE=et_EE.utf8 sort < input
a1.ee
vvbwjbln7.ee
wwvl8.ee
wxxezi6lkaq7eoi.ee
vyz.ee
munro@yoga:~/junk$ LC_COLLATE=en_US.utf8 sort < input
a1.ee
vvbwjbln7.ee
vyz.ee
wwvl8.ee
wxxezi6lkaq7eoi.ee

--
Thomas Munro
http://www.enterprisedb.com

Re: BUG #14112: sorting v and w is broken with et_EE locate

From
Peter Geoghegan
Date:
On Wed, Apr 27, 2016 at 9:07 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> That does look odd.

What happens if you replace the dot in each string with a single 'x'
character, Georg? Does the sort order look correct to you then?


--
Peter Geoghegan

Re: BUG #14112: sorting v and w is broken with et_EE locate

From
Peter Geoghegan
Date:
On Wed, Apr 27, 2016 at 9:22 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Wed, Apr 27, 2016 at 9:07 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> That does look odd.
>
> What happens if you replace the dot in each string with a single 'x'
> character, Georg? Does the sort order look correct to you then?

I ask because I suspect that this might be the same strcoll() bug I
describe here: https://bugzilla.redhat.com/show_bug.cgi?id=1320356

(In particular, see my remarks on Austria and Germany.)
--
Peter Geoghegan

Re: BUG #14112: sorting v and w is broken with et_EE locate

From
Tom Lane
Date:
Peter Geoghegan <pg@heroku.com> writes:
> I ask because I suspect that this might be the same strcoll() bug I
> describe here: https://bugzilla.redhat.com/show_bug.cgi?id=1320356

The report is against 9.4, though, so strcoll shouldn't matter.

            regards, tom lane

Re: BUG #14112: sorting v and w is broken with et_EE locate

From
Thomas Munro
Date:
On Thu, Apr 28, 2016 at 4:24 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Wed, Apr 27, 2016 at 9:22 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> On Wed, Apr 27, 2016 at 9:07 PM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>>> That does look odd.
>>
>> What happens if you replace the dot in each string with a single 'x'
>> character, Georg? Does the sort order look correct to you then?
>
> I ask because I suspect that this might be the same strcoll() bug I
> describe here: https://bugzilla.redhat.com/show_bug.cgi?id=1320356
>
> (In particular, see my remarks on Austria and Germany.)

No change here.  This system has locales-all ("GNU C Library:
Precompiled locale data") package version 2.19-18+deb8u4 (and same
libc6).

munro@yoga:~/junk$ LC_COLLATE=et_EE.utf8 sort < input
a1.ee
vvbwjbln7.ee
wwvl8.ee
wxxezi6lkaq7eoi.ee
vyz.ee
munro@yoga:~/junk$ LC_COLLATE=et_EE.utf8 sort < input2
a1xee
vvbwjbln7xee
wwvl8xee
wxxezi6lkaq7eoixee
vyzxee

--
Thomas Munro
http://www.enterprisedb.com

Re: BUG #14112: sorting v and w is broken with et_EE locate

From
Peter Geoghegan
Date:
On Wed, Apr 27, 2016 at 9:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <pg@heroku.com> writes:
>> I ask because I suspect that this might be the same strcoll() bug I
>> describe here: https://bugzilla.redhat.com/show_bug.cgi?id=1320356
>
> The report is against 9.4, though, so strcoll shouldn't matter.

It shouldn't matter that it doesn't agree with strxfrm(), which is the
most important thing, but not the only thing. I think that it would be
interesting to know if this is a strcoll() problem. I have no
intention of pursuing a "fix" from the glibc people.


--
Peter Geoghegan

Re: BUG #14112: sorting v and w is broken with et_EE locate

From
Thomas Munro
Date:
On Thu, Apr 28, 2016 at 4:43 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Thu, Apr 28, 2016 at 4:24 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> On Wed, Apr 27, 2016 at 9:22 PM, Peter Geoghegan <pg@heroku.com> wrote:
>>> On Wed, Apr 27, 2016 at 9:07 PM, Thomas Munro
>>> <thomas.munro@enterprisedb.com> wrote:
>>>> That does look odd.
>>>
>>> What happens if you replace the dot in each string with a single 'x'
>>> character, Georg? Does the sort order look correct to you then?
>>
>> I ask because I suspect that this might be the same strcoll() bug I
>> describe here: https://bugzilla.redhat.com/show_bug.cgi?id=1320356
>>
>> (In particular, see my remarks on Austria and Germany.)
>
> No change here.  This system has locales-all ("GNU C Library:
> Precompiled locale data") package version 2.19-18+deb8u4 (and same
> libc6).
>
> munro@yoga:~/junk$ LC_COLLATE=et_EE.utf8 sort < input
> a1.ee
> vvbwjbln7.ee
> wwvl8.ee
> wxxezi6lkaq7eoi.ee
> vyz.ee
> munro@yoga:~/junk$ LC_COLLATE=et_EE.utf8 sort < input2
> a1xee
> vvbwjbln7xee
> wwvl8xee
> wxxezi6lkaq7eoixee
> vyzxee

Same result on a CentOS box.  I think the OP should probably write to
bug-glibc-locales@gnu.org.

--
Thomas Munro
http://www.enterprisedb.com

Re: BUG #14112: sorting v and w is broken with et_EE locate

From
Georg Kahest
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 04/28/2016 08:09 AM, Thomas Munro wrote:
> On Thu, Apr 28, 2016 at 4:43 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Thu, Apr 28, 2016 at 4:24 PM, Peter Geoghegan <pg@heroku.com>
>> wrote:
>>> On Wed, Apr 27, 2016 at 9:22 PM, Peter Geoghegan
>>> <pg@heroku.com> wrote:
>>>> On Wed, Apr 27, 2016 at 9:07 PM, Thomas Munro
>>>> <thomas.munro@enterprisedb.com> wrote:
>>>>> That does look odd.
>>>>
>>>> What happens if you replace the dot in each string with a
>>>> single 'x' character, Georg? Does the sort order look correct
>>>> to you then?
>>>
>>> I ask because I suspect that this might be the same strcoll()
>>> bug I describe here:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1320356
>>>
>>> (In particular, see my remarks on Austria and Germany.)
>>
>> No change here.  This system has locales-all ("GNU C Library:
>> Precompiled locale data") package version 2.19-18+deb8u4 (and
>> same libc6).
>>
>> munro@yoga:~/junk$ LC_COLLATE=et_EE.utf8 sort < input a1.ee
>> vvbwjbln7.ee wwvl8.ee wxxezi6lkaq7eoi.ee vyz.ee
>> munro@yoga:~/junk$ LC_COLLATE=et_EE.utf8 sort < input2 a1xee
>> vvbwjbln7xee wwvl8xee wxxezi6lkaq7eoixee vyzxee
>
> Same result on a CentOS box.  I think the OP should probably write
> to bug-glibc-locales@gnu.org.
>


Hello,

Indeed the problem seems to be related to to glibc itself handling it
incorrectly.

Thank you for your time, ill report the bug to glibc.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJXIeHSAAoJEFDOdES6xIFjLNQP/3hLYCBS1ex78SN+uIZGT4xV
1nx/xadb9qQ3AoVT2CsHVgL9QVwCXmNbXR/tAfdj6OKy9i8WMzBvuI4cvZjKB+ei
f1FeJc2ldnpLgAQ9/R9FRqMpGch4MnkhwxhK4+c69TqTvugvPwGpSvAddPj5edxn
IM2diNuCtQKSw+fHwP1/N4hB67TfFX+rfoHbdhwSlGbuK8Lxs+kpxIecP1WutcS5
jrFbptaLlWKMTptQmyVKINu8sztRxMdlJ5ywUr9UpL2GdaQv3SzhzC5OOcDh4a96
stmh7fZ6DBBpvvGWg/bJLNTi+nOgyEb9vFwKQMvseyoXnyRG4JyvoNJyzDpccyVt
1lWYhnlPuSFTYOI9zWfhcmWgZ5XY7g3kC3B5Ode5pawvSsHZ1ynvsxEHOK9i3J67
nAU4g1ehjw9sYwl+5g7+xuRXNoGIAr4prGAzlM7ZOG+2mwEpAqaQGkxTYZ+Sts0i
I/+SIMpDfQbZmMjzkKvwBSAqJZlCZign1fEt234uhRuIfI3ucxhBAegXGFpUzQSS
/qy0knRog+ouTwQN3pV1QAbcHfC8ZcBiPSzivT3KNneHaD7usz2GrD8wB8OyAbMH
nzThOL6aUhLUdmCOpm9/zil4HTXcTWxSUcoWLhHEAjzlto+74I4yUJM7L5LK7Lfq
LWBpBv7i2Nj/goR2w2Ip
=Lfo9
-----END PGP SIGNATURE-----