Thread: incorrect collation order in at least some non-C locales

incorrect collation order in at least some non-C locales

From
Martin Pitt
Date:
Hi PostgreSQL developers!

I recently got the email below and confirmed that I get the same
broken collation order in de_DE.UTF-8 as the original reporter (who
probably uses pt_PT.something). It works fine in C, though.

System: Debian unstable, according to the reporter the bug does not
happen under Windows.

Any idea?

Thank you!

Martin

----- Forwarded message from fgaroso <fgaroso@ig.com.br> -----

To: mpitt@debian.org
From: fgaroso <fgaroso@ig.com.br>
Subject: Bug PostgreSQL
Date: Wed, 1 Feb 2006 14:12:04 -0200
X-Spam-Status: No, score=2.1 required=4.0 tests=BAYES_50,DNS_FROM_RFC_POST,
    RCVD_IN_NJABL_PROXY autolearn=no version=3.0.3

Content-Description: Mail message body
Version: Unstable package postgresql-8.1
Date Download: 2006-02-01 (apt-get)

Description:

create table tmp ( name char(30) );
create index tmp_idx on tmp (name);
insert into tmp values ( 'SUEKO' );
insert into tmp values ( 'SUE E' );
insert into tmp values ( 'SUE T' );

select * from tmp order by name;

Returns incorrect order:

teste=# select * from tmp order by name desc;
              name
--------------------------------
 SUE T
 SUEKO
 SUE E
(3 registros)


Note:
Version for windows tested and OK ;)




----- End forwarded message -----

--
Martin Pitt        http://www.piware.de
Ubuntu Developer   http://www.ubuntu.com
Debian Developer   http://www.debian.org

Re: incorrect collation order in at least some non-C locales

From
Stephan Szabo
Date:
On Sun, 5 Feb 2006, Martin Pitt wrote:

> Hi PostgreSQL developers!
>
> I recently got the email below and confirmed that I get the same
> broken collation order in de_DE.UTF-8 as the original reporter (who
> probably uses pt_PT.something). It works fine in C, though.

This is probably not "broken collation order" but instead how those
locales are defined. Many natural language locales are defined to not use
spaces in first pass ordering (like the ordering in a dictionary). Thus
the first pass strings are effectively, "SUEKO", "SUEE" and "SUET".

> System: Debian unstable, according to the reporter the bug does not
> happen under Windows.

The reason for that is that he has "C" locale under Windows.

Re: incorrect collation order in at least some non-C locales

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> This is probably not "broken collation order" but instead how those
> locales are defined.

I'd only consider it "broken" if you get different sort order from
sort(1) under the same locale.  Here is an example from Fedora 4
showing that sort(1) has the same ideas about what de_DE sort order
is:

[tgl ~]$ cat zzz
SUEKO
SUE E
SUE T
[tgl ~]$ sort zzz
SUE E
SUE T
SUEKO
[tgl ~]$ LANG=de_DE.utf8 sort zzz
SUE E
SUEKO
SUE T
[tgl ~]$

            regards, tom lane