Thread: incorrect collation order in at least some non-C locales
Hi PostgreSQL developers! I recently got the email below and confirmed that I get the same broken collation order in de_DE.UTF-8 as the original reporter (who probably uses pt_PT.something). It works fine in C, though. System: Debian unstable, according to the reporter the bug does not happen under Windows. Any idea? Thank you! Martin ----- Forwarded message from fgaroso <fgaroso@ig.com.br> ----- To: mpitt@debian.org From: fgaroso <fgaroso@ig.com.br> Subject: Bug PostgreSQL Date: Wed, 1 Feb 2006 14:12:04 -0200 X-Spam-Status: No, score=2.1 required=4.0 tests=BAYES_50,DNS_FROM_RFC_POST, RCVD_IN_NJABL_PROXY autolearn=no version=3.0.3 Content-Description: Mail message body Version: Unstable package postgresql-8.1 Date Download: 2006-02-01 (apt-get) Description: create table tmp ( name char(30) ); create index tmp_idx on tmp (name); insert into tmp values ( 'SUEKO' ); insert into tmp values ( 'SUE E' ); insert into tmp values ( 'SUE T' ); select * from tmp order by name; Returns incorrect order: teste=# select * from tmp order by name desc; name -------------------------------- SUE T SUEKO SUE E (3 registros) Note: Version for windows tested and OK ;) ----- End forwarded message ----- -- Martin Pitt http://www.piware.de Ubuntu Developer http://www.ubuntu.com Debian Developer http://www.debian.org
On Sun, 5 Feb 2006, Martin Pitt wrote: > Hi PostgreSQL developers! > > I recently got the email below and confirmed that I get the same > broken collation order in de_DE.UTF-8 as the original reporter (who > probably uses pt_PT.something). It works fine in C, though. This is probably not "broken collation order" but instead how those locales are defined. Many natural language locales are defined to not use spaces in first pass ordering (like the ordering in a dictionary). Thus the first pass strings are effectively, "SUEKO", "SUEE" and "SUET". > System: Debian unstable, according to the reporter the bug does not > happen under Windows. The reason for that is that he has "C" locale under Windows.
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > This is probably not "broken collation order" but instead how those > locales are defined. I'd only consider it "broken" if you get different sort order from sort(1) under the same locale. Here is an example from Fedora 4 showing that sort(1) has the same ideas about what de_DE sort order is: [tgl ~]$ cat zzz SUEKO SUE E SUE T [tgl ~]$ sort zzz SUE E SUE T SUEKO [tgl ~]$ LANG=de_DE.utf8 sort zzz SUE E SUEKO SUE T [tgl ~]$ regards, tom lane