Problem while ordering Turkish chars - Mailing list pgsql-hackers

From Devrim GUNDUZ
Subject Problem while ordering Turkish chars
Date
Msg-id Pine.LNX.4.44.0304191307230.12333-100000@emo.org.tr
Whole thread Raw
Responses Re: Problem while ordering Turkish chars  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

I'm experiencing some problems while ordering Turkish characters.

Here is that I mean:

Let's say we have two records:

Onder <- There are two dots on O. (ASCII CODE: 153)
Ozan

In Turkish alphabet, O (dotless) comes before O (with dots). So, Ozan
should be listed before Onder, but it's listed after Onder; since
PostgreSQL thinks that those letters are the same...

To begin with:
operdb=# SELECT version();                                                version
---------------------------------------------------------------------------------------------------------PostgreSQL
7.3.2on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2  
20020903 (Red Hat Linux 8.0 3.2-7)

(installed from rpm)

Here is an simple example: Lines 1, 10,11,12 have O with dots as the first
letter.
==========SELECT oper_uzun_adi FROM operler ORDER BY oper_uzun_adi;
...
Ömer KORKMAZOner BINBASOnur BAŞOnur TURHANOsman Selçuk SARIOĞLUOsman TOPANOzan GÜLENOzan SARIOzge YAZICIOGLUÖzgür
ÇAKICIÖzgürÖZDEMİRÖzlem BAL 
...
==========

But if I apply a select query and use LIKE, PostgreSQL gives me correct
solutions:

operdb=# SELECT oper_uzun_adi FROM operler WHERE oper_uzun_adi ILIKE 'O%';             oper_uzun_adi
------------------------------------------Osman TOPANOner BINBASOzge YAZICIOGLUOnur BAŞOnur TURHANOzan GÜLENOsman
SelçukSARIOĞLUOzan SARI 
(8 rows)

Now for O with dots:
operdb=# SELECT oper_uzun_adi FROM operler WHERE oper_uzun_adi ILIKE 'Ö%';             oper_uzun_adi
------------------------------------------Özgür ÇAKICIÖzlem BALÖmer KORKMAZÖzgür ÖZDEMİR


So, LIKE understands that they are different chars; but if I simply order
them; I do not get the correct result.

Here is the locale setting:

#       Locale settings
#
# (initialized by initdb -- may be changed)
LC_MESSAGES = 'en_US'
LC_MONETARY = 'en_US'
LC_NUMERIC = 'en_US'
LC_TIME = 'en_US'

If I change it to tr_TR, nothing changes.

Could anyone help me?

Best regards,
--
Devrim GUNDUZ
devrim@gunduz.org                devrim.gunduz@linux.org.tr         http://www.gunduz.org



pgsql-hackers by date:

Previous
From: Christopher Kings-Lynne
Date:
Subject: Re: For the ametures. (related to "Are we losing momentum?")
Next
From: mlw
Date:
Subject: Re: Note about upcoming instability in FE/BE protocol