Re: collation & UTF-8 - Mailing list pgsql-general

From Tomi NA
Subject Re: collation & UTF-8
Date
Msg-id d487eb8e0602240945g723a9900l84b01e9abcedd493@mail.gmail.com
Whole thread Raw
In response to Re: collation & UTF-8  (Martijn van Oosterhout <kleptog@svana.org>)
Responses Re: collation & UTF-8  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general

On 2/24/06, Martijn van Oosterhout <kleptog@svana.org> wrote:
On Fri, Feb 24, 2006 at 06:23:07PM +0100, Tomi NA wrote:
> I'm using PosgreSQL 8.1.2 on linux and want to load UTF-8 encoded varchars.
> While I can store and get at stored text correctly, the ORDER BY places all
> accented characters (Croatian, in this case - probably marked hr_HR) after
> non-accented characters.
> This is no showstopper, but it does affect the general perception of
> application quality.

Collation is a function of the OS. Basically, is the locale of your
database setup for UTF-8 collation? It would probably be called
hr_HR.UTF-8.

You were right about this:
 LC_ALL=hr_HR.UTF-8 sort < test.txt
(seemingly) collates the same way that pgsql does. Accented letters at the end of the alphabet. I've tried hr_HR.UTF8 as well, without results.
 Btw, my database is created with
CREATE DATABASE mydb
  WITH OWNER = postgres
       ENCODING = 'UTF8'
       TABLESPACE = pg_default;

Yes, setup the locale correctly. In general, postgresql should give the
same results as sort(1) on the command-line. Use that to experiment.

LC_ALL=hr_HR.UTF-8 sort < input > output

I'm very sorry to report it does not work. :(
Btw,
set | grep LC_
returns nothing...is this a possible source of the problem?

Tomislav

pgsql-general by date:

Previous
From: CG
Date:
Subject: Re: ltree + gist index performance degrades significantly over a night
Next
From: Tom Lane
Date:
Subject: Re: ltree + gist index performance degrades significantly over a night