Home > mailing lists

Re: Unicode + LC_COLLATE - Mailing list pgsql-general

From	John Sidney-Woollett
Subject	Re: Unicode + LC_COLLATE
Date	April 22, 2004 11:52:48
Msg-id	3487.192.168.0.64.1082640418.squirrel@mercury.wardbrook.com Whole thread Raw
In response to	Re: Unicode + LC_COLLATE (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Unicode + LC_COLLATE
List	pgsql-general

Tree view

Tom Lane said:
> C locale basically means "sort by the byte sequence values".  It'll do
> something self-consistent, but maybe not what you'd like for UTF8
> characters.

OK, that explains that. I guess I will need to try it out to see what the
effect is on extended character sets.

>> Our database is UNICODE with LC_COLLATE=en_US.iso885915.
> Does that sort rationally at all?  I should think you'd need to specify
> an LC_COLLATE setting that's designed for UTF8 encoding, not 8859-15.

Er..., actually the LC_COLLATE for the DB in question is C - I was looking
at the wrong database (wrong telnet session)! So your comments above apply
in this case.

> If you only ever store characters that are in 7-bit ASCII then none of
> this will affect you, and you can get away with broken combinations of
> encoding and locale.  But if you'd like to sort characters outside the
> minimal ASCII set then you need to get it right ...

Tom, thanks for the answers above.

I guess if I have some time I should build some different DBs with
different combinations of encoding and collations and summarise my
findings using different types of data and sort/search commands, in case
anyone else has the same level of confusion that I do...

John Sidney-Woollett

pgsql-general by date:

From: "John Sidney-Woollett"
Date: 22 April 2004, 11:32:14
Subject: Missing OID rant

From: Tom Lane
Date: 22 April 2004, 11:59:44
Subject: Re: [ADMIN] Restoring a Databases that features tserach2

Re: Unicode + LC_COLLATE - Mailing list pgsql-general

Previous

Next