Home > mailing lists

Re: Mixing different LC_COLLATE and database encodings - Mailing list pgsql-general

From	Bill Moseley
Subject	Re: Mixing different LC_COLLATE and database encodings
Date	February 19, 2006 00:48:31
Msg-id	20060219014830.GA6918@hank.org Whole thread Raw
In response to	Re: Mixing different LC_COLLATE and database encodings (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Mixing different LC_COLLATE and database encodings (Peter Eisentraut <peter_e@gmx.net>) Re: Mixing different LC_COLLATE and database encodings (Greg Stark <gsstark@mit.edu>)
List	pgsql-general

Tree view

On Sat, Feb 18, 2006 at 01:40:09PM -0500, Tom Lane wrote:
> Bill Moseley <moseley@hank.org> writes:
> > - To clarify the first point, if the database is encoded utf-8 and
> > lc_collate is en_US then Postgresql does NOT try to convert utf-8 to
> > 8859-1 before sorting.
>
> Basically, this is a horribly bad idea and you should never do it.
> The database encoding should always match what the locale assumes
> for its character set (unless the locale is "C", which doesn't care).

What's a bad idea?  Having a lc_collate on the cluster that doesn't
support the encodings in the databases?

> We'd enforce that you never do it if we knew a portable way to determine
> the character set assumed by an LC_COLLATE setting.

Again, not sure what "it" is, but I do find it confusing when the
cluster can have only one lc_collate, but the databases on that
cluster can have more than one encoding.  That's why I was asking
how postgresql handles (possibly) different encodings.

Are you saying that if a database is encoded as utf8 then the cluster
should be initiated with something like en_US.utf8?  And then all
databaes on that cluster should be encoded the same?

I suspect I don't understand how LC_COLLATE works that well.

I thought the locale defines the order of the characters, but not the
encoding of those characters.  Maybe that's not correct. I assumed the
same locale should sort the same chars represented in different
encodings the same way.  Maybe that's not the case:

    $ LC_ALL=en_US.UTF-8 locale charmap
    UTF-8

    $ LC_ALL=en_US locale charmap
    ISO-8859-1

    $ LC_ALL=C locale charmap
    ANSI_X3.4-1968

--
Bill Moseley
moseley@hank.org

pgsql-general by date:

From: Michael Glaesemann
Date: 18 February 2006, 23:22:08
Subject: Re: Domains

From: Michael Fuhr
Date: 19 February 2006, 01:21:05
Subject: Re: PostgreSQL Functions / PL-Language

Re: Mixing different LC_COLLATE and database encodings - Mailing list pgsql-general

Previous

Next