Home > mailing lists

Re: locale - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: locale
Date	April 8, 2004 12:47:29
Msg-id	14589.1081439245@sss.pgh.pa.us Whole thread Raw
In response to	Re: locale (Dennis Bjorklund <db@zigo.dhs.org>)
Responses	Re: locale
List	pgsql-hackers

Tree view

Dennis Bjorklund <db@zigo.dhs.org> writes:
> On Thu, 8 Apr 2004, Tom Lane wrote:
>> No, the ordering *will* be the same as it was before, because strcoll()
>> is still functioning the same.  You'd get the same answer from a sort
>> operation since it depends on the same operators.

> But, now when we compare these strings as latin1 strings
> it's no longer the case that c3 84 72 61 > c3 85 6b 65. As latin1 strings
> we compare each character and c3 = c3, and then 84 < 85 (in latin1 84
> and 85 are some control characters).

You're missing the point: strcoll() is not going to compare them as
latin1 strings.  It's going to interpret the bytes as utf-8 strings,
because that's what LC_CTYPE will tell it to do.  So the sort ordering
of any particular byte string remains the same as it was before, and
the index does not become corrupt.

Whether the index is delivering answers that you find useful is a whole
different question ;-).  For example, if you do a "WHERE col = 'foo'"
type of query, you'll be presenting the latin1 encoding of 'foo', which
may well not equal the utf-8 encoding of 'foo', meaning you won't find
that row even if it exists.  However this would be true whether you used
the index or not --- it's really a data failure and not an index failure.

> a) What have we gained by copying this table into the latin1 database.
>    It looks broken to me.

It looks broken to me too, in terms of user functionality.  I was simply
responding to your assertion that the indexes will be corrupt.  They
won't be.

AFAICS, to support per-database encoding and locale correctly, CREATE
DATABASE would have to be prepared to re-encode *and* re-index every
textual column in the copied database.  I don't really foresee us going
to that much work in order to have a solution that's still half-baked
and non-spec-compliant.  It's much more likely that per-column locale
and encoding will get done instead.
        regards, tom lane

pgsql-hackers by date:

From: Tom Lane
Date: 08 April 2004, 12:32:29
Subject: Re: PostgreSQL configuration

From: pgsql@mohawksoft.com
Date: 08 April 2004, 13:27:36
Subject: Re: PostgreSQL configuration

Re: locale - Mailing list pgsql-hackers

Previous

Next