Home > mailing lists

Re: 8.3 to 8.4 Upgrade issues - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: 8.3 to 8.4 Upgrade issues
Date	August 10, 2010 20:56:35
Msg-id	21669.1281484586@sss.pgh.pa.us Whole thread Raw
In response to	Re: 8.3 to 8.4 Upgrade issues (Rod Taylor <rod.taylor@gmail.com>)
List	pgsql-hackers

Tree view

Rod Taylor <rod.taylor@gmail.com> writes:
> Agreed with it being an interesting choice of settings. Nearly all of
> the data is 7-bit ASCII and what isn't seems to be a mix of UTF8,
> LATIN1, and LATIN15.

> I'm pretty sure it interpreted en_US to be LATIN1. There haven't been
> any noticeable changes in sorting order that I know of.

Well, if you've got non-ASCII data that you know is not UTF8, then
setting a UTF8-dependent locale setting is a really really bad idea :-(.
You are risking not just bad performance but seriously bad misbehavior.
If you use a LATIN-n (or other single-byte-encoding) locale, the worst
that data in other encodings can do to you is sort into odd positions.
If you use a UTF8 locale and have data of other encodings, then
strcoll() can tell that you are violating the encoding spec, and on
many platforms it goes entirely berserk when you do that.  glibc in
particular does not play nice with that.  You didn't say what platform
this is, but if it's glibc based then you are sitting on a ticking time
bomb, and you had better dump and reinitialize in a safer locale setting
before your data gets eaten.
        regards, tom lane

pgsql-hackers by date:

From: Peter Geoghegan
Date: 10 August 2010, 20:38:23
Subject: Re: string_to_array with an empty input string

From: Tom Lane
Date: 10 August 2010, 21:19:37
Subject: Re: Surprising dead_tuple_count from pgstattuple

Re: 8.3 to 8.4 Upgrade issues - Mailing list pgsql-hackers

Previous

Next