Home > mailing lists

Re: C locale versus en_US.UTF8. (Was: String comparision in PostgreSQL) - Mailing list pgsql-general

From	Dmitriy Igrishin
Subject	Re: C locale versus en_US.UTF8. (Was: String comparision in PostgreSQL)
Date	August 29, 2012 19:14:11
Msg-id	CAAfz9KMpYDtKeQa8EfW8ixh0GiVZdJ4Waj5zN36A05cXd5tSUQ@mail.gmail.com Whole thread
In response to	Re: C locale versus en_US.UTF8. (Was: String comparision in PostgreSQL) (Merlin Moncure <mmoncure@gmail.com>)
List	pgsql-general

Tree view

2012/8/29 Merlin Moncure <mmoncure@gmail.com>

On Wed, Aug 29, 2012 at 12:43 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Wed, Aug 29, 2012 at 10:31:21AM -0700, Aleksey Tsalolikhin wrote:
>> On Wed, Aug 29, 2012 at 9:45 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> > citext unfortunately doesn't allow for index optimization of LIKE
>> > queries, which IMNSHO defeats the whole purpose. to the best way
>> > remains to use lower() ...
>> > this will be index optimized and fast as long as you specified C
>> > locale for your database.
>>
>> What is the difference between C and en_US.UTF8, please? We see that
>> the same query (that invokes a sort) runs 15% faster under the C
>> locale. The output between C and en_US.UTF8 is identical. We're
>> considering moving our database from en_US.UTF8 to C, but we do deal
>> with internationalized text.
>
> Well, C has reduced overhead for string comparisons, but obviously
> doesn't work well for international characters. The single-byte
> encodings have somewhat less overhead than UTF8. You can try using C
> locales for databases that don't require non-ASCII characters.

To add:
The middle ground I usually choose is to have a database encoding of
UTF8 but with the C (aka POSIX) locale. This gives you the ability to
store any unicode but indexing operations will use the faster C string
comparison operations for a significant performance boost --
especially for partial string searches on an indexed column. This is
an even more attractive option in 9.1 with the ability to specify
specific collations at runtime.

Good point! Thanks!

--
// Dmitriy.

pgsql-general by date:

From: Vincent Veyron
Date: 29 August 2012, 19:06:58
Subject: Re: Dropping a column on parent table doesn't propagate to children?

From: Merlin Moncure
Date: 29 August 2012, 19:22:20
Subject: Re: Views versus user-defined functions: formatting, comments, performance, etc.

Re: C locale versus en_US.UTF8. (Was: String comparision in PostgreSQL) - Mailing list pgsql-general

Previous

Next