Thread: C locale + unicode

C locale + unicode

From

John Sidney-Woollett

Date:

14 January 2005, 16:28:59

Does anyone know if it's permitted to use the 'C' locale with a UNICODE
encoded database in 7.4.6? And will it work correctly?

Or do you have to use a en_XX.utf8 locale if you want to use unicode
encoding for your databases?

John Sidney-Woollett

Re: C locale + unicode

From

Tom Lane

Date:

14 January 2005, 16:57:18

John Sidney-Woollett <johnsw@wardbrook.com> writes:
> Does anyone know if it's permitted to use the 'C' locale with a UNICODE
> encoded database in 7.4.6?

Yes.

> And will it work correctly?

For suitably small values of "correctly", sure.  Textual sort ordering
would be by byte values, which might be a bit unintuitive for Unicode
characters.  And I don't think upper()/lower() would work very nicely
for characters outside the basic ASCII set.  But AFAIR those are the
only gotchas.  People in the Far East, who tend not to care about either
of those points, use 'C' locale with various multibyte character sets
all the time.

            regards, tom lane

Re: C locale + unicode

From

John Sidney-Woollett

Date:

14 January 2005, 18:26:05

Tom, thanks for the info.

Do upper() and lower() only work correctly for postgres v8 UTF-8 encoded
databases? (They don't seem to work on chars > standard ascii on my
7.4.6 db). Is this locale or encoding specific issue?

Is there likely to be a significant difference in speed between a
database using a UTF-8 locale and the C locale (if you don't care about
the small issues you detailed below)?

Thanks.

John Sidney-Woollett

Tom Lane wrote:

> John Sidney-Woollett <johnsw@wardbrook.com> writes:
>
>>Does anyone know if it's permitted to use the 'C' locale with a UNICODE
>>encoded database in 7.4.6?
>
>
> Yes.
>
>
>>And will it work correctly?
>
>
> For suitably small values of "correctly", sure.  Textual sort ordering
> would be by byte values, which might be a bit unintuitive for Unicode
> characters.  And I don't think upper()/lower() would work very nicely
> for characters outside the basic ASCII set.  But AFAIR those are the
> only gotchas.  People in the Far East, who tend not to care about either
> of those points, use 'C' locale with various multibyte character sets
> all the time.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly

Re: C locale + unicode

From

Tom Lane

Date:

14 January 2005, 18:34:29

John Sidney-Woollett <johnsw@wardbrook.com> writes:
> Do upper() and lower() only work correctly for postgres v8 UTF-8 encoded
> databases? (They don't seem to work on chars > standard ascii on my
> 7.4.6 db). Is this locale or encoding specific issue?

Before 8.0, they don't work on multibyte characters, period.  In 8.0
they work according to your locale setting.

> Is there likely to be a significant difference in speed between a
> database using a UTF-8 locale and the C locale (if you don't care about
> the small issues you detailed below)?

I'd expect the C locale to be materially faster for text sorting.
Don't have a number offhand.

            regards, tom lane

Re: C locale + unicode

From

John Sidney-Woollett

Date:

14 January 2005, 21:02:08

Thanks for the info - to the point and much appreciated!

John Sidney-Woollett

Tom Lane wrote:

> John Sidney-Woollett <johnsw@wardbrook.com> writes:
>
>>Do upper() and lower() only work correctly for postgres v8 UTF-8 encoded
>>databases? (They don't seem to work on chars > standard ascii on my
>>7.4.6 db). Is this locale or encoding specific issue?
>
>
> Before 8.0, they don't work on multibyte characters, period.  In 8.0
> they work according to your locale setting.
>
>
>>Is there likely to be a significant difference in speed between a
>>database using a UTF-8 locale and the C locale (if you don't care about
>>the small issues you detailed below)?
>
>
> I'd expect the C locale to be materially faster for text sorting.
> Don't have a number offhand.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)