Thread: C locale + unicode
Does anyone know if it's permitted to use the 'C' locale with a UNICODE encoded database in 7.4.6? And will it work correctly? Or do you have to use a en_XX.utf8 locale if you want to use unicode encoding for your databases? John Sidney-Woollett
John Sidney-Woollett <johnsw@wardbrook.com> writes: > Does anyone know if it's permitted to use the 'C' locale with a UNICODE > encoded database in 7.4.6? Yes. > And will it work correctly? For suitably small values of "correctly", sure. Textual sort ordering would be by byte values, which might be a bit unintuitive for Unicode characters. And I don't think upper()/lower() would work very nicely for characters outside the basic ASCII set. But AFAIR those are the only gotchas. People in the Far East, who tend not to care about either of those points, use 'C' locale with various multibyte character sets all the time. regards, tom lane
Tom, thanks for the info. Do upper() and lower() only work correctly for postgres v8 UTF-8 encoded databases? (They don't seem to work on chars > standard ascii on my 7.4.6 db). Is this locale or encoding specific issue? Is there likely to be a significant difference in speed between a database using a UTF-8 locale and the C locale (if you don't care about the small issues you detailed below)? Thanks. John Sidney-Woollett Tom Lane wrote: > John Sidney-Woollett <johnsw@wardbrook.com> writes: > >>Does anyone know if it's permitted to use the 'C' locale with a UNICODE >>encoded database in 7.4.6? > > > Yes. > > >>And will it work correctly? > > > For suitably small values of "correctly", sure. Textual sort ordering > would be by byte values, which might be a bit unintuitive for Unicode > characters. And I don't think upper()/lower() would work very nicely > for characters outside the basic ASCII set. But AFAIR those are the > only gotchas. People in the Far East, who tend not to care about either > of those points, use 'C' locale with various multibyte character sets > all the time. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
John Sidney-Woollett <johnsw@wardbrook.com> writes: > Do upper() and lower() only work correctly for postgres v8 UTF-8 encoded > databases? (They don't seem to work on chars > standard ascii on my > 7.4.6 db). Is this locale or encoding specific issue? Before 8.0, they don't work on multibyte characters, period. In 8.0 they work according to your locale setting. > Is there likely to be a significant difference in speed between a > database using a UTF-8 locale and the C locale (if you don't care about > the small issues you detailed below)? I'd expect the C locale to be materially faster for text sorting. Don't have a number offhand. regards, tom lane
Thanks for the info - to the point and much appreciated! John Sidney-Woollett Tom Lane wrote: > John Sidney-Woollett <johnsw@wardbrook.com> writes: > >>Do upper() and lower() only work correctly for postgres v8 UTF-8 encoded >>databases? (They don't seem to work on chars > standard ascii on my >>7.4.6 db). Is this locale or encoding specific issue? > > > Before 8.0, they don't work on multibyte characters, period. In 8.0 > they work according to your locale setting. > > >>Is there likely to be a significant difference in speed between a >>database using a UTF-8 locale and the C locale (if you don't care about >>the small issues you detailed below)? > > > I'd expect the C locale to be materially faster for text sorting. > Don't have a number offhand. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)