Thread: Default PostgreSQL server encoding - Change to unicode (utf8)
Hello, Thank you for reading my post. When I run the command: I get the following messages: I would like the cluster (and the databases) encoding to be unicode (UTF8). What can I do? Can I set the default encoding I want for the whole PostgreSQL server somewhere? Thank you for helping and best regards. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Default-PostgreSQL-server-encoding-Change-to-unicode-utf8-tp5505985p5505985.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On 02/22/2012 11:20 AM, Léa Massiot wrote: > Hello, > > Thank you for reading my post. > > When I run the command: > > I get the following messages: The messages ? > > I would like the cluster (and the databases) encoding to be unicode (UTF8). > > What can I do? > Can I set the default encoding I want for the whole PostgreSQL server > somewhere? A good place to start for your options is: http://www.postgresql.org/docs/9.0/interactive/locale.html http://www.postgresql.org/docs/9.0/interactive/multibyte.html > > Thank you for helping and best regards. > > -- > View this message in context: http://postgresql.1045698.n5.nabble.com/Default-PostgreSQL-server-encoding-Change-to-unicode-utf8-tp5505985p5505985.html > Sent from the PostgreSQL - general mailing list archive at Nabble.com. > -- Adrian Klaver adrian.klaver@gmail.com
Hello. Thank you for your answer. I used the <raw> and </raw> tags, this is probably the reason why you couldn't see the messages... Thank you for the two links. I read this (in the second one): "On Windows, however, UTF-8 encoding can be used with any locale." yet I still have some questions... On Unix (Debian GNU Linux Squeeze): ========================================================================================= psql_cmd> \l ----------+----------+----------+-------------+------------ Name | Owner | Encoding | Collation | Ctype ----------+----------+----------+-------------+------------ template1 | postgres | UTF8 | en_us.UTF-8 | en_us.UTF-8 ========================================================================================= On Windows (XP): ========================================================================================= psql_cmd> \l ----------+----------+----------+----------------------------+--------------------------- Name | Owner | Encoding | Collation | Ctype ----------+----------+----------+----------------------------+--------------------------- template1 | postgres | UTF8 | English_United States.1252 | English_United States.1252 ========================================================================================= Question 1 Focusing on the "Collation" and "Ctype" columns, has "English_United States.1252" something to do with "Windows-1252" ("CP-1252")? "CP-1252" is an 8 bits character encoding (so, it can map codes to 2^8 characters at most). How compatible is this with an "UTF8" "Encoding"? For people testing PostgreSQL under Windows, is there any other more appropriate "Collation" that could be used to set a database collation? There is no "locale -a" command avaiblable under Windows. Is there any workaround? Question 2 Suppose I have a PostgreSQL table which has a VARCHAR column "text". Suppose I want to insert the string "Li 李" which contains the Chinese ideograph 李. How can I do this with an "INSERT INTO" command? I wish I could do something like: INSERT INTO t (text) VALUES ('Li U+674E') or INSERT INTO t (text) VALUES ('Li \u674E') How can I do this? Thanks and best regards. -- Léa -- View this message in context: http://postgresql.1045698.n5.nabble.com/Default-PostgreSQL-server-encoding-Change-to-unicode-utf8-tp5505985p5518720.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Monday, February 27, 2012 3:55:43 am Léa Massiot wrote: > Hello. > Thank you for your answer. > Thank you for the two links. > I read this (in the second one): "On Windows, however, UTF-8 encoding can > be used with any locale." yet I still have some questions... > > Question 1 > Focusing on the "Collation" and "Ctype" columns, > has "English_United States.1252" something to do with "Windows-1252" > ("CP-1252")? > "CP-1252" is an 8 bits character encoding (so, it can map codes to 2^8 > characters at most). > How compatible is this with an "UTF8" "Encoding"? > For people testing PostgreSQL under Windows, is there any other more > appropriate "Collation" that could be used to set a database collation? This is answered in the first link I sent: http://www.postgresql.org/docs/9.0/interactive/locale.html " Windows uses more verbose locale names, such as German_Germany or Swedish_Sweden.1252, but the principles are the same." " LC_COLLATE String sort order LC_CTYPE Character classification (What is a letter? Its upper-case equivalent? " So appropriate depends on what sorting character rules you want to follow. By the way both of these are fixed at database creation and cannot be changed. > There is no "locale -a" command avaiblable under Windows. Is there any > workaround? A little Googling found this. I am not a regular Windows user, so there may be better options out there: http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/systeminfo.mspx?mfr=true > > Thanks and best regards. > -- > Léa > -- Adrian Klaver adrian.klaver@gmail.com