Thread: Default PostgreSQL server encoding - Change to unicode (utf8)

Default PostgreSQL server encoding - Change to unicode (utf8)

From
Léa Massiot
Date:
Hello,

Thank you for reading my post.

When I run the command:

I get the following messages:

I would like the cluster (and the databases) encoding to be unicode (UTF8).

What can I do?
Can I set the default encoding I want for the whole PostgreSQL server
somewhere?

Thank you for helping and best regards.

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Default-PostgreSQL-server-encoding-Change-to-unicode-utf8-tp5505985p5505985.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: Default PostgreSQL server encoding - Change to unicode (utf8)

From
Adrian Klaver
Date:
On 02/22/2012 11:20 AM, Léa Massiot wrote:
> Hello,
>
> Thank you for reading my post.
>
> When I run the command:
>
> I get the following messages:

The messages ?

>
> I would like the cluster (and the databases) encoding to be unicode (UTF8).
>
> What can I do?
> Can I set the default encoding I want for the whole PostgreSQL server
> somewhere?

A good place to start for your options is:
http://www.postgresql.org/docs/9.0/interactive/locale.html
http://www.postgresql.org/docs/9.0/interactive/multibyte.html

>
> Thank you for helping and best regards.
>
> --
> View this message in context:
http://postgresql.1045698.n5.nabble.com/Default-PostgreSQL-server-encoding-Change-to-unicode-utf8-tp5505985p5505985.html
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>


--
Adrian Klaver
adrian.klaver@gmail.com

Re: Default PostgreSQL server encoding - Change to unicode (utf8)

From
Léa Massiot
Date:
Hello.
Thank you for your answer.
I used the <raw> and </raw> tags, this is probably the reason
why you couldn't see the messages...
Thank you for the two links.
I read this (in the second one): "On Windows, however, UTF-8 encoding can be
used with any locale." yet I still have some questions...

On Unix (Debian GNU Linux Squeeze):

=========================================================================================
  psql_cmd> \l

  ----------+----------+----------+-------------+------------
  Name      | Owner    | Encoding | Collation   | Ctype
  ----------+----------+----------+-------------+------------
  template1 | postgres | UTF8     | en_us.UTF-8 | en_us.UTF-8

=========================================================================================

On Windows (XP):

=========================================================================================
  psql_cmd> \l


----------+----------+----------+----------------------------+---------------------------
  Name      | Owner    | Encoding | Collation                  | Ctype

----------+----------+----------+----------------------------+---------------------------
  template1 | postgres | UTF8     | English_United States.1252 |
English_United States.1252

=========================================================================================

Question 1
  Focusing on the "Collation" and "Ctype" columns,
  has "English_United States.1252" something to do with "Windows-1252"
("CP-1252")?
  "CP-1252" is an 8 bits character encoding (so, it can map codes to 2^8
characters at most).
  How compatible is this with an "UTF8" "Encoding"?
  For people testing PostgreSQL under Windows, is there any other more
appropriate "Collation" that could be used to set a database collation?
  There is no "locale -a" command avaiblable under Windows. Is there any
workaround?

Question 2
  Suppose I have a PostgreSQL table which has a VARCHAR column "text".
  Suppose I want to insert the string "Li 李" which contains the Chinese
ideograph 李.
  How can I do this with an "INSERT INTO" command?
  I wish I could do something like:
  INSERT INTO t (text) VALUES ('Li U+674E')
  or
  INSERT INTO t (text) VALUES ('Li \u674E')
  How can I do this?

Thanks and best regards.
--
Léa

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Default-PostgreSQL-server-encoding-Change-to-unicode-utf8-tp5505985p5518720.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: Default PostgreSQL server encoding - Change to unicode (utf8)

From
Adrian Klaver
Date:
On Monday, February 27, 2012 3:55:43 am Léa Massiot wrote:
> Hello.
> Thank you for your answer.


> Thank you for the two links.
> I read this (in the second one): "On Windows, however, UTF-8 encoding can
> be used with any locale." yet I still have some questions...
>

> Question 1
>   Focusing on the "Collation" and "Ctype" columns,
>   has "English_United States.1252" something to do with "Windows-1252"
> ("CP-1252")?
>   "CP-1252" is an 8 bits character encoding (so, it can map codes to 2^8
> characters at most).
>   How compatible is this with an "UTF8" "Encoding"?
>   For people testing PostgreSQL under Windows, is there any other more
> appropriate "Collation" that could be used to set a database collation?

This is answered in the first link I sent:

http://www.postgresql.org/docs/9.0/interactive/locale.html

" Windows uses more verbose locale names, such as German_Germany or Swedish_Sweden.1252,
but the principles are the same."

"
LC_COLLATE    String sort order
LC_CTYPE    Character classification (What is a letter? Its upper-case equivalent?
"

So appropriate depends on what sorting character rules you want to follow.  By the way
both of these are fixed at database creation and cannot be changed.

>   There is no "locale -a" command avaiblable under Windows. Is there any
> workaround?

A little Googling found this. I am not a regular Windows user, so there may be
better options out there:

http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/systeminfo.mspx?mfr=true


>
> Thanks and best regards.
> --
> Léa
>


--
Adrian Klaver
adrian.klaver@gmail.com