=> show client_encoding ;
client_encoding
-----------------
UNICODE
(1 ligne)
=> select char_length('a'), bit_length('a');
char_length | bit_length
-------------+------------
1 | 8
(1 ligne)
# that's an accented "e"
=> select char_length('é'), bit_length('é'); ;
char_length | bit_length
-------------+------------
1 | 16 <= two bytes
(1 ligne)
pg does not simply store utf-8 data, it also understands it if you set
your encoding correctly (ie. initdb to UNICODE and client_encoding too so
that data doesn't get mangled on the way to the db). It will refuse to eat
illegal UTF8 characters too.
Once you try unicode, all the codepage mess starts to look old...
On Thu, 16 Sep 2004 20:39:48 -0400, Richard Connamacher
<rich.n1@indieimage.com> wrote:
> I'm new to PostgreSQL, and from the looks of it, it's a great database,
> and I'll be using more of it in the future.
>
> I had a quick question if anyone could clear this up. The documentation
> for PostgreSQL (version 7.1, the version this server is using) says that
> it supports multibyte character encodings like Unicode (which implies
> UTF-16 encoding). Later on, the same page says that Unicode is
> represented using UTF-8 encoding. UTF-8 is the 8-bit version of Unicode.
> The multibyte version of Unicode is UTF-16.
>
> So, which is it? If I create a database using Unicode as the encoding,
> will the encoding be UTF-8 (singlebyte) or UTF-16 (multibyte)?
>
> Thanks!
> Rich
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if
> your
> joining column's datatypes do not match
>