Thread: Unicode encoding

Unicode encoding

From
"William Sweet"
Date:
Hi Guys,
 
I've been searching the web lately for discussions on PostgreSQL multibyte support and UTF-8 support. The result was similar to the situation of asking 10 doctors the same question and getting 10 different answers. So, I solicit your help on solving this great mystery.
 
As I understand it, multibyte support has been enabled since v7.3. I am running a RH9 installation that had v7.3 as an included pkg, so I installed it as part of the RH9 installation. So I assume multibyte support is enabled. Now, I'd like to only store Unicode chars in my PostgreSQL dbs. I hear there are 3 ways to accomplish this:
 
1) during PostgreSQL configure/build (installation level)
2) during initdb (cluster level)
3) CREATE DATABASE (db level)
 
...but there are some "not-so-happy" stories on the net. For instance, "it's not 'true' Unicode support when implemented at the db level", or "sorting and regex do not work properly with a cluster level implementation", etc. I've read the v7.3 Admin Guide section 7.2 Multibyte support... sounds reasonable. So my question is, what is the official way to enable "true" Unicode storage and retrieval, so that LIKE, sorting, and regex in perl::DBI work properly? I am a tad concerned also that I don't see PostgreSQL mentioned on the Unicode products page; http://www.unicode.org/onlinedat/products.html
 
Any advice would be greatly appreciated.
 
Thanks, Will
 

Re: Unicode encoding

From
Peter Eisentraut
Date:
William Sweet wrote:
> support is enabled. Now, I'd like to only store Unicode chars in my
> PostgreSQL dbs. I hear there are 3 ways to accomplish this:
>
> 1) during PostgreSQL configure/build (installation level)
> 2) during initdb (cluster level)
> 3) CREATE DATABASE (db level)

Each one of these only sets the default for the one below it.

> ...but there are some "not-so-happy" stories on the net. For
> instance, "it's not 'true' Unicode support when implemented at the db
> level",

That is bogus.

> or "sorting and regex do not work properly with a cluster
> level implementation",

That is true.

 etc. I've read the v7.3 Admin Guide section
> 7.2 Multibyte support... sounds reasonable. So my question is, what
> is the official way to enable "true" Unicode storage and retrieval,
> so that LIKE, sorting, and regex in perl::DBI work properly?

Sorting will not work correctly with Unicode.

> I am a
> tad concerned also that I don't see PostgreSQL mentioned on the
> Unicode products page; http://www.unicode.org/onlinedat/products.html

Well, we're also not listed on the ISO 8859 products page, but I don't
think that matters. :-)