Thread: Encoding and multibye support

Encoding and multibye support

From
"Iain"
Date:
Hi All,
 
I recently had a sight problem with a development database because I used the default encoding of SQL_ASCII. When I tried to load the database into a EUC_JP database of course there were some problems with invlaid EUC_JP characters. Fortunately they were easy to find and fix.
 
Anyway, my search on "encoding" or "multibyte" showed up nothing in the 7.4 documentation. Eventually I found a page written by Tatsuo Ishii in the 7.2 documentation.
 
I think that it's an important area, and is a potential trap for new players so I'd like to see the documentation updated.
 
The following came out of a discussion with Tom Lane. I submitted it as comment in the interactive documentation. I think it would be a good idea to check the details and update the doc:
------
The default encoding SQL_ASCII effectively disables any encoding conversion. This means that your db will accept any kind of data. It's a potential problem as you may end up wth different kinds of encoding being used in both your data and metadata.
 
It would seem that unless you specifically require to store data in various encodings then you should select a specific encoding when creating a new database. Use initdb -E to set the default for all new DBs. This can be overridden when using creating a new DB
------
 
Also, the documentation for installation (chapter 14), creating database clusters (16.2) and creating databases (18.2) doesn't mention encoding at all. Maybe they should. Also 16.2 should link to the documention for initdb (Server Applications, section III). I think that wuld be a good idea.
 
regards
Iain

Re: Encoding and multibye support

From
"Iain"
Date:
Actually I should say that I eventually found the section in chapter 20 (localization) of the 7.4 docs, but I'd like to see this page being linked to from the areas I mentioned, and maybe making it easier to find by searching on words like "encode" "encoding" etc.
 
Regards
Iain