Thread: Can I change the character encoding for a DB?
Hi All,
I created a DB with SQL_ASCII encoding but I want to change it to EUC_JP. Can I do this without creating a new database? I didn't find a way in the docs, sorry.
set server_encoding = EUC_JP;
didn't work.
My understanding is that the encoding is purely information and doesn't actually affect the data stored in the DB. Is that right?
The reason I want to do this is that my java app is corrupting Japanese Kanji. I suspect that the JDBC driver uses this information. VB apps using ODBS seem to be fine.
"Iain" <iain@mst.co.jp> writes: > I created a DB with SQL_ASCII encoding but I want to change it to EUC_JP. > Can I do this without creating a new database? If you're certain that the data already in the database is in EUC_JP encoding, it would probably work to do a quick UPDATE on the pg_database.encoding entry for the database (and then start fresh backend sessions, since they won't notice otherwise). If there's data in there that is not correct per EUC_JP then this will create big problems, so don't say I didn't warn you ... regards, tom lane
Hi Tom, Thanks for the info and warning. Fortunately this is just a development database so any problems won't be too big a deal. Does dumping and re-creating the DB avoid such a potential problem? I read the info written by Tasuo, but I have to admit that the whole encoding thing is still full of mystery for me. For example, if the database has a SQL_ASCII encoding and you enter japanese text, what encoding is used? Does it depend on the client used to enter it? I wonder if it is possible that there is some SJIS data in my DB... My understanding is that the DB doesn't ensure that data makes sense in the set encoding, it just allows the db to make conversions when dealing with clients such as the ODBC and JDBC drivers. Anyway, in this case the base data came from a EUC_JP encoded DB, but a VB app (and some psql scripts) has been used to enter and modify data. It uses the ODBC driver provided for postgres, everything had been working smoothly and I only noticed a problem when I tried getting data using java and the jdbc driver. I created a new mini test DB with the EUC_JP encoding and the data was returned correctly. regards, Iain ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "Iain" <iain@mst.co.jp> Cc: <pgsql-admin@postgresql.org> Sent: Tuesday, January 27, 2004 2:05 AM Subject: Re: [ADMIN] Can I change the character encoding for a DB? > "Iain" <iain@mst.co.jp> writes: > > I created a DB with SQL_ASCII encoding but I want to change it to EUC_JP. > > Can I do this without creating a new database? > > If you're certain that the data already in the database is in EUC_JP > encoding, it would probably work to do a quick UPDATE on the > pg_database.encoding entry for the database (and then start fresh > backend sessions, since they won't notice otherwise). If there's data in > there that is not correct per EUC_JP then this will create big problems, > so don't say I didn't warn you ... > > regards, tom lane
"Iain" <iain@mst.co.jp> writes: > I read the info written by Tasuo, but I have to admit that the whole > encoding thing is still full of mystery for me. For example, if the database > has a SQL_ASCII encoding and you enter japanese text, what encoding is used? > Does it depend on the client used to enter it? I'm not an expert on this stuff, but my understanding is that setting the encoding to SQL_ASCII disables all encoding-conversion logic --- whatever bytes the client sends are what get stored. So if you want to change the setting from SQL_ASCII to something else, you'd better be sure that everything every client has stored is compatible with the new encoding specification. > My understanding is that the DB > doesn't ensure that data makes sense in the set encoding, When you select a real encoding spec (not SQL_ASCII) then the DB does check that entered strings are valid in that encoding. Also it will attempt to do encoding conversion, if clients specify they are using a client_encoding that is different from the database encoding (and is not SQL_ASCII). Hope that helps --- if not, try Tatsuo ... regards, tom lane
Hi Tom, Thanks for that. It's somewhat clearer now. I went ahead and rebuilt the DB with no problems so I guess that whatever data was in the DB converted OK. Regards Iain > "Iain" <iain@mst.co.jp> writes: > > I read the info written by Tasuo, but I have to admit that the whole > > encoding thing is still full of mystery for me. For example, if the database > > has a SQL_ASCII encoding and you enter japanese text, what encoding is used? > > Does it depend on the client used to enter it? > > I'm not an expert on this stuff, but my understanding is that setting > the encoding to SQL_ASCII disables all encoding-conversion logic --- > whatever bytes the client sends are what get stored. So if you want > to change the setting from SQL_ASCII to something else, you'd better be > sure that everything every client has stored is compatible with the new > encoding specification. > > > My understanding is that the DB > > doesn't ensure that data makes sense in the set encoding, > > When you select a real encoding spec (not SQL_ASCII) then the DB does > check that entered strings are valid in that encoding. Also it will > attempt to do encoding conversion, if clients specify they are using > a client_encoding that is different from the database encoding (and > is not SQL_ASCII). > > Hope that helps --- if not, try Tatsuo ... > > regards, tom lane