Thread: character problem
Hi, I've got a postgres installation which is accessed by users thru a set of perl CGIs. Unlickily, when the user enters some characters in a description (string) field, like "èòéàùì", the string is stored in the database with other characters instead of those digited by the user. It seems as the locale setting of the client and of the server are different, but in the postgresql.conf file I didn't find such setting. My question is: how can I fix the problem? And most important, is it possible to adjust the already inserted strings without having to update the tables? I mean, is it possible to change the locale settings thus the strings in the database are displayed rightly? Thanks, Luca -- Luca Ferrari fluca1978@infinito.it
> like "èòéàùì", the string is stored in the database with other characters instead of those digited by the user. Well, how does `èòéàùì' look like in your database? Like `èòéà ùì'? In that case you most probably have initialized your cluster with an UTF-8 locale while not using a / the correct client encoding. You can check your database encoding with \l in psql. If your perl clients don't handle UTF-8, just set a client encoding for them. SET client_encoding = LATIN1; for example. The client encoding can be set for a user, as default in postgresql.conf, interactive using the command above and maybe other ways. > And most important, is it possible to adjust the already inserted strings without having to update the tables? Your first need to deside what is wrong at all -- the way the servers stores your chars, or the way your client displays them. > is it possible to change the locale settings thus the strings in the database are displayed rightly? The client encoding can be set anytime, to change the cluster LOCALE or database encoding you need to initDB + dump and restore AFAIK. -- Regards, Hannes Dorbath
Hannes Dorbath's cat, on 10/10/2005 9.28, walking on the keyboard wrote: > The client encoding can be set anytime, to change the cluster LOCALE or > database encoding you need to initDB + dump and restore AFAIK. Do you mean dumping the database, destroying it and then recreating it? I'm sorry, but I'm still new in postgresql, thus it is not clear if meant that or another trick I don't know. Thanks, Luca -- Luca Ferrari fluca1978@infinito.it
On 10.10.2005 10:34, Luca Ferrari wrote: > Do you mean dumping the database, destroying it and then recreating it? > I'm sorry, but I'm still new in postgresql, thus it is not clear if > meant that or another trick I don't know. Let us find the problem before jumping to solutions. Start psql <database> <user>, type \l inside it and paste the output for the database in question. Then type "show client_encoding;" and paste the output as well, please. -- Regards, Hannes Dorbath
On 10.10.2005 11:56, Luca Ferrari wrote: > gestione_personale=# \l > List of databases > Name | Owner | Encoding > --------------------+-----------+----------- > gestione_database | dbmanager | SQL_ASCII > gestione_personale | wwwrun | SQL_ASCII > template0 | dbmanager | SQL_ASCII > template1 | dbmanager | SQL_ASCII > (4 rows) > > gestione_personale=# show client_encoding; > client_encoding > ----------------- > SQL_ASCII > (1 row) OK, so you are not using Unicode. I don't know much about the SQL_ASCII charset, but I expect it to slightly differ from LATIN1. Type SET client_encoding = LATIN1; in psql, then SELECT some data containing the chars you have problems with and see if it is fixed. If not, INSERT a new row with such chars from psql, SELECT that row again and if the chars look right now, your other rows are stored invalid in the datebase and you need to convert them somehow. -- Regards, Hannes Dorbath
Hannes Dorbath's cat, on 10/10/2005 12.08, walking on the keyboard wrote: > OK, so you are not using Unicode. I don't know much about the SQL_ASCII > charset, but I expect it to slightly differ from LATIN1. Type > > SET client_encoding = LATIN1; > > in psql, then SELECT some data containing the chars you have problems > with and see if it is fixed. If not, INSERT a new row with such chars > from psql, SELECT that row again and if the chars look right now, your > other rows are stored invalid in the datebase and you need to convert > them somehow. The old rows are still displayed in a bad format, the newly inserted one is instead shown rightly. Also data in pg_dump present wrong strings, thus any suggestion about how to reconvert already inserted data? Moreover, the set client_encoding change is permanent or do I have to configure it in the postgresql.conf file? Thanks, Luca -- Luca Ferrari fluca1978@infinito.it
On 10.10.2005 14:24, Luca Ferrari wrote: > The old rows are still displayed in a bad format, the newly inserted one > is instead shown rightly. Also data in pg_dump present wrong strings, > thus any suggestion about how to reconvert already inserted data? It should be possible to convert it with an intentionally "wrong" client_encoding and one or two reloads, but I don't have the time ATM to think out how to do it exactly, maybe someone else can help in the meantime. > Moreover, the set client_encoding change is permanent or do I have to > configure it in the postgresql.conf file? No it's for the running session only. Set it global in postgresql.conf, for each user or just fire "SET client_encoding = LATIN1;" as first query in your scripts. -- Regards, Hannes Dorbath
Hannes Dorbath <light@theendofthetunnel.de> writes: > On 10.10.2005 11:56, Luca Ferrari wrote: >> gestione_personale=# \l >> List of databases >> Name | Owner | Encoding >> --------------------+-----------+----------- >> gestione_database | dbmanager | SQL_ASCII >> gestione_personale | wwwrun | SQL_ASCII >> template0 | dbmanager | SQL_ASCII >> template1 | dbmanager | SQL_ASCII > OK, so you are not using Unicode. I don't know much about the SQL_ASCII > charset, but I expect it to slightly differ from LATIN1. Type > SET client_encoding = LATIN1; No, SQL_ASCII represents the complete absence of any encoding knowledge. With this database setting, changing client_encoding is a complete no-op. Postgres will just absorb and re-emit strings exactly as they were supplied originally, no matter what client_encoding is. The cause of the unexpected conversions Luca is having trouble with must be somewhere on the client side. It's not going to be done by Postgres with this encoding setting. If you are actively dealing with non-7-bit-ASCII data, using SQL_ASCII for the database encoding is probably a bad idea, exactly because Postgres won't help you out at all with converting encodings or enforcing that data is validly encoded. regards, tom lane
On Mon, Oct 10, 2005 at 10:27:27AM -0400, Tom Lane wrote: > No, SQL_ASCII represents the complete absence of any encoding > knowledge. With this database setting, changing client_encoding is a > complete no-op. Postgres will just absorb and re-emit strings exactly > as they were supplied originally, no matter what client_encoding is. The documents remain pretty confusing about this, assuming I still understand the current state of affairs (always a dangerous assumption). The chart in <http://developer.postgresql.org/docs/postgres/multibyte.html>, for instance, says "SQL_ASCII" supports "ASCII". I'm not sure what to do about this (I've noticed it before, and run into the same quandary). One possibility is to add something like this immediately below the chart in the page above: ---snip--- NOTE: SQL_ASCII _does not_ enforce a 7 bit restriction on insertions. SQL_ASCII does not represent a positive claim that the database knows all the characters to be 7 bit characters. It represents instead the complete absence of any encoding knowledge. Inserting high-bit characters into a database using the SQL_ASCII character set may have unpredictable results. ---snip--- -- Andrew Sullivan | ajs@crankycanuck.ca I remember when computers were frustrating because they *did* exactly what you told them to. That actually seems sort of quaint now. --J.D. Baldwin
Andrew Sullivan <ajs@crankycanuck.ca> writes: > On Mon, Oct 10, 2005 at 10:27:27AM -0400, Tom Lane wrote: >> No, SQL_ASCII represents the complete absence of any encoding >> knowledge. With this database setting, changing client_encoding is a >> complete no-op. Postgres will just absorb and re-emit strings exactly >> as they were supplied originally, no matter what client_encoding is. > The documents remain pretty confusing about this, assuming I still > understand the current state of affairs (always a dangerous > assumption). The chart in > <http://developer.postgresql.org/docs/postgres/multibyte.html>, for > instance, says "SQL_ASCII" supports "ASCII". I'm not sure what to do > about this (I've noticed it before, and run into the same quandary). You're right, the documentation basically doesn't explain this at all :-( I'll try to add something. regards, tom lane