Re: Encoding-related errors when moving from 7.3 to 8.0.1 - Mailing list pgsql-general
From | Carlos Moreno |
---|---|
Subject | Re: Encoding-related errors when moving from 7.3 to 8.0.1 |
Date | |
Msg-id | 423F918F.3030505@mochima.com Whole thread Raw |
In response to | Re: Encoding-related errors when moving from 7.3 to 8.0.1 (Alvaro Herrera <alvherre@dcc.uchile.cl>) |
Responses |
Re: Encoding-related errors when moving from 7.3 to 8.0.1
|
List | pgsql-general |
Thanks again, Alvaro! Alvaro Herrera wrote: >>So, our system (CGI's written in C++ running on a Linux server) >>simply takes whatever the user gives (properly validated and >>escaped) and throws it in the database. We've never encountered >>any problem (well, or perhaps it's the opposite? Perhaps we've >>always been living with the problem without realizing it?) > > The latter, I think. The problem is character recoding. If your old > system has been running with encoding SQL_ASCII, then no recoding ever > takes place. If you are now using UTF8 or latin1 (say) as server > encoding, then as soon as the client is using a different encoding, > there should be conversion in order to make the new data correct w.r.t. > the server encoding. If the wrong conversion takes place, or if no > conversion takes place, you may either end up with invalid data, or > have the server reject your input (as was this case.) This makes sense to me, yes. The reason why I'm a bit lost is that we never did anything whatsoever with respect to encoding. Oddly enough, I couldn't find much about this in the docs. I see references to it in the runtime configuration docs (the part where they describe the postgres.conf file). There's one line, commented out, where they set (as an example), the client_encoding to sql_ascii, and a comment to the end of that line says "actually, it defaults to the server encoding"). I just found out that in the create database statement, one of the options specifies the encoding using for that database. I guess what changed from version 7.4.x to 8.0 is that the default server_encoding changed? This means that a temporary solution (or rather, a temporary patch) would be to create the database specifying the right server_encoding to match what I currently have on my system? (I wouldn't want to do that if it is nothing more than a patch to keep hiding the problem) > So the moral of the story seems to be that yes, you need to make each > application issue the correct client_encoding before entering any data. > You can attach it to the user or database, by issuing ALTER USER (resp. > DATABASE). But if you are using a web interface, where the user can > enter data in either win1252 or latin1 encoding (or whatever) depending > on the environment, then I'm not sure what you should do. This is indeed the case; and "do nothing" is what we have always done with respect to this issue... Why has it been so long without us realizing that there was a hidden problem, I really don't know. (and we do have users with plenty of "weird" characters -- accent aigue, grave, circumflex, tilde, dieresis, etc. -- and they have always worked). I'm so lost! :-( BTW, the correct e-mail to pass through the anti-spam filter is my first name, followed by a dot, followed by my last name (the rest after the @ is the same) Thanks again for your message! Carlos --
pgsql-general by date: