Re: Encoding-related errors when moving from 7.3 to 8.0.1 - Mailing list pgsql-general

From Carlos Moreno
Subject Re: Encoding-related errors when moving from 7.3 to 8.0.1
Date
Msg-id 423F918F.3030505@mochima.com
Whole thread Raw
In response to Re: Encoding-related errors when moving from 7.3 to 8.0.1  (Alvaro Herrera <alvherre@dcc.uchile.cl>)
Responses Re: Encoding-related errors when moving from 7.3 to 8.0.1
List pgsql-general
Thanks again, Alvaro!

Alvaro Herrera wrote:

>>So, our system (CGI's written in C++ running on a Linux server)
>>simply takes whatever the user gives (properly validated and
>>escaped) and throws it in the database.  We've never encountered
>>any problem  (well, or perhaps it's the opposite?  Perhaps we've
>>always been living with the problem without realizing it?)
>
> The latter, I think.  The problem is character recoding.  If your old
> system has been running with encoding SQL_ASCII, then no recoding ever
> takes place.  If you are now using UTF8 or latin1 (say) as server
> encoding, then as soon as the client is using a different encoding,
> there should be conversion in order to make the new data correct w.r.t.
> the server encoding.  If the wrong conversion takes place, or if no
> conversion takes place, you may either end up with invalid data, or
> have the server reject your input (as was this case.)

This makes sense to me, yes.  The reason why I'm a bit lost is that
we never did anything whatsoever with respect to encoding.  Oddly
enough, I couldn't find much about this in the docs.

I see references to it in the runtime configuration docs (the part
where they describe the postgres.conf file).  There's one line,
commented out, where they set (as an example), the client_encoding
to sql_ascii, and a comment to the end of that line says "actually,
it defaults to the server encoding").

I just found out that in the create database statement, one of the
options specifies the encoding using for that database.

I guess what changed from version 7.4.x to 8.0 is that the default
server_encoding changed?

This means that a temporary solution (or rather, a temporary patch)
would be to create the database specifying the right server_encoding
to match what I currently have on my system?  (I wouldn't want to do
that if it is nothing more than a patch to keep hiding the problem)

> So the moral of the story seems to be that yes, you need to make each
> application issue the correct client_encoding before entering any data.
> You can attach it to the user or database, by issuing ALTER USER (resp.
> DATABASE).  But if you are using a web interface, where the user can
> enter data in either win1252 or latin1 encoding (or whatever) depending
> on the environment, then I'm not sure what you should do.

This is indeed the case;  and "do nothing" is what we have always
done with respect to this issue...  Why has it been so long without
us realizing that there was a hidden problem, I really don't know.
(and we do have users with plenty of "weird" characters -- accent
aigue, grave, circumflex, tilde, dieresis, etc. -- and they have
always worked).

I'm so lost!  :-(

BTW, the correct e-mail to pass through the anti-spam filter is
my first name, followed by a dot, followed by my last name (the
rest after the @ is the same)

Thanks again for your message!

Carlos
--

pgsql-general by date:

Previous
From: "Ed L."
Date:
Subject: Measuring Query Performance
Next
From: Edwin New
Date:
Subject: Clearing locks