Re: Encoding-related errors when moving from 7.3 to 8.0.1 - Mailing list pgsql-general

From Alvaro Herrera
Subject Re: Encoding-related errors when moving from 7.3 to 8.0.1
Date
Msg-id 20050321203832.GA20621@dcc.uchile.cl
Whole thread Raw
In response to Re: Encoding-related errors when moving from 7.3 to 8.0.1  (Carlos Moreno <moreno@mochima.com>)
Responses Re: Encoding-related errors when moving from 7.3 to 8.0.1
List pgsql-general
On Sun, Mar 20, 2005 at 10:02:24AM -0500, Carlos Moreno wrote:

Carlos,

> So, our system (CGI's written in C++ running on a Linux server)
> simply takes whatever the user gives (properly validated and
> escaped) and throws it in the database.  We've never encountered
> any problem  (well, or perhaps it's the opposite?  Perhaps we've
> always been living with the problem without realizing it?)

The latter, I think.  The problem is character recoding.  If your old
system has been running with encoding SQL_ASCII, then no recoding ever
takes place.  If you are now using UTF8 or latin1 (say) as server
encoding, then as soon as the client is using a different encoding,
there should be conversion in order to make the new data correct w.r.t.
the server encoding.  If the wrong conversion takes place, or if no
conversion takes place, you may either end up with invalid data, or
have the server reject your input (as was this case.)

So the moral of the story seems to be that yes, you need to make each
application issue the correct client_encoding before entering any data.
You can attach it to the user or database, by issuing ALTER USER (resp.
DATABASE).  But if you are using a web interface, where the user can
enter data in either win1252 or latin1 encoding (or whatever) depending
on the environment, then I'm not sure what you should do.  One idea
would be "do nothing," but that seems very invalid-data-prone.  Another
idea would be having the user select an encoding (and maybe display the
data to them after the recoding has taken place so they can correct it
in case they got it wrong.)  This seems messy and likely to upset your
users.

Someone else may have better advise for you on this.  I haven't really
worked with these things.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"I can't go to a restaurant and order food because I keep looking at the
fonts on the menu.  Five minutes later I realize that it's also talking
about food" (Donald Knuth)

pgsql-general by date:

Previous
From: Michael Fuhr
Date:
Subject: Re: Copression
Next
From: Eric Parusel
Date:
Subject: Re: how do I clear a page, or set an item in a page to