Thread: Weird encoding behavior

Weird encoding behavior

From

"Leonardo M. Ramé"

Date:

25 March 2009, 17:43:40

Hi, I'm experiencing a weird behavior when storing latin characters to a
PostgreSQL 8.3.1.876 server. The database is Latin1 encoded, and it is
working since September 2008, it wasn't updated nor replaced since its
first installation.

The weirdness of the problem is that sometimes the characters are stored
correctly, but sometimes doesn't (allways by the same program), the
field type is Varchar(30), and for example the text "NUÑEZ" is stored as
"NU?EZ".

The data comes from an external application in an XML file (also
Latin1), then, a Delphi service parses the XML and create the
Insert/Update statements to store the data in the database. I'd try to
reproduce the bug by sending XML files with 'Ñ' to the service, but it
is stored correctly.

Also, there's a front end that allows users to see/edit the data in a
user friendlier way. Again, I checked by inserting records with 'Ñ'
using this front-end, and also are stored correctly.

Does anyone faced the same problem? any workaround?

Thanks in advance,
Leonardo

Re: Weird encoding behavior

From

Martin Gainty

Date:

25 March 2009, 21:53:30

Hola

use Ñ for spanish N
http://webdesign.about.com/od/localization/l/blhtmlcodes-sp.htm

Saludos Cordiales desde EEUU!
Martin
______________________________________________
Disclaimer and confidentiality note

This message is confidential and may be privileged. If you are not the intended recipient, we kindly ask you to  please inform the sender. Any unauthorised dissemination or copying hereof is prohibited. This message serves for information purposes only and shall not have any legally binding effect. Given that e-mails can easily be subject to manipulation, we can not accept any liability for the content provided.

> From: martinrame@yahoo.com
> Subject: [GENERAL] Weird encoding behavior
> Date: Wed, 25 Mar 2009 15:52:55 -0300
> To: pgsql-general@postgresql.org
>
> Hi, I'm experiencing a weird behavior when storing latin characters to a
> PostgreSQL 8.3.1.876 server. The database is Latin1 encoded, and it is
> working since September 2008, it wasn't updated nor replaced since its
> first installation.
>
> The weirdness of the problem is that sometimes the characters are stored
> correctly, but sometimes doesn't (allways by the same program), the
> field type is Varchar(30), and for example the text "NUÑEZ" is stored as
> "NU?EZ".
>
> The data comes from an external application in an XML file (also
> Latin1), then, a Delphi service parses the XML and create the
> Insert/Update statements to store the data in the database. I'd try to
> reproduce the bug by sending XML files with 'Ñ' to the service, but it
> is stored correctly.
>
> Also, there's a front end that allows users to see/edit the data in a
> user friendlier way. Again, I checked by inserting records with 'Ñ'
> using this front-end, and also are stored correctly.
>
> Does anyone faced the same problem? any workaround?
>
> Thanks in advance,
> Leonardo
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Internet Explorer 8 – Now Available. Faster, safer, easier. Download FREE now!

Re: Weird encoding behavior

From

"Albe Laurenz"

Date:

26 March 2009, 06:54:13

Leonardo M. Ramé wrote:
> Hi, I'm experiencing a weird behavior when storing latin characters to a
> PostgreSQL 8.3.1.876 server. The database is Latin1 encoded, and it is
> working since September 2008, it wasn't updated nor replaced since its
> first installation.
>
> The weirdness of the problem is that sometimes the characters are stored
> correctly, but sometimes doesn't (allways by the same program), the
> field type is Varchar(30), and for example the text "NUÑEZ" is stored as
> "NU?EZ".
>
> The data comes from an external application in an XML file (also
> Latin1), then, a Delphi service parses the XML and create the
> Insert/Update statements to store the data in the database. I'd try to
> reproduce the bug by sending XML files with 'Ñ' to the service, but it
> is stored correctly.
>
> Also, there's a front end that allows users to see/edit the data in a
> user friendlier way. Again, I checked by inserting records with 'Ñ'
> using this front-end, and also are stored correctly.
>
> Does anyone faced the same problem? any workaround?

Well, there *must* be one client that stores wrong data...

As a first step, can you find out the code point of the character that
is represented as "?" in your E-Mail?

Something like
SELECT ascii(substr('NU?EZ', 3, 1));
except that instead of the string literal you substitute the column
containing the bad value.

Yours,
Laurenz Albe

Re: Weird encoding behavior

From

"Leonardo M. Ramé"

Date:

26 March 2009, 08:52:25

> Well, there *must* be one client that stores wrong data...

You are right, I'll ask someone in site to look at *each* client hunting
for the root of the problem. It must be a Windows Regional Settings or
something similar.

>
> As a first step, can you find out the code point of the character that
> is represented as "?" in your E-Mail?
>
> Something like
> SELECT ascii(substr('NU?EZ', 3, 1));
> except that instead of the string literal you substitute the column
> containing the bad value.
>
> Yours,
> Laurenz Albe
>

I did what you suggested, and it responds with a 63 when the string is
"NU?NEZ" and 209 when it's "NUÑEZ".

Thanks,
Leonardo.

Re: Weird encoding behavior

From

Alvaro Herrera

Date:

26 March 2009, 10:56:11

"Leonardo M. Ramé" wrote:
> Hi, I'm experiencing a weird behavior when storing latin characters to a
> PostgreSQL 8.3.1.876 server. The database is Latin1 encoded, and it is
> working since September 2008, it wasn't updated nor replaced since its
> first installation.
>
> The weirdness of the problem is that sometimes the characters are stored
> correctly, but sometimes doesn't (allways by the same program), the
> field type is Varchar(30), and for example the text "NUÑEZ" is stored as
> "NU?EZ".

You need to be sure that the client_encoding variable is set to whatever
the client is actually using.  Some client programs may be using UTF8,
and thus they should do "SET client_encoding TO utf8" on connection
start; others may be using Win-1252, Latin1, or other encodings.  If you
declare each correctly when the connection is established, the server
will do the right conversion for you automatically.  If you fail to do
that you will have a mess of characters, just like you do now.

BTW it is very advisable to update to 8.3.7.  It has a lot of bug fixes
since 8.3.1.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Weird encoding behavior

From

"Albe Laurenz"

Date:

26 March 2009, 12:07:42

Leonardo M. Ramé wrote:
> > As a first step, can you find out the code point of the character that
> > is represented as "?" in your E-Mail?
> >
> > Something like
> > SELECT ascii(substr('NU?EZ', 3, 1));
> > except that instead of the string literal you substitute the column
> > containing the bad value.
>
> I did what you suggested, and it responds with a 63 when the string is
> "NU?NEZ" and 209 when it's "NUÑEZ".

63 is indeed a question mark. Since such a conversion would not be
done by PostgreSQL, "something else" must convert Ñ to ?N *before*
it gets into PostgreSQL...

Yours,
Laurenz Albe

Re: Weird encoding behavior

From

Tom Lane

Date:

26 March 2009, 12:34:55

"Albe Laurenz" <laurenz.albe@wien.gv.at> writes:
> Leonardo M. Ram� wrote:
>> I did what you suggested, and it responds with a 63 when the string is
>> "NU?NEZ" and 209 when it's "NU�EZ".

> 63 is indeed a question mark. Since such a conversion would not be
> done by PostgreSQL, "something else" must convert � to ?N *before*
> it gets into PostgreSQL...

Yeah, I think this destroys the theory that it's due to a wrong choice
of client_encoding setting.  What you'd be likely to get from that is
a "character can't be translated" kind of error, not silent substitution
of a question mark.  The damage must be getting done on the client side.

            regards, tom lane