Home > mailing lists

Re: encoding question - Mailing list pgsql-admin

From	Ivo Rossacher
Subject	Re: encoding question
Date	March 21, 2006 17:38:21
Msg-id	200603212238.04111.rossacher@bluewin.ch Whole thread Raw
In response to	Re: encoding question ("Ben K." <bkim@coe.tamu.edu>)
Responses	Re: encoding question
List	pgsql-admin

Tree view

Am Dienstag, 21. März 2006 21.14 schrieb Ben K.:
> I just wanted to add that when I created the same database with -E
> SQL_ASCII on my linux box, the dump was loaded fine. I created another
> database without -E and observed the same invalid encoding problem.

This is not really surprising since SQL_ASCII does not check the coding unlike
all other encodings.

>
> On the face value this seems to solve the problem at least superficially.

The more interesting question is, what is your application doing with the non
ASCII characters within your database. The answer to this question will tell
you what the correct contents would be.

>
> I'd like to check the data validity, and the easiest way seems to be to
> dump the data again from the linux box and compare with the original.

Your application defines what is valid. Even if you know that the dump would
be the same it would not tell you anything about the validity of the data. So
the better check would be to check with the application(s) connecting to both
servers and work with some records which do contain non ASCII characters.
If both servers do give the same results with your application(s) you most
possible got the coding right.

>
> Is there a way to compare between any two databases online? (like running
> a script checking row counts and schema) If I run crc on the concat of all
> fields in a row, and if the crc matches, would it be reasonably
> sufficient? Is there a stronger validation method?

Since any general method for comparing database contents (I don't know of such
a tool) would use it's own drivers and setup, it will probably not get the
same result as the test with your client applications.

The bottom line is that only a encoding set at the server level will make
clear what the meaning of non ASCII characters is. The server can then deal
with the conversion between the server and the client encoding so that the
different clients can work even with different internal encodings.
With SQL_ASCII only the client application knows. This kind of setup needs a
lot of care during setup to get consistent data, especially when several
different applications are used.
The drawback of selecting an encoding is a little performance penalty. However
in my databases I could not measure any difference. I have to say here that
my data does not have a lot of strings in. So it is definitly not a good test
case for this. Since there are several different clients with different
languages using my databases, I do use unicode as encoding. This works
without any problem for me.

Best regards
Ivo

>
>
> Thanks.
>
> Ben K.
> Developer
> http://benix.tamu.edu
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

pgsql-admin by date:

From: "Ben K."
Date: 21 March 2006, 16:17:46
Subject: Re: encoding question

From: "Sriram Dandapani"
Date: 21 March 2006, 18:18:11
Subject: out of memory error with large insert

Re: encoding question - Mailing list pgsql-admin

Previous

Next