Re: encoding question - Mailing list pgsql-admin

From Aftab Alam
Subject Re: encoding question
Date
Msg-id 003501c64d62$1dee5450$ec1010ac@aftabn463
Whole thread Raw
In response to Re: encoding question  (Ivo Rossacher <rossacher@bluewin.ch>)
List pgsql-admin
unsubscribe

Regards,





-----Original Message-----
From: pgsql-admin-owner@postgresql.org
[mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Ivo Rossacher
Sent: Wednesday, March 22, 2006 3:08 AM
To: pgsql-admin@postgresql.org
Cc: Ben K.
Subject: Re: [ADMIN] encoding question


Am Dienstag, 21. März 2006 21.14 schrieb Ben K.:
> I just wanted to add that when I created the same database with -E
> SQL_ASCII on my linux box, the dump was loaded fine. I created another
> database without -E and observed the same invalid encoding problem.

This is not really surprising since SQL_ASCII does not check the coding
unlike
all other encodings.

>
> On the face value this seems to solve the problem at least superficially.

The more interesting question is, what is your application doing with the
non
ASCII characters within your database. The answer to this question will tell
you what the correct contents would be.

>
> I'd like to check the data validity, and the easiest way seems to be to
> dump the data again from the linux box and compare with the original.

Your application defines what is valid. Even if you know that the dump would
be the same it would not tell you anything about the validity of the data.
So
the better check would be to check with the application(s) connecting to
both
servers and work with some records which do contain non ASCII characters.
If both servers do give the same results with your application(s) you most
possible got the coding right.

>
> Is there a way to compare between any two databases online? (like running
> a script checking row counts and schema) If I run crc on the concat of all
> fields in a row, and if the crc matches, would it be reasonably
> sufficient? Is there a stronger validation method?

Since any general method for comparing database contents (I don't know of
such
a tool) would use it's own drivers and setup, it will probably not get the
same result as the test with your client applications.

The bottom line is that only a encoding set at the server level will make
clear what the meaning of non ASCII characters is. The server can then deal
with the conversion between the server and the client encoding so that the
different clients can work even with different internal encodings.
With SQL_ASCII only the client application knows. This kind of setup needs a
lot of care during setup to get consistent data, especially when several
different applications are used.
The drawback of selecting an encoding is a little performance penalty.
However
in my databases I could not measure any difference. I have to say here that
my data does not have a lot of strings in. So it is definitly not a good
test
case for this. Since there are several different clients with different
languages using my databases, I do use unicode as encoding. This works
without any problem for me.

Best regards
Ivo

>
>
> Thanks.
>
> Ben K.
> Developer
> http://benix.tamu.edu
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend


pgsql-admin by date:

Previous
From: "Aftab Alam"
Date:
Subject: Re: encoding question
Next
From: KALODIKIS THODORIS
Date:
Subject: unsubscribe