Re: encoding question - Mailing list pgsql-admin
From | Ivo Rossacher |
---|---|
Subject | Re: encoding question |
Date | |
Msg-id | 200603212238.04111.rossacher@bluewin.ch Whole thread Raw |
In response to | Re: encoding question ("Ben K." <bkim@coe.tamu.edu>) |
Responses |
Re: encoding question
|
List | pgsql-admin |
Am Dienstag, 21. März 2006 21.14 schrieb Ben K.: > I just wanted to add that when I created the same database with -E > SQL_ASCII on my linux box, the dump was loaded fine. I created another > database without -E and observed the same invalid encoding problem. This is not really surprising since SQL_ASCII does not check the coding unlike all other encodings. > > On the face value this seems to solve the problem at least superficially. The more interesting question is, what is your application doing with the non ASCII characters within your database. The answer to this question will tell you what the correct contents would be. > > I'd like to check the data validity, and the easiest way seems to be to > dump the data again from the linux box and compare with the original. Your application defines what is valid. Even if you know that the dump would be the same it would not tell you anything about the validity of the data. So the better check would be to check with the application(s) connecting to both servers and work with some records which do contain non ASCII characters. If both servers do give the same results with your application(s) you most possible got the coding right. > > Is there a way to compare between any two databases online? (like running > a script checking row counts and schema) If I run crc on the concat of all > fields in a row, and if the crc matches, would it be reasonably > sufficient? Is there a stronger validation method? Since any general method for comparing database contents (I don't know of such a tool) would use it's own drivers and setup, it will probably not get the same result as the test with your client applications. The bottom line is that only a encoding set at the server level will make clear what the meaning of non ASCII characters is. The server can then deal with the conversion between the server and the client encoding so that the different clients can work even with different internal encodings. With SQL_ASCII only the client application knows. This kind of setup needs a lot of care during setup to get consistent data, especially when several different applications are used. The drawback of selecting an encoding is a little performance penalty. However in my databases I could not measure any difference. I have to say here that my data does not have a lot of strings in. So it is definitly not a good test case for this. Since there are several different clients with different languages using my databases, I do use unicode as encoding. This works without any problem for me. Best regards Ivo > > > Thanks. > > Ben K. > Developer > http://benix.tamu.edu > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org
pgsql-admin by date: