Thread: From ASCII to UTF-8
As part of a migration from 8.0 to 8.1 i want to convert the data from ASCII to UTF-8. I dumped the database with pg_dump (8.0) and tried to convert it with iconv, but it shows an error: $ iconv -t ASCII -t UTF-8 fahstats_data.dump -o fahstats_data_utf-8.dump iconv: illegal input sequence at position 71407864 That position contains the decimal value 233: $ od -A d -j 71407864 -N 1 -t u1 fahstats_data.dump 71407864 233 71407865 I could use pg_dump -E in 8.1 but it is in another machine with ADSL connection and the dump size is 1.8GB. It would take more than 12 hours. How to install pg_dump 8.1 only? I tried to copy the executable and the libs but it did not work. Regards, Clodoaldo Pinto
Clodoaldo Pinto wrote: > As part of a migration from 8.0 to 8.1 i want to convert the data > from ASCII to UTF-8. ASCII is a subset of UTF-8, so if you really wanted to do that you wouldn't need to do anything. > I dumped the database with pg_dump (8.0) and tried to convert it with > iconv, but it shows an error: > > $ iconv -t ASCII -t UTF-8 fahstats_data.dump -o ^^ ^^ Mistake? > fahstats_data_utf-8.dump iconv: illegal input sequence at position > 71407864 > > That position contains the decimal value 233: Well, that is not an ASCII character, so you need to use a different source encoding for iconv. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Clodoaldo Pinto wrote: > As part of a migration from 8.0 to 8.1 i want to convert the data from > ASCII to UTF-8. > > I dumped the database with pg_dump (8.0) and tried to convert it with > iconv, but it shows an error: > > $ iconv -t ASCII -t UTF-8 fahstats_data.dump -o fahstats_data_utf-8.dump > iconv: illegal input sequence at position 71407864 > > That position contains the decimal value 233: > > $ od -A d -j 71407864 -N 1 -t u1 fahstats_data.dump > 71407864 233 > 71407865 > > I could use pg_dump -E in 8.1 but it is in another machine with ADSL > connection and the dump size is 1.8GB. It would take more than 12 > hours. > > How to install pg_dump 8.1 only? I tried to copy the executable and > the libs but it did not work. > from what you wrote it seems that your dump contains non-ascii characters... probably somehow non-ascii data got into your database. like iso-8859-1 or iso-8859-15 or cp-1252 (if you are using western-european stuff). in those encodings, 255 = é. maybe you could try something like: iconv -f ISO-8859-1 -t UTF-8 .... please note that a conversion FROM these encodings always succeeds. so a success does not mean that you guessed the charset correctly. you still will havet to check manually if the resulting document contains the correct data. gabor