Home > mailing lists

Re: Significance of Database Encoding - Mailing list pgsql-sql

From	PFC
Subject	Re: Significance of Database Encoding
Date	May 15, 2005 16:49:04
Msg-id	op.sqt1blpqth1vuj@localhost Whole thread
In response to	Re: Significance of Database Encoding (Rajesh Mallah <mallah_rajesh@yahoo.com>)
List	pgsql-sql

Tree view

> $ iconv -f US-ASCII -t UTF-8  < test.sql > out.sql
> iconv: illegal input sequence at position 114500
>
> Any ideas how the job can be accomplised reliably.
>
> Also my database may contain data in multiple encodings
> like WINDOWS-1251 and WINDOWS-1256 in various places
> as data has been inserted by different peoples using
> different sources and client software.
You could use a simple program like that (in Python):

output = open( "unidump", "w" )
for line in open( "your dump" ):for encoding in "utf-8", "iso-8859-15", "whatever":    try:        output.write(
unicode(line, encoding ).encode( "utf-8" ))        break    except UnicodeError:        passelse:    print "No suitable
encodingfor line..."
 
I'd say this might work, if UTF-8 cannot absorb an apostrophe inside a  
multibit character. Can it ?
Or you could do that to all your table using SELECTs but it's going to be  
painful...

pgsql-sql by date:

From: Rajesh Mallah
Date: 15 May 2005, 15:38:36
Subject: Re: Significance of Database Encoding

From: Rajesh Mallah
Date: 15 May 2005, 23:16:50
Subject: Re: Significance of Database Encoding

Re: Significance of Database Encoding - Mailing list pgsql-sql

Previous

Next