Hello.
Thanks a lot! Using the script, I found the entries with invalid UTF-8
and fixed them in the database.
- Tore.
On Sep 19, 2008, at 15:37 , Mikel Lindsaar wrote:
> On Fri, Sep 19, 2008 at 6:29 PM, Tore Halset <halset@pvv.ntnu.no>
> wrote:
>> Looks like I have managed to insert an illegal character into the
>> main
>> system that does not conform to UTF-8. Anything I can and should do
>> to work
>> around this issue?
>
> I have had the same problem previously and after a lot of help from
> Tom Lane basically came up to the following...
>
> You need to basically dump your table out (or a subset containing the
> row ID and column that would have the bad data) in plain text and then
> parse it with a script to detect invalid UTF-8 sequences, then find
> what rows the bad data is in and go and fix it.
>
> It is either that or you drop the data inserting some other character.
> But this has obvious drawbacks.
>
> I wrote a short ruby script that goes through a dumped file line by
> line and puts each line through Iconv to parse it from UTF-8 to UTF-8,
> if it fails it dumps the offending line to a log file.
>
> A ruby script that would just print the offending row would go
> something like this:
>
> require 'iconv'
> File.read(ARGV[0]).each do |line|
> begin
> Iconv.iconv('UTF-8', 'UTF-8', line)
> rescue
> puts "Failed: #{line}"
> end
> end
>
> Save that in a file (find_invalid_utf8.rb) then run it with:
>
> $ ruby find_invalid_utf8.rb my_dumped_table.csv
>
> It's not pretty, and just dumps the raw output to the screen, but it
> might do for you.
>
> --
> http://lindsaar.net/
> Rails, RSpec and Life blog....
>