Re: SQL-ASCII database cleanup - Mailing list pgsql-general

From Susan Cassidy
Subject Re: SQL-ASCII database cleanup
Date
Msg-id 3A51F387FE0CC74D80FA60C146987F2501C3D4E01298@oc-exchange1.stbernard.com
Whole thread Raw
In response to SQL-ASCII database cleanup  (Mike Blackwell <mike.blackwell@rrd.com>)
List pgsql-general

Use the Encode module to test/convert back and forth between UTF8 characters and bytes for the SQL ASCII database.  Assuming the input is already UTF-8:

 

use Encode qw(:all);

# connect to db, prepare insert statement, etc.

  my $bytes = encode('utf8', $utf8_text);

  $sth->execute($bytes, $i) or errexit("execute of insert into public_suffixes tbl failed: ", $DBI::errstr);

 

If your input is not already UTF-8, you will have to use decode in an eval statement to convert to utf-8, then check for failure before re-converting and inserting into the database.  Or something similar.

 

This seems to work for me.  When I need to pull the data back out of the database, I have to reconvert from the byte string into UTF-8 characters before displaying the output.

 

Susan


From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Mike Blackwell
Sent: Thursday, July 21, 2011 7:49 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] SQL-ASCII database cleanup

 

I have an older database that was created with SQL-ASCII encoding.  Over time users have managed to enter all manner of interesting characters, mostly via cut and paste from Windows documents.  I'm attempting to clean up and eventually the database to UTF8.  I've managed to find most of the data that won't nicely convert from some-random-encoding to UTF8, but it seems the users are entering it as fast as I can find it. Is there a way the incoming data from a Perl CGI web application can be automatically limited to UTF8 even though the database is SQL-ASCII?

 

 

Mike

pgsql-general by date:

Previous
From: mdxxd
Date:
Subject: Re: Building an home computer for best Poker Tracker performance
Next
From: Justin Pasher
Date:
Subject: PostgreSQL 8.4.8 RPM/SRPM for RHEL4