Re: finding bogus UTF-8 - Mailing list pgsql-general

From dennis jenkins
Subject Re: finding bogus UTF-8
Date
Msg-id AANLkTikW0Yb8ZyULMVLgdLsv-A+6O_TF3rHmtdxX8mUG@mail.gmail.com
Whole thread Raw
In response to finding bogus UTF-8  (Scott Ribe <scott_ribe@elevated-dev.com>)
List pgsql-general
On Thu, Feb 10, 2011 at 1:02 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:
> I know that I have at least one instance of a varchar that is not valid UTF-8, imported from a source with errors
(AMACPT files, actually) before PG's checking was as stringent as it is today. Can anybody suggest a query to find such
values?

If you know which table and column the data is in, you can also do
something like this:

(I typed this up without checking the syntax of it.  The basic idea is
to cast the column as bytea, encode with the 'escape' method, then
grep for back-slashes).

select * from bad_table where regexp_match (encode (bad_column::bytea,
'escape'), '\\\\'));

pgsql-general by date:

Previous
From: dennis jenkins
Date:
Subject: Re: finding bogus UTF-8
Next
From: Andy Colson
Date:
Subject: Re: Schema version control