Re: Best practices for moving UTF8 databases - Mailing list pgsql-general

From Justin Pasher
Subject Re: Best practices for moving UTF8 databases
Date
Msg-id 4A673D49.7000905@newmediagateway.com
Whole thread Raw
In response to Re: Best practices for moving UTF8 databases  (Phoenix Kiula <phoenix.kiula@gmail.com>)
List pgsql-general
Phoenix Kiula wrote:
> I tried this. Get an error.
>
>
> mypg=# select * from interesting WHERE NOT description ~ ( '^('||
> mypg(#    $$[\09\0A\0D\x20-\x7E]|$$||               -- ASCII
> mypg(#    $$[\xC2-\xDF][\x80-\xBF]|$$||             -- non-overlong 2-byte
> mypg(#     $$\xE0[\xA0-\xBF][\x80-\xBF]|$$||        -- excluding overlongs
> mypg(#    $$[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}|$$||  -- straight 3-byte
> mypg(#     $$\xED[\x80-\x9F][\x80-\xBF]|$$||        -- excluding surrogates
> mypg(#     $$\xF0[\x90-\xBF][\x80-\xBF]{2}|$$||     -- planes 1-3
> mypg(#    $$[\xF1-\xF3][\x80-\xBF]{3}|$$||          -- planes 4-15
> mypg(#     $$\xF4[\x80-\x8F][\x80-\xBF]{2}$$||      -- plane 16
> mypg(#   '*)$' )
> mypg-#
> mypg-#   ;
> ERROR:  invalid regular expression: quantifier operand invalid
>

If you really don't want to go the "pg_dump -> iconv (remove invalid
characters) -> diff the dump files" route, a stored procedure that
searches for invalid characters was posted a few years back that
attempts to find the invalid characters.

http://archives.postgresql.org/pgsql-hackers/2005-12/msg00511.php

http://svana.org/kleptog/pgsql/utf8_verify.sql

--
Justin Pasher

pgsql-general by date:

Previous
From: tomrevam
Date:
Subject: Re: Checkpoint Tuning Question
Next
From: Robert James
Date:
Subject: Can LIKE under utf8 use INDEXes?