Re: Reinterpreting BYTEA as TEXT, converting BYTEA to TEXT - Mailing list pgsql-general

From Vlad Romascanu
Subject Re: Reinterpreting BYTEA as TEXT, converting BYTEA to TEXT
Date
Msg-id AANLkTinZCsMhbjCgRnEX=-pVf_YwMncOWDbFMu-==Or9@mail.gmail.com
Whole thread Raw
In response to Re: Reinterpreting BYTEA as TEXT, converting BYTEA to TEXT  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-general
Aha! :)  Why haven't I used convert_from() -- my ultimate abominable
goal is to do an in-place migration of an SQL_ASCII database
(LC_TYPE/COLLATION="C") to UTF8 (LC_TYPE/COLLATION="C"), where the
string data in the SQL_ASCII database is in some encoding, say LATIN1,
and where the bulk of the data is ASCII-only (i.e. don't want to
"pg_dump -E sqlasciidb | pg_restore -d utf8db" when I can potentially
get away with a fraction of the I/O and disk space requirements.)  So
the plan was to:

UPDATE tbl SET str=convert(str::bytea, 'LATIN1',
'UTF8')::my_varlena::text WHERE str::bytea<>convert(str::bytea,
'LATIN1', 'UTF8');
UPDATE pg_database SET encoding=pg_char_to_encoding('UTF8') WHERE
datname='sqlasciidb';
-- close all connections/backends connected to sqlasciidb
-- reopen connection
-- reindex

Of course, I could theoretically UPDATE pg_database first, restart the
backends, and then use convert_from(), but even though LC_TYPE="C" I
sort of preferred the strings to be UTF8-consistent prior to
restarting a backend on the UTF8'ed database.

So I guess the ultimate question therefore is: what is wrong, if
anything, with the above hack? :)

Thanks!
V.

On Wed, Mar 9, 2011 at 5:23 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> I think convert_from is a little more direct:
> convert_from(string bytea, src_encoding name)

pgsql-general by date:

Previous
From: "mark"
Date:
Subject: sort mem: size in RAM vs size on Disk
Next
From: Merlin Moncure
Date:
Subject: Re: Using bytea field...