Re: One-off attempt at catalog hacking to turn bytea column into text - Mailing list pgsql-general

From Noah Misch
Subject Re: One-off attempt at catalog hacking to turn bytea column into text
Date
Msg-id 20110511081108.GA26552@tornado.gateway.2wire.net
Whole thread Raw
In response to One-off attempt at catalog hacking to turn bytea column into text  (Vlad Romascanu <vromascanu@accurev.com>)
List pgsql-general
On Tue, May 10, 2011 at 04:31:37PM -0400, Vlad Romascanu wrote:
> As a one-off attempt to change a large table's 'bytea' column to
> 'text' with minimal I/O (where the 'bytea' contents is already valid
> UTF8 and the database encoding is also UTF8, and the column is not
> part of any index or anything involving collation), how unsafe is the
> following?
>
> UPDATE pg_attribute SET atttypid='text'::regtype::oid WHERE
> attrelid=('schema_name.table_name')::regclass AND attname='col_name'
> AND atttypid='bytea'::regtype::oid;

Do a "LOCK TABLE schema_name.table_name" earlier in the same transaction.

Any of the following potentially complicates things:
- data is not valid text in the server encoding, including NUL bytes
- use in an index (you mentioned this does not apply)
- use in a CHECK or FOREIGN KEY constraint
- default expression on the column
- views or rules referencing the column
- referenced directly in a trigger definition (UPDATE OF col_name, WHEN ...)

A useful, though not foolproof, way to check whether you've snagged on any of
those is to take schema dumps of a test database after (a) doing the change
normally and (b) doing it this way, then compare.

> Additionally, if the 'bytea' happenned to also explicitly contain a
> trailing NUL prior to the "conversion" (i.e. after the hack, the last
> byte in the 'text' value would be NUL), would there be any obvious
> problems with the above hack?

Yes; this will break things in the general case.  Two text datums that differ
only in the presence of this trailing NUL will compare as unequal by texteq().
The octet_length() function will include the NUL byte in its count.  If you use
the column in very restricted ways, you might end up okay.  Note that you could
avoid these problems by updating the bytea columns to remove these NUL bytes
before making the catalog update.  It could still be much I/O, but at least you
wouldn't have a table lock while doing it.

nm

pgsql-general by date:

Previous
From: mephysto
Date:
Subject: Custom types and JDBC
Next
From: Tim Uckun
Date:
Subject: Postgres federation