On Thu, Jan 25, 2007 at 02:28:38PM -0500, Michael Artz wrote:
> Perhaps my understanding of the 'encode' function is incorrect, but I
> was under the impression that I could do something like:
>
> SELECT lower(encode(bytes, 'escape')) FROM mytable;
>
> as it sounded like (from the manual) that 'encode' would return valid
> ASCII, with all the non-ascii bytes hex escaped.
The documentation for encode() does give that impression: "Encode
binary string to ASCII-only representation. Supported types are:
base64, hex, escape." However, the source code for esc_encode()
in src/backend/utils/adt/encode.c says and does otherwise:
* Only two characters are escaped:
* \0 (null) and \\ (backslash)
> When I have the byte 0x8a, however, I get the error:
>
> ERROR: invalid byte sequence for encoding "UTF8": 0x8a
Since encode() returns text and doesn't escape non-ASCII characters,
all of the original binary data will be treated as though it's text
in the database's encoding. If the data contains byte sequences
that aren't valid in that encoding then you get the above error.
> I have the sneaking suspicion that I am missing something, so please
> correct me if I am wrong. If I am wrong, is there a better way to
> lowercase all the ascii characters in a bytea string?
What are you trying to do? What is the binary data and why are you
treating it (or part of it) as though it's text? Do you want the
end result to be text with escape sequences or do you want to convert
it back to bytea?
Something like this might work:
SELECT lower(textin(byteaout(bytes))) FROM mytable;
To turn the result back into bytea:
SELECT decode(lower(textin(byteaout(bytes))), 'escape') FROM mytable;
--
Michael Fuhr