> > I have found (and confirmed by studying the PostgreSQL
> > source) that to reliably insert arbitrary binary data into a
> > bytea column there are only 3 characters which need to be
> > escaped: \000, \047 ( ' ), \134 ( \ ). Here's the PHP
> > function that I've been using:
>
>
> Postgresl, in treating things as strings, handles \000 as NULL as an end
> of string.
>
> select 'abc\000def' as hehehe;
> hehehe
> --------
> abc
> (1 row)
>
>
> How do you get the data back from a query, with all the nulls in it?
Try this:
test=# select 'abc\\000def' as hehehe;
hehehe
------------
abc\000def
(1 row)
Notice that in the PHP function:
{
if (ord($ct[$i]) == 0)
$buf .= "\\\\000";
The reason for 4 '\'s (\\\\) in the function is that PHP interprets '\\' as
an escaped '\', and turns the result into '\\000'. Postgres then sees the
'\\' as an escaped '\' and converts this into '\000' ( I think this happens
in the backend, not sure exactly which stage, but prior to byteain()).
Finally, byteain() looks for '\###', and interprets it as an escaped octal
value, and converts it into a single binary character.
On the way back to psql (or whatever your client app is), the binary data
stored in the heap is run through byteaout() which converts "non-printable"
characters back to escaped octal equivilents before sending the data to the
client (i.e. this all happens in the backend). The client receives only the
escaped version of the data, hence the '\000' in the example above.
In 7.2devel there are two new functions (actually, I think they exist in
release versions as part of pgcrypto) called encode and decode. So you can
use encode to convert the binary data to either hex or base64 in the
backend, before it gets sent to the client. This may be more convenient for
you than the octal escaped form if your client already has hex-to-binary or
base64-to-binary functions available to it.
As I said in the previous post, I think if your client uses a binary cursor
to retrieve data, the bytea data will be returned to the client in its
original binary form (i.e. unescaped and unencoded), which avoids
unnecessary (and cpu cycle wasting) conversions. I'll be trying this myself
soon, so I guess I'll find out ;)
-- Joe