> the problem is: you'll get this four byte sequence '\000' _instead_ of
> NUL-byte anyway.
http://www.postgresql.org/docs/8.1/interactive/datatype-binary.html says :
"A binary string is a sequence of octets (or bytes). Binary strings are
distinguished from character strings by two characteristics: First, binary
strings specifically allow storing octets of value zero and other
"non-printable" octets (usually, octets outside the range 32 to 126).
Character strings disallow zero octets, and also disallow any other octet
values and sequences of octet values that are invalid according to the
database's selected character set encoding. Second, operations on binary
strings process the actual bytes, whereas the processing of character
strings depends on locale settings. In short, binary strings are
appropriate for storing data that the programmer thinks of as "raw bytes",
whereas character strings are appropriate for storing text."
That's the whole point of escaping, so that data generated by COPY, for
instance, does not include any funky characters, including the \0 (zero)
character, so that you can use any standard tool on it, including grep...
I LIKE the fact that TEXT refuses invalid characters. It helps find bugs
in my applications, like when I forget to process some 8-bit string before
inserting it in my database which uses UNICODE. I definitely prefer an
error than finding a month later that half my data has had all its
accented characters screwed up.
in psql, you have to use the escaped syntax :
SELECT length('\\000'::BYTEA), length('\\001'::BYTEA),
length('\\000'::TEXT), length('\\001'::TEXT); length | length | length | length
--------+--------+--------+-------- 1 | 1 | 4 | 4
Your client library should take care of escaping and de-escaping. Here, in
python :
>>> cursor.execute( "SELECT %s,%s::BYTEA", ("this is a normal string",
>>> psycopg2.Binary( ">\x00< this is a string with a zero byte" ),) )
>>> r = cursor.fetchone()
>>> print r
['this is a normal string', <read-only buffer for 0x2aaaaab32210, size 37,
offset 0 at 0x2aaaace27c70>]
>>> print str(r[1])
> < this is a string with a zero byte
>>> print repr(str(r[1]))
'>\x00< this is a string with a zero byte'
>>> ord(r[1][1])
0
Note : \x00 is python's escaping for the null byte