Thanks Tom for the reply!
I read once more the doc and now I better understand the "high-bit-set
value" part ;o)
myDatabaseName=# select encode('\x00017F80', 'escape');
encode
------------------
\000\x01\x7F\200
If I understand correctly, with the input "\x00017F80", I get the
outputted value above because:
- "00" is converted to "\000"
- "01" and "7F" get converted to "\x01" and "\x7F" respectively as they
are not 0 or a high-bit-set value
- "80" is converted to "\200" since it is a high-bit-set value
I remember getting confused by the fact I got hexadecimal values in
output and I didn't really get the "high-bit-set" part of the doc.
Do you know why there is this distinction between high-bit-set values
and other non-printable characters ?
Also, I still have 2 more questions.
First, the following is strange: I cannot decode what the encode method
returned
myDatabaseName=# select encode('\x00017F80', 'escape');
encode
------------------
\000\x01\x7F\200
(1 row)
myDatabaseName=# select decode('\000\x01\x7F\200', 'escape');
ERROR: invalid input syntax for type bytea
Second, as I was poking around the code, I found out about the
"bytea_output". If I set it to "escape", I still get hexadecimals. Is
that expected ?
myDatabaseName=# set bytea_output to escape;
SET
myDatabaseName=# select encode('\x00017F80', 'escape');
encode
------------------
\000\x01\x7F\200
(1 row)
Cheers,
On Mon, Jan 27, 2020 at 06:05:45PM -0500, Tom Lane wrote:
> PG Bug reporting form <
noreply@postgresql.org> writes:
> > From the documentation [0] about the encode function, the "escape" format
> > should "convert zero bytes and high-bit-set bytes to octal sequences (\nnn)
> > and doubles backslashes."
> > However, executing "select encode(E'aaa\bccc', 'escape');" outputs
> > "aaa\x08ccc", although according to the documentation I should get
> > "aaa\010ccc".
>
> No, I don't think so. The \b gives rise to a byte with hex value 08
> (that is, control-H or backspace) in the E'' literal, which converts
> to the same byte value in the bytea value that gets passed to
> encode(). Since that's not either a zero or a high-bit-set value,
> encode() just repeats it literally in the text result, and you end
> up with the same thing as if you'd just done
>
> =# select E'aaa\bccc'::text;
> text
> ------------
> aaa\x08ccc
> (1 row)
>
> I think it must be psql itself that's choosing to represent the
> backspace as \x08, because nothing in the backend does that.
> (pokes around ... yeah, it's pg_wcsformat() that's doing it)
>
> You could certainly make an argument that encode() ought to
> backslashify all ASCII control characters, not only \0. But
> it's behaving as documented, AFAICS.
>
> regards, tom lane
--
Campinas Stéphane