BUG #16236: Invalid escape encoding - Mailing list pgsql-bugs

From Stéphane Campinas
Subject BUG #16236: Invalid escape encoding
Date
Msg-id CAAyNevaL3vLCHVai1vbJQnKp1KY1pMdDchsgB3pFnSPyRoccgw@mail.gmail.com
Whole thread Raw
In response to BUG #16236: Invalid escape encoding  (PG Bug reporting form <noreply@postgresql.org>)
List pgsql-bugs
Thanks Tom for the reply!

I read once more the doc and now I better understand the "high-bit-set
value" part ;o)

        myDatabaseName=# select encode('\x00017F80', 'escape');
              encode
        ------------------
         \000\x01\x7F\200

If I understand correctly, with the input "\x00017F80", I get the
outputted value above because:
- "00" is converted to "\000"
- "01" and "7F" get converted to "\x01" and "\x7F" respectively as they
  are not 0 or a high-bit-set value
- "80" is converted to "\200" since it is a high-bit-set value

I remember getting confused by the fact I got hexadecimal values in
output and I didn't really get the "high-bit-set" part of the doc.

Do you know why there is this distinction between high-bit-set values
and other non-printable characters ?

Also, I still have 2 more questions.

First, the following is strange: I cannot decode what the encode method
returned

        myDatabaseName=# select encode('\x00017F80', 'escape');
              encode
        ------------------
         \000\x01\x7F\200
        (1 row)

        myDatabaseName=# select decode('\000\x01\x7F\200', 'escape');
        ERROR:  invalid input syntax for type bytea

Second, as I was poking around the code, I found out about the
"bytea_output". If I set it to "escape", I still get hexadecimals. Is
that expected ?

        myDatabaseName=# set bytea_output to escape;
        SET
        myDatabaseName=# select encode('\x00017F80', 'escape');
              encode
        ------------------
         \000\x01\x7F\200
        (1 row)

Cheers,

On Mon, Jan 27, 2020 at 06:05:45PM -0500, Tom Lane wrote:
> PG Bug reporting form <noreply@postgresql.org> writes:
> > From the documentation [0] about the encode function, the "escape" format
> > should "convert zero bytes and high-bit-set bytes to octal sequences (\nnn)
> > and doubles backslashes."
> > However, executing "select encode(E'aaa\bccc', 'escape');" outputs
> > "aaa\x08ccc", although according to the documentation I should get
> > "aaa\010ccc".
>
> No, I don't think so.  The \b gives rise to a byte with hex value 08
> (that is, control-H or backspace) in the E'' literal, which converts
> to the same byte value in the bytea value that gets passed to
> encode().  Since that's not either a zero or a high-bit-set value,
> encode() just repeats it literally in the text result, and you end
> up with the same thing as if you'd just done
>
> =# select E'aaa\bccc'::text;     
>     text   
> ------------
>  aaa\x08ccc
> (1 row)
>
> I think it must be psql itself that's choosing to represent the
> backspace as \x08, because nothing in the backend does that.
> (pokes around ... yeah, it's pg_wcsformat() that's doing it)
>
> You could certainly make an argument that encode() ought to
> backslashify all ASCII control characters, not only \0.  But
> it's behaving as documented, AFAICS.
>
>                       regards, tom lane

--
Campinas Stéphane


--
Campinas Stéphane
Attachment

pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Segmentation Fault (Logical Replication)
Next
From: PG Bug reporting form
Date:
Subject: BUG #16238: Function " to_char(timestamp, text) " doesn't work properly