Thread: Unexpected behaviour of encode()

Unexpected behaviour of encode()

From
Bill Moran
Date:
psql -U postgres
psql (9.2.3)
Type "help" for help.

postgres=# select encode('can''t', 'escape');
 encode
--------
 can't
(1 row)

I observed the same behaviour on one of our older systems (8.3.11) as well.

Am I missing something?  I expected "can''t" as the output.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/


Re: Unexpected behaviour of encode()

From
Merlin Moncure
Date:
On Tue, Mar 26, 2013 at 12:47 PM, Bill Moran <wmoran@potentialtech.com> wrote:
>
> psql -U postgres
> psql (9.2.3)
> Type "help" for help.
>
> postgres=# select encode('can''t', 'escape');
>  encode
> --------
>  can't
> (1 row)
>
> I observed the same behaviour on one of our older systems (8.3.11) as well.
>
> Am I missing something?  I expected "can''t" as the output.

why? that isn't what you passed it.  the input string doubled single
quotes is converted to single single quote per spec.  it's 'ghetto
escaping'.

merlin


Re: Unexpected behaviour of encode()

From
Bill Moran
Date:
In response to Merlin Moncure <mmoncure@gmail.com>:

> On Tue, Mar 26, 2013 at 12:47 PM, Bill Moran <wmoran@potentialtech.com> wrote:
> >
> > psql -U postgres
> > psql (9.2.3)
> > Type "help" for help.
> >
> > postgres=# select encode('can''t', 'escape');
> >  encode
> > --------
> >  can't
> > (1 row)
> >
> > I observed the same behaviour on one of our older systems (8.3.11) as well.
> >
> > Am I missing something?  I expected "can''t" as the output.
>
> why? that isn't what you passed it.  the input string doubled single
> quotes is converted to single single quote per spec.  it's 'ghetto
> escaping'.

Not sure what you mean by "ghetto secaping" ... but doubling up a '
is the SQL standard escaping method, AFAIK.

If I just do:
SELECT 'can''t'::text;
I get "can't" which is what I'd expect.  I would then expect
encode to escape the ' somehow.  Even c-style escaping, like
"can\'t" would have been less surprising to me.

If there's something I'm missing, I'm still missing it.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/


Re: Unexpected behaviour of encode()

From
Merlin Moncure
Date:
On Tue, Mar 26, 2013 at 1:36 PM, Bill Moran <wmoran@potentialtech.com> wrote:
> I get "can't" which is what I'd expect.  I would then expect
> encode to escape the ' somehow.

nope -- encode/escape doesn't escape single quotes. it's not designed
to produce output that can be fed directly back into the database (at
least, not without escaping).  yes, it (escape format) is pretty
terrible.

merlin


Re: Unexpected behaviour of encode()

From
Tom Lane
Date:
Bill Moran <wmoran@potentialtech.com> writes:
> If I just do:
> SELECT 'can''t'::text;
> I get "can't" which is what I'd expect.  I would then expect
> encode to escape the ' somehow.  Even c-style escaping, like
> "can\'t" would have been less surprising to me.

> If there's something I'm missing, I'm still missing it.

The manual says that 'escape' encoding "merely outputs null bytes as
\000 and doubles backslashes".  (The reason to double backslashes is to
make \000 unambiguous, of course.)  The point of this is to sanitize
bytea data sufficiently to allow it to be transported as text.  If you
want to transport it as a SQL literal, that's a tighter constraint that
would require some other escaping method, or at least passing the result
through something like quote_literal.

(Having said that, I wonder though if "escape" doesn't need more
thought.  The output is only valid text in SQL_ASCII or single-byte
encodings, otherwise there's risk of encoding violations.)

            regards, tom lane


Re: Unexpected behaviour of encode()

From
Jasen Betts
Date:
On 2013-03-26, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> The manual says that 'escape' encoding "merely outputs null bytes as
> \000 and doubles backslashes".

> (Having said that, I wonder though if "escape" doesn't need more
> thought.  The output is only valid text in SQL_ASCII or single-byte
> encodings, otherwise there's risk of encoding violations.)

it does that too, since as long as I can remember.
I used decode-hex here so it'll work on older version of pg.

select encode(decode('ee5c2700aa','hex'),'escape');
    encode
--------------------
 \356\\'\000\252

to see the slashes doubled you need to run it through quote-literal

select quote_literal(encode(decode('ee5c2700aa','hex'),'escape'));
      quote_literal
--------------------------
 E'\\356\\\\''\\000\\252'

--
⚂⚃ 100% natural

Re: Unexpected behaviour of encode()

From
Tom Lane
Date:
Jasen Betts <jasen@xnet.co.nz> writes:
> On 2013-03-26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The manual says that 'escape' encoding "merely outputs null bytes as
>> \000 and doubles backslashes".

>> (Having said that, I wonder though if "escape" doesn't need more
>> thought.  The output is only valid text in SQL_ASCII or single-byte
>> encodings, otherwise there's risk of encoding violations.)

> it does that too, since as long as I can remember.
> I used decode-hex here so it'll work on older version of pg.

Hah ... that's what I get for believing the manual ;-).  The code
comments tell the truth:

 * We must escape zero bytes and high-bit-set bytes to avoid generating
 * text that might be invalid in the current encoding, or that might
 * change to something else if passed through an encoding conversion
 * (leading to failing to de-escape to the original bytea value).
 * Also of course backslash itself has to be escaped.

It appears that the manual's statement was correct before 8.3, but
when somebody fixed the code to deal with the encoding issue, they
didn't fix the manual.  I'll go improve that ...

            regards, tom lane