Thread: Unexpected behaviour of encode()
psql -U postgres psql (9.2.3) Type "help" for help. postgres=# select encode('can''t', 'escape'); encode -------- can't (1 row) I observed the same behaviour on one of our older systems (8.3.11) as well. Am I missing something? I expected "can''t" as the output. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
On Tue, Mar 26, 2013 at 12:47 PM, Bill Moran <wmoran@potentialtech.com> wrote: > > psql -U postgres > psql (9.2.3) > Type "help" for help. > > postgres=# select encode('can''t', 'escape'); > encode > -------- > can't > (1 row) > > I observed the same behaviour on one of our older systems (8.3.11) as well. > > Am I missing something? I expected "can''t" as the output. why? that isn't what you passed it. the input string doubled single quotes is converted to single single quote per spec. it's 'ghetto escaping'. merlin
In response to Merlin Moncure <mmoncure@gmail.com>: > On Tue, Mar 26, 2013 at 12:47 PM, Bill Moran <wmoran@potentialtech.com> wrote: > > > > psql -U postgres > > psql (9.2.3) > > Type "help" for help. > > > > postgres=# select encode('can''t', 'escape'); > > encode > > -------- > > can't > > (1 row) > > > > I observed the same behaviour on one of our older systems (8.3.11) as well. > > > > Am I missing something? I expected "can''t" as the output. > > why? that isn't what you passed it. the input string doubled single > quotes is converted to single single quote per spec. it's 'ghetto > escaping'. Not sure what you mean by "ghetto secaping" ... but doubling up a ' is the SQL standard escaping method, AFAIK. If I just do: SELECT 'can''t'::text; I get "can't" which is what I'd expect. I would then expect encode to escape the ' somehow. Even c-style escaping, like "can\'t" would have been less surprising to me. If there's something I'm missing, I'm still missing it. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
On Tue, Mar 26, 2013 at 1:36 PM, Bill Moran <wmoran@potentialtech.com> wrote: > I get "can't" which is what I'd expect. I would then expect > encode to escape the ' somehow. nope -- encode/escape doesn't escape single quotes. it's not designed to produce output that can be fed directly back into the database (at least, not without escaping). yes, it (escape format) is pretty terrible. merlin
Bill Moran <wmoran@potentialtech.com> writes: > If I just do: > SELECT 'can''t'::text; > I get "can't" which is what I'd expect. I would then expect > encode to escape the ' somehow. Even c-style escaping, like > "can\'t" would have been less surprising to me. > If there's something I'm missing, I'm still missing it. The manual says that 'escape' encoding "merely outputs null bytes as \000 and doubles backslashes". (The reason to double backslashes is to make \000 unambiguous, of course.) The point of this is to sanitize bytea data sufficiently to allow it to be transported as text. If you want to transport it as a SQL literal, that's a tighter constraint that would require some other escaping method, or at least passing the result through something like quote_literal. (Having said that, I wonder though if "escape" doesn't need more thought. The output is only valid text in SQL_ASCII or single-byte encodings, otherwise there's risk of encoding violations.) regards, tom lane
On 2013-03-26, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The manual says that 'escape' encoding "merely outputs null bytes as > \000 and doubles backslashes". > (Having said that, I wonder though if "escape" doesn't need more > thought. The output is only valid text in SQL_ASCII or single-byte > encodings, otherwise there's risk of encoding violations.) it does that too, since as long as I can remember. I used decode-hex here so it'll work on older version of pg. select encode(decode('ee5c2700aa','hex'),'escape'); encode -------------------- \356\\'\000\252 to see the slashes doubled you need to run it through quote-literal select quote_literal(encode(decode('ee5c2700aa','hex'),'escape')); quote_literal -------------------------- E'\\356\\\\''\\000\\252' -- ⚂⚃ 100% natural
Jasen Betts <jasen@xnet.co.nz> writes: > On 2013-03-26, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The manual says that 'escape' encoding "merely outputs null bytes as >> \000 and doubles backslashes". >> (Having said that, I wonder though if "escape" doesn't need more >> thought. The output is only valid text in SQL_ASCII or single-byte >> encodings, otherwise there's risk of encoding violations.) > it does that too, since as long as I can remember. > I used decode-hex here so it'll work on older version of pg. Hah ... that's what I get for believing the manual ;-). The code comments tell the truth: * We must escape zero bytes and high-bit-set bytes to avoid generating * text that might be invalid in the current encoding, or that might * change to something else if passed through an encoding conversion * (leading to failing to de-escape to the original bytea value). * Also of course backslash itself has to be escaped. It appears that the manual's statement was correct before 8.3, but when somebody fixed the code to deal with the encoding issue, they didn't fix the manual. I'll go improve that ... regards, tom lane