Thread: TODO: Expose parser support for decoding unicode escape literals to user
Hi all I just noticed a Stack Overflow question (http://stackoverflow.com/q/20124393/398670) where someone's asking how to decode '\u0000` style escapes *stored in database text fields* into properly encoded text strings. The parser supports this for escape-strings, and you can write E'\u011B' to get 'ě' because of http://postgresql.1045698.n5.nabble.com/Unicode-escapes-in-literals-td1992313.html. I don't see this exposed in a way that users can call directly, though. 'decode(bytea, text)' has the 'escape' input, but it expects octal. It's possible to use PL/PgSQL's 'EXECUTE' to use the parser to do the work, but that's downright awful. Am I missing something obvious, or is this something that'd be a good new-developer TODO? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: TODO: Expose parser support for decoding unicode escape literals to user
From
Adrian Klaver
Date:
On 05/15/2014 01:31 AM, Craig Ringer wrote: > Hi all > > I just noticed a Stack Overflow question > (http://stackoverflow.com/q/20124393/398670) where someone's asking how > to decode '\u0000` style escapes *stored in database text fields* into > properly encoded text strings. > > The parser supports this for escape-strings, and you can write E'\u011B' > to get 'ě' because of > http://postgresql.1045698.n5.nabble.com/Unicode-escapes-in-literals-td1992313.html. > > I don't see this exposed in a way that users can call directly, though. > 'decode(bytea, text)' has the 'escape' input, but it expects octal. > > It's possible to use PL/PgSQL's 'EXECUTE' to use the parser to do the > work, but that's downright awful. > > Am I missing something obvious, or is this something that'd be a good > new-developer TODO? > Not sure if this is what you want?: test=> SELECT quote_literal(E'test \u011B'); quote_literal --------------- 'test ě' -- Adrian Klaver adrian.klaver@aklaver.com
Re: TODO: Expose parser support for decoding unicode escape literals to user
From
David G Johnston
Date:
Adrian Klaver-4 wrote > On 05/15/2014 01:31 AM, Craig Ringer wrote: >> Hi all >> >> I just noticed a Stack Overflow question >> (http://stackoverflow.com/q/20124393/398670) where someone's asking how >> to decode '\u0000` style escapes *stored in database text fields* into >> properly encoded text strings. >> >> The parser supports this for escape-strings, and you can write E'\u011B' >> to get 'ě' because of >> http://postgresql.1045698.n5.nabble.com/Unicode-escapes-in-literals-td1992313.html. >> >> I don't see this exposed in a way that users can call directly, though. >> 'decode(bytea, text)' has the 'escape' input, but it expects octal. >> >> It's possible to use PL/PgSQL's 'EXECUTE' to use the parser to do the >> work, but that's downright awful. >> >> Am I missing something obvious, or is this something that'd be a good >> new-developer TODO? >> > > Not sure if this is what you want?: > > test=> SELECT quote_literal(E'test \u011B'); > quote_literal > --------------- > 'test ě' Except the data is already in the database and there is no way to put an "E" in front of a column name and cause PostgreSQL to process the escapes embedded in the column's value in the same way it processes a literal. WITH src (txt) AS ( VALUES ('A \u011B C') ) SELECT txt FROM src; Hence the need for a function to perform the same process that the parser performs when dealing with literals. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/TODO-Expose-parser-support-for-decoding-unicode-escape-literals-to-user-tp5804012p5804042.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Re: Re: TODO: Expose parser support for decoding unicode escape literals to user
From
Adrian Klaver
Date:
On 05/15/2014 07:13 AM, David G Johnston wrote: > Adrian Klaver-4 wrote >> On 05/15/2014 01:31 AM, Craig Ringer wrote: >>> Hi all >>> >>> I just noticed a Stack Overflow question >>> (http://stackoverflow.com/q/20124393/398670) where someone's asking how >>> to decode '\u0000` style escapes *stored in database text fields* into >>> properly encoded text strings. >>> >>> The parser supports this for escape-strings, and you can write E'\u011B' >>> to get 'ě' because of >>> http://postgresql.1045698.n5.nabble.com/Unicode-escapes-in-literals-td1992313.html. >>> >>> I don't see this exposed in a way that users can call directly, though. >>> 'decode(bytea, text)' has the 'escape' input, but it expects octal. >>> >>> It's possible to use PL/PgSQL's 'EXECUTE' to use the parser to do the >>> work, but that's downright awful. >>> >>> Am I missing something obvious, or is this something that'd be a good >>> new-developer TODO? >>> >> >> Not sure if this is what you want?: >> >> test=> SELECT quote_literal(E'test \u011B'); >> quote_literal >> --------------- >> 'test ě' > > Except the data is already in the database and there is no way to put an "E" > in front of a column name and cause PostgreSQL to process the escapes > embedded in the column's value in the same way it processes a literal. Yea, that is a problem. > > WITH src (txt) AS ( VALUES ('A \u011B C') ) > SELECT txt FROM src; > > Hence the need for a function to perform the same process that the parser > performs when dealing with literals. > > David J. > -- Adrian Klaver adrian.klaver@aklaver.com
Re: TODO: Expose parser support for decoding unicode escape literals to user
From
Craig Ringer
Date:
On 05/15/2014 09:56 PM, Adrian Klaver wrote: > > test=> SELECT quote_literal(E'test \u011B'); > quote_literal > --------------- > 'test ě' That's another case where the function isn't doing what you expect. quote_literal has nothing to do with what's happening, it's escape-string processing in the parser doing the work. Compare: regress=> SELECT 'test \u011B'; ?column? ------------- test \u011B (1 row) regress=> SELECT E'test \u011B'; ?column? ---------- test ě (1 row) now, the problem posed is if you had this: regress=> CREATE TABLE test AS SELECT TEXT 'test \u011B' dummy; SELECT 1 regress=> SELECT * FROM test; dummy ------------- test \u011B (1 row) how would you get 'test ě' ? The parser can do it, but I don't think anyone would consider this an acceptable solution to this problem (anybody reading this, UNDER NO CIRCUMSTANCES USE THIS FUNCTION, EVER): regress=> CREATE OR REPLACE FUNCTION ohmygod(text) RETURNS text AS $$ DECLARE retval text; BEGIN -- If you use this in real code, I hate you EXECUTE 'SELECT E'''||$1||''';' INTO retval; RETURN retval; END; $$ LANGUAGE plpgsql; CREATE FUNCTION regress=> SELECT ohmygod(dummy) FROM test; ohmygod --------- test ě (1 row) It'd be nice to expose this capability to users without requiring that kind of horror. Hence: exposing parser support for decoding unicode escape literals to the user. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: TODO: Expose parser support for decoding unicode escape literals to user
From
Adrian Klaver
Date:
On 05/16/2014 04:12 AM, Craig Ringer wrote: > On 05/15/2014 09:56 PM, Adrian Klaver wrote: >> >> test=> SELECT quote_literal(E'test \u011B'); >> quote_literal >> --------------- >> 'test ě' > > That's another case where the function isn't doing what you expect. > quote_literal has nothing to do with what's happening, it's > escape-string processing in the parser doing the work. Compare: > > regress=> SELECT 'test \u011B'; > ?column? > ------------- > test \u011B > (1 row) > > regress=> SELECT E'test \u011B'; > ?column? > ---------- > test ě > (1 row) Davids comments and some playing around with strings showed me what was going on and the error of my ways. > > now, the problem posed is if you had this: > > regress=> CREATE TABLE test AS SELECT TEXT 'test \u011B' dummy; > SELECT 1 > regress=> SELECT * FROM test; > dummy > ------------- > test \u011B > (1 row) > > how would you get 'test ě' ? > > > The parser can do it, but I don't think anyone would consider this an > acceptable solution to this problem (anybody reading this, UNDER NO > CIRCUMSTANCES USE THIS FUNCTION, EVER): > > > regress=> CREATE OR REPLACE FUNCTION ohmygod(text) RETURNS text AS $$ > DECLARE > retval text; > BEGIN > -- If you use this in real code, I hate you > EXECUTE 'SELECT E'''||$1||''';' INTO retval; > RETURN retval; > END; > $$ LANGUAGE plpgsql; > CREATE FUNCTION Actually I started down this dark path also, before I was interrupted by something else:) > > regress=> SELECT ohmygod(dummy) FROM test; > ohmygod > --------- > test ě > (1 row) > > > > It'd be nice to expose this capability to users without requiring that > kind of horror. > > Hence: exposing parser support for decoding unicode escape literals to > the user. Yes, so as an example something like: decode_u('test \u011B') that would decode the escaped values automatically. > -- Adrian Klaver adrian.klaver@aklaver.com