Re: proposal: unescape_text function - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: proposal: unescape_text function
Date
Msg-id CAFj8pRBw-L4gsFLYJMMsLghio=qe9eh+QXWVj037VaCH+h6=eQ@mail.gmail.com
Whole thread Raw
In response to Re: proposal: unescape_text function  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
Responses Re: proposal: unescape_text function  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
List pgsql-hackers


pá 27. 11. 2020 v 15:37 odesílatel Peter Eisentraut <peter.eisentraut@enterprisedb.com> napsal:
On 2020-10-07 11:00, Pavel Stehule wrote:
>         Since the idea originated from unescaping unicode string
>         literals i.e.
>                 select unescape('Odpov\u011Bdn\u00E1 osoba');
>
>         Shouldn't the built-in function support the above syntax as well?
>
>
>     good idea. The prefixes u (4 digits) and U (8 digits) are supported

I don't really get the point of this function.  There is AFAICT no
function to produce this escaped format, and it's not a recognized
interchange format.  So under what circumstances would one need to use this?

Some corporate data can be in CSV format with escaped unicode characters. Without this function it is not possible to decode these files without external application.

Postgres has support for this conversion, but only for string literals.

CREATE OR REPLACE FUNCTION public.unescape(text, text)  RETURNS text LANGUAGE plpgsql AS $function$ DECLARE result text; BEGIN   EXECUTE format('SELECT U&%s UESCAPE %s',                          quote_literal(replace($1, '\u','^')),                         quote_literal($2)) INTO result;   RETURN result; END; $function$

Because unicode is major encoding, I think this conversion should be supported. There is another question about implementation like in this patch implemented unicode_unescape function, or with some new conversion. Using conversion https://www.postgresql.org/docs/current/sql-createconversion.html  is probably better, but I am not sure how intuitive it is, and it is hard to use it (without not nice workarounds) in plpgsql.

I don't expect so Postgres should produce data in unicode escaped format, but can be useful, if Postgres can do some work with data in special format of major encoding.

postgres=# select convert_from(E'Odpov\u011Bdn\u00E1 osoba', 'UTF8');
┌─────────────────┐
│  convert_from   │
╞═════════════════╡
│ Odpovědná osoba │
└─────────────────┘
(1 row)

I can do this with bytea, but it is hard to use it with text fields.

I didn't find any way how to do it without ugly steps.

Regards

Pavel

pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)
Next
From: Andreas Seltenreich
Date:
Subject: [sqlsmith] Planner error on lateral joins