Home > mailing lists

Re: proposal: unescape_text function - Mailing list pgsql-hackers

From	Pavel Stehule
Subject	Re: proposal: unescape_text function
Date	November 29, 2020 20:36:01
Msg-id	CAFj8pRBw-L4gsFLYJMMsLghio=qe9eh+QXWVj037VaCH+h6=eQ@mail.gmail.com Whole thread Raw
In response to	Re: proposal: unescape_text function (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
Responses	Re: proposal: unescape_text function (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
List	pgsql-hackers

Tree view

pá 27. 11. 2020 v 15:37 odesílatel Peter Eisentraut <peter.eisentraut@enterprisedb.com> napsal:

On 2020-10-07 11:00, Pavel Stehule wrote:
> Since the idea originated from unescaping unicode string
> literals i.e.
> select unescape('Odpov\u011Bdn\u00E1 osoba');
>
> Shouldn't the built-in function support the above syntax as well?
>
>
> good idea. The prefixes u (4 digits) and U (8 digits) are supported

I don't really get the point of this function. There is AFAICT no
function to produce this escaped format, and it's not a recognized
interchange format. So under what circumstances would one need to use this?

Some corporate data can be in CSV format with escaped unicode characters. Without this function it is not possible to decode these files without external application.

Postgres has support for this conversion, but only for string literals.

CREATE OR REPLACE FUNCTION public.unescape(text, text)  RETURNS text LANGUAGE plpgsql AS $function$ DECLARE result text; BEGIN   EXECUTE format('SELECT U&%s UESCAPE %s',                          quote_literal(replace($1, '\u','^')),                         quote_literal($2)) INTO result;   RETURN result; END; $function$

Because unicode is major encoding, I think this conversion should be supported. There is another question about implementation like in this patch implemented unicode_unescape function, or with some new conversion. Using conversion https://www.postgresql.org/docs/current/sql-createconversion.html is probably better, but I am not sure how intuitive it is, and it is hard to use it (without not nice workarounds) in plpgsql.

I don't expect so Postgres should produce data in unicode escaped format, but can be useful, if Postgres can do some work with data in special format of major encoding.

postgres=# select convert_from(E'Odpov\u011Bdn\u00E1 osoba', 'UTF8');
┌─────────────────┐
│ convert_from │
╞═════════════════╡
│ Odpovědná osoba │
└─────────────────┘
(1 row)

I can do this with bytea, but it is hard to use it with text fields.

I didn't find any way how to do it without ugly steps.

Regards

Pavel

pgsql-hackers by date:

From: Justin Pryzby
Date: 29 November 2020, 20:21:15
Subject: Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From: Andreas Seltenreich
Date: 29 November 2020, 21:40:50
Subject: [sqlsmith] Planner error on lateral joins

Re: proposal: unescape_text function - Mailing list pgsql-hackers

Previous

Next