Re: proposal: unescape_text function - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: proposal: unescape_text function
Date
Msg-id CAFj8pRA-z=k5Ra+UNCXf2RqBCYWyQaDnE89YNgQW-0PQ9Jwxtw@mail.gmail.com
Whole thread Raw
In response to Re: proposal: unescape_text function  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: proposal: unescape_text function
List pgsql-hackers


st 2. 12. 2020 v 0:05 odesílatel Andrew Dunstan <andrew@dunslane.net> napsal:

On 11/30/20 8:14 AM, Peter Eisentraut wrote:
> On 2020-11-29 18:36, Pavel Stehule wrote:
>>
>>     I don't really get the point of this function.  There is AFAICT no
>>     function to produce this escaped format, and it's not a recognized
>>     interchange format.  So under what circumstances would one need to
>>     use this?
>>
>>
>> Some corporate data can be in CSV format with escaped unicode
>> characters. Without this function it is not possible to decode these
>> files without external application.
>
> I would like some supporting documentation on this.  So far we only
> have one stackoverflow question, and then this implementation, and
> they are not even the same format.  My worry is that if there is not
> precise specification, then people are going to want to add things in
> the future, and there will be no way to analyze such requests in a
> principled way.
>
>
>


Also, should this be an extension? I'm dubious about including such
marginal uses in the core code unless there's a really good case for it.

I am not sure, and  I am inclined so it should be core functionality.

1. Although this use case is marginal, this is related to most used encodings - ascii and unicode. 8 bit encodings enhanced about escaped multibyte chars will be used for a very long time. Unfortunately - this will be worse, because Postgres will be used more in the corporate environment, where there is a bigger press to conserve very legacy technologies without correct multibyte support. The core problem so this issue is out of concept bytea -> text or text -> bytea transformations supported by Postgres. This is text -> text transformation (for almost all encoding based on ascii), that is not supported by Postgres now.

2. Postgres already has this functionality - but unfortunately there is a limit just only literal constants.

create or replace function uunescape(text)
returns text as $$
declare r text;
begin
  -- don't use this code!!!
  execute 'select e''' || $1 || '''' into r;
  return r;
end;
$$ language plpgsql immutable;

But one way how anybody can use it is SQL injection vulnerable and slow. So some simple buildin solution can be protection against some future security issues. Personally I am happy with just this limited function that will be safe (although the design based on introducing new encoding and conversions can be more complete and accurate). I agree so this case is marginal, but it is a fully valid use case, and supporting unicode escaped codes just by parser is a needless limit.

3. there are new disadvantages of extensions in current DBaaS times. Until the extension is not directly accepted by a cloud provider, then the extension is not available for users. The acceptance of extensions is not too agile - so moving this code to extension doesn't solve this problem. Without DBaaS the implementation of this feature as the extensions can be good enough.

Regards

Pavel





cheers


andrew

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Add statistics to pg_stat_wal view for wal related parameter tuning
Next
From: Nikolay Samokhvalov
Date:
Subject: Re: pg_stat_statements oddity with track = all