Re: proposal: unescape_text function - Mailing list pgsql-hackers
From | Pavel Stehule |
---|---|
Subject | Re: proposal: unescape_text function |
Date | |
Msg-id | CAFj8pRA-z=k5Ra+UNCXf2RqBCYWyQaDnE89YNgQW-0PQ9Jwxtw@mail.gmail.com Whole thread Raw |
In response to | Re: proposal: unescape_text function (Andrew Dunstan <andrew@dunslane.net>) |
Responses |
Re: proposal: unescape_text function
|
List | pgsql-hackers |
st 2. 12. 2020 v 0:05 odesílatel Andrew Dunstan <andrew@dunslane.net> napsal:
On 11/30/20 8:14 AM, Peter Eisentraut wrote:
> On 2020-11-29 18:36, Pavel Stehule wrote:
>>
>> I don't really get the point of this function. There is AFAICT no
>> function to produce this escaped format, and it's not a recognized
>> interchange format. So under what circumstances would one need to
>> use this?
>>
>>
>> Some corporate data can be in CSV format with escaped unicode
>> characters. Without this function it is not possible to decode these
>> files without external application.
>
> I would like some supporting documentation on this. So far we only
> have one stackoverflow question, and then this implementation, and
> they are not even the same format. My worry is that if there is not
> precise specification, then people are going to want to add things in
> the future, and there will be no way to analyze such requests in a
> principled way.
>
>
>
Also, should this be an extension? I'm dubious about including such
marginal uses in the core code unless there's a really good case for it.
I am not sure, and I am inclined so it should be core functionality.
1. Although this use case is marginal, this is related to most used encodings - ascii and unicode. 8 bit encodings enhanced about escaped multibyte chars will be used for a very long time. Unfortunately - this will be worse, because Postgres will be used more in the corporate environment, where there is a bigger press to conserve very legacy technologies without correct multibyte support. The core problem so this issue is out of concept bytea -> text or text -> bytea transformations supported by Postgres. This is text -> text transformation (for almost all encoding based on ascii), that is not supported by Postgres now.
2. Postgres already has this functionality - but unfortunately there is a limit just only literal constants.
create or replace function uunescape(text)
returns text as $$
declare r text;
begin
returns text as $$
declare r text;
begin
-- don't use this code!!!
execute 'select e''' || $1 || '''' into r;
return r;
end;
$$ language plpgsql immutable;
return r;
end;
$$ language plpgsql immutable;
But one way how anybody can use it is SQL injection vulnerable and slow. So some simple buildin solution can be protection against some future security issues. Personally I am happy with just this limited function that will be safe (although the design based on introducing new encoding and conversions can be more complete and accurate). I agree so this case is marginal, but it is a fully valid use case, and supporting unicode escaped codes just by parser is a needless limit.
3. there are new disadvantages of extensions in current DBaaS times. Until the extension is not directly accepted by a cloud provider, then the extension is not available for users. The acceptance of extensions is not too agile - so moving this code to extension doesn't solve this problem. Without DBaaS the implementation of this feature as the extensions can be good enough.
Regards
Pavel
cheers
andrew
pgsql-hackers by date: