Re: proposal: unescape_text function - Mailing list pgsql-hackers

From Chapman Flack
Subject Re: proposal: unescape_text function
Date
Msg-id 5FC7AAF5.7010209@anastigmatix.net
Whole thread Raw
In response to Re: proposal: unescape_text function  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: proposal: unescape_text function
List pgsql-hackers
On 12/02/20 05:37, Pavel Stehule wrote:
> 2. there can be optional parameter "prefix" with default "\". But with "\u"
> it can be compatible with Java or Python.

Java's unicode escape form is one of those early ones that lack
a six-digit form, and where any character outside of the basic multilingual
plane has to be represented by two four-digit escapes in a row, encoding
the two surrogates that would make up the character's representation
in UTF-16.

Obviously that's an existing form that's out there, so it's not a bad
thing to have some kind of support for it, but it's not a great
representation to encourage people to use.

Python, by contrast, has both \uxxxx and \Uxxxxxxxx where you would use
the latter to represent a non-BMP character directly. So the Java and
Python schemes should be considered distinct.

In Perl, there is a useful extension to regexp substitution where
you specify the replacement not as a string or even a string with &
and \1 \2 ... magic, but as essentially a lambda that is passed the
match and returns a computed replacement. That makes conversions of
the sort discussed here generally trivial to implement. Would it be
worth considering to add something of general utility like that, and
then there could be a small library of pure SQL functions (or a wiki
page or GitHub gist) covering a bunch of the two dozen representations
on that page linked above?

Regards,
-Chap



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: macOS SIP, next try
Next
From: Dmitry Dolgov
Date:
Subject: Re: [HACKERS] [PATCH] Generic type subscripting