Re: Unicode escapes with any backend encoding - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Unicode escapes with any backend encoding
Date
Msg-id 24911.1578967534@sss.pgh.pa.us
Whole thread Raw
In response to Re: Unicode escapes with any backend encoding  (Andrew Dunstan <andrew.dunstan@2ndquadrant.com>)
Responses Re: Unicode escapes with any backend encoding  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> On Tue, Jan 14, 2020 at 10:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Grepping for other direct uses of unicode_to_utf8(), I notice that
>> there are a couple of places in the JSON code where we have a similar
>> restriction that you can only write a Unicode escape in UTF8 server
>> encoding.  I'm not sure whether these same semantics could be
>> applied there, so I didn't touch that.

> Off the cuff I'd be inclined to say we should keep the text escape
> rules the same. We've already extended the JSON standard y allowing
> non-UTF8 encodings.

Right.  I'm just thinking though that if you can write "é" literally
in a JSON string, even though you're using LATIN1 not UTF8, then why
not allow writing that as "\u00E9" instead?  The latter is arguably
truer to spec.

However, if JSONB collapses "\u00E9" to LATIN1 "é", that would be bad,
unless we have a way to undo it on printout.  So there might be
some more moving parts here than I thought.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Unicode escapes with any backend encoding
Next
From: David Z
Date:
Subject: Re: Making psql error out on output failures