Unicode escapes in literals - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Unicode escapes in literals
Date
Msg-id 490038DB.5070602@gmx.net
Whole thread Raw
Responses Re: Unicode escapes in literals
Re: Unicode escapes in literals
List pgsql-hackers
I would like to add an escape mechanism to PostgreSQL for entering 
arbitrary Unicode characters into string literals.  We currently only 
have the option of entering the character directly via the keyboard or 
cut-and-paste, which is difficult for a number of reasons, such as when 
the font doesn't have the character, and entering the UTF8-encoded bytes 
using the E'...' strings, which is hardly usable.

SQL has the following escape syntax for it:
   U&'special character: \xxxx' [ UESCAPE '\' ]

where xxxx is the hexadecimal Unicode codepoint.  So this is pretty much 
just another variant on what the E'...' syntax does.

The trick is that since we have user-definable encoding conversion 
routines, we can't convert the Unicode codepoint to the server encoding 
in the scanner stage.  I imagine there are two ways to address this:

1. Only support this syntax when the server encoding is UTF8.  This 
would probably cover most use cases anyway.  We could have limited 
support for characters in the ASCII range for all server encodings.

2. Convert this syntax to a function call.  But that would then create a 
lot of inconsistencies, such as needing functional indexes for matches 
against what should really be a literal.

I'd be happy to start with UTF8 support only.  Other ideas?


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Deriving Recovery Snapshots
Next
From: Simon Riggs
Date:
Subject: Re: Block level concurrency during recovery