Home > mailing lists

Unicode escapes in literals - Mailing list pgsql-hackers

From	Peter Eisentraut
Subject	Unicode escapes in literals
Date	October 23, 2008 05:42:10
Msg-id	490038DB.5070602@gmx.net Whole thread Raw
Responses	Re: Unicode escapes in literals Re: Unicode escapes in literals
List	pgsql-hackers

Tree view

I would like to add an escape mechanism to PostgreSQL for entering 
arbitrary Unicode characters into string literals.  We currently only 
have the option of entering the character directly via the keyboard or 
cut-and-paste, which is difficult for a number of reasons, such as when 
the font doesn't have the character, and entering the UTF8-encoded bytes 
using the E'...' strings, which is hardly usable.

SQL has the following escape syntax for it:
   U&'special character: \xxxx' [ UESCAPE '\' ]

where xxxx is the hexadecimal Unicode codepoint.  So this is pretty much 
just another variant on what the E'...' syntax does.

The trick is that since we have user-definable encoding conversion 
routines, we can't convert the Unicode codepoint to the server encoding 
in the scanner stage.  I imagine there are two ways to address this:

1. Only support this syntax when the server encoding is UTF8.  This 
would probably cover most use cases anyway.  We could have limited 
support for characters in the ASCII range for all server encodings.

2. Convert this syntax to a function call.  But that would then create a 
lot of inconsistencies, such as needing functional indexes for matches 
against what should really be a literal.

I'd be happy to start with UTF8 support only.  Other ideas?

pgsql-hackers by date:

From: Simon Riggs
Date: 23 October 2008, 05:40:06
Subject: Re: Deriving Recovery Snapshots

From: Simon Riggs
Date: 23 October 2008, 05:56:00
Subject: Re: Block level concurrency during recovery

Unicode escapes in literals - Mailing list pgsql-hackers

Previous

Next