Home > mailing lists

Re: Unicode support - Mailing list pgsql-hackers

From	Andrew Dunstan
Subject	Re: Unicode support
Date	April 13, 2009 18:04:27
Msg-id	49E3A8D1.9010607@dunslane.net Whole thread Raw
In response to	Re: Unicode support (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> This isn't about the number of bytes, but about whether or not we should 
>> count characters encoded as two or more combined code points as a single 
>> char or not.
>>     
>
> It's really about whether we should support non-canonical encodings.
> AFAIK that's a hack to cope with implementations that are restricted
> to UTF-16, and we should Just Say No.  Clients that are sending these
> things converted to UTF-8 are in violation of the standard.
>   

I don't believe that the standard forbids the use of combining chars at 
all. RFC 3629 says:
  Security may also be impacted by a characteristic of several  character encodings, including UTF-8: the "same thing"
(asfar as a  user can tell) can be represented by several distinct character  sequences.  For instance, an e with acute
accentcan be represented  by the precomposed U+00E9 E ACUTE character or by the canonically  equivalent sequence U+0065
U+0301(E + COMBINING ACUTE).  Even though  UTF-8 provides a single byte sequence for each character sequence,  the
existenceof multiple character sequences for "the same thing"  may have security consequences whenever string matching,
indexing, searching, sorting, regular expression matching and selection are  involved.  An example would be string
matchingof an identifier  appearing in a credential and in access control list entries.  This  issue is amenable to
solutionsbased on Unicode Normalization Forms,  see [UAX15].

cheers

andrew

pgsql-hackers by date:

From: Tom Lane
Date: 13 April 2009, 17:39:56
Subject: Re: Unicode support

From: - -
Date: 13 April 2009, 18:22:18
Subject: Re: Unicode support

Re: Unicode support - Mailing list pgsql-hackers

Previous

Next