Home > mailing lists

Re: Unicode support - Mailing list pgsql-hackers

From	Peter Eisentraut
Subject	Re: Unicode support
Date	April 14, 2009 12:36:58
Msg-id	200904141532.44618.peter_e@gmx.net Whole thread Raw
In response to	Re: Unicode support (Andrew Dunstan <andrew@dunslane.net>)
Responses	Re: Unicode support
List	pgsql-hackers

Tree view

On Monday 13 April 2009 22:39:58 Andrew Dunstan wrote:
> Umm, but isn't that because your encoding is using one code point?
>
> See the OP's explanation w.r.t. canonical equivalence.
>
> This isn't about the number of bytes, but about whether or not we should
> count characters encoded as two or more combined code points as a single
> char or not.

Here is a test case that shows the problem (if your terminal can display
combining characters (xterm appears to work)):

SELECT U&'\00E9', char_length(U&'\00E9');?column? | char_length
----------+-------------é        |           1
(1 row)

SELECT U&'\0065\0301', char_length(U&'\0065\0301');?column? | char_length
----------+-------------é        |           2
(1 row)

pgsql-hackers by date:

From: Andrew Dunstan
Date: 14 April 2009, 12:20:37
Subject: Re: Unicode string literals versus the world

From: Peter Eisentraut
Date: 14 April 2009, 12:41:41
Subject: Re: Unicode support

Re: Unicode support - Mailing list pgsql-hackers

Previous

Next