Re: Unicode support - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: Unicode support
Date
Msg-id 49E3950E.8020800@dunslane.net
Whole thread Raw
In response to Re: Unicode support  (Alvaro Herrera <alvherre@commandprompt.com>)
Responses Re: Unicode support  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Unicode support  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-hackers

Alvaro Herrera wrote:
> - - wrote:
>
>   
>> 1) Functions like char_length() or length() do NOT return the number
>> of characters (the manual says they do), instead they return the
>> number of code points.
>>     
>
> I think you have client_encoding misconfigured.
>
> alvherre=# select length('á'::text);
>  length 
> --------
>       1
> (1 fila)
>
>
>   

Umm, but isn't that because your encoding is using one code point?

See the OP's explanation w.r.t. canonical equivalence.

This isn't about the number of bytes, but about whether or not we should 
count characters encoded as two or more combined code points as a single 
char or not.

cheers

andrew




pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Unicode support
Next
From: Robert Haas
Date:
Subject: Re: proposal: add columns created and altered to pg_proc and pg_class