Home > mailing lists

UTF8 on Debian - Mailing list pgsql-hackers

From	Gregory Stark
Subject	UTF8 on Debian
Date	October 15, 2007 18:07:17
Msg-id	87fy0c2ikl.fsf@oxford.xeocode.com Whole thread Raw
Responses	Re: UTF8 on Debian Re: UTF8 on Debian Re: UTF8 on Debian
List	pgsql-hackers

Tree view

Something very strange is going on on my machine with UTF8:

postgres=# show server_encoding;server_encoding 
-----------------UTF8
(1 row)

postgres=# select length(convert_from(E'\343\203\251\343\202\244\343\202\273\343\203\263','utf8'));length 
--------     8
(1 row)

postgres=# select 'substring(s,'||i||',1)',convert_to(substring(s,i,1),'utf8') from (select
convert_from(E'\343\203\251\343\202\244\343\202\273\343\203\263','utf8')as s)a, (select generate_series(1,8) as i)b;
?column?    | convert_to 
 
------------------+------------substring(s,1,1) | \343substring(s,2,1) | \203\251substring(s,3,1) |
\343substring(s,4,1)| \202\244substring(s,5,1) | \343substring(s,6,1) | \202\273substring(s,7,1) | \343substring(s,8,1)
|\203\263
 
(8 rows)

I believe this is in fact only four katakana characters. (Namely U+30E9 U+30A4
U+30BB U+30F3) \343 is merely the first byte of each three-byte encoding for
the individual characters.

Dave doesn't see the same behaviour on this three machines, so I think it's
something unique to my machine. Possibly not a Postgres bug at all but some
kind of install gotcha.

I'm running Debian unstable with glibc 2.6.1-4 so it is a bit bleeding edge.
But as I understand it the utf8 decoding is all our code anyways so I can't
quite figure out how it could be glibc's fault.

Does anybody else see anything like this?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

pgsql-hackers by date:

From: Magnus Hagander
Date: 15 October 2007, 14:44:17
Subject: Re: Windows and locales and UTF-8 (oh my)

From: Gregory Stark
Date: 15 October 2007, 18:50:24
Subject: Re: UTF8 on Debian

UTF8 on Debian - Mailing list pgsql-hackers

Previous

Next