Home > mailing lists

Re: Bug in UTF8-Validation Code? - Mailing list pgsql-hackers

From	Mark Dilger
Subject	Re: Bug in UTF8-Validation Code?
Date	April 2, 2007 20:04:28
Msg-id	46117D6D.7050705@markdilger.com Whole thread Raw
In response to	Re: Bug in UTF8-Validation Code? (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Bug in UTF8-Validation Code? Re: Bug in UTF8-Validation Code?
List	pgsql-hackers

Tree view

Tom Lane wrote:
> Mark Dilger <pgsql@markdilger.com> writes:
>>> pgsql=# select chr(14989485);
>>> chr
>>> -----
>>> ä¸
>>> (1 row)
> 
> Is there a principled rationale for this particular behavior as
> opposed to any other?
> 
> In particular, in UTF8 land I'd have expected the argument of chr()
> to be interpreted as a Unicode code point, not as actual UTF8 bytes
> with a randomly-chosen endianness.
> 
> Not sure what to do in other multibyte encodings.

"Not sure what to do in other multibyte encodings" was pretty much my rationale 
for this particular behavior.  I standardized on network byte order because 
there are only two endianesses to choose from, and the other seems to be a more 
surprising choice.

I looked around on the web for a standard for how to convert an integer into a 
valid multibyte character and didn't find anything.  Andrew, Supernews has said 
upthread that chr() is clearly wrong and needs to be fixed. If so, we need some 
clear definition what "fixed" means.

Any suggestions?

mark

pgsql-hackers by date:

From: Chris Browne
Date: 02 April 2007, 20:04:10
Subject: Re: Modifying TOAST thresholds

From: Mark Dilger
Date: 02 April 2007, 20:07:32
Subject: Re: Bug in UTF8-Validation Code?

Re: Bug in UTF8-Validation Code? - Mailing list pgsql-hackers

Previous

Next