Re: invalidly encoded strings - Mailing list pgsql-hackers

From Martijn van Oosterhout
Subject Re: invalidly encoded strings
Date
Msg-id 20070911063145.GA18260@svana.org
Whole thread Raw
In response to Re: invalidly encoded strings  (Tatsuo Ishii <ishii@postgresql.org>)
List pgsql-hackers
On Tue, Sep 11, 2007 at 11:27:50AM +0900, Tatsuo Ishii wrote:
> SELECT * FROM japanese_table ORDER BY convert(japanese_text using utf8_to_euc_jp);
>
> Without using convert(), he will get random order of data. This is
> because Kanji characters are in random order in UTF-8, while Kanji
> characters are reasonably ordered in EUC_JP.

The usual way to approach this is to make convert return bytea instead
of text. Then your problems vanish. Bytea can still be sorted, but it
won't be treated as a text string and thus does not need to conform to
the requirements of a text string.

Languages like perl distinguish between "encode" which is text->bytea
and "decode" which is bytea->text. We've got "convert" for oth and that
causes problems.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: invalidly encoded strings
Next
From: Jeff Davis
Date:
Subject: Re: invalidly encoded strings