Home > mailing lists

Re: invalidly encoded strings - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: invalidly encoded strings
Date	September 11, 2007 00:20:44
Msg-id	16498.1189480802@sss.pgh.pa.us Whole thread Raw
In response to	Re: invalidly encoded strings (Tatsuo Ishii <ishii@postgresql.org>)
Responses	Re: invalidly encoded strings Re: invalidly encoded strings
List	pgsql-hackers

Tree view

Tatsuo Ishii <ishii@postgresql.org> writes:
>> BTW, it strikes me that there is another hole that we need to plug in
>> this area, and that's the convert() function.  Being able to create
>> a value of type text that is not in the database encoding is simply
>> broken.  Perhaps we could make it work on bytea instead (providing
>> a cast from text to bytea but not vice versa), or maybe we should just
>> forbid the whole thing if the database encoding isn't SQL_ASCII.

> Please don't do that. It will break an usefull use case of convert().

The reason we have a problem here is that we've been choosing
convenience over safety in encoding-related issues.  I wonder if we must
stoop to having a "strict_encoding_checks" GUC variable to satisfy
everyone.

> A user has a database encoded in UTF-8. He has English, French,
> Chinese  and Japanese data in tables. To sort the tables in the
> language order, he will do like this:

> SELECT * FROM japanese_table ORDER BY convert(japanese_text using utf8_to_euc_jp);

> Without using convert(), he will get random order of data.

I'd say that *with* convert() he will get a random order of data.  This
is making a boatload of unsupportable assumptions about the locale and
encoding of the surrounding database.  There are a lot of bad-encoding
situations for which strcoll() simply breaks down completely and can't
even deliver self-consistent answers.

It might work the way you are expecting if the database uses SQL_ASCII
encoding and C locale --- and I'd be fine with allowing convert() only
when the database encoding is SQL_ASCII.
        regards, tom lane

pgsql-hackers by date:

From: Tom Lane
Date: 11 September 2007, 00:01:14
Subject: Re: invalidly encoded strings

From: Tatsuo Ishii
Date: 11 September 2007, 00:30:08
Subject: Re: invalidly encoded strings

Re: invalidly encoded strings - Mailing list pgsql-hackers

Previous

Next