Re: proposal: UTF8 to_ascii function - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: proposal: UTF8 to_ascii function
Date
Msg-id 48A041CC.1090703@dunslane.net
Whole thread Raw
In response to Re: proposal: UTF8 to_ascii function  (Jan Urbański <j.urbanski@students.mimuw.edu.pl>)
Responses Re: proposal: UTF8 to_ascii function  (Jan Urbański <j.urbanski@students.mimuw.edu.pl>)
Re: proposal: UTF8 to_ascii function  ("Pavel Stehule" <pavel.stehule@gmail.com>)
List pgsql-hackers

Jan Urbański wrote:
> Andrew Dunstan wrote:
>>
>>
>> Pavel Stehule wrote:
>>>
>>>
>>> One note - convert_to is correct. But we have to use to_ascii without
>>> decode functions. It has same behave - convert from bytea to text.
>>> Text in "incorrect" encoding is dafacto bytea. So correct to_ascii
>>> function prototypes are:
>>>
>>> to_ascii(text)
>>> to_ascii(bytea, integer);
>>> to_ascii(bytea, name);
>>>
>>>  
>>>>     
>>
>> What you have not said is how you propose to convert UTF8 to ASCII.
>>
>> Currently to_ascii() converts a small number of single byte charsets 
>> to ASCII by folding the chars with high bits set, so what we get is a 
>> pure ASCII result which is safe in any server encoding, as they are 
>> all ASCII supersets.
>>
>> But what conversion rule will you use for the gazillions of Unicode 
>> characters?
>>
>> I honestly do not understand the use case for this at all.
>
> I do. Often clients want their searches to be 
> accented-or-language-specific letters insensitive. So searching for 
> 'łódź' returns 'lodz'. So the use case is there (in fact, the lack of 
> such facility made me consider not upgrading particular client to 
> 8.3...).
> Or maybe there's a better way to do it?

Well, my first question would be "Why aren't you using a database 
encoding that supports to_ascii()?"

However, I suppose that your use case would support this signature:
   to_ascii(bytea, name)

where it would just error out if the encoding name were something other 
than LATIN1, LATIN2, LATIN9, or WIN1250.

But what would be the meaning of this?:
   to_ascii(bytea, integer)


cheers

andrew



pgsql-hackers by date:

Previous
From: Jan Urbański
Date:
Subject: Re: proposal: UTF8 to_ascii function
Next
From: Zdenek Kotala
Date:
Subject: Re: Proposal: PageLayout footprint