Thread: RFC: i18n2ascii(TEXT) stored procedure

RFC: i18n2ascii(TEXT) stored procedure

From
Michael A Nachbaur
Date:
I've created the following stored procedure to allow me to do
international-insensitive text searches, e.g. a search for "Resume" would
match the text "Résumé".

I wanted to know:

a) am I missing any characters that need to be converted?  My first (and only
language) is English, so I'm in the dark when that is concerned;
b) is there a better and/or faster way of implementing this?  I don't want
searches to bog down (at least too badly) as a result of this.

CREATE OR REPLACE FUNCTION i18n2ascii (TEXT) RETURNS TEXT AS '   my ($source) = @_;   $source =~
tr/áàâäéèêëíìîïóòôöúùûüÁÀÂÄÉÈÊËÍÌÎÏÓÒÔÖÚÙÛÜ/aaaaeeeeiiiioooouuuuAAAAEEEEIIIIOOOOUUUU/;   return $source;
' LANGUAGE 'plperl';

--
/* Michael A. Nachbaur <mike@nachbaur.com>* http://nachbaur.com/pgpkey.asc*/

"Ah, " said Arthur, "this is obviously some strange usage
of the word safe that I wasn't previously aware of. "



Re: RFC: i18n2ascii(TEXT) stored procedure

From
Manuel Sugawara
Date:
Michael A Nachbaur <mike@nachbaur.com> writes:

> b) is there a better and/or faster way of implementing this?  I
> don't want searches to bog down (at least too badly) as a result of
> this.

Use to_ascii(text),

masm=# select to_ascii('áéíóú');to_ascii
----------aeiou
(1 row)

Regards,
Manuel.


Re: RFC: i18n2ascii(TEXT) stored procedure

From
Michael A Nachbaur
Date:
On Thursday 25 September 2003 05:06 pm, Manuel Sugawara wrote:
> Michael A Nachbaur <mike@nachbaur.com> writes:
> > b) is there a better and/or faster way of implementing this?  I
> > don't want searches to bog down (at least too badly) as a result of
> > this.
>
> Use to_ascii(text),
[snip]
D'oh!  I guess thats what I get for not RTFM. :-) 

-- 
/* Michael A. Nachbaur <mike@nachbaur.com>* http://nachbaur.com/pgpkey.asc*/

"Oh no, not again." 




Re: RFC: i18n2ascii(TEXT) stored procedure

From
Peter Eisentraut
Date:
Michael A Nachbaur writes:

> a) am I missing any characters that need to be converted?

In Unicode, any character can be dynamically combined with any number of
accent characters, so an enumerated list will never do.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: RFC: i18n2ascii(TEXT) stored procedure

From
"scott.marlowe"
Date:
On Thu, 25 Sep 2003, Michael A Nachbaur wrote:

> I've created the following stored procedure to allow me to do 
> international-insensitive text searches, e.g. a search for "Resume" would 
> match the text "Résumé".
> 
> I wanted to know:
> 
> a) am I missing any characters that need to be converted?  My first (and only 
> language) is English, so I'm in the dark when that is concerned;
> b) is there a better and/or faster way of implementing this?  I don't want 
> searches to bog down (at least too badly) as a result of this.
> 
> CREATE OR REPLACE FUNCTION i18n2ascii (TEXT) RETURNS TEXT AS '
>     my ($source) = @_;
>     $source =~ 
> tr/áàâäéèêëíìîïóòôöúùûüÁÀÂÄÉÈÊËÍÌÎÏÓÒÔÖÚÙÛÜ/aaaaeeeeiiiioooouuuuAAAAEEEEIIIIOOOOUUUU/;
>     return $source;
> ' LANGUAGE 'plperl';


You could probably accomplish the same thing without using perl via the 
built in function translate().  Look in the functions-string.html in the 
7.3.x documentation.

Also, the regex version of substring() is quite powerful.