Re: [GENERAL] Case insensitive searches (and accents) - Mailing list pgsql-general

From Paulo Parola
Subject Re: [GENERAL] Case insensitive searches (and accents)
Date
Msg-id 007601be98ca$89aa2ec0$0300000a@cpqivx-2
Whole thread Raw
List pgsql-general
On Thu, 6 May 1999, Michael J Davis wrote:

>> Try:
>>
>> select some_field
>>    from table
>>    where lower(another_field) like '%substring_entered_by_user%'


>Michael,  I stuck with this format for awhile, in the interests of trying
>to write generic code and then the '~*' operator was just too handy.  ;-)

>SELECT * FROM foo WHERE bar ~* 'kun';

>Will return `Kunstler' from table `foo'...I don't know about the umlaut
>though...

>Cheers,
>Tom

Would both approaches be equivalent? How about performance?

As for the 'accent insensitive' searches: I should strip all accents
at the returned values perhaps at the select statement itself like
we can do with the 'lower' function and compare them to the
variables input by the user after also stripping the accents, perhaps
like the PHP3 command below, which I use to strip the accents
and different characters from the Portuguese language:

  $substring_entered_by_user = strtr($substring_entered_by_user,
"áàâãéêíóôõüúçÁÀÂÃÉÊÍÓÔÕÜÚÇ", "aaaaeeiooouucAAAAEEIOOOUUC");

I found out the PostgreSQL function 'translate' can be of help, but see how
it works with the examples below:

select translate('12345','1','a')\g
translate
---------
a2345
(1 row)

select translate('áéíóúfgçdfÇfg', 'áóÇ', 'aoc') \g
translate
-------------
aéíóúfgçdfÇfg
(1 row)

select translate('áéíóúfgçdfÇfg', 'Çáó', 'cao') \g
translate
-------------
áéíóúfgçdfcfg
(1 row)

select translate(translate('áé','é','e'),'á','a')\g
translate
---------
ae
(1 row)

So if I would like to strip the accents from Portuguese I would have to
issue a where clause filled with ugly things like

select ... from ... where
translate( translate( translate( translate( lower(string), 'á','a' ),
'à','a' ),
'â','a' ), 'é','e'), ..... like '%$variable1%' and translate(.....) ....
like '%$variable2%' ...

so that each comparison at the where clause would have to have 13 embedded
calls to the translate function to strip all the 13 characters "with accent"
from the Portuguese language!

Would this be the only way to do that???

How about performance and code legibility?

Any help would be greatly appreciated!

Paulo
pparola@brazilinfo.com




pgsql-general by date:

Previous
From: "Jonny Hinojosa"
Date:
Subject: Regression test failures
Next
From: Kristopher Yates
Date:
Subject: accessing PostgreSQL DB via PERL