Thread: case insensitive search in a column

case insensitive search in a column

From
Roland Dirlewanger
Date:
Hope the following problem is related enough to interfaces  ...

I'm looking for the most efficient way to look through a PGSQL database
with a case insensitive search.

I tried the following queries :
1. select * from mybase where myattr like 'word%';
2. select * from mybase where myattr ~* '^word';

The first one is case-sensitive. The second one matches my needs but is
about 5 or 6 times slower than the first one.

Before I start to convert the whole database in either lowercase or
uppercase, is there a way to fasten up case insensitive searches for
words a the begining of a column or even in the middle of a row ?

Thanks a lot in advance for your help.

--
Roland Dirlewanger
CNRS - Délégation Aquitaine et Poitou-Charentes
Esplanade des Arts et Métiers
33402 TALENCE CEDEX

Mél : rd@dr15.cnrs.fr, Tél : 05.57.35.58.52, Fax : 05.57.35.58.01



Re: [INTERFACES] case insensitive search in a column

From
Tom Lane
Date:
Roland Dirlewanger <rd@dr15.cnrs.fr> writes:
> I tried the following queries :
> 1. select * from mybase where myattr like 'word%';
> 2. select * from mybase where myattr ~* '^word';
> The first one is case-sensitive. The second one matches my needs but is
> about 5 or 6 times slower than the first one.

Probably you have an index on myattr?

The system knows how to limit the scan using the index for
case-sensitive queries --- basically it rewrites a query like the
above toselect * from mybase where myattr like 'word%' AND    myattr >= 'word' AND myattr <= 'word\377';
and then the index scanner knows what to do with the extra clauses,
so that most of the table doesn't even get visited.
(This also works for myattr ~ '^word', btw.)

This trick doesn't work for case-insensitive queries, however.

You might be able to get somewhere with a downcased functional
index, iecreate index mybase_myattr_loweron mybase(lower(myattr) text_ops);
(I might have the syntax slightly off, but it's close) and then
writeselect * from mybase where lower(myattr) like 'word%';
I am not sure whether the auto rewriting works in this scenario
however --- you might have to do it yourself, ie actually write outselect * from mybase where lower(myattr) like
'word%'AND    lower(myattr) >= 'word' AND lower(myattr) <= 'word\377';
 
You could check by seeing whether EXPLAIN says that the simpler
version is being done by sequential scan or index scan.

> Before I start to convert the whole database in either lowercase or
> uppercase, is there a way to fasten up case insensitive searches for
> words a the begining of a column or even in the middle of a row ?

If you want to find any word in a table, you need something like
pgsql/contrib/fulltextindex/ --- but the overhead is pretty high ...
        regards, tom lane

PS: this'd be more on-topic in pgsql-sql, please direct any
followups there.