Thread: Help with LIKE

Help with LIKE

From
"David Olbersen"
Date:
I have a table with 8,628,633 rows that I'd LIKE to search (ha ha).

I have a very simple query:
  SELECT * FROM tableA WHERE column1 LIKE '%something%';

tableA.column1 has an index on it and the database has been vacuumed recently. My problem is with the output of
EXPLAIN:

+----------------------------------------------------------------+
|                         QUERY PLAN                             |
+----------------------------------------------------------------+
| Seq Scan on tableA  (cost=0.00..212651.61 rows=13802 width=46) |
|   Filter: (column1 ~~ '%something%'::text)                     |
+----------------------------------------------------------------+

I don't like that cost (2,12,651) at all! Is there anyway I can optimize this query? Make a different kind of index
(it'scurrently btree)? Use substr or indexof or something instead of LIKE? 

Thoughts?

--------------------------
David Olbersen
iGuard Engineer
11415 West Bernardo Court
San Diego, CA 92127
1-858-676-2277 x2152

Re: Help with LIKE

From
Josh Berkus
Date:
David,

> I have a table with 8,628,633 rows that I'd LIKE to search (ha ha).
>
> I have a very simple query:
>   SELECT * FROM tableA WHERE column1 LIKE '%something%';

That's what's called an "unanchored text search".   That kind of query cannot
be indexed using a regular index.

What you need is called "Full Text Indexing" or "Full Text Search".   Check
out two resources:

1) contrib/tsearch in your PostgreSQL source code;
2) OpenFTS (www.openfts.org).

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: Help with LIKE

From
"David Olbersen"
Date:
Josh,

> That's what's called an "unanchored text search".   That kind
> of query cannot be indexed using a regular index.

Duh, should have tried the anchors to get what I wanted...

> What you need is called "Full Text Indexing" or "Full Text
> Search".   Check
> out two resources:

This isn't actually what I was looking for, the anchor works better (only 5.87 now!)

Thanks for the reminder!

--------------------------
David Olbersen
iGuard Engineer
11415 West Bernardo Court
San Diego, CA 92127
1-858-676-2277 x2152

Re: Help with LIKE

From
"David Olbersen"
Date:
My mistake, things don't get much better.

I'm selecting URLs out of a database like this:

  SELECT * FROM table WHERE url ~ '^http://.*something.*$';

This still uses a sequential scan but cuts the time down to 76,351 from 212,651 using

  WHERE url LIKE '%something%';

The full text indexing doesn't look quite right as there are no spaces in this data.

Also, using something like:

  WHERE position( 'something', url ) > 0

is a bit worse, giving 84,259.

--------------------------
David Olbersen
iGuard Engineer
11415 West Bernardo Court
San Diego, CA 92127
1-858-676-2277 x2152


> -----Original Message-----
> From: David Olbersen
> Sent: Thursday, March 20, 2003 3:19 PM
> To: pgsql-sql@postgresql.org
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Help with LIKE
>
>
> Josh,
>
> > That's what's called an "unanchored text search".   That kind
> > of query cannot be indexed using a regular index.
>
> Duh, should have tried the anchors to get what I wanted...
>
> > What you need is called "Full Text Indexing" or "Full Text
> > Search".   Check
> > out two resources:
>
> This isn't actually what I was looking for, the anchor works
> better (only 5.87 now!)
>
> Thanks for the reminder!
>
> --------------------------
> David Olbersen
> iGuard Engineer
> 11415 West Bernardo Court
> San Diego, CA 92127
> 1-858-676-2277 x2152
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
> majordomo@postgresql.org
>

Re: Help with LIKE

From
Josh Berkus
Date:
David,

> My mistake, things don't get much better.
>
> I'm selecting URLs out of a database like this:
>
>   SELECT * FROM table WHERE url ~ '^http://.*something.*$';

That search still requires a seq scan, since it has "gaps" in the seqence of
characters.  That is,

url ~ '^http://www.something.*' could use an index, but your search above
cannot.

You may be right that the standard OpenFTS indexing won't help you in this
case, since you're really searching for fragments of a continuous text
string.

One thing I might suggest is that you look for ways that you might be able to
break out the text you're searching for with a function.  For example, if you
were searching strictly on the domain SLD name, then you could create an
"immutable"  function called if_split_sld(TEXT) and index on that.

If you are really searching for "floating" text within the string, I believe
that there are some options in tseach to help you, but they may not end up
improving performance much.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: Help with LIKE

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
>> SELECT * FROM table WHERE url ~ '^http://.*something.*$';

> That search still requires a seq scan, since it has "gaps" in the seqence of
> characters.  That is,

> url ~ '^http://www.something.*' could use an index, but your search above
> cannot.

Actually, it *can* use an index ... but the index condition will only
use the characters before the ".*", ie, "http://".  Which is just about
useless if you're searching a column of URLs :-(

I agree that tsearch or OpenFTS are the tools to be looking at.

            regards, tom lane