Re: LIKE search and performance - Mailing list pgsql-performance

From Magnus Hagander
Subject Re: LIKE search and performance
Date
Msg-id 4655E64B.5050503@hagander.net
Whole thread Raw
In response to Re: LIKE search and performance  (James Mansion <james@mansionfamily.plus.com>)
Responses Re: LIKE search and performance  (James Mansion <james@mansionfamily.plus.com>)
List pgsql-performance
James Mansion wrote:
> Alexander Staubo wrote:
>> On 5/23/07, Andy <frum@ar-sd.net> wrote:
>>> An example would be:
>>> SELECT * FROM table
>>>                              WHERE name like '%john%' or street like
>>> '%srt%'
>>>
>>> Anyway, the query planner always does seq scan on the whole table and
>>> that
>>> takes some time. How can this be optimized or made in another way to be
>>> faster?
>>
>> There's no algorithm in existence that can "index" arbitrary
>> substrings the way you think. The only rational way to accomplish this
>> is to first break the text into substrings using some algorithm (eg.,
>> words delimited by whitespace and punctuation), and index the
>> substrings individually.
> That seems rather harsh.  If I'd put an index on each of these colomns
> I'd certainly
> expect it to use the indices - and I'm pretty sure that Sybase would.
> I'd expect
> it to scan the index leaf pages instead of the table itself - they
> should be much
> more compact and also likely to be hot in cache.

If Sybase is still like SQL Server (or the other way around), it *may*
end up scanning the index *IFF* the index is a clustered index. If it's
a normal index, it will do a sequential scan on the table.

Yes, the leaf page of the index is more compact, but you also have to
scan the intermediate pages to get to the leaf pages. But again, it can
be a win. On such a system.

It's not a win on PostgreSQL, because of our MVCC implementation. We
need to scan *both* index *and* data pages if we go down that route, in
which case it's a lot faster to just scan the data pages alone.

I don't really know how MSSQL deals with this now that they have
MVCC-ish behavior, and I have no idea at all if sybase has anything like
MVCC.


> Why *wouldn't* the planner do this?

The question should be why the optimizer doesn't consider it, and the
executor uses it. The planner really doesn't decide that part :-)
Hopefully, the answer can be found above.

//Magnus

pgsql-performance by date:

Previous
From: James Mansion
Date:
Subject: Re: LIKE search and performance
Next
From: James Mansion
Date:
Subject: Re: LIKE search and performance