Re: LIKE search and performance - Mailing list pgsql-performance

From Mark Lewis
Subject Re: LIKE search and performance
Date
Msg-id 1180040560.31471.262.camel@archimedes
Whole thread Raw
In response to Re: LIKE search and performance  (James Mansion <james@mansionfamily.plus.com>)
Responses Re: LIKE search and performance  (Craig James <craig_james@emolecules.com>)
Re: LIKE search and performance  (mark@mark.mielke.cc)
Re: LIKE search and performance  (PFC <lists@peufeu.com>)
List pgsql-performance
On Thu, 2007-05-24 at 21:54 +0100, James Mansion wrote:
> > If Sybase is still like SQL Server (or the other way around), it *may*
> > end up scanning the index *IFF* the index is a clustered index. If it's
> > a normal index, it will do a sequential scan on the table.
> >
> >
> Are you sure its not covered?  Have to check at work - but I'm off next
> week so it'll have to wait.
>
> > It's not a win on PostgreSQL, because of our MVCC implementation. We
> > need to scan *both* index *and* data pages if we go down that route, in
> > which case it's a lot faster to just scan the data pages alone.
> >
> >
> Why do you need to go to all the data pages - doesn't the index
> structure contain all the keys so
> you prefilter and then check to see if the *matched* items are still in
> view?  I'll be first to admit I
> know zip about Postgres, but it seems odd - doesn't the index contain
> copies of the key values?.
>
> I suspect that I mis-spoke with 'leaf'.  I really just mean 'all index
> pages with data', since the scan
> does not even need to be in index order, just a good way to get at the
> data in a compact way.

PG could scan the index looking for matches first and only load the
actual rows if it found a match, but that could only be a possible win
if there were very few matches, because the difference in cost between a
full index scan and a sequential scan would need to be greater than the
cost of randomly fetching all of the matching data rows from the table
to look up the visibility information.

So yes it would be possible, but the odds of it being faster than a
sequential scan are small enough to make it not very useful.

And since it's basically impossible to know the selectivity of this kind
of where condition, I doubt the planner would ever realistically want to
choose that plan anyway because of its poor worst-case behavior.

-- Mark

pgsql-performance by date:

Previous
From: James Mansion
Date:
Subject: Re: LIKE search and performance
Next
From: Craig James
Date:
Subject: Re: LIKE search and performance