Re: Strange (?) Index behavior? - Mailing list pgsql-performance

From Allen Landsidel
Subject Re: Strange (?) Index behavior?
Date
Msg-id 88f1825a041105095631064cb0@mail.gmail.com
Whole thread Raw
In response to Re: Strange (?) Index behavior?  (Bruno Wolff III <bruno@wolff.to>)
Responses Re: Strange (?) Index behavior?
Re: Strange (?) Index behavior?
List pgsql-performance
On Fri, 5 Nov 2004 11:51:59 -0600, Bruno Wolff III <bruno@wolff.to> wrote:
> On Fri, Nov 05, 2004 at 09:39:16 -0500,
>  Allen Landsidel <alandsidel@gmail.com> wrote:
> >
> > For some reason it's a requirement that partial wildcard searches are
> > done on this field, such as "SELECT ... WHERE field LIKE 'A%'"
> >
> > I thought an interesting way to do this would be to simply create
> > partial indexes for each letter on that field, and it works when the
> > query matches the WHERE clause in the index exactly like above.  The
> > problem is thus:
>
> That may not help much except for prefixes that have a below average
> number of occurences. If you are going to be select 1/26 of the records,
> you are probably going to do about as well with a sequential scan as an
> index scan.

The thing isn't that I want 1/26th of the records since the
distribution is not exactly equal among different letters, but more
importantly, there are about 76million rows currently, and for some
reason I am being told by the people with the pointy hair that a query
like "select foo,bar from table where foo like 'abc%';" is not an
uncommon type of query to run.  I don't know why it's common and to be
honest, I'm afraid to ask. ;)

With that many rows, and a normal index on the field, postgres figures
the best option for say "I%" is not an index scan, but a sequential
scan on the table, with a filter -- quite obviously this is slow as
heck, and yes, I've run analyze several times and in fact have the
vacuum analyze automated.

With the partial index the index scan is used and the cost drops from
0..2million to 0..9000 -- a vast improvement.

So I'm going to go with the partial indexes, and have a total of 36 of
them -- A-Z and 0-9.

> Just having a normal index on the column will work if the database locale
> is C. In 7.4 you can create an index usable by LIKE even in the database
> locale isn't C, but I don't remember the exact syntax. You will be better
> off having just one index rather than 26 partial indexes.

I haven't written a line of C in years, and it was never my strong
suit, so despite all my years doing development and sysadminning, the
locale stuff is still something of a mystery to me.

The locale though is C, the default, and will for the time being at
least be storing only ascii strings -- no unicode, other character
sets, or anything funky like that.

-Allen

pgsql-performance by date:

Previous
From: Michael Fuhr
Date:
Subject: Re: Checking = with timestamp field is slow
Next
From: Matt Nuzum
Date:
Subject: What is the difference between these?