Re: Strange (?) Index behavior? - Mailing list pgsql-performance

From Allen Landsidel
Subject Re: Strange (?) Index behavior?
Date
Msg-id 88f1825a04110513028759dd3@mail.gmail.com
Whole thread Raw
In response to Re: Strange (?) Index behavior?  ("Matt Clark" <matt@ymogen.net>)
Responses Re: Strange (?) Index behavior?  (Matt Clark <matt@ymogen.net>)
List pgsql-performance
On Fri, 5 Nov 2004 18:34:23 -0000, Matt Clark <matt@ymogen.net> wrote:
> > With that many rows, and a normal index on the field,
> > postgres figures the best option for say "I%" is not an index
> > scan, but a sequential scan on the table, with a filter --
> > quite obviously this is slow as heck, and yes, I've run
> > analyze several times and in fact have the vacuum analyze automated.
>
> Ah, so "like 'I%'" uses a very slow seq scan, but "like 'ABC%'" uses an
> ordinary index OK?  If so then...

That is correct.

> The planner would usually assume (from what Tom usually says) that 1/26
> selectivity isn't worth doing an index scan for, but in your case it's wrong
> (maybe because the rows are very big?)

The rows aren't big, it's a text field, a few ints, and a few
timestamps.  That's all.  The text field is the one we're querying on
here and lengthwise it's typically not over 32 chars.

> You may be able to get the planner to go for an index scan on "like 'I%'" by
> tweaking the foo_cost variables in postgresql.conf

That's true but I'd rather not, there are times when the seqscan will
have a faster net result (for other queries) and I'd rather not have
them suffer.

> Or you could have the app rewrite "like 'I%'" to "like 'IA%' or like 'IB%'
> ... ", or do that as a stored proc.

Holy cow.  Yeah that seems a little outrageous.  It would be cleaner
looking in "\d table" than having all these indexes at the cost of
having one very ugly query.

> > With the partial index the index scan is used and the cost
> > drops from 0..2million to 0..9000 -- a vast improvement.
>
> So there are really only 9000 rows out of 76 million starting with 'I'?  How
> about combining some techniques - you could create an index on the first two
> chars of the field (should be selective enough to give an index scan),
> select from that, and select the actual data with the like clause.

I was talking about the cost, not the number of rows.  About 74,000
rows are returned but the query only takes about 8 seconds to run. --
with the partial index in place.

> CREATE INDEX idx_firstletters ON table (substr(field, 1, 2));
> CREATE INDEX idx_all ON table (field);
> SELECT field FROM (SELECT field FROM table WHERE substr(field, 1, 2) = 'DE')
> AS approx WHERE field LIKE 'DE%';

That looks like a pretty slick way to create an index, I didn't know
there was such a way to do it.. but It appears that this will not work
with queries where the WHERE clause wants to find substrings longer
than 2 characters.

I will give it a try and see how it goes though I think I'm fairly
"settled" on creating all the other indexes, unless there is some
specific reason I shouldn't -- they are used in all cases where the
substring is >= 1 character, so long as I make sure the first where
clause (or inner select in a subquery) is the most ambiguous from an
index standpoint.

Going back to the initial problem -- having only one large, complete
index on the table (no partial indexes) the query "SELECT field FROM
table WHERE field LIKE 'A%';" does not use the index.  The query
"SELECT field FROM table WHERE field LIKE 'AB%';" however, does use
the single large index if it exists.

Adding the partial index "CREATE INDEX idx_table_substrfield_A ON
table (field) WHERE field LIKE 'A%';" causes all queries with
substrings of any length to do index scans.provided I issue the query
as:

SELECT field FROM table WHERE field LIKE 'A%' AND field LIKE 'AB%';
 -- or even --
SELECT field FROM table WHERE field LIKE 'A%';

The latter query, without the partial index described, does a
sequential scan on the table itself instead of an index scan.

-Allen

pgsql-performance by date:

Previous
From: Gabriele Bartolini
Date:
Subject: Question regarding the file system
Next
From: Tom Lane
Date:
Subject: Re: Strange (?) Index behavior?