Home > mailing lists

Re: Strange (?) Index behavior? - Mailing list pgsql-performance

From	Matt Clark
Subject	Re: Strange (?) Index behavior?
Date	November 6, 2004 00:14:48
Msg-id	418C1757.70909@ymogen.net Whole thread Raw
In response to	Re: Strange (?) Index behavior? (Allen Landsidel <alandsidel@gmail.com>)
Responses	Re: Strange (?) Index behavior?
List	pgsql-performance

Tree view

>>So there are really only 9000 rows out of 76 million starting with 'I'?  How
>>about combining some techniques - you could create an index on the first two
>>chars of the field (should be selective enough to give an index scan),
>>select from that, and select the actual data with the like clause.
>>
>>
>
>I was talking about the cost, not the number of rows.  About 74,000
>rows are returned but the query only takes about 8 seconds to run. --
>
>
Well, 74000/76000000 ~= 0.1%, way less than 1/26, so no surprise that an
indexscan is better, and also no surprise that the planner can't know
that I is such an uncommon initial char.

>with the partial index in place.
>
>
>
>>CREATE INDEX idx_firstletters ON table (substr(field, 1, 2));
>>CREATE INDEX idx_all ON table (field);
>>SELECT field FROM (SELECT field FROM table WHERE substr(field, 1, 2) = 'DE')
>>AS approx WHERE field LIKE 'DE%';
>>
>>
>
>That looks like a pretty slick way to create an index, I didn't know
>there was such a way to do it.. but It appears that this will not work
>with queries where the WHERE clause wants to find substrings longer
>than 2 characters.
>
>
I don't see why not, it just uses the functional index to grap the
1/(ascii_chars^2) of the rows that are of obvious interest, and then
uses the standard index to filter that set..  Where it won't work is
where you just want one initial char!  Which is why I suggested the
silly query rewrite...

>Going back to the initial problem -- having only one large, complete
>index on the table (no partial indexes) the query "SELECT field FROM
>table WHERE field LIKE 'A%';" does not use the index.  The query
>"SELECT field FROM table WHERE field LIKE 'AB%';" however, does use
>the single large index if it exists.
>
>
>
If you were planning the query, what would you do?  Assuming we're
talking about A-Z as possible first chars, and assuming we don't know
the distribution of those chars, then we have to assume 1/26 probability
of each char, so a seq scan makes sense.  Whereas like 'JK%' should only
pull 1/500 rows.

>Adding the partial index "CREATE INDEX idx_table_substrfield_A ON
>table (field) WHERE field LIKE 'A%';" causes all queries with
>substrings of any length to do index scans.provided I issue the query
>as:
>
>SELECT field FROM table WHERE field LIKE 'A%' AND field LIKE 'AB%';
> -- or even --
>SELECT field FROM table WHERE field LIKE 'A%';
>
>The latter query, without the partial index described, does a
>sequential scan on the table itself instead of an index scan.
>
>
Yes, because (I assume, Tom will no doubt clarify/correct), by creating
the partial indices you create a lot more information about the
distribution of the first char - either that, or the planner simply
always uses an exactly matching partial index if available.

I _think_ that creating 26 partial indexes on '?%' is essentially the
same thing as creating one functional index on substr(field,1,1), just
messier, unless the partial indexes cause the planner to do something
special...

M

pgsql-performance by date:

From: Mark Wong
Date: 05 November 2004, 23:13:59
Subject: ia64 results with dbt2 and 8.0beta4

From: patrick ~
Date: 06 November 2004, 00:27:21
Subject: Re: vacuum analyze slows sql query

Re: Strange (?) Index behavior? - Mailing list pgsql-performance

Previous

Next