Re: Strange (?) Index behavior? - Mailing list pgsql-performance

From Matt Clark
Subject Re: Strange (?) Index behavior?
Date
Msg-id 418C1757.70909@ymogen.net
Whole thread Raw
In response to Re: Strange (?) Index behavior?  (Allen Landsidel <alandsidel@gmail.com>)
Responses Re: Strange (?) Index behavior?
List pgsql-performance
>>So there are really only 9000 rows out of 76 million starting with 'I'?  How
>>about combining some techniques - you could create an index on the first two
>>chars of the field (should be selective enough to give an index scan),
>>select from that, and select the actual data with the like clause.
>>
>>
>
>I was talking about the cost, not the number of rows.  About 74,000
>rows are returned but the query only takes about 8 seconds to run. --
>
>
Well, 74000/76000000 ~= 0.1%, way less than 1/26, so no surprise that an
indexscan is better, and also no surprise that the planner can't know
that I is such an uncommon initial char.

>with the partial index in place.
>
>
>
>>CREATE INDEX idx_firstletters ON table (substr(field, 1, 2));
>>CREATE INDEX idx_all ON table (field);
>>SELECT field FROM (SELECT field FROM table WHERE substr(field, 1, 2) = 'DE')
>>AS approx WHERE field LIKE 'DE%';
>>
>>
>
>That looks like a pretty slick way to create an index, I didn't know
>there was such a way to do it.. but It appears that this will not work
>with queries where the WHERE clause wants to find substrings longer
>than 2 characters.
>
>
I don't see why not, it just uses the functional index to grap the
1/(ascii_chars^2) of the rows that are of obvious interest, and then
uses the standard index to filter that set..  Where it won't work is
where you just want one initial char!  Which is why I suggested the
silly query rewrite...

>Going back to the initial problem -- having only one large, complete
>index on the table (no partial indexes) the query "SELECT field FROM
>table WHERE field LIKE 'A%';" does not use the index.  The query
>"SELECT field FROM table WHERE field LIKE 'AB%';" however, does use
>the single large index if it exists.
>
>
>
If you were planning the query, what would you do?  Assuming we're
talking about A-Z as possible first chars, and assuming we don't know
the distribution of those chars, then we have to assume 1/26 probability
of each char, so a seq scan makes sense.  Whereas like 'JK%' should only
pull 1/500 rows.

>Adding the partial index "CREATE INDEX idx_table_substrfield_A ON
>table (field) WHERE field LIKE 'A%';" causes all queries with
>substrings of any length to do index scans.provided I issue the query
>as:
>
>SELECT field FROM table WHERE field LIKE 'A%' AND field LIKE 'AB%';
> -- or even --
>SELECT field FROM table WHERE field LIKE 'A%';
>
>The latter query, without the partial index described, does a
>sequential scan on the table itself instead of an index scan.
>
>
Yes, because (I assume, Tom will no doubt clarify/correct), by creating
the partial indices you create a lot more information about the
distribution of the first char - either that, or the planner simply
always uses an exactly matching partial index if available.

I _think_ that creating 26 partial indexes on '?%' is essentially the
same thing as creating one functional index on substr(field,1,1), just
messier, unless the partial indexes cause the planner to do something
special...

M

pgsql-performance by date:

Previous
From: Mark Wong
Date:
Subject: ia64 results with dbt2 and 8.0beta4
Next
From: patrick ~
Date:
Subject: Re: vacuum analyze slows sql query