Re: Strange (?) Index behavior? - Mailing list pgsql-performance
From | Matt Clark |
---|---|
Subject | Re: Strange (?) Index behavior? |
Date | |
Msg-id | 418C1757.70909@ymogen.net Whole thread Raw |
In response to | Re: Strange (?) Index behavior? (Allen Landsidel <alandsidel@gmail.com>) |
Responses |
Re: Strange (?) Index behavior?
|
List | pgsql-performance |
>>So there are really only 9000 rows out of 76 million starting with 'I'? How >>about combining some techniques - you could create an index on the first two >>chars of the field (should be selective enough to give an index scan), >>select from that, and select the actual data with the like clause. >> >> > >I was talking about the cost, not the number of rows. About 74,000 >rows are returned but the query only takes about 8 seconds to run. -- > > Well, 74000/76000000 ~= 0.1%, way less than 1/26, so no surprise that an indexscan is better, and also no surprise that the planner can't know that I is such an uncommon initial char. >with the partial index in place. > > > >>CREATE INDEX idx_firstletters ON table (substr(field, 1, 2)); >>CREATE INDEX idx_all ON table (field); >>SELECT field FROM (SELECT field FROM table WHERE substr(field, 1, 2) = 'DE') >>AS approx WHERE field LIKE 'DE%'; >> >> > >That looks like a pretty slick way to create an index, I didn't know >there was such a way to do it.. but It appears that this will not work >with queries where the WHERE clause wants to find substrings longer >than 2 characters. > > I don't see why not, it just uses the functional index to grap the 1/(ascii_chars^2) of the rows that are of obvious interest, and then uses the standard index to filter that set.. Where it won't work is where you just want one initial char! Which is why I suggested the silly query rewrite... >Going back to the initial problem -- having only one large, complete >index on the table (no partial indexes) the query "SELECT field FROM >table WHERE field LIKE 'A%';" does not use the index. The query >"SELECT field FROM table WHERE field LIKE 'AB%';" however, does use >the single large index if it exists. > > > If you were planning the query, what would you do? Assuming we're talking about A-Z as possible first chars, and assuming we don't know the distribution of those chars, then we have to assume 1/26 probability of each char, so a seq scan makes sense. Whereas like 'JK%' should only pull 1/500 rows. >Adding the partial index "CREATE INDEX idx_table_substrfield_A ON >table (field) WHERE field LIKE 'A%';" causes all queries with >substrings of any length to do index scans.provided I issue the query >as: > >SELECT field FROM table WHERE field LIKE 'A%' AND field LIKE 'AB%'; > -- or even -- >SELECT field FROM table WHERE field LIKE 'A%'; > >The latter query, without the partial index described, does a >sequential scan on the table itself instead of an index scan. > > Yes, because (I assume, Tom will no doubt clarify/correct), by creating the partial indices you create a lot more information about the distribution of the first char - either that, or the planner simply always uses an exactly matching partial index if available. I _think_ that creating 26 partial indexes on '?%' is essentially the same thing as creating one functional index on substr(field,1,1), just messier, unless the partial indexes cause the planner to do something special... M
pgsql-performance by date: