Re: Strange (?) Index behavior? - Mailing list pgsql-performance
From | Allen Landsidel |
---|---|
Subject | Re: Strange (?) Index behavior? |
Date | |
Msg-id | 88f1825a04110513028759dd3@mail.gmail.com Whole thread Raw |
In response to | Re: Strange (?) Index behavior? ("Matt Clark" <matt@ymogen.net>) |
Responses |
Re: Strange (?) Index behavior?
|
List | pgsql-performance |
On Fri, 5 Nov 2004 18:34:23 -0000, Matt Clark <matt@ymogen.net> wrote: > > With that many rows, and a normal index on the field, > > postgres figures the best option for say "I%" is not an index > > scan, but a sequential scan on the table, with a filter -- > > quite obviously this is slow as heck, and yes, I've run > > analyze several times and in fact have the vacuum analyze automated. > > Ah, so "like 'I%'" uses a very slow seq scan, but "like 'ABC%'" uses an > ordinary index OK? If so then... That is correct. > The planner would usually assume (from what Tom usually says) that 1/26 > selectivity isn't worth doing an index scan for, but in your case it's wrong > (maybe because the rows are very big?) The rows aren't big, it's a text field, a few ints, and a few timestamps. That's all. The text field is the one we're querying on here and lengthwise it's typically not over 32 chars. > You may be able to get the planner to go for an index scan on "like 'I%'" by > tweaking the foo_cost variables in postgresql.conf That's true but I'd rather not, there are times when the seqscan will have a faster net result (for other queries) and I'd rather not have them suffer. > Or you could have the app rewrite "like 'I%'" to "like 'IA%' or like 'IB%' > ... ", or do that as a stored proc. Holy cow. Yeah that seems a little outrageous. It would be cleaner looking in "\d table" than having all these indexes at the cost of having one very ugly query. > > With the partial index the index scan is used and the cost > > drops from 0..2million to 0..9000 -- a vast improvement. > > So there are really only 9000 rows out of 76 million starting with 'I'? How > about combining some techniques - you could create an index on the first two > chars of the field (should be selective enough to give an index scan), > select from that, and select the actual data with the like clause. I was talking about the cost, not the number of rows. About 74,000 rows are returned but the query only takes about 8 seconds to run. -- with the partial index in place. > CREATE INDEX idx_firstletters ON table (substr(field, 1, 2)); > CREATE INDEX idx_all ON table (field); > SELECT field FROM (SELECT field FROM table WHERE substr(field, 1, 2) = 'DE') > AS approx WHERE field LIKE 'DE%'; That looks like a pretty slick way to create an index, I didn't know there was such a way to do it.. but It appears that this will not work with queries where the WHERE clause wants to find substrings longer than 2 characters. I will give it a try and see how it goes though I think I'm fairly "settled" on creating all the other indexes, unless there is some specific reason I shouldn't -- they are used in all cases where the substring is >= 1 character, so long as I make sure the first where clause (or inner select in a subquery) is the most ambiguous from an index standpoint. Going back to the initial problem -- having only one large, complete index on the table (no partial indexes) the query "SELECT field FROM table WHERE field LIKE 'A%';" does not use the index. The query "SELECT field FROM table WHERE field LIKE 'AB%';" however, does use the single large index if it exists. Adding the partial index "CREATE INDEX idx_table_substrfield_A ON table (field) WHERE field LIKE 'A%';" causes all queries with substrings of any length to do index scans.provided I issue the query as: SELECT field FROM table WHERE field LIKE 'A%' AND field LIKE 'AB%'; -- or even -- SELECT field FROM table WHERE field LIKE 'A%'; The latter query, without the partial index described, does a sequential scan on the table itself instead of an index scan. -Allen
pgsql-performance by date: