Thread: Slow query - index not used
I'm running Postgres v.7.3.4 on Solaris. Here's a bare-bone version of the query that runs unreasonably slow: SELECT count (*) from note_links_aux WHERE nt_note_name='KnowledgeBase'; Query Plan shows that db runs through the entire table:Aggregate (cost=1983.58..1983.58 rows=1 width=0) -> Seq Scan onnote_links_aux (cost=0.00..1970.18 rows=5360 width=0) Filter: (nt_note_name = 'KnowledgeBase'::character varying) even though field nt_note_name is indexed: Indexes: nla_nl_id_fk_i btree (nl_id), nla_nt_note_name_fk_i btree (nt_note_name), ... ... Running the same query against another varchar field on this table I see that index is being used: -> Index Scan using ... This query runs much faster. Analyzing table did not help. Using "DEFAULT_STATISTICS_TARGET = 100" made db use index for field = 'abc', but field = 'xyz' still results in the table scan. Also 'rows' number reported in the plan is way off. What can be done here? Thanks, Mike.
Michael Brusser <michael@synchronicity.com> writes: > Using "DEFAULT_STATISTICS_TARGET = 100" made db use index for > field = 'abc', but field = 'xyz' still results in the table scan. What's the actual distribution of values in these columns? Are you searching for values that are particularly common or uncommon? > Also 'rows' number reported in the plan is way off. That would seem to be the crux of the problem, but you'd better be more specific than "way off". regards, tom lane
> What's the actual distribution of values in these columns? > Are you searching for values that are particularly common > or uncommon? This column always has a predefined set of values. Usually the app. would search for one of the existing values. --------------------------- Total records: 74654 --------------------------- nt_note_name | count --------------------+------Collection | 10068Component | 1200CustTicket | 15009Deliver | 1201Download | 1999GroupProv | 464IP_News | 5950IP_Publish_Request | 4000IP_Usage | 2000KnowledgeBase | 15002LevelInfo | 10OtherParam | 4000Request | 4501TestMethod | 4050VerTech | 4000Version | 1200 --------------------------- I started from scratch: took out param DEFAULT_STATISTICS_TARGET from config file, restarted db, ran vacuum analyze, then got this statistics: ------------------------------------------------------------------------ # explain select count (1) from note_links_aux where nt_note_name = 'KnowledgeBase'; Aggregate (cost=1982.68..1982.68 rows=1 width=0) -> Seq Scan on note_links_aux (cost=0.00..1970.18 rows=5002 width=0) Filter: (nt_note_name = 'KnowledgeBase'::character varying) ------------------------------------------------------------------------ # explain select count (1) from note_links_aux where nt_note_name = 'OtherParam'; Aggregate (cost=1984.78..1984.78 rows=1 width=0) -> Seq Scan on note_links_aux (cost=0.00..1970.18 rows=5840 width=0) Filter: (nt_note_name = 'OtherParam'::character varying) ------------------------------------------------------------------------ # explain select count (1) from note_links_aux where nt_note_name = 'LevelInfo'; Aggregate (cost=58.91..58.91 rows=1 width=0) -> Index Scan using nla_nt_note_name_fk_i on note_links_aux (cost=0.00..58.87 rows=15 width=0) Index Cond: (nt_note_name = 'LevelInfo'::character varying) ------------------------------------------------------------------------ # explain select count (1) from note_links_aux where nt_note_name = 'NoSuchThing'; Aggregate (cost=5.83..5.83 rows=1 width=0) -> Index Scan using nla_nt_note_name_fk_i on note_links_aux (cost=0.00..5.83rows=1 width=0) Index Cond: (nt_note_name = 'NoSuchThing'::character varying) ------------------------------------------------------------------------ So 'rows' values are incorrect. Also looking at queries with 'KnowledgeBase' and 'OtherParam' - does seq. scan make sense? I mean 'rows' has value of about 5000 records from the total of 75000 records on the table. This ratio does not seem high enough to assume that index scan won't be benefitial. And even if we consider the real number of records - 5000, this is still only 20% of the total. Would an index scan be still faster? Sorry if I put here more info than you need. Thanks, Mike.
On Wed, 21 Jan 2004, Michael Brusser wrote: > So 'rows' values are incorrect. You can increase the statistics-gathering for that column with ALTER TABLE and probably get better estimates. > Also looking at queries with 'KnowledgeBase' > and 'OtherParam' - does seq. scan make sense? > > I mean 'rows' has value of about 5000 records from the total of 75000 > records on the table. It doesn't sound so strange to me. I don't know exactly what limits postgresql uses but it probably need to fetch every page in the table to find all those 5000 records. If it has to do that then the index scan would not help that much (it might even make it slower). It's easy to test what happens if it do the index scan instead of the seq. scan. Just do SET enable_seqscan TO false; before you try the query. Then you can compare the times with and without index scan. Remember, even if it finds a row in the index. it still has to fetch the actual row from the table also. So if it needs to fetch all pages from the table the total amount of IO is "all of the table" + "the relevant part of the index". The if it's faster or not depends on such things as if it's already cached in memory. Setting the effective_cache_size correctly lets postgresql take into account how much file cache you have which can effect the plan. -- /Dennis Björklund
Michael Brusser <michael@synchronicity.com> writes: > So 'rows' values are incorrect. You sound like you are expecting them to be exact. They're just estimates. They're all plenty close enough for planning purposes, except maybe the one for 'KnowledgeBase' is a little further off than I would have expected. That could just be a chance sampling thing --- if you rerun ANALYZE and then check again, how much does that estimate move around in a few tries? > I mean 'rows' has value of about 5000 records from the total of 75000 > records on the table. This ratio does not seem high enough to assume > that index scan won't be benefitial. You're mistaken. You didn't give any indication of how wide the rows are in this table, but supposing for purposes of argument that they're roughly 100 bytes apiece, there would be room for 80 rows on each 8K page of the table. A scan that needs to visit 1/15th of the table is statistically certain to read nearly every page of the table, many of them multiple times. (This assumes the needed rows are randomly scattered, which wouldn't be true if the table is "clustered" on the index, but you didn't say anything about having done that.) So an indexscan doesn't save any I/O, and may cost I/O if some of the pages fall out of buffer cache between visits. Add to that the penalty for random instead of sequential reads, and the cost to read the index itself, and you come out way behind a seqscan. This all assumes that I/O is far more expensive than CPU cycles. If you have a database that's small enough to fit in RAM then the cost of checking the WHERE clauses at each tuple could become the dominant factor. If that's your situation you should look at altering the planner's cost parameters --- in particular lower random_page_cost and increase the various CPU cost settings. With the default cost settings, the planner will not pick an indexscan if it thinks that more than a couple percent of the table needs to be visited. (I think the breakeven point is dependent on table size, but don't have the details in my head right now.) You can experiment with EXPLAIN ANALYZE and the various enable_xxx settings to see if it's right or not in your environment. See the pgsql-performance archives for more discussion. regards, tom lane