Thread: Slow query - index not used

Slow query - index not used

From
Michael Brusser
Date:
I'm running Postgres v.7.3.4 on Solaris.

Here's a bare-bone version of the query that runs unreasonably slow:
SELECT count (*) from note_links_aux WHERE nt_note_name='KnowledgeBase';

Query Plan shows that db runs through the entire table:Aggregate  (cost=1983.58..1983.58 rows=1 width=0)  ->  Seq Scan
onnote_links_aux  (cost=0.00..1970.18 rows=5360 width=0)        Filter: (nt_note_name = 'KnowledgeBase'::character
varying)

even though field nt_note_name is indexed:
Indexes: nla_nl_id_fk_i btree (nl_id),        nla_nt_note_name_fk_i btree (nt_note_name), ... ...

Running the same query against another varchar field on this table
I see that index is being used:   ->  Index Scan using ...
This query runs much faster.

Analyzing table did not help.
Using "DEFAULT_STATISTICS_TARGET = 100" made db use index for 
field = 'abc', but field = 'xyz' still results in the table scan.

Also 'rows' number reported in the plan is way off.
What can be done here?

Thanks,
Mike.



Re: Slow query - index not used

From
Tom Lane
Date:
Michael Brusser <michael@synchronicity.com> writes:
> Using "DEFAULT_STATISTICS_TARGET = 100" made db use index for 
> field = 'abc', but field = 'xyz' still results in the table scan.

What's the actual distribution of values in these columns?  Are you
searching for values that are particularly common or uncommon?

> Also 'rows' number reported in the plan is way off.

That would seem to be the crux of the problem, but you'd better be more
specific than "way off".
        regards, tom lane


Re: Slow query - index not used

From
Michael Brusser
Date:
> What's the actual distribution of values in these columns?
> Are you searching for values that are particularly common
> or uncommon?

This column always has a predefined set of values.
Usually the app. would search for one of the
existing values.

---------------------------
Total records:      74654
---------------------------   nt_note_name    | count
--------------------+------Collection         | 10068Component          |  1200CustTicket         | 15009Deliver
   |  1201Download           |  1999GroupProv          |   464IP_News            |  5950IP_Publish_Request |
4000IP_Usage          |  2000KnowledgeBase      | 15002LevelInfo          |    10OtherParam         |  4000Request
     |  4501TestMethod         |  4050VerTech            |  4000Version            |  1200
 
---------------------------

I started from scratch: took out param DEFAULT_STATISTICS_TARGET from
config file, restarted db, ran vacuum analyze, then got this statistics:

------------------------------------------------------------------------
# explain select count (1) from note_links_aux        where nt_note_name = 'KnowledgeBase';
Aggregate  (cost=1982.68..1982.68 rows=1 width=0)  ->  Seq Scan on note_links_aux  (cost=0.00..1970.18 rows=5002
width=0)       Filter: (nt_note_name = 'KnowledgeBase'::character varying)
 

------------------------------------------------------------------------
# explain select count (1) from note_links_aux        where nt_note_name = 'OtherParam';
Aggregate  (cost=1984.78..1984.78 rows=1 width=0)  ->  Seq Scan on note_links_aux  (cost=0.00..1970.18 rows=5840
width=0)       Filter: (nt_note_name = 'OtherParam'::character varying)
 

------------------------------------------------------------------------
# explain select count (1) from note_links_aux        where nt_note_name = 'LevelInfo';
Aggregate  (cost=58.91..58.91 rows=1 width=0)  ->  Index Scan using nla_nt_note_name_fk_i on note_links_aux
   (cost=0.00..58.87 rows=15 width=0)        Index Cond: (nt_note_name = 'LevelInfo'::character varying)
 

------------------------------------------------------------------------
# explain select count (1) from note_links_aux        where nt_note_name = 'NoSuchThing';

Aggregate  (cost=5.83..5.83 rows=1 width=0)  ->  Index Scan using nla_nt_note_name_fk_i on note_links_aux
(cost=0.00..5.83rows=1 width=0)        Index Cond: (nt_note_name = 'NoSuchThing'::character varying)
 

------------------------------------------------------------------------

So 'rows' values are incorrect. Also looking at queries with 'KnowledgeBase'
and 'OtherParam' - does seq. scan make sense?

I mean 'rows' has value of about 5000 records from the total of 75000
records on the table. This ratio does not seem high enough to assume
that index scan won't be benefitial.
And even if we consider the real number of records - 5000, this is still
only 20% of the total. Would an index scan be still faster?

Sorry if I put here more info than you need.
Thanks,
Mike.




Re: Slow query - index not used

From
Dennis Bjorklund
Date:
On Wed, 21 Jan 2004, Michael Brusser wrote:

> So 'rows' values are incorrect.

You can increase the statistics-gathering for that column with ALTER TABLE 
and probably get better estimates.

> Also looking at queries with 'KnowledgeBase'
> and 'OtherParam' - does seq. scan make sense?
> 
> I mean 'rows' has value of about 5000 records from the total of 75000
> records on the table.

It doesn't sound so strange to me. I don't know exactly what limits 
postgresql uses but it probably need to fetch every page in the table to 
find all those 5000 records. If it has to do that then the index scan 
would not help that much (it might even make it slower).

It's easy to test what happens if it do the index scan instead of the seq. 
scan. Just do SET enable_seqscan TO false; before you try the query. Then 
you can compare the times with and without index scan.

Remember, even if it finds a row in the index. it still has to fetch the 
actual row from the table also. So if it needs to fetch all pages from the 
table the total amount of IO is "all of the table" + "the relevant part of 
the index". The if it's faster or not depends on such things as if it's 
already cached in memory. Setting the effective_cache_size correctly lets 
postgresql take into account how much file cache you have which can effect 
the plan.

-- 
/Dennis Björklund



Re: Slow query - index not used

From
Tom Lane
Date:
Michael Brusser <michael@synchronicity.com> writes:
> So 'rows' values are incorrect.

You sound like you are expecting them to be exact.  They're just
estimates.  They're all plenty close enough for planning purposes,
except maybe the one for 'KnowledgeBase' is a little further off
than I would have expected.  That could just be a chance sampling
thing --- if you rerun ANALYZE and then check again, how much does that
estimate move around in a few tries?

> I mean 'rows' has value of about 5000 records from the total of 75000
> records on the table. This ratio does not seem high enough to assume
> that index scan won't be benefitial.

You're mistaken.  You didn't give any indication of how wide the rows
are in this table, but supposing for purposes of argument that they're
roughly 100 bytes apiece, there would be room for 80 rows on each 8K
page of the table.  A scan that needs to visit 1/15th of the table is
statistically certain to read nearly every page of the table, many of
them multiple times.  (This assumes the needed rows are randomly
scattered, which wouldn't be true if the table is "clustered" on the
index, but you didn't say anything about having done that.)  So an
indexscan doesn't save any I/O, and may cost I/O if some of the pages
fall out of buffer cache between visits.  Add to that the penalty for
random instead of sequential reads, and the cost to read the index
itself, and you come out way behind a seqscan.

This all assumes that I/O is far more expensive than CPU cycles.
If you have a database that's small enough to fit in RAM then the cost
of checking the WHERE clauses at each tuple could become the dominant
factor.  If that's your situation you should look at altering the
planner's cost parameters --- in particular lower random_page_cost and
increase the various CPU cost settings.

With the default cost settings, the planner will not pick an indexscan
if it thinks that more than a couple percent of the table needs to be
visited.  (I think the breakeven point is dependent on table size, but
don't have the details in my head right now.)  You can experiment with
EXPLAIN ANALYZE and the various enable_xxx settings to see if it's right
or not in your environment.

See the pgsql-performance archives for more discussion.
        regards, tom lane