Re: Optimizing Query - Mailing list pgsql-sql

From Tom Lane
Subject Re: Optimizing Query
Date
Msg-id 4315.983851855@sss.pgh.pa.us
Whole thread Raw
In response to Re: Optimizing Query  (Justin Long <justinlong@strategicnetwork.org>)
Responses Re: Optimizing Query  (Justin Long <justinlong@strategicnetwork.org>)
List pgsql-sql
Justin Long <justinlong@strategicnetwork.org> writes:
> Ok, now I have another question... it doesn't seem to be accessing the index.

> explain select k.kbid,k.title from knowledge k , kbwords w0 , kbwords w1 
> WHERE ((k.kbid=w0.kbid and w0.wordid=42743) AND (k.kbid=w1.kbid and 
> w1.wordid=85369))

> NOTICE:  QUERY PLAN:

> Merge Join  (cost=32339.30..35496.97 rows=19262538 width=24)
>    ->  Merge Join  (cost=16530.24..16668.77 rows=233274 width=20)
>          ->  Sort  (cost=15809.06..15809.06 rows=8257 width=4)
>                ->  Seq Scan on kbwords w1  (cost=0.00..15271.85 rows=8257 
> width=4)
>          ->  Sort  (cost=721.18..721.18 rows=2825 width=16)
>                ->  Seq Scan on knowledge k  (cost=0.00..559.25 rows=2825 
> width=16)
>    ->  Sort  (cost=15809.06..15809.06 rows=8257 width=4)
>          ->  Seq Scan on kbwords w0  (cost=0.00..15271.85 rows=8257 width=4)

> Note the sequential scans... there is a wordindex where w0.wordid=42743... 
> why isn't it doing an indexscan? wouldn't that be more efficient?

It probably thinks not, because the estimated number of hits (8257) is
so high.  That estimate is currently driven by the frequency of the most
common value in the column (mainly because that's the only stat we have
:-().  I am guessing that you have a few very common words, which are
skewing the stats for kbwords and causing it not to pick an indexscan.

Does your setup have a notion of "stop words" that shouldn't be indexed,
like "a", "an", "the", etc?  Perhaps you need to add such a feature, or
throw in a few more stopwords if you already have 'em.
        regards, tom lane


pgsql-sql by date:

Previous
From: Tom Lane
Date:
Subject: Re: [GENERAL] MySQLs Describe emulator!
Next
From: Josh Berkus
Date:
Subject: No Documentation for to_char(INTERVAL, mask)