Tom Lane wrote:
>
> mlw <markw@mohawksoft.com> writes:
> > btw anyone trying this query should use: "attdispersion"
>
> Sorry about that --- I just copied-and-pasted the query from some notes
> that are obsolete as of 7.1...
>
> > cdinfo=# explain select * from ztitles where artistid = 100000220 ;
> > NOTICE: QUERY PLAN:
>
> > Seq Scan on ztitles (cost=0.00..4740.75 rows=3163 width=296)
>
> > And this is with "-o -fs"
>
> > Index Scan using ztitles_artistid_ndx on ztitles (cost=0.00..5915.01
> > rows=3163 width=296)
>
> > attname | attdispersion | starelid | staattnum | staop | stanullfrac | stacommonfrac | stacommonval |
staloval | stahival
> > artistid | 0.0477198 | 19274 | 2 | 97 | 0 | 0.149362 | 100050450 |
100000000 | 100055325
>
> The reason why the thing is going for a sequential scan is that
> astonishingly high stacommonfrac statistic. Does artistid 100050450
> really account for 14.9% of all the rows in your table? (Who is that
> anyway? ;-)) If so, a search for artistid 100050450 definitely *should*
> use a sequential scan.
I tested this statement against the database and you are right, about 14
seconds with the index, 4 without.
BTW ID # 100050450 is "Various Artists"
This is sort of a point I was trying to make in previous emails. I think
this situation, and this sort of ratio is far more likely than the
attention it has been given.
In about every project I have used postgres I have run into this. It is
only recently that I have understood what the problem was and how to get
around it (sort of).
This one entry is destroying any intelligent performance we could hope
to attain. As I said, I always see this sort of behavior in some
implementation.
> The problem at hand is estimating the frequency
> of entries for some other artistid, given that we only have this much
> statistical info available. Obviously the stats are insufficient, and
> I hope to do something about that in a release or two, but it ain't
> gonna happen for 7.1. In the meantime, if you've got huge outliers
> like that, you could try reducing the value of NOT_MOST_COMMON_RATIO
> in src/backend/utils/adt/selfuncs.c.
I did some playing with this value, and I can seem to have it
differentiate between 100050450 and anything else.
--
http://www.mohawksoft.com