Re: SourceForge & Postgres - Mailing list pgsql-hackers

From Tom Lane
Subject Re: SourceForge & Postgres
Date
Msg-id 3490.976656435@sss.pgh.pa.us
Whole thread Raw
In response to Re: SourceForge & Postgres  (mlw <markw@mohawksoft.com>)
List pgsql-hackers
mlw <markw@mohawksoft.com> writes:
> btw anyone trying this query should use: "attdispersion"

Sorry about that --- I just copied-and-pasted the query from some notes
that are obsolete as of 7.1...

> cdinfo=# explain select * from ztitles where artistid = 100000220 ;
> NOTICE:  QUERY PLAN:
> Seq Scan on ztitles  (cost=0.00..4740.75 rows=3163 width=296)
> And this is with "-o -fs"

> Index Scan using ztitles_artistid_ndx on ztitles  (cost=0.00..5915.01
> rows=3163 width=296)
>   attname   | attdispersion | starelid | staattnum | staop | stanullfrac | stacommonfrac |      stacommonval      |
     staloval         |     stahival      
 
>  artistid   |     0.0477198 |    19274 |         2 |    97 |           0 |      0.149362 | 100050450              |
100000000               | 100055325
 

The reason why the thing is going for a sequential scan is that
astonishingly high stacommonfrac statistic.  Does artistid 100050450
really account for 14.9% of all the rows in your table?  (Who is that
anyway? ;-))  If so, a search for artistid 100050450 definitely *should*
use a sequential scan.  The problem at hand is estimating the frequency
of entries for some other artistid, given that we only have this much
statistical info available.  Obviously the stats are insufficient, and
I hope to do something about that in a release or two, but it ain't
gonna happen for 7.1.  In the meantime, if you've got huge outliers
like that, you could try reducing the value of NOT_MOST_COMMON_RATIO
in src/backend/utils/adt/selfuncs.c.
        regards, tom lane


pgsql-hackers by date:

Previous
From: ncm@zembu.com (Nathan Myers)
Date:
Subject: Re: RFC: CRC datatype
Next
From: ncm@zembu.com (Nathan Myers)
Date:
Subject: Re: RFC C++ Interface