Re: SourceForge & Postgres - Mailing list pgsql-hackers

From mlw
Subject Re: SourceForge & Postgres
Date
Msg-id 3A36C0B3.ADAF69FE@mohawksoft.com
Whole thread Raw
In response to Re: SourceForge & Postgres  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> 
> mlw <markw@mohawksoft.com> writes:
> > btw anyone trying this query should use: "attdispersion"
> 
> Sorry about that --- I just copied-and-pasted the query from some notes
> that are obsolete as of 7.1...
> 
> > cdinfo=# explain select * from ztitles where artistid = 100000220 ;
> > NOTICE:  QUERY PLAN:
> 
> > Seq Scan on ztitles  (cost=0.00..4740.75 rows=3163 width=296)
> 
> > And this is with "-o -fs"
> 
> > Index Scan using ztitles_artistid_ndx on ztitles  (cost=0.00..5915.01
> > rows=3163 width=296)
> 
> >   attname   | attdispersion | starelid | staattnum | staop | stanullfrac | stacommonfrac |      stacommonval      |
       staloval         |     stahival
 
> >  artistid   |     0.0477198 |    19274 |         2 |    97 |           0 |      0.149362 | 100050450              |
100000000               | 100055325
 
> 
> The reason why the thing is going for a sequential scan is that
> astonishingly high stacommonfrac statistic.  Does artistid 100050450
> really account for 14.9% of all the rows in your table?  (Who is that
> anyway? ;-))  If so, a search for artistid 100050450 definitely *should*
> use a sequential scan.

I tested this statement against the database and you are right, about 14
seconds with the index, 4 without.

BTW ID # 100050450 is "Various Artists"

This is sort of a point I was trying to make in previous emails. I think
this situation, and this sort of ratio is far more likely than the
attention it has been given.

In about every project I have used postgres I have run into this. It is
only recently that I have understood what the problem was and how to get
around it (sort of).

This one entry is destroying any intelligent performance we could hope
to attain. As I said, I always see this sort of behavior in some
implementation.


>  The problem at hand is estimating the frequency
> of entries for some other artistid, given that we only have this much
> statistical info available.  Obviously the stats are insufficient, and
> I hope to do something about that in a release or two, but it ain't
> gonna happen for 7.1.  In the meantime, if you've got huge outliers
> like that, you could try reducing the value of NOT_MOST_COMMON_RATIO
> in src/backend/utils/adt/selfuncs.c.

I did some playing with this value, and I can seem to have it
differentiate between 100050450 and anything else.


-- 
http://www.mohawksoft.com


pgsql-hackers by date:

Previous
From: ncm@zembu.com (Nathan Myers)
Date:
Subject: Re: (one more time) Patches with vacuum fixes available .
Next
From: ncm@zembu.com (Nathan Myers)
Date:
Subject: Re: RFC C++ Interface