On Fri, 24 Jan 2003, Stephan Szabo wrote:
>
> I think it's a clumping effect.
Yup, I think that's exactly the effect.
A proposal.... (yes I I'm volunteering if people point me in the right
direction)... would be to have a "plugable" set of analyze functions so that a
huge database that runs analyze infrequently could choose to have a very slow
analyze that might work better for it's data.
I see no reason different analyze functions would to be compiled into
the source code ... but could probably exists as PL/pgSQL languages.
The one thing compiling it in would help with is to let me know
the exact number of tuples on each individual page, but I guess
reltuples/relpages from pg_class is a good estimate.
> For example, I made a table (ordered) with 20 values of a, 50 values of b
> (each showing up in each a) and 100 values of c (not used, just means 100
> rows for each (a,b) combination. It's got 541 pages it looks like. Analyze
> sets the correlation to about 0.08 on the table and so a query like:
> select * from test1 where b=1; prefers a sequence scan (1791 vs 2231)
> while the index scan actually performs about 5 times better.
That sounds like the same situation I was in. If my logic is right, this
means you had about 184 tuples/page (200*50*100/541), so it looks to me
like for each "a", you get half-a-page where "b=1".
If you had 'c' have 200 values, I think you'd get even a bigger speedup
because half the page is still "wasted" with b=2 values.
If you had 'c' have 10000 values, I think you'd get even a slightly bigger
speedup because you'd have so many b=1 pages next to each other you'd
benefit from more sequential disk access.
> I guess the reason is that in general, the index scan *really* is reading
> something on the order of 40 pages rather than the much larger estimate
> (I'd guess something on the order of say 300-400? I'm not sure how to
> find that except by trying to reverse engineer the estimate number),
Or by adding a printf()... I think it'd be in cost_index in costsize.c.
> because pretty much each value of a will probably have 1 or 2 pages with
> b=1.
>
> I'm not really sure how to measure that, however.
As I said... I'm happy to volunteer and experiment if people point
me in a good direction.
Ron