> > 3. if at all, an automatic analyze should do the samples on small tables,
> > and accurate stats on large tables
>
> Other way 'round, surely? It already does that: if your table has fewer
> rows than the sampling target, they all get used.
I mean, that it is probably not useful to maintain distribution statistics
for a table that is that small at all (e.g. <= 3000 rows and less than 512 k size).
So let me reword: do the samples for medium sized tables.
> > When on the other hand the optimizer does a "mistake" on a huge table
> > the difference is easily a matter of hours, thus you want accurate stats.
>
> Not if it takes hours to get the stats. I'm more interested in keeping
> ANALYZE cheap and encouraging DBAs to run it frequently, so that the
> stats stay up-to-date. It doesn't matter how perfect the stats were
> when they were made, if the table has changed since then.
That is true, but this is certainly a tradeoff situation. For a huge table
that is quite static you would certainly want most accurate statistics even
if it takes hours to compute once a month.
My comments are based on praxis and not theory :-) Of course current
state of the art optimizer implementations might lag well behind state of
the art theory from ACM SIGMOD :-)
Andreas