Thread: AW: AW: Call for alpha testing: planner statistics revi sion s

AW: AW: Call for alpha testing: planner statistics revi sion s

From
Zeugswetter Andreas SB
Date:
> > 3. if at all, an automatic analyze should do the samples on small tables,
> > and accurate stats on large tables
> 
> Other way 'round, surely?  It already does that: if your table has fewer
> rows than the sampling target, they all get used.

I mean, that it is probably not useful to maintain distribution statistics 
for a table that is that small at all (e.g. <= 3000 rows and less than 512 k size). 
So let me reword: do the samples for medium sized tables.

> > When on the other hand the optimizer does a "mistake" on a huge table
> > the difference is easily a matter of hours, thus you want accurate stats.
> 
> Not if it takes hours to get the stats.  I'm more interested in keeping
> ANALYZE cheap and encouraging DBAs to run it frequently, so that the
> stats stay up-to-date.  It doesn't matter how perfect the stats were
> when they were made, if the table has changed since then.

That is true, but this is certainly a tradeoff situation. For a huge table
that is quite static you would certainly want most accurate statistics even
if it takes hours to compute once a month.

My comments are based on praxis and not theory :-) Of course current 
state of the art optimizer implementations might lag well behind state of
the art theory from ACM SIGMOD :-)

Andreas


Re: AW: AW: Call for alpha testing: planner statistics revi sion s

From
Tom Lane
Date:
Zeugswetter Andreas SB  <ZeugswetterA@wien.spardat.at> writes:
> I mean, that it is probably not useful to maintain distribution statistics 
> for a table that is that small at all (e.g. <= 3000 rows and less than
> 512 k size). 

Actually, stats are quite interesting for smaller tables too.  Maybe not
so much for the table itself (ie, deciding between seq and index scan is
not so critical), but to estimate sizes of joins against other tables.

>> Not if it takes hours to get the stats.  I'm more interested in keeping
>> ANALYZE cheap and encouraging DBAs to run it frequently, so that the
>> stats stay up-to-date.  It doesn't matter how perfect the stats were
>> when they were made, if the table has changed since then.

> That is true, but this is certainly a tradeoff situation. For a huge table
> that is quite static you would certainly want most accurate statistics even
> if it takes hours to compute once a month.

Sure.  My thought is that one would do this by increasing the SET
STATISTICS targets for such tables, thus yielding more detailed stats
that take longer to compute.  What we need now is experimentation to
find out how well this works in practice.  It might well be that more
knobs will turn out to be useful, but let's not add complexity until
we've proven it to be necessary ...
        regards, tom lane