AW: AW: Call for alpha testing: planner statistics revi sion s - Mailing list pgsql-hackers

From Zeugswetter Andreas SB
Subject AW: AW: Call for alpha testing: planner statistics revi sion s
Date
Msg-id 11C1E6749A55D411A9670001FA687963368330@sdexcsrv1.f000.d0188.sd.spardat.at
Whole thread Raw
Responses Re: AW: AW: Call for alpha testing: planner statistics revi sion s  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
> > 3. if at all, an automatic analyze should do the samples on small tables,
> > and accurate stats on large tables
> 
> Other way 'round, surely?  It already does that: if your table has fewer
> rows than the sampling target, they all get used.

I mean, that it is probably not useful to maintain distribution statistics 
for a table that is that small at all (e.g. <= 3000 rows and less than 512 k size). 
So let me reword: do the samples for medium sized tables.

> > When on the other hand the optimizer does a "mistake" on a huge table
> > the difference is easily a matter of hours, thus you want accurate stats.
> 
> Not if it takes hours to get the stats.  I'm more interested in keeping
> ANALYZE cheap and encouraging DBAs to run it frequently, so that the
> stats stay up-to-date.  It doesn't matter how perfect the stats were
> when they were made, if the table has changed since then.

That is true, but this is certainly a tradeoff situation. For a huge table
that is quite static you would certainly want most accurate statistics even
if it takes hours to compute once a month.

My comments are based on praxis and not theory :-) Of course current 
state of the art optimizer implementations might lag well behind state of
the art theory from ACM SIGMOD :-)

Andreas


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: AW: AW: Call for alpha testing: planner statistics revi sion s
Next
From: Zeugswetter Andreas SB
Date:
Subject: AW: AW: AW: Call for alpha testing: planner statistics revi sion s