Thread: Self-generating statistics?

Self-generating statistics?

From
Mario Weilguni
Date:
Something that might be worth considering:

Many of the performance problems on pgsql-general seem to be related to the fact that no analyze is performed after the
creationon the tables, so 
maybe this might be an option to fix that (in future releases): when a table has no statistics at all, and the first
seq-scanon the table is 
performed, it might improve further performance if this seq-scan is used to get table statistics too. This should not
betoo expensive since reading the 
table has to be done only once. Further queries will have at least preliminary statistics at hand.

I'm not sure how (CPU) expensive statistic-gathering is, but if most of the work is reading the tuples, it might be a
winto do this. 

Regards,
    Mario Weilguni




Re: Self-generating statistics?

From
Neil Conway
Date:
Mario Weilguni <mweilguni@sime.com> writes:
> Many of the performance problems on pgsql-general seem to be related
> to the fact that no analyze is performed after the creation on the
> tables

Well, there are lots of other ways an incompetent DBA can screw up a
database. The need to VACUUM and ANALYZE is stated clearly in the
docs. Providing workarounds for negligence isn't the right path to get
started down, IMHO.

That said, the general idea of a self-tuning database system has
merit, IMHO. For example, this paper proposes a histogram data
structure that can be updated fairly cheaply based on data gathered
from query execution:

        http://citeseer.nj.nec.com/255752.html

A bunch of industry players (IBM, Microsoft, etc.) are putting some
work into this area (IBM calls it "autonomic computing", for
example). It might be an interesting area to look at in the future...

Cheers,

Neil

--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC

Re: Self-generating statistics?

From
Andrew Sullivan
Date:
On Mon, Oct 21, 2002 at 08:23:33AM +0200, Mario Weilguni wrote:
> Something that might be worth considering:
>
> Many of the performance problems on pgsql-general seem to be
> related to the fact that no analyze is performed after the creation
> on the tables, so maybe this might be an option to fix that (in
> future releases): when a table has no statistics at all, and the
> first seq-scan on the table is

It's never the case that a table has no statistics at all.  It has
default ones.  Maybe they're right; it's hard to know.

Someone has posted on gborg an anto-vacuum daemon that might be of
use in this situation.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110