Re: Simple postgresql.conf wizard - Mailing list pgsql-hackers

From Gregory Stark
Subject Re: Simple postgresql.conf wizard
Date
Msg-id 87wseqppji.fsf@oxford.xeocode.com
Whole thread Raw
In response to Re: Simple postgresql.conf wizard  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Simple postgresql.conf wizard
List pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> writes:

> "Dann Corbit" <DCorbit@connx.com> writes:
>> I also do not believe that there is any value that will be the right
>> answer.  But a table of data might be useful both for people who want to
>> toy with altering the values and also for those who want to set the
>> defaults.  I guess that at one time such a table was generated to
>> produce the initial estimates for default values.
>
> Sir, you credit us too much :-(.  The actual story is that the current
> default of 10 was put in when we first implemented stats histograms,
> replacing code that kept track of only a *single* most common value
> (and not very well, at that).  So it was already a factor of 10 more
> stats than we had experience with keeping, and accordingly conservatism
> suggested not boosting the default much past that.

I think that's actually too little credit. The sample size is chosen quite
carefully based on solid mathematics to provide a specific confidence interval
estimate for queries covering ranges the size of a whole bucket.

The actual number of buckets more of an arbitrary choice. It depends entirely
on how your data is distributed and how large a range your queries are
covering. A uniformly distributed data set should only need a single bucket to
generate good estimates. Less evenly distributed data sets need more.

I wonder actually if there are algorithms for estimating the number of buckets
needed for a histogram to achieve some measurable goal. That would close the
loop. It would be much more reassuring to base the size of the sample on solid
statistics than on hunches.

> So we really don't have any methodically-gathered evidence about the
> effects of different stats settings.  It wouldn't take a lot to convince
> us to switch to a different default, I think, but it would be nice to
> have more than none.

I think the difficulty (aside from testing being laborious at the best of
times) is that it's heavily dependent on data sets which are hard to generate
good examples for. Offhand I would think the census data might make a good
starting point -- it should have columns which range from perfectly uniform to
highly skewed.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production
Tuning


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Simple postgresql.conf wizard
Next
From: Tom Lane
Date:
Subject: Re: Visibility map, partial vacuums