Thread: change sample size for statistics

change sample size for statistics

From
Willy-Bas Loos
Date:
Hi,

is there a way to change the sample size for statistics (that analyze gathers)?
It is said to be 10%. i would like to raise that, because we are getting bas estimations for n_distinct.

Cheers,

WBL

--
"Patriotism is the conviction that your country is superior to all others because you were born in it." -- George Bernard Shaw

Re: change sample size for statistics

From
Josh Berkus
Date:
On 6/10/11 5:15 AM, Willy-Bas Loos wrote:
> Hi,
>
> is there a way to change the sample size for statistics (that analyze
> gathers)?
> It is said to be 10%. i would like to raise that, because we are getting bas
> estimations for n_distinct.

It's not 10%.  We use a fixed sample size, which is configurable on the
system, table, or column basis.

Some reading (read all these pages to understand what you're doing):
http://www.postgresql.org/docs/9.0/static/planner-stats.html
http://www.postgresql.org/docs/9.0/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-OTHER
http://www.postgresql.org/docs/9.0/static/planner-stats-details.html
http://www.postgresql.org/docs/9.0/static/sql-altertable.html
(scroll down to "set storage" on that last page)

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: change sample size for statistics

From
Nathan Boley
Date:
[ Sorry, forgot to cc list ]

>> It is said to be 10%. i would like to raise that, because we are getting bas
>> estimations for n_distinct.
>
> More to the point, the estimator we use is going to be biased for many
> ( probably most ) distributions no matter how large your sample size
> is.
>
> If you need to fix ndistinct, a better approach may be to do it manually.
>
> Best,
> Nathan
>

Re: change sample size for statistics

From
Willy-Bas Loos
Date:


On Fri, Jun 10, 2011 at 9:58 PM, Josh Berkus <josh@agliodbs.com> wrote:
It's not 10%.  We use a fixed sample size, which is configurable on the
system, table, or column basis.

It seems that you are referring to "alter column set statistics" and "default_statistics_target", which are the number of percentiles in the histogram  (and MCV's) .
I mean the number of records that are scanned by analyze to come to the statistics for the planner, especially n_disctict.


On Fri, Jun 10, 2011 at 10:06 PM, Nathan Boley <npboley@gmail.com> wrote:
If you need to fix ndistinct, a better approach may be to do it manually.

That would be nice, but how do i prevent the analyzer to overwrite n_distinct without blocking the generation of new histogram values etc for that column?

We use version 8.4 at the moment (on debian squeeze).

Cheers,

WBL
--
"Patriotism is the conviction that your country is superior to all others because you were born in it." -- George Bernard Shaw

Re: change sample size for statistics

From
Robert Haas
Date:
On Mon, Jun 13, 2011 at 6:33 PM, Willy-Bas Loos <willybas@gmail.com> wrote:
> On Fri, Jun 10, 2011 at 9:58 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>
>> It's not 10%.  We use a fixed sample size, which is configurable on the
>> system, table, or column basis.
>
> It seems that you are referring to "alter column set statistics" and
> "default_statistics_target", which are the number of percentiles in the
> histogram  (and MCV's) .
> I mean the number of records that are scanned by analyze to come to the
> statistics for the planner, especially n_disctict.

In 9.0+ you can do ALTER TABLE .. ALTER COLUMN .. SET (n_distinct = ...);

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company