Home > mailing lists

Re: default_statistics_target WAS: max_wal_senders must die - Mailing list pgsql-hackers

From	Josh Berkus
Subject	Re: default_statistics_target WAS: max_wal_senders must die
Date	October 21, 2010 01:41:48
Msg-id	4CBF9A50.8040604@agliodbs.com Whole thread Raw
In response to	Re: default_statistics_target WAS: max_wal_senders must die (Greg Stark <gsstark@mit.edu>)
Responses	Re: default_statistics_target WAS: max_wal_senders must die Re: default_statistics_target WAS: max_wal_senders must die
List	pgsql-hackers

Tree view

> I don't see why the MCVs would need a particularly large sample size
> to calculate accurately. Have you done any tests on the accuracy of
> the MCV list?

Yes, although I don't have them at my fingertips.  In sum, though, you
can't take 10,000 samples from a 1b row table and expect to get a
remotely accurate MCV list.

A while back I did a fair bit of reading on ndistinct and large tables
from the academic literature.  The consensus of many papers was that it
took a sample of at least 3% (or 5% for block-based) of the table in
order to have 95% confidence in ndistinct of 3X.  I can't imagine that
MCV is easier than this.

> And mostly
> what it tells me is that we need a robust statistical method and the
> data structures it requires for estimating the frequency of a single
> value.

Agreed.

>  Binding the length of the MCV list to the size of the histogram is
> arbitrary but so would any other value and I haven't seen anyone
> propose any rationale for any particular value.

histogram size != sample size.  It is in our code, but that's a bug and
not a feature.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com

pgsql-hackers by date:

From: Robert Haas
Date: 21 October 2010, 01:34:23
Subject: lazy snapshots?

From: Robert Haas
Date: 21 October 2010, 01:49:31
Subject: Re: default_statistics_target WAS: max_wal_senders must die

Re: default_statistics_target WAS: max_wal_senders must die - Mailing list pgsql-hackers

Previous

Next