Home > mailing lists

Re: estimating # of distinct values - Mailing list pgsql-hackers

From	Csaba Nagy
Subject	Re: estimating # of distinct values
Date	January 20, 2011 11:08:24
Msg-id	1295517901.1436.172.camel@clnt-sysecm-cnagy Whole thread Raw
In response to	Re: estimating # of distinct values (Tomas Vondra <tv@fuzzy.cz>)
Responses	Re: estimating # of distinct values
List	pgsql-hackers

Tree view

Hi Tomas,

On Wed, 2011-01-19 at 23:13 +0100, Tomas Vondra wrote:
> No, the multi-column statistics do not require constant updating. There
> are cases where a sampling is perfectly fine, although you may need a
> bit larger sample. Generally if you can use a multi-dimensional
> histogram, you don't need to scan the whole table.

In the cases where sampling is enough, you can do that to the updates
too: do a sampling on the changes, in that you only process every Nth
change to make it to the estimator. If you can also dynamically tune the
N to grow it as the statistics stabilize, and lower it if you detect
high variance, even better.

If the analyze process could be decoupled from the backends, and maybe
just get the data passed over to be processed asynchronously, then that
could be a feasible way to have always up to date statistics when the
bottleneck is IO and CPU power is in excess. If that then leads to
better plans, it could really be a win exceeding the overhead.

If this analyze process (or more of them) could also just get the data
from the modified buffers in a cyclic way, so that backends need nothing
extra to do, then I don't see any performance disadvantage other than
possible extra locking contention on the buffers and non-determinism of
the actual time when a change makes it to the statistics. Then you just
need to get more CPU power and higher memory bandwidth to pay for the
accurate statistics.

Cheers,
Csaba.

pgsql-hackers by date:

From: Robert Haas
Date: 20 January 2011, 11:06:43
Subject: Re: pg_basebackup for streaming base backups

From: Bruce Momjian
Date: 20 January 2011, 11:15:52
Subject: Re: pg_basebackup for streaming base backups

Re: estimating # of distinct values - Mailing list pgsql-hackers

Previous

Next