Home > mailing lists

Re: estimating # of distinct values - Mailing list pgsql-hackers

From	Csaba Nagy
Subject	Re: estimating # of distinct values
Date	January 10, 2011 17:35:25
Msg-id	1294657351.2962.37.camel@clnt-sysecm-cnagy Whole thread Raw
In response to	Re: estimating # of distinct values (tv@fuzzy.cz)
List	pgsql-hackers

Tree view

On Fri, 2011-01-07 at 12:32 +0100, tv@fuzzy.cz wrote:
> the problem is you will eventually need to drop the results and rebuild
> it, as the algorithms do not handle deletes (ok, Florian mentioned an
> algorithm L_0 described in one of the papers, but I'm not sure we can use
> it).

Yes, but even then you can start with much better cards if you already
have an estimate of what it looks like, based on the fact that you did
continuous updating of it. For example you'll have a pretty good
estimate of the bounds of the number of distinct values, while if you
really start from scratch you have nothing to start with but assume that
you must cope with the complete range between all values are distinct ->
there's only a few of them.

> I'm not sure a constantly running background process is a good idea. I'd
> prefer storing an info about the modified tuples somewhere, and starting
> analyze only when a given threshold is reached. I'm not sure how to do
> that, though.
> 
> Another thing I'm not sure about is where to store those intermediate
> stats (used to get the current estimate, updated incrementally).

The forks implementation proposed in other responses is probably the
best idea if usable. It will also solve you the problem of memory
limitations, at the expense of more resources used if the structure
grows too big, but it will still be probably fast enough if it stays
small and in cache.

Cheers,
Csaba.

pgsql-hackers by date:

From: "Kevin Grittner"
Date: 10 January 2011, 16:50:35
Subject: Re: Compatibility GUC for serializable

From: Tom Lane
Date: 10 January 2011, 17:37:40
Subject: Re: Bug in pg_describe_object (was: Re: [HACKERS] obj_unique_identifier(oid))

Re: estimating # of distinct values - Mailing list pgsql-hackers

Previous

Next