Re: estimating # of distinct values - Mailing list pgsql-hackers

From tv@fuzzy.cz
Subject Re: estimating # of distinct values
Date
Msg-id 1d0cb03ef3ab1c04fe87d6910b51d243.squirrel@sq.gransy.com
Whole thread Raw
In response to Re: estimating # of distinct values  (Csaba Nagy <ncslists@googlemail.com>)
Responses Re: estimating # of distinct values  (Jim Nasby <jim@nasby.net>)
Re: estimating # of distinct values  (Csaba Nagy <ncslists@googlemail.com>)
List pgsql-hackers
> On Thu, 2010-12-30 at 21:02 -0500, Tom Lane wrote:
>> How is an incremental ANALYZE going to work at all?
>
> How about a kind of continuous analyze ?
>
> Instead of analyzing just once and then drop the intermediate results,
> keep them on disk for all tables and then piggyback the background
> writer (or have a dedicated process if that's not algorithmically
> feasible) and before writing out stuff update the statistics based on
> the values found in modified buffers. Probably it could take a random
> sample of buffers to minimize overhead, but if it is done by a
> background thread the overhead could be minimal anyway on multi-core
> systems.

Hi,

the problem is you will eventually need to drop the results and rebuild
it, as the algorithms do not handle deletes (ok, Florian mentioned an
algorithm L_0 described in one of the papers, but I'm not sure we can use
it).

I'm not sure a constantly running background process is a good idea. I'd
prefer storing an info about the modified tuples somewhere, and starting
analyze only when a given threshold is reached. I'm not sure how to do
that, though.

Another thing I'm not sure about is where to store those intermediate
stats (used to get the current estimate, updated incrementally). I was
thinking about pg_stats but I'm not sure it's the right place - depending
on the algorithm, this may be a fet kilobytes up to several megabytes. And
it's not needed except when updating it. Any ideas?

regards
Tomas



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: system views for walsender activity
Next
From: Joachim Wieland
Date:
Subject: Re: Snapshot synchronization, again...