Re: estimating # of distinct values - Mailing list pgsql-hackers

From Jim Nasby
Subject Re: estimating # of distinct values
Date
Msg-id AD87588E-8BCE-403C-B8F7-871C9182157F@nasby.net
Whole thread Raw
In response to Re: estimating # of distinct values  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-hackers
On Dec 31, 2010, at 7:34 AM, Alvaro Herrera wrote:
> Excerpts from Tom Lane's message of jue dic 30 23:02:04 -0300 2010:
>> Alvaro Herrera <alvherre@commandprompt.com> writes:
>>> I was thinking that we could have two different ANALYZE modes, one
>>> "full" and one "incremental"; autovacuum could be modified to use one or
>>> the other depending on how many changes there are (of course, the user
>>> could request one or the other, too; not sure what should be the default
>>> behavior).
>>
>> How is an incremental ANALYZE going to work at all?  It has no way to
>> find out the recent changes in the table, for *either* inserts or
>> deletes.  Unless you want to seqscan the whole table looking for tuples
>> with xmin later than something-or-other ... which more or less defeats
>> the purpose.
>
> Yeah, I was thinking that this incremental ANALYZE would be the stream
> in the "stream-based estimator" but evidently it doesn't work that way.
> The stream that needs to be passed to the estimator consists of new
> tuples as they are being inserted into the table, so this would need to
> be done by the inserter process ... or it'd need to transmit the CTIDs
> for someone else to stream them ... not an easy thing, in itself.

Perhaps listen/notify could be used for this, now that it allows passing a payload.

BTW, if we reduce the frequency at which full scans of large tables are needed then presumably the cost of the scans
couldbe largely ignored. If we don't need to scan frequently then we shouldn't care very much about how long a scan
takes,which means we could throttle it heavily. Presumably even a heavily used system can spare 500kB/s of IO to
performbackground scanning... 
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net




pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Sync Rep Design
Next
From: Robert Haas
Date:
Subject: Re: and it's not a bunny rabbit, either