Re: ANALYZE sampling is too good - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: ANALYZE sampling is too good
Date
Msg-id CAA4eK1K1R011==4-xuYe9WYFqWQiT=Hayp-Aa4J=gc0Xy9=2xA@mail.gmail.com
Whole thread Raw
In response to Re: ANALYZE sampling is too good  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Fri, Dec 6, 2013 at 7:22 AM, Peter Geoghegan <pg@heroku.com> wrote:
> On Thu, Dec 5, 2013 at 3:50 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> There are fairly well researched algorithms for block-based sampling
>> which estimate for the skew introduced by looking at consecutive rows in
>> a block.  In general, a minimum sample size of 5% is required, and the
>> error is no worse than our current system.  However, the idea was shot
>> down at the time, partly because I think other hackers didn't get the math.
>
> I think that this certainly warrants revisiting. The benefits would be
> considerable.
>
> Has anyone ever thought about opportunistic ANALYZE piggy-backing on
> other full-table scans? That doesn't really help Greg, because his
> complaint is mostly that a fresh ANALYZE is too expensive, but it
> could be an interesting, albeit risky approach.

Is only fresh ANALYZE costly or consecutive one's are also equally costly?

Doing it in some background operation might not be a bad idea, but doing it
in backend query execution (seq scan) might add overhead for query response time
especially if part or most of data for table is in RAM, so here
overhead due to actual read
might not be very high but the calculation for analyse (like sort)
will make it costly.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Andrew Gierth
Date:
Subject: Re: WITHIN GROUP patch
Next
From: Andres Freund
Date:
Subject: Re: ANALYZE sampling is too good