Home > mailing lists

Re: ANALYZE sampling is too good - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: ANALYZE sampling is too good
Date	December 6, 2013 11:49:40
Msg-id	CAA4eK1K1R011==4-xuYe9WYFqWQiT=Hayp-Aa4J=gc0Xy9=2xA@mail.gmail.com Whole thread Raw
In response to	Re: ANALYZE sampling is too good (Peter Geoghegan <pg@heroku.com>)
List	pgsql-hackers

Tree view

On Fri, Dec 6, 2013 at 7:22 AM, Peter Geoghegan <pg@heroku.com> wrote:
> On Thu, Dec 5, 2013 at 3:50 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> There are fairly well researched algorithms for block-based sampling
>> which estimate for the skew introduced by looking at consecutive rows in
>> a block.  In general, a minimum sample size of 5% is required, and the
>> error is no worse than our current system.  However, the idea was shot
>> down at the time, partly because I think other hackers didn't get the math.
>
> I think that this certainly warrants revisiting. The benefits would be
> considerable.
>
> Has anyone ever thought about opportunistic ANALYZE piggy-backing on
> other full-table scans? That doesn't really help Greg, because his
> complaint is mostly that a fresh ANALYZE is too expensive, but it
> could be an interesting, albeit risky approach.

Is only fresh ANALYZE costly or consecutive one's are also equally costly?

Doing it in some background operation might not be a bad idea, but doing it
in backend query execution (seq scan) might add overhead for query response time
especially if part or most of data for table is in RAM, so here
overhead due to actual read
might not be very high but the calculation for analyse (like sort)
will make it costly.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Andrew Gierth
Date: 06 December 2013, 11:13:25
Subject: Re: WITHIN GROUP patch

From: Andres Freund
Date: 06 December 2013, 12:21:32
Subject: Re: ANALYZE sampling is too good

Re: ANALYZE sampling is too good - Mailing list pgsql-hackers

Previous

Next