Home > mailing lists

Re: ANALYZE sampling is too good - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: ANALYZE sampling is too good
Date	December 6, 2013 01:52:39
Msg-id	CAM3SWZREK9cRovD2X=3pMqYgq1QfhG6xmfdwD_gN0FEsH9td+w@mail.gmail.com Whole thread Raw
In response to	Re: ANALYZE sampling is too good (Josh Berkus <josh@agliodbs.com>)
Responses	Re: ANALYZE sampling is too good Re: ANALYZE sampling is too good Re: ANALYZE sampling is too good
List	pgsql-hackers

Tree view

On Thu, Dec 5, 2013 at 3:50 PM, Josh Berkus <josh@agliodbs.com> wrote:
> There are fairly well researched algorithms for block-based sampling
> which estimate for the skew introduced by looking at consecutive rows in
> a block.  In general, a minimum sample size of 5% is required, and the
> error is no worse than our current system.  However, the idea was shot
> down at the time, partly because I think other hackers didn't get the math.

I think that this certainly warrants revisiting. The benefits would be
considerable.

Has anyone ever thought about opportunistic ANALYZE piggy-backing on
other full-table scans? That doesn't really help Greg, because his
complaint is mostly that a fresh ANALYZE is too expensive, but it
could be an interesting, albeit risky approach.
Opportunistically/unpredictably acquiring a ShareUpdateExclusiveLock
would be kind of weird, for one thing, but if a full table scan really
is very expensive, would it be so unreasonable to attempt to amortize
that cost?

-- 
Peter Geoghegan

pgsql-hackers by date:

From: Tom Lane
Date: 06 December 2013, 01:49:06
Subject: Re: WITHIN GROUP patch

From: Joe Conway
Date: 06 December 2013, 02:29:28
Subject: dblink performance regression

Re: ANALYZE sampling is too good - Mailing list pgsql-hackers

Previous

Next