Re: ANALYZE sampling is too good - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: ANALYZE sampling is too good
Date
Msg-id CA+U5nMKQ-b=34u37A1yOMOExQ2me+Tif8_-cYHDM3vODOrLDuA@mail.gmail.com
Whole thread Raw
In response to Re: ANALYZE sampling is too good  (Peter Geoghegan <pg@heroku.com>)
Responses Re: ANALYZE sampling is too good
Re: ANALYZE sampling is too good
List pgsql-hackers
On 10 December 2013 23:43, Peter Geoghegan <pg@heroku.com> wrote:
> On Tue, Dec 10, 2013 at 3:26 PM, Jim Nasby <jim@nasby.net> wrote:
>>> I agree that looking for information on block level sampling
>>> specifically, and its impact on estimation quality is likely to not
>>> turn up very much, and whatever it does turn up will have patent
>>> issues.
>>
>>
>> We have an entire analytics dept. at work that specializes in finding
>> patterns in our data. I might be able to get some time from them to at least
>> provide some guidance here, if the community is interested. They could
>> really only serve in a consulting role though.
>
> I think that Greg had this right several years ago: it would probably
> be very useful to have the input of someone with a strong background
> in statistics. It doesn't seem that important that they already know a
> lot about databases, provided they can understand what our constraints
> are, and what is important to us. It might just be a matter of having
> them point us in the right direction.

err, so what does stats target mean exactly in statistical theory?
Waiting for a statistician, and confirming his credentials before you
believe him above others here, seems like wasted time.

What your statistician will tell you is it that YMMV, depending on the data.

So we'll still need a parameter to fine tune things when the default
is off. We can argue about the default later, in various level of
rigour.

Block sampling, with parameter to specify sample size. +1

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: logical changeset generation v6.8
Next
From: Jeff Janes
Date:
Subject: Re: Why we are going to have to go DirectIO