On 10 December 2013 23:43, Peter Geoghegan <pg@heroku.com> wrote:
> On Tue, Dec 10, 2013 at 3:26 PM, Jim Nasby <jim@nasby.net> wrote:
>>> I agree that looking for information on block level sampling
>>> specifically, and its impact on estimation quality is likely to not
>>> turn up very much, and whatever it does turn up will have patent
>>> issues.
>>
>>
>> We have an entire analytics dept. at work that specializes in finding
>> patterns in our data. I might be able to get some time from them to at least
>> provide some guidance here, if the community is interested. They could
>> really only serve in a consulting role though.
>
> I think that Greg had this right several years ago: it would probably
> be very useful to have the input of someone with a strong background
> in statistics. It doesn't seem that important that they already know a
> lot about databases, provided they can understand what our constraints
> are, and what is important to us. It might just be a matter of having
> them point us in the right direction.
err, so what does stats target mean exactly in statistical theory?
Waiting for a statistician, and confirming his credentials before you
believe him above others here, seems like wasted time.
What your statistician will tell you is it that YMMV, depending on the data.
So we'll still need a parameter to fine tune things when the default
is off. We can argue about the default later, in various level of
rigour.
Block sampling, with parameter to specify sample size. +1
-- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services