On 12/07/2014 03:54 AM, Tomas Vondra wrote:
> The one interesting case is the 'step skew' with statistics_target=10,
> i.e. estimates based on mere 3000 rows. In that case, the adaptive
> estimator significantly overestimates:
>
> values current adaptive
> ------------------------------
> 106 99 107
> 106 8 6449190
> 1006 38 6449190
> 10006 327 42441
>
> I don't know why I didn't get these errors in the previous runs, because
> when I repeat the tests with the old patches I get similar results with
> a 'good' result from time to time. Apparently I had a lucky day back
> then :-/
>
> I've been messing with the code for a few hours, and I haven't found any
> significant error in the implementation, so it seems that the estimator
> does not perform terribly well for very small samples (in this case it's
> 3000 rows out of 10.000.000 (i.e. ~0.03%).
The paper [1] gives an equation for an upper bound of the error of this
GEE estimator. How do the above numbers compare with that bound?
[1]
http://ftp.cse.buffalo.edu/users/azhang/disc/disc01/cd1/out/papers/pods/towardsestimatimosur.pdf
- Heikki