Re: PATCH: adaptive ndistinct estimator v3 (WAS: Re: [PERFORM] Yet another abort-early plan disaster on 9.3) - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: PATCH: adaptive ndistinct estimator v3 (WAS: Re: [PERFORM] Yet another abort-early plan disaster on 9.3)
Date
Msg-id 549943DA.4090603@vmware.com
Whole thread Raw
In response to PATCH: adaptive ndistinct estimator v3 (WAS: Re: [PERFORM] Yet another abort-early plan disaster on 9.3)  (Tomas Vondra <tv@fuzzy.cz>)
Responses Re: PATCH: adaptive ndistinct estimator v3 (WAS: Re: [PERFORM] Yet another abort-early plan disaster on 9.3)
List pgsql-hackers
On 12/07/2014 03:54 AM, Tomas Vondra wrote:
> The one interesting case is the 'step skew' with statistics_target=10,
> i.e. estimates based on mere 3000 rows. In that case, the adaptive
> estimator significantly overestimates:
>
>      values   current    adaptive
>      ------------------------------
>      106           99         107
>      106            8     6449190
>      1006          38     6449190
>      10006        327       42441
>
> I don't know why I didn't get these errors in the previous runs, because
> when I repeat the tests with the old patches I get similar results with
> a 'good' result from time to time. Apparently I had a lucky day back
> then :-/
>
> I've been messing with the code for a few hours, and I haven't found any
> significant error in the implementation, so it seems that the estimator
> does not perform terribly well for very small samples (in this case it's
> 3000 rows out of 10.000.000 (i.e. ~0.03%).

The paper [1] gives an equation for an upper bound of the error of this 
GEE estimator. How do the above numbers compare with that bound?

[1] 
http://ftp.cse.buffalo.edu/users/azhang/disc/disc01/cd1/out/papers/pods/towardsestimatimosur.pdf

- Heikki




pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: compress method for spgist - 2
Next
From: Teodor Sigaev
Date:
Subject: Re: speedup tidbitmap patch: cache page