Home > mailing lists

Re: More thoughts about planner's cost estimates - Mailing list pgsql-hackers

From	Josh Berkus
Subject	Re: More thoughts about planner's cost estimates
Date	June 2, 2006 16:24:12
Msg-id	200606021223.35168.josh@agliodbs.com Whole thread Raw
In response to	Re: More thoughts about planner's cost estimates (Greg Stark <gsstark@mit.edu>)
Responses	Re: More thoughts about planner's cost estimates
List	pgsql-hackers

Tree view

Greg,

>     Using a variety of synthetic and real-world data sets, we show that
>     distinct sampling gives estimates for distinct values queries that
> are within 0%-10%, whereas previous methods were typically 50%-250% off,
> across the spectrum of data sets and queries studied.

Aha.  It's a question of the level of error permissable.   For our 
estimates, being 100% off is actually OK.  That's why I was looking at 5% 
block sampling; it stays within the range of +/- 50% n-distinct in 95% of 
cases.

> Doing a bit of basic searching around I think the tool we're looking for
> here is called a "chi-squared test for independence".

Augh.  I wrote a program (in Pascal) to do this back in 1988.   Now I can't 
remember the math.  For a two-column test it's relatively 
computation-light, though, as I recall ... but I don't remember standard 
chi square works with a random sample.


-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

pgsql-hackers by date:

From: Oleg Bartunov
Date: 02 June 2006, 15:50:34
Subject: Re: Connection Broken with Custom Dicts for TSearch2

From: Tino Wildenhain
Date: 02 June 2006, 16:43:23
Subject: Re: COPY (query) TO file

Re: More thoughts about planner's cost estimates - Mailing list pgsql-hackers

Previous

Next