Home > mailing lists

Re: [PERFORM] Bad n_distinct estimation; hacks suggested? - Mailing list pgsql-hackers

From	Mischa Sandberg
Subject	Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Date	May 3, 2005 18:33:18
Msg-id	1115155990.4277ee16aba34@webmail.telus.net Whole thread Raw
In response to	Re: [PERFORM] Bad n_distinct estimation; hacks suggested? (Markus Schaber <schabi@logix-tt.com>)
Responses	Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
List	pgsql-hackers

Tree view

Quoting Markus Schaber <schabi@logix-tt.com>:

> Hi, Josh,
>
> Josh Berkus wrote:
>
> > Yes, actually.   We need 3 different estimation methods:
> > 1 for tables where we can sample a large % of pages (say, >= 0.1)
> > 1 for tables where we sample a small % of pages but are "easily
> estimated"
> > 1 for tables which are not easily estimated by we can't afford to
> sample a
> > large % of pages.
> >
> > If we're doing sampling-based estimation, I really don't want
> people to lose
> > sight of the fact that page-based random sampling is much less
> expensive than
> > row-based random sampling.   We should really be focusing on
> methods which
> > are page-based.

Okay, although given the track record of page-based sampling for
n-distinct, it's a bit like looking for your keys under the streetlight,
rather than in the alley where you dropped them :-)

How about applying the distinct-sampling filter on a small extra data
stream to the stats collector?

--
Engineers think equations approximate reality.
Physicists think reality approximates the equations.
Mathematicians never make the connection.

pgsql-hackers by date:

From: Tom Lane
Date: 03 May 2005, 18:20:46
Subject: Re: Feature freeze date for 8.1

From: Thomas Swan
Date: 03 May 2005, 18:33:20
Subject: Re: Feature freeze date for 8.1

Re: [PERFORM] Bad n_distinct estimation; hacks suggested? - Mailing list pgsql-hackers

Previous

Next