Home > mailing lists

Re: [HACKERS] Bad n_distinct estimation; hacks suggested? - Mailing list pgsql-performance

From	Josh Berkus
Subject	Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Date	May 3, 2005 18:42:38
Msg-id	200505031443.44859.josh@agliodbs.com Whole thread Raw
In response to	Re: [HACKERS] Bad n_distinct estimation; hacks suggested? (Mischa Sandberg <mischa.sandberg@telus.net>)
Responses	Re: [HACKERS] Bad n_distinct estimation; hacks suggested? Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
List	pgsql-performance

Tree view

Mischa,

> Okay, although given the track record of page-based sampling for
> n-distinct, it's a bit like looking for your keys under the streetlight,
> rather than in the alley where you dropped them :-)

Bad analogy, but funny.

The issue with page-based vs. pure random sampling is that to do, for example,
10% of rows purely randomly would actually mean loading 50% of pages.  With
20% of rows, you might as well scan the whole table.

Unless, of course, we use indexes for sampling, which seems like a *really
good* idea to me ....

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

pgsql-performance by date:

From: Mischa Sandberg
Date: 03 May 2005, 18:33:21
Subject: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

From: John A Meinel
Date: 03 May 2005, 21:45:33
Subject: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

Re: [HACKERS] Bad n_distinct estimation; hacks suggested? - Mailing list pgsql-performance

Previous

Next