Re: [HACKERS] Bad n_distinct estimation; hacks suggested? - Mailing list pgsql-performance

From Josh Berkus
Subject Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Date
Msg-id 200505031443.44859.josh@agliodbs.com
Whole thread Raw
In response to Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg <mischa.sandberg@telus.net>)
Responses Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
List pgsql-performance
Mischa,

> Okay, although given the track record of page-based sampling for
> n-distinct, it's a bit like looking for your keys under the streetlight,
> rather than in the alley where you dropped them :-)

Bad analogy, but funny.

The issue with page-based vs. pure random sampling is that to do, for example,
10% of rows purely randomly would actually mean loading 50% of pages.  With
20% of rows, you might as well scan the whole table.

Unless, of course, we use indexes for sampling, which seems like a *really
good* idea to me ....

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

pgsql-performance by date:

Previous
From: Mischa Sandberg
Date:
Subject: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Next
From: John A Meinel
Date:
Subject: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?