Thread: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?

Re: [PERFORM] Bad n_distinct estimation; hacks suggested?

From
"Dave Held"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Monday, April 25, 2005 10:23 AM
> To: Simon Riggs
> Cc: josh@agliodbs.com; Greg Stark; Marko Ristola; pgsql-perform;
> pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks
> suggested?
>
> [...]
> It's not just the scan --- you also have to sort, or something like
> that, if you want to count distinct values.  I doubt anyone is really
> going to consider this a feasible answer for large tables.

How about an option to create a stat hashmap for the column
that maps distinct values to their number of occurrences?  Obviously
the map would need to be updated on INSERT/DELETE/UPDATE, but if the
table is dominated by reads, and an accurate n_distinct is very
important, there may be people willing to pay the extra time and space
cost.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129